LHC Computing Grid Project

Project Execution Board

Notes of the meeting of Tuesday July 20, 2004

DRAFT 3 21/July/2004 – including comments from David, Philippe, Les, Latchezar, Bernd

 

Present:

Dario Barberis, Latchezar Betev, Ian Bird, Nick Brook (phone), Philippe Charpentier, Dirk Duellmann, Fredéric Hemmer, Jürgen Knobloch (secretary), Massimo Lamanna, Erwin Laure, Bernd Panzer, Les Robertson (chair), David Stickland, Torre Wenaus (phone)

 

Actions: Actions are identified by bold blue italics.

Minutes of last meeting and matters arising

The LHCC comprehensive review will take 1.5 to 2 days, most likely on November 22-23. There is a conflict with EGEE. Les fill find out with Jos Engelen on how to solve the conflict.

The application area is interested in NOT having an internal review. There was no objection to this proposal.

 

Major decisions from recent meetings

Torre reported on the architect’s forum:

Gabriele Cosmo has been nominated simulation project leader.

 

Les reported from the recent GDB:

Kors Bos has been elected as new chair of the GDB.

Further information from the GDB will be made available by Mirco or Alberto.

 

Distributed Deployment of databases – Dirk Duellmann (transparencies )

Dirk summarized a proposal on a LCG Database Deployment Project. The project scope is based on discussions with the representatives of the experiments. The transparencies contain a good description of the goals and also of the “non-goals” of the project. It is intended to support more than one database incarnation – currently ORACLE and MySQL – through a ‘relational abstraction layer’ (RAL) in central applications. Dirk also discussed possible distribution options and technologies. In the first phase, during 2005, some applications will still be bound to a database vendor and therefore to a particular tier – it is expected that Oracle will be used at tier-0 and tier-1 centres and MySQL on the others. During that phase, emphasis will be on bulk data transfer issues between tier-1 and on tier-1 to tier-2 extraction. Together with interested tier-2 sites, a MySQL service will be defined. In a second phase the heterogeneous setup of tier-2 sites would be extended to tier-1 sites. The project will be decomposed into three work packages.

The project would be run as part of the LCG deployment area in close collaboration with the application area. As a first step, Dirk proposes that the project working groups be composed of two people from each experiment (one application specialist and one for grid deployment) and participants from major grid services such as LCG RLS and EGEE as well as technology experts from tier-1 and tier-2 sites. A more detailed work plan will be provided 2 months after project start.

ATLAS CMS and LHCb expressed strong interest in the project. The ALICE model does not include distributed RDBMS their interest is therefore limited to gLite.

There are already sites (Chicago & Argonne) having expressed interest to set up a MySQL service.

Philippe raised the question of online database support. It was concluded that this should be included in the requirements assessment.

 

After the first 2 months of the project a detailed workplan will be presented; then a workforce from regional centres will be required – first contacts have already been made.

The PEB goes along with the project proposal. In the near future the experiments should nominate people. The tier-1 &2 centres should also get involved.

 

Lightweight disk pool manager – Ian

At the GDB last week all experiments said that the functionality will be clearly needed. Other projects (dCache and CASTOR) will be maintained at the current level.

The lightweight disk pool manager is considered a component of gLite that is required now. Ian’s group will develop the simplest solution along the lines of the document distributed by Ian.

David Stickland gave the following strong reservations:

"David stated for CMS that they consider this effort to be premature. While it is evident that dCache is not currently an out-of-the-box solution for lightweight needs there was a clear message, at least CMS understood this to be the message, from the GDB (for which unfortunately minutes have not yet appeared) that LCG should not be developing new solutions, but should be encouraged to clearly expose the issues and work with the existing technology providers, such as the dCache team, to develop a clear specification. Only if these efforts fail or LCG determines that the level of support required for such a program is not available elsewhere should a new program be envisaged.

CMS is not aware that sufficient discussion of the technical requirements and possible alternate solutions of a lightweight disk pool manager has yet been carried out. CMS opposes any activity beyond a technical  definition and workplan; the need for a tapeless SE is evident, but we have yet to see any evidence that what is being proposed is an adequate solution or just the tip of a complex iceberg that will later have to be fleshed out and result in wasted (HEP Community) effort."

Les and Ian explained that dCache was good for larger centres while for the many small centres the management overhead was too large. One does not aim at a complicated solution.

Ian will provide a document with a clear work plan.

Dario got responses from smaller ATLAS sites that are in favour of having this as soon as possible.

LHCb supports the project and would like to have it two months ago. Having no storage element at most sites poses problems for the data challenges when small sites cannot store the data at their own site.

Les concluded to go ahead and come back with a status report to the PEB by end October.

 

Reports from data challenges

ATLAS:

Dario showed a plot of the jobs run on the grids participating (LCG, Grid3, and NorduGrid).

On that plot time-zero is the start of the data challenge on June 24th.

It was noted that Grid3 production started late. The slope (and therefore throughput) of LCG is a factor 3 to 4 below expectation. This is attributed to deficiencies of the resource broker. An improvement by a factor 2 was obtained by using a second resource broker since July 19th.

ATLAS will decide later in the week whether to use a separate job submission system to overcome the limitation.

Procedure of upgrades – change of definition of seconds and minutes

 

Les advocated that the service challenges should be designed to identify potential problems for DCs before problems arise. They should not target artificial things.

 

LHCb have decided two weeks ago in agreement with the LCG-DA to make a pause in their use of LCG resources in order to take the necessary time to understand and fix the problems that have been observed. They are now progressively restarting with LCG2 but one of the problems still is the inertia to have sites upgrading to the middleware that fixes the problems. In the meantime they continued with the same pace as before production through Dirac native sites. On LCG they are still facing Globus problems loosing jobs. – Ian attributed this to wrongly configured nodes. Philippe emphasized the very good working relationship with the GDA group (LCG production deployment and certification tesbed). LHCb plans to continue the data challenge beyond the normal end date of end July to follow the new developments.

 

The ALICE data challenge is still going on – running mostly 500-700 alien jobs concurrently

The LCG job submission is starting now again but it is going very slowly. The reasons are being investigated in close collaboration with

the LCG team.

ALICE would expect 500-600 jobs in parallel from LCG. They are now doing the reconstruction of data and using local storage elements at the external sites. LCG has provided additional 3.5 TB disk servers, which are currently being configured as local SE for processing at CERN.

A.o.b.

A discussion started after the announcement that CERN would no longer distribute security updates to RH 7.3 because those are now provided to CERN by a company (Progeny). RH has ceased to provide support for 7.3 already end 2003.

Bernd considered three options:

1. Try to extend the contract with Progeni to distributing also to CERN-collaborating institutions – (no response was obtained yet)

2. Find and apply the security patches ourselves – this would require 0.5 FTE of a very knowledgeable person

3. Keep the CERN distribution in the current state with the basic 7.3 distribution and the security patches up to now. Expecting that external sites would take care of security issues themselves.

Bernd also noted that the download rate of security patches was very low.

In the absence of any better solution, the PEB endorsed option 3.

The security policy on Linux should be discussed further in the GDB.

 

 

Les reminded that the contributions to the quarterly report are expected by tomorrow, July 21. It is important to proved the contributions in  order to allow for an overall editing and consolidation of the report.

 

Les and Jürgen have drafted a list of topics to be discussed at upcoming PEB meetings. The list will be distributed soon.

 

 

Next meeting – August 3:

 

 

 

Actions

#

Date opened

Description

Responsible

Date closed

1

16dec03

ALICE, CMS and LHCb to name someone responsible for coordinating deployment on LCG-2

Federico, David S., Philippe

Done

2

16dec03

Understand why the substantial resources in Liverpool are not available for LCG-2.

6jan04- visit to RAL organised for 24jan04

Les

Done

3

16dec03

Confirm that the absence of BNL in the LCG-2 deployment list is due to manpower shortage

Les

Done

4

16dec03

Experiments to request through their national contacts that their resources in the core LCG-2 centres are integrated in LCG-2

Federico, Dario, David S., Philippe

Done

5

16dec03

Regional centres to be asked to clarify their mass storage plans.

Presented by RCs in GDB of 13jan04

Les

13jan04

6

12jan04

Revised proposed GAG mandate

Federico

27jan0

7

27jan04

Revised ARDA note

Les

12feb04

8

27jan04

Establish a weekly “Deployment Meeting”

Ian

2feb04

9

27jan04

Note on new project proposal from Trento

Federico

Done

10

12feb04

Define new name for middleware

Bob

11may04

11

12feb04

Nominate Arda contact persons

Experiments

15mar04

12

12feb04

Nominate people for Phase 2 requirements of the experiments

Experiments

Done

13

8jun04

CMS to respond to the LCG Service Challenges proposal

David S.

 

14

20jul04

Clarify the date  of the LHCC comprehensive review

Les

 

15

20jul04

Experiments to nominate participants to the database project

Experiments

 

16

20jul04

Status report of lightweight disk pool manager – Oct 2004

Ian