LHC Computing
Grid Project
DRAFT 3 21/July/2004 –
including comments from David, Philippe, Les, Latchezar,
Bernd
Present:
Dario Barberis,
Actions: Actions are identified by bold blue italics.
The LHCC comprehensive review will take 1.5 to 2 days, most likely on November 22-23. There is a conflict with EGEE. Les fill find out with Jos Engelen on how to solve the conflict.
The application area is interested in NOT having an internal review. There was no objection to this proposal.
Torre reported on the architect’s forum:
Gabriele Cosmo has been nominated simulation project leader.
Les reported from the recent GDB:
Kors Bos has been elected as new chair of the GDB.
Further information from the GDB will be made available by Mirco or Alberto.
Dirk summarized a
proposal on a LCG Database Deployment Project. The project scope is based on
discussions with the representatives of the experiments. The transparencies
contain a good description of the goals and also of the “non-goals”
of the project. It is intended to support more than one database incarnation
– currently ORACLE and MySQL – through a
‘relational abstraction layer’ (RAL) in central applications. Dirk
also discussed possible distribution options and technologies. In the first
phase, during 2005, some applications will still be bound to a database vendor
and therefore to a particular tier – it is expected that Oracle will be
used at tier-0 and tier-1 centres and MySQL on the
others. During that phase, emphasis will be on bulk data transfer issues
between tier-1 and on tier-1 to tier-2 extraction. Together with interested
tier-2 sites, a MySQL service will be defined. In a
second phase the heterogeneous setup of tier-2 sites would be extended to
tier-1 sites. The project will be decomposed into three work packages.
The project would be
run as part of the LCG deployment area in close collaboration with the
application area. As a first step, Dirk proposes that the project working
groups be composed of two people from each experiment (one application
specialist and one for grid deployment) and participants from major grid
services such as LCG RLS and EGEE as well as technology experts from tier-1 and
tier-2 sites. A more detailed work plan will be provided 2 months after project
start.
ATLAS CMS and LHCb
expressed strong interest in the project. The
There are already
sites (Chicago & Argonne) having expressed interest to set up a MySQL service.
Philippe raised the
question of online database support. It was concluded that this should be
included in the requirements assessment.
After the first 2
months of the project a detailed workplan will be
presented; then a workforce from regional centres will be required –
first contacts have already been made.
The PEB goes along
with the project proposal. In the near
future the experiments should nominate people. The tier-1 &2 centres should also get
involved.
At the GDB last week all
experiments said that the functionality will be clearly needed. Other projects
(dCache and CASTOR) will be maintained at the current
level.
The lightweight disk
pool manager is considered a component of gLite that
is required now. Ian’s group will develop the simplest solution along the
lines of the document distributed by Ian.
David Stickland gave the following strong reservations:
"David stated for CMS
that they consider this effort to be premature. While it is evident that dCache is not currently an out-of-the-box solution for
lightweight needs there was a clear message, at least CMS understood this to be
the message, from the GDB (for which unfortunately minutes have not yet
appeared) that LCG should not be developing new solutions, but should be
encouraged to clearly expose the issues and work with the existing technology
providers, such as the dCache team, to develop a
clear specification. Only if these efforts fail or LCG determines that the
level of support required for such a program is not available elsewhere should
a new program be envisaged.
CMS is not aware that
sufficient discussion of the technical requirements and possible alternate
solutions of a lightweight disk pool manager has yet been carried out. CMS
opposes any activity beyond a technical
definition and workplan; the need for a tapeless SE is evident, but we have yet to see any evidence
that what is being proposed is an adequate solution or just the tip of a
complex iceberg that will later have to be fleshed out and result in wasted
(HEP Community) effort."
Les and Ian explained
that dCache was good for larger centres while for the
many small centres the management overhead was too large. One does not aim at a
complicated solution.
Ian will provide a
document with a clear work plan.
Dario got responses
from smaller ATLAS sites that are in favour of having this as soon as possible.
LHCb supports the
project and would like to have it two months ago. Having no storage element at
most sites poses problems for the data challenges when small sites cannot store
the data at their own site.
Les concluded to go
ahead and come back with a status report
to the PEB by end October.
ATLAS:
Dario showed a plot
of the jobs run on the grids participating (LCG, Grid3, and NorduGrid).
On that plot time-zero
is the start of the data challenge on June 24th.
It was noted that
Grid3 production started late. The slope (and therefore throughput) of LCG is a
factor 3 to 4 below expectation. This is attributed to deficiencies of the
resource broker. An improvement by a factor 2 was obtained by using a second
resource broker since July 19th.
ATLAS will decide
later in the week whether to use a separate job submission system to overcome
the limitation.
Procedure of upgrades
– change of definition of seconds and minutes
Les advocated that the
service challenges should be designed to identify potential problems for DCs before problems arise. They should not target
artificial things.
LHCb have decided two weeks ago in agreement with the LCG-DA to make a pause
in their use of LCG resources in order to take the necessary time to understand
and fix the problems that have been observed. They are now progressively
restarting with LCG2 but one of the problems still is the inertia to have sites
upgrading to the middleware that fixes the problems. In the meantime they
continued with the same pace as before production through Dirac
native sites. On LCG they are still facing Globus
problems loosing jobs. – Ian attributed this to wrongly configured nodes.
Philippe emphasized the very good working relationship with the GDA group (LCG
production deployment and certification tesbed). LHCb
plans to continue the data challenge beyond the normal end date of end July to
follow the new developments.
The
The LCG job submission
is starting now again but it is going very slowly. The reasons are being
investigated in close collaboration with
the LCG team.
A discussion started after the announcement that CERN would no longer distribute security updates to RH 7.3 because those are now provided to CERN by a company (Progeny). RH has ceased to provide support for 7.3 already end 2003.
Bernd considered three options:
1. Try to extend the contract with Progeni to distributing also to CERN-collaborating institutions – (no response was obtained yet)
2. Find and apply the security patches ourselves – this would require 0.5 FTE of a very knowledgeable person
3. Keep the CERN distribution in the current state with the basic 7.3 distribution and the security patches up to now. Expecting that external sites would take care of security issues themselves.
Bernd also noted that the download rate of security patches was very low.
In the absence of any better solution, the PEB endorsed option 3.
The security policy on Linux should be discussed further in the GDB.
Les reminded that the contributions to the quarterly report are expected by tomorrow, July 21. It is important to proved the contributions in order to allow for an overall editing and consolidation of the report.
Les and Jürgen have drafted a list of topics to be discussed at upcoming PEB meetings. The list will be distributed soon.
Next meeting – August 3:
|
Actions |
||||
|
# |
Date opened |
Description |
Responsible |
Date closed |
|
1 |
16dec03 |
ALICE, CMS and LHCb to name someone
responsible for coordinating deployment on LCG-2 |
Federico, David S., Philippe |
Done |
|
2 |
16dec03 |
Understand why the substantial resources
in 6jan04- visit to RAL organised for 24jan04 |
Les |
Done |
|
3 |
16dec03 |
Confirm that the absence of BNL in the
LCG-2 deployment list is due to manpower shortage |
Les |
Done |
|
4 |
16dec03 |
Experiments to request through their
national contacts that their resources in the core LCG-2 centres are
integrated in LCG-2 |
Federico, Dario, David S., Philippe |
Done |
|
5 |
16dec03 |
Regional centres to be asked to clarify
their mass storage plans. Presented by RCs
in GDB of 13jan04 |
Les |
13jan04 |
|
6 |
12jan04 |
Revised proposed GAG mandate |
Federico |
27jan0 |
|
7 |
27jan04 |
Revised ARDA note |
Les |
12feb04 |
|
8 |
27jan04 |
Establish a weekly “Deployment Meeting” |
Ian |
2feb04 |
|
9 |
27jan04 |
Note on new project proposal from |
Federico |
Done |
|
10 |
12feb04 |
Define new name for middleware |
Bob |
11may04 |
|
11 |
12feb04 |
Nominate Arda
contact persons |
Experiments |
15mar04 |
|
12 |
12feb04 |
Nominate people for Phase 2 requirements of
the experiments |
Experiments |
Done |
|
13 |
8jun04 |
CMS to respond to the LCG Service
Challenges proposal |
David S. |
|
|
14 |
20jul04 |
Clarify the date of the LHCC comprehensive review |
Les |
|
|
15 |
20jul04 |
Experiments to nominate participants to
the database project |
Experiments |
|
|
16 |
20jul04 |
Status report of lightweight disk pool
manager – Oct 2004 |
Ian |
|