Concerns and Recommendations from the LHCC Comprehensive
Review of LCG - 27
December 2003
This note was agreed at
the Project Execution Board (PEB) on 2 March. It refers only to items that were noted in the
LHCC Report.
Management & General
- Concern over resources in Regional Centres - MoU should be
developed over the next 12 months between the funding agencies and CERN
- This has now started, under the direction of
the CSO.
- LHCC expressed some reservation on the role and composition of
the new SC2 and these issues must be re-visited.
- To be re-visited in the autumn after the new
SC2 has been in operation for 6 months.
- Propose a new set of Level 1 milestones for the March meeting
- Being prepared for the March LHCC.
Middleware
- Concern that the existing middleware is generally too complex
and under-developed, and from past experience the main risk appears to be
the lack of product delivery.
- The Grid Deployment Area (GDA) has taken
over maintenance of the source code for all the components of EDG that
are used in LCG-2. They have
established a team that provides the first level support with a focus on
ensuring that the functionality is well adapted to the needs of the
experiments. There are agreements with some of the original developers in
order to have access to in-depth expertise as necessary. There is also an
agreement with VDT to provide support for the US
components, including the Globus toolkit. The strategy is to continue to
provide solid support for the middleware that constitutes LCG-2 until it
is replaced by improved products, such as that planned by the EGEE
project. The support agreements are valid for 2004. They will have to be
renegotiated in the autumn, when the timescales of potential successor
software is better known.
- Concern over the difficulties
in entering the analysis phase of the project – expecting the ARDA project
to have been planned by end January.
- ARDA
project now agreed, initial planning not before the end of March. All
four experiments have committed to participate in this project, aimed at
prototyping analysis systems. We will have to wait for about six months
to see how this develops in practice.
- The LHCC considers it very important for the middleware project
to ensure tight links and collaboration with the US
part of the effort and to establish a close and better collaboration with
all the Regional Centres.
- A closer relationship has been established
between the current middleware support team in the GDA and VDT, with some
funding coming from NSF. The aim is to provide support for VDT components
in the LCG-2 distribution, and to investigate the integration of EDG
components in VDT.
- The EGEE middleware team includes Miron
Livny, leader of the VDT project.
- The Grid Deployment Board (GDB) has
initiated a working group to investigate the inter-operability issues between
different middleware toolkits, such as LCG-2 and the toolkit used by the
Grid3 activity in the US.
The leader of the working group is Vicky White, FNAL. The first report is
due at the March GDB.
- The strategy is to aim for compatibility
between the Tier 1 centres. It is likely that the US Open Science Grid (OSG)
initiatives will use software different from that used by EGEE, and that
we will have to live with these differences.
Fabric
- No concerns or recommendations
Grid Deployment
- Concern over resources and priorities in Regional Centres. The
Committee recommends that the Regional Centres should be queried on how
they believe funds will become available to achieve their required
computing capacity.
- The GDB members have agreed to provide information
on planned resources in Regional Centres – a two quarter forward look.
The first results of this are due at the end of February.
- The MoU task force will deal with
establishing the funding plans of the Tier 0, Tier 1 and large Tier 2
centres.
- The GDB should ensure that there is more detailed technical discussion.
- The GDA has set up a weekly coordination
meeting to enable improved technical discussion between all the parties
involved in deploying, operating and using the service. In addition there is a weekly phone
conference between the core site (Tier 1) system managers and the deployment
team to address service and operational issues.
- Installation is too complex.
- In the LCG-2 release, the installation of
the worker nodes of the batch farms is now much simpler and does not rely
on any particular tool. The
deployment team has taken over the maintenance of the source code base,
and will work on reducing the dependencies within the software and the
way in which it is packaged and delivered. As part of this work, the installation
procedure for the service nodes will be simplified, taking into account
the wide range of requirements of the Tier 1 and Tier 2 sites.
Applications
- Concern over the long-term continuity of
personnel, and the long-term support of products, in particular the maths
library.
- This is a major concern, as the staffing
level of the Applications Area (AA) will begin to decay from the
beginning of next year, as the special funding for Phase 1 of the LCG
project comes to an end. Resources were foreseen to continue central
support at a reduced level, but this has not yet been funded. The AA has
begun to develop a programme of work for the coming year, which will be
presented to the PEB within the next month to six weeks. Approval of this
plan will take account of resource requirements for development and for
long-term support.
- This is being addressed as part of the
general Phase 2 planning that is scheduled to be completed by the summer.
- Requests further clarification of how
proposals made in the Architect’s Forum are to be incorporated in to the
Applications Area.
- For proposals in areas already within the
scope of the applications area, the Architects Forum (AF) acts as the
decision making body, subject to PEB endorsement of significant
decisions, particularly those requiring future allocation of resources.
Issues where agreement cannot be reached in the AF are taken to the PEB.
Taking the case of the proposal (and recommendation of the internal
review committee) to pursue a common LCG/ROOT dictionary as an example:
The proposal in general terms was discussed in the AF and agreed to be
interesting enough to explore its viability in technical discussions to
see whether they could lead to a concrete proposal. Technical discussions
then took place between the relevant AA project (SEAL) and the ROOT team.
The discussions went well and did in fact lead to a concrete proposal to
incorporate this objective into AA/SEAL/ROOT plans. The plan was
presented to the experiments and after some direct experiment-SEAL
iterations the AF agreed to incorporate this into a revised SEAL
workplan. At the Feb 19 AF the SEAL workplan incorporating a dictionary convergence plan
was agreed. This plan will be presented to the PEB for its endorsement.
- For proposals in areas outside the present
scope of the applications area, applications area projects develop
proposals, present them to the AF for modification and approval, and the AF-agreed proposal
is presented to the PEB (formerly to the SC2) for a PEB decision. This
was done in the case of the conditions DB last year. Another such
proposal is currently in preparation (for a detector geometry exchange
tool).
- For proposals within AA scope but for which
the AF is unable to reach agreement, a policy has been in place from
the beginning to escalate the issue to higher management (now the PEB)
for resolution and decision. To date, this course has not had to be
invoked.
- Stresses the importance to support the Monte Carlo generator
codes required by the LHC experiments. Such support appears to fit the
scope of the Simulation project.
- The point here is that the current MC
generator services (GENSER) sub-project of the AA Simulation Project does
not have a stable staffing plan. The leader of the sub-project, Paolo
Bartalini, is a CERN fellow whose contract expires in December. Other
major resources are provided by Russia in the context of the CERN-Russia computing protocol. This
provides for a team of three people, one of whom would be at CERN at any
particular time, on a three-month assignment. There is no commitment to
ensure stability of the team membership. As a minimum one suitably
qualified person is required at CERN on a longer-term basis to lead this
activity and provide basic librarian services.