Concerns and Recommendations from the LHCC Comprehensive Review of LCG - 27 December 2003

 

This note was agreed at the Project Execution Board (PEB) on 2 March.  It refers only to items that were noted in the LHCC Report.

 

Management & General

  • Concern over resources in Regional Centres - MoU should be developed over the next 12 months between the funding agencies and CERN
    • This has now started, under the direction of the CSO.
  • LHCC expressed some reservation on the role and composition of the new SC2 and these issues must be re-visited.
    • To be re-visited in the autumn after the new SC2 has been in operation for 6 months.
  • Propose a new set of Level 1 milestones for the March meeting
    • Being prepared for the March LHCC.

 

 

Middleware

  • Concern that the existing middleware is generally too complex and under-developed, and from past experience the main risk appears to be the lack of product delivery.
    • The Grid Deployment Area (GDA) has taken over maintenance of the source code for all the components of EDG that are used in LCG-2.  They have established a team that provides the first level support with a focus on ensuring that the functionality is well adapted to the needs of the experiments. There are agreements with some of the original developers in order to have access to in-depth expertise as necessary. There is also an agreement with VDT to provide support for the US components, including the Globus toolkit. The strategy is to continue to provide solid support for the middleware that constitutes LCG-2 until it is replaced by improved products, such as that planned by the EGEE project. The support agreements are valid for 2004. They will have to be renegotiated in the autumn, when the timescales of potential successor software is better known.
  • Concern over the difficulties in entering the analysis phase of the project – expecting the ARDA project to have been planned by end January.
    • ARDA project now agreed, initial planning not before the end of March. All four experiments have committed to participate in this project, aimed at prototyping analysis systems. We will have to wait for about six months to see how this develops in practice.
  • The LHCC considers it very important for the middleware project to ensure tight links and collaboration with the US part of the effort and to establish a close and better collaboration with all the Regional Centres.
    • A closer relationship has been established between the current middleware support team in the GDA and VDT, with some funding coming from NSF. The aim is to provide support for VDT components in the LCG-2 distribution, and to investigate the integration of EDG components in VDT.
    • The EGEE middleware team includes Miron Livny, leader of the VDT project.
    • The Grid Deployment Board (GDB) has initiated a working group to investigate the inter-operability issues between different middleware toolkits, such as LCG-2 and the toolkit used by the Grid3 activity in the US. The leader of the working group is Vicky White, FNAL. The first report is due at the March GDB.
    • The strategy is to aim for compatibility between the Tier 1 centres. It is likely that the US Open Science Grid (OSG) initiatives will use software different from that used by EGEE, and that we will have to live with these differences.

 

Fabric

  • No concerns or recommendations

 

Grid Deployment

  • Concern over resources and priorities in Regional Centres. The Committee recommends that the Regional Centres should be queried on how they believe funds will become available to achieve their required computing capacity.
    • The GDB members have agreed to provide information on planned resources in Regional Centres – a two quarter forward look. The first results of this are due at the end of February.
    • The MoU task force will deal with establishing the funding plans of the Tier 0, Tier 1 and large Tier 2 centres.
  • The GDB should ensure that there is more detailed technical discussion.
    • The GDA has set up a weekly coordination meeting to enable improved technical discussion between all the parties involved in deploying, operating and using the service.  In addition there is a weekly phone conference between the core site (Tier 1) system managers and the deployment team to address service and operational issues.
  • Installation is too complex.
    • In the LCG-2 release, the installation of the worker nodes of the batch farms is now much simpler and does not rely on any particular tool.  The deployment team has taken over the maintenance of the source code base, and will work on reducing the dependencies within the software and the way in which it is packaged and delivered.  As part of this work, the installation procedure for the service nodes will be simplified, taking into account the wide range of requirements of the Tier 1 and Tier 2 sites.

 

Applications

  • Concern over the long-term continuity of personnel, and the long-term support of products, in particular the maths library.
    • This is a major concern, as the staffing level of the Applications Area (AA) will begin to decay from the beginning of next year, as the special funding for Phase 1 of the LCG project comes to an end. Resources were foreseen to continue central support at a reduced level, but this has not yet been funded. The AA has begun to develop a programme of work for the coming year, which will be presented to the PEB within the next month to six weeks. Approval of this plan will take account of resource requirements for development and for long-term support.
    • This is being addressed as part of the general Phase 2 planning that is scheduled to be completed by the summer.
  • Requests further clarification of how proposals made in the Architect’s Forum are to be incorporated in to the Applications Area.
    • For proposals in areas already within the scope of the applications area, the Architects Forum (AF) acts as the decision making body, subject to PEB endorsement of significant decisions, particularly those requiring future allocation of resources. Issues where agreement cannot be reached in the AF are taken to the PEB. Taking the case of the proposal (and recommendation of the internal review committee) to pursue a common LCG/ROOT dictionary as an example: The proposal in general terms was discussed in the AF and agreed to be interesting enough to explore its viability in technical discussions to see whether they could lead to a concrete proposal. Technical discussions then took place between the relevant AA project (SEAL) and the ROOT team. The discussions went well and did in fact lead to a concrete proposal to incorporate this objective into AA/SEAL/ROOT plans. The plan was presented to the experiments and after some direct experiment-SEAL iterations the AF agreed to incorporate this into a revised SEAL workplan. At the Feb 19 AF the SEAL workplan incorporating a dictionary convergence plan was agreed. This plan will be presented to the PEB for its endorsement.
    • For proposals in areas outside the present scope of the applications area, applications area projects develop proposals, present them to the AF for modification and approval, and the AF-agreed proposal is presented to the PEB (formerly to the SC2) for a PEB decision. This was done in the case of the conditions DB last year. Another such proposal is currently in preparation (for a detector geometry exchange tool).
    • For proposals within AA scope but for which the AF is unable to reach agreement, a policy has been in place from the beginning to escalate the issue to higher management (now the PEB) for resolution and decision. To date, this course has not had to be invoked.
  • Stresses the importance to support the Monte Carlo generator codes required by the LHC experiments. Such support appears to fit the scope of the Simulation project. 
    • The point here is that the current MC generator services (GENSER) sub-project of the AA Simulation Project does not have a stable staffing plan. The leader of the sub-project, Paolo Bartalini, is a CERN fellow whose contract expires in December. Other major resources are provided by Russia in the context of the CERN-Russia computing protocol. This provides for a team of three people, one of whom would be at CERN at any particular time, on a three-month assignment. There is no commitment to ensure stability of the team membership. As a minimum one suitably qualified person is required at CERN on a longer-term basis to lead this activity and provide basic librarian services.