2019 CC* sites that we expect to have GPU resources available for use:
For the following sites, we are waiting on action from the site admin:
Admin needs to create separate GPU queue
August 13, 2020
Admin needs to add GPU nodes. No timeline.
July 31, 2020
Admin needs to set up new cluster.
June 19, 2020
We have identified 3 different problems with sites missing from the Miron dashboard:
Missing GPU configuration in the CEs and Factory (AMNH, Clarkson): Operations will reach out to site admins and ask about GPU resources available to the OSG.
Topology and factory configuration mismatch (TCNJ, Wayne State): We tag CC* resources at the CE level in Topology but due to the configuration mismatch, TCNJ WSU records are only associated with their site. The Miron dashboard looks for CC* hours based on the CE.
Potential lack of GPU job pressure (TCNJ): We have verified that the factory and CE are configured to request GPUs and that pilots reporting to the Open Science pool are advertising their GPUs. There is a noticeable drop in job pressure within the OSG VO starting in May.
Separately, we’d like to improve the monitoring of CC* GPU resources by advertising total running, idle, and held GPU jobs to the central collector by implementing the following: