condor_gpu_discovery segfaults on chevre

Description

condor_gpu_discovery is segfaulting on chevre. Looks like whatever nvidia driver is on chevre doesn't have MiG support, so dlsym'ing for MIG related functions returns null. Nevertherless, we are trying to call functions like nvmlDeviceGetMaxMigDeviceCount even though they haven't been found by dlsym.

Activity

Show:
John (TJ) Knoeller
March 22, 2021, 3:30 PM

CODE REVIEW : I approve ToddM’s changes.

Greg Thain
March 18, 2021, 4:59 PM

CODE REVIEW

And with the most recent fix, it works again on chevre, detecting the “GPU” and not crashing.

Todd L Miller
March 18, 2021, 4:17 PM

Code Review

I approve TJ’s patch.

Greg Thain
March 15, 2021, 4:03 PM

 

John (TJ) Knoeller
March 15, 2021, 3:51 PM

added null checks for nvmlDeviceGetMaxMigDeviceCount, Greg says it is now crashing on a different null pointer.

Time remaining

0m

Assignee

Todd L Miller