condor_gpu_discovery segfaults on chevre
condor_gpu_discovery is segfaulting on chevre. Looks like whatever nvidia driver is on chevre doesn't have MiG support, so dlsym'ing for MIG related functions returns null. Nevertherless, we are trying to call functions like nvmlDeviceGetMaxMigDeviceCount even though they haven't been found by dlsym.
CODE REVIEW : I approve ToddM’s changes.
And with the most recent fix, it works again on chevre, detecting the “GPU” and not crashing.
I approve TJ’s patch.
added null checks for nvmlDeviceGetMaxMigDeviceCount, Greg says it is now crashing on a different null pointer.