HTCondor-CE jobs removed prematurely soon after they start running


Jobs are being removed prematurely by SYSTEM_PERIODIC_REMOVE on lhcb-ce:

This appears to be because we're using JobCurrentStartExecutingDate in our expression:

But the removed jobs don't appear to have this attribute. We should use JobCurrentStartDate instead.


Brian Lin
April 14, 2021, 7:57 PM

Similar story here ticket tagged for 5.1.0 (released) and 4.5.2 (not released). How do you want to handle this?

Jaime Frey
March 23, 2021, 6:09 PM

Code Review

The changes in pull requests #439 and #440 look good.

Brian Lin
March 23, 2021, 5:52 PM

Passing back to Jaime for additional code review since we found another problem and needed to cherry pick these changes back to v4 for OSG:

Jaime Frey
March 19, 2021, 7:15 PM

Code Review

This change looks good.

Jaime Frey
March 19, 2021, 7:14 PM

The issue is that JobCurrentStartDate is set when the shadow is born, and JobCurrentStartExecutingDate is set when the job starts (after file transfer).

Brian Lin

