HTCondor-CE jobs removed prematurely soon after they start running

Description

Jobs are being removed prematurely by SYSTEM_PERIODIC_REMOVE on lhcb-ce:

This appears to be because we're using JobCurrentStartExecutingDate in our expression:

But the removed jobs don't appear to have this attribute. We should use JobCurrentStartDate instead.

Activity

Show:
Brian Lin
April 14, 2021, 7:57 PM

Similar story here ticket tagged for 5.1.0 (released) and 4.5.2 (not released). How do you want to handle this?

Jaime Frey
March 23, 2021, 6:09 PM

Code Review

The changes in pull requests #439 and #440 look good.

Brian Lin
March 23, 2021, 5:52 PM

Passing back to Jaime for additional code review since we found another problem and needed to cherry pick these changes back to v4 for OSG:

Jaime Frey
March 19, 2021, 7:15 PM

Code Review

This change looks good.

Jaime Frey
March 19, 2021, 7:14 PM

The issue is that JobCurrentStartDate is set when the shadow is born, and JobCurrentStartExecutingDate is set when the job starts (after file transfer).

Time remaining

0m

Assignee

Brian Lin

Is PATh development

Yes