transfer_output_files = . erroneously transfers .machine.ad, so jobs cannot restart

Description

LIGO reports that when users set

transfer_output_files = .

and

when_to_transfer_output = on_exit_or_evict

when jobs restart from an eviction, they are put on hold because they try to xfer back .machine.ad and .job.ad, but those files already exist, and are owned by condor.

Activity

Show:
Greg Thain
April 2, 2021, 5:09 PM

Note that the original documentation was done with tag “HTCondor-267” [sic] and thus did not get picked up in the list of commits. This doc commit was done in SHA e82d1c139052e24b258532b670d17aba760b733c

Todd L Miller
February 11, 2021, 10:09 PM
Edited

Needs a version history item. (It may be helpful to test if transfer_output_files = .job.ad broke things before this patch – and verify that it doesn’t after – for writing the version history item, to document the fix as applicable as it actually is.)

Todd L Miller
February 11, 2021, 10:09 PM

Code Review

The C++ is pretty; I approve. The code passes review, but brings up some follow-on issues I’ve written up as ticket HTCONDOR-268.

Time remaining

0m

Assignee

Greg Thain