Ensure that critical directories are present when HTCondor starts

Description

HTCondor should make sure that critical directories are present, creating them if necessary and setting proper ownership/permissions when it starts up, instead of relying on system packaging to do this. Relying on system packaging is a) more duplicative work (edit rpm, deb, msi, macport, conda, ...) and b) does not work for tarball and other personal condor installs, and c) fails if the admin changes path locations.

owner

group

mode

param

filename in packaging

comments

root

root

0700

SEC_PASSWORD_DIRECTORY

/etc/condor/passwords.d

 

root

root

0700

SEC_TOKEN_SYSTEM_DIRECTORY

/etc/condor/tokens.d

 

condor

condor

0755

LOCAL_DIR

/var/lib/condor

Probably always exists, condor's home directory in packaging

condor

condor

0755

EXECUTE

/var/lib/condor/execute

 

root

root

0755

SEC_CREDENTIAL_DIRECTORY_KRB

/var/lib/condor/krb_credentials

param conditionally defined

root

condor

2770

SEC_CREDENTIAL_DIRECTORY_OAUTH

/var/lib/condor/oauth_credentials

param conditionally defined

condor

condor

0755

SPOOL

/var/lib/condor/spool

 

condor

condor

0755

LOCAL_UNIV_EXECUTE

/var/lib/condor/spool/local_univ_execute

 

condor

condor

0755

LOCK

/var/lock/condor

Already created by master if root

condor

condor

1777

LOCAL_DISK_LOCK_DIR

/var/lock/condor/local

Already created by master if root, Only used by preen

condor

condor

0755

LOG

/var/log/condor

 

condor

condor

0755

RUN

/var/run/condor

Already created by master if root, not used in tarball

Copied from https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=7637

Activity

Show:
Todd Tannenbaum
March 1, 2021, 8:59 PM

All aspects of code review 2 have been completed.

Zach Miller
February 15, 2021, 9:22 PM

It's okay to Leave LOCAL_DISK_LOCK_DIR in the table, and KEEP the permissions as 01777. Here's why it's not a big deal:

  1. It is undefined in the param table. So in a normal installation no directory will be created by this code. (In fact, you probably want to add a conditional in the current code that checks if the param succeeded.)

  2. If someone HAS defined it, the directory probably already exists, in which case this code again does nothing.

  3. If someone HAS defined it, but the directory doesn't exist, it means they must be moving an old config to a new machine and installing, I guess? In which case, we should create the directory for them so their installation still works. They previously went out of their way to customize this and had to manually set the world-writable permissions on their previous installation, so they presumably did so for a reason.

Code History:

Matt Farallee added this as a param option in 2010. My hunch is that zero people in the world actually customize this value and instead let the hard-coded default of /tmp/condorLocks suffice.

However, for reasons 1 and 2 above, it does no harm in the typical case to leave it present in the table, other than minor code clutter.

Jaime Frey
February 12, 2021, 8:39 PM

EXECUTE and LOCAL_UNIV_EXECUTE don’t need to have permissions 1777. That is only true if they will be on NFS with root-squash. On a local filesystem, 0755 is appropriate.

When creating the job scratch directory, the starter stats the parent directory (EXECUTE or LOCAL_UNIV_EXECUTE). If it’s world-writable, the starter creates the job directory as the user. Otherwise, it creates the directory as condor and then chowns it to the user.

Todd Tannenbaum
February 11, 2021, 2:24 PM
Edited

Code Review #2: There are three things wrong here:

  1. With this patch, jobs cannot run because the condor_master continuously runs as root (instead of user “condor”). As a result, the condor_procd will only accept connections as root, resulting in the schedd never successfully starting up because it fails to connect to the procd. The problem patch 0e9f64 calls get_condor_uid() in the master’s actual main(), before daemon core main sets up the priv system. Patch incoming.

  2. The permissions on SPOOL in the patch do not match the permissions listed in the table on this ticket. Also fixed in incoming patch.

  3. TIm’s first question was never explored, and it should. Specifically, we need to convince ourselves that world-readable directories are really needed. Where did the permissions on the above table come from? Not only are they different than CHTC, they are different than the permissions on our own htcondor/mini:el7 container!

Todd L Miller
January 13, 2021, 12:13 AM
Edited

Confirmed that mkdir just returns -1 if passed the empty string.

List alphabetized.

Time remaining

0m

Assignee

Todd Tannenbaum