Compensate for short PID reads in procd

Description

In order to compensate for short PID reads in the procd, we must first detect them. We propose the following detection methods; if any of them trigger, we consider the read to be short.

  1. If PID 1 is missing.

  2. If the procd’s PID is missing.

  3. If the procd’s parent (usually the master) is missing.

  4. If “too many” PIDs have gone missing since the last poll.

I expect that method (4) will require considerable tuning to avoid false positives; as a result, I presently intend to implement it as a warning, instead. Methods (1), (2), and (3) indicate inarguably invalid PID reads. (Even the master has exited since the last poll, the procd will have been reparented.)


We can compensate for short reads in a few different ways.

  1. Ignore the short read and use the previous result(s) until the next time we poll.

  2. Ignore the short read and use the previous result(s), but schedule the next poll to be soon.

  3. Immediately retry the read.

  4. Wait for a short amount of time and then retry the read.

We don’t presently know how effective (3) will be, although it’s likely the simplest to implement (and will pose the fewest problems for daemon expecting up-to-date information). If (1) or (2) is too hard to do, (4) may be an acceptable substitute; blocking for, e.g., 50 milliseconds, is unfortunate but much better than having any of the daemons involved EXCEPT.

Activity

Show:
Jaime Frey
January 25, 2021, 7:23 PM

Code Review

I’d remove the “Obviously”. Otherwise, the docs look good.

Todd L Miller
January 22, 2021, 7:49 PM

I’d like a second opinion on the documentation.

Jaime Frey
January 20, 2021, 8:53 PM

Code Review

Looks good.

Todd L Miller
January 20, 2021, 7:29 PM

Closing with the assumption that has dealt with the problem. We’ll open a new subtask under if necessary.

Todd L Miller
October 26, 2020, 10:40 AM

8.9.10-0.521521 was added to the daily repo for Moate to start testing in CHTC on Thursday, 2020-10-22; he was notified around 2:30 PM.

Due date

None

Time remaining

0m

Assignee

Todd L Miller

Is PATh development

None

Fix versions

Priority

Major

HTCondorCustomerGroup

CHTC

Components

Reporter

Todd L Miller