Compensate for short PID reads in procd
In order to compensate for short PID reads in the procd, we must first detect them. We propose the following detection methods; if any of them trigger, we consider the read to be short.
If PID 1 is missing.
If the procd’s PID is missing.
If the procd’s parent (usually the master) is missing.
If “too many” PIDs have gone missing since the last poll.
I expect that method (4) will require considerable tuning to avoid false positives; as a result, I presently intend to implement it as a warning, instead. Methods (1), (2), and (3) indicate inarguably invalid PID reads. (Even the master has exited since the last poll, the procd will have been reparented.)
We can compensate for short reads in a few different ways.
Ignore the short read and use the previous result(s) until the next time we poll.
Ignore the short read and use the previous result(s), but schedule the next poll to be soon.
Immediately retry the read.
Wait for a short amount of time and then retry the read.
We don’t presently know how effective (3) will be, although it’s likely the simplest to implement (and will pose the fewest problems for daemon expecting up-to-date information). If (1) or (2) is too hard to do, (4) may be an acceptable substitute; blocking for, e.g., 50 milliseconds, is unfortunate but much better than having any of the daemons involved EXCEPT.
I’d remove the “Obviously”. Otherwise, the docs look good.
I’d like a second opinion on the documentation.
Closing with the assumption that has dealt with the problem. We’ll open a new subtask under if necessary.
8.9.10-0.521521 was added to the daily repo for Moate to start testing in CHTC on Thursday, 2020-10-22; he was notified around 2:30 PM.