condor_starter crashes when receiving an updated proxy when AES is turned on.

Description

To reproduce: run a vanillia universe job that has an associated proxy:

While the job is running, update the proxy on the submit side. This will trigger the shadow to (eventually) send an updated proxy, which will cause the startd to crash.

On line 1199 of V9_0_0 you see this:

And it turns out p_sock->m_send_md_ctx is NULL, causing libcrypto to segfault.

It turns out that the context only gets created when encryption is turned off, probably under the assumption that we want to checksum all the clear text. Then by the time we finalize the checksum, encryption has been turned on. However, in this case this a new TCP connection resuming a non-negotiated session, so encryption just gets turned on immediately after the session resume, and nothing has been sent in the clear from the server to the client. Hence, the context object hadn’t been created. I found several other places where we call EVP_DigestFinal without checking that the context object exists, so in the interest of being through I am adding code in all of these places to create the context object if it hadn’t already been done so.

Activity

Jaime Frey May 12, 2021 at 2:54 AM
Edited

This fix assumes that the SHA256 of an empty buffer is all zeros. That is not the case. Instead, if there’s no context object when we want to call EVP_DigestFinal_ex(), we should stuff all zeros in the result buffer instead. shows what goes wrong with this fix. It also contains the proper fix.

Brian Bockelman May 10, 2021 at 7:15 PM

From the AES code, the diagnosis appears plausible and the corresponding fix looks correct. After this patch, we should have initialized the digest prior to finalizing.

I agree with Greg’s review comments about the stylistic change. Further, I’d like to see a regression test: this seems like a reasonably common thing to do and appears to crash in a default setup.

Greg Thain May 6, 2021 at 3:54 AM

CODE REVIEW I’m not a security expert, but the patch looks good to me. I reproduce the problem before the patch, and with the patch, no crash. Also, with the patch, no new test failures.

Perhaps in V9_0-branch, we could change:

bool init_EVP_context(std::unique_ptr<EVP_MD_CTX, decltype(&EVP_MD_CTX_destroy)> &ctx);
to

std::unique_ptr<EVP_MD_CTX, decltype(..)> init_EVP_context();

as mutating a std::unique_ptr is a bit of an antipattern, and with RVO, returning one “by value” doesn’t actually copy anything, and just looks cleaner, as the caller can do

my_context = init_EVP_context();

And check my_context for nullness (with the overloaded bool conversion) if needed, e.g.

if (my_context) { …

Done

Details

Time tracking

13h logged

Assignee

Strategic PI work

Yes

Fix versions

Priority

HTCondorCustomerGroup

condor-users

Components

Reporter

Created April 29, 2021 at 11:18 PM
Updated July 19, 2021 at 10:12 PM
Resolved May 10, 2021 at 7:40 PM