Server thread safety #275

bigbrett · 2026-01-22T22:34:57Z

TL;DR: Makes wolfHSM server safe to use in multithreaded scenarios.

Overview

This pull request implements a generic framework for thread-safe access to shared server resources in wolfHSM, specifically targeting the NVM (non-volatile memory) and global key cache subsystems as the first shared data to be protected. Crypto is left to a subsequent PR but is the likely next candidate.

Note that a server context itself still cannot be shared across threads without proper serialization by the caller. This PR just adds the mechanisms such that, when multiple server contexts share an NVM instance or global keystore, access to those shared resources is properly serialized, allowing requests from multiple clients to be processed concurrently in separate threads.

Changes:

Introduces lock abstraction layer (wh_lock.{c,h}) with callback-based design for platform independence
Example POSIX lock implementation using pthread_mutex
Refactor NVM and keystore layer internals to use lock abstraction such that global keystore and NVM can be shared by multiple server contexts in a thread-safe manner
Thread safe functionality enabled with the WOLFHSM_CFG_THREADSAFE build option. When this option is NOT defined, all lock abstraction operations compile to no-ops, with zero overhead
Adds "thread safe stress test" to test suite that attempts the flesh out data races via a large number of contention cases, meant to be run under ThreadSanitizer

Gaps/future work:

Serializing access to global crypto state, specifically hardware crypto for ports. A bit of a tricky problem since offload is provided at the port level, and there isn't a good way for wolfHSM to know which algos will be accelerated and which wont. A naive implementation might consider simply locking the server crypto context, but this contains a mixture of local (CMAC) and quasi-global (RNG) elements and no abstraction for hardware. Locks also need to be synchronized with the wolfCrypt port mutex. We should refactor the server crypto context and perhaps split it into local and global structures, with the global supporting hardware state. Future work....

…ety, serializing access to shared global resources like NVM and global keycache

billphipps

Truly excellent! You solved this just the way I had hoped for!
My requested changes are very limited and not really functional. More just fleshing out the exact requirements for a real implementation and a few minor typos and renaming opportunities.

The stress testing framework is outstanding!

billphipps · 2026-01-23T21:47:28Z

wolfhsm/wh_lock.h

@@ -0,0 +1,149 @@
+/*
+ * Copyright (C) 2024 wolfSSL Inc.


Suggested change

* Copyright (C) 2024 wolfSSL Inc.

* Copyright (C) 2026 wolfSSL Inc.

Happy new year!

billphipps · 2026-01-23T21:52:58Z

wolfhsm/wh_lock.h

+typedef int (*whLockCleanupCb)(void* context);
+
+/** Acquire exclusive lock (blocking) */
+typedef int (*whLockAcquireCb)(void* context);


Perhaps a comment on why a non-blocking version of Acquire is not useful or not required?

billphipps · 2026-01-24T17:49:22Z

wolfhsm/wh_lock.h

+ * All callbacks receive a user-provided context pointer (from whLockConfig).
+ * Each lock instance protects exactly one resource.
+ *
+ * Return: WH_ERROR_OK on success, negative error code on failure


Add that no state is modified on error, as if the call had never happened.

Recommend that Initialize also acquires the lock, to stop initialization sequencing problems
Recommend that Cleanup is Non-blocking and will either acquire the mutex and destroy it, or mark it to be destroyed by the owner as it attempts to release it (consider atomic set of a flag in the state). This should stop race conditions at cleanup.

Recommend returning:
WH_ERROR_BADARGS: Invalid context pointer (follows EINVAL). Note NULL context means disable and all functions return WH_ERROR_OK.
WH_ERROR_BADARGS: Attempting to Acquire or Release a context that was not Initialized (follows EINVAL)
WH_ERROR_LOCKED: (optional) If non-owner attempts to release an acquired mutex OR if owner attempts to acquire the same mutex it already owns
WH_ERROR_OK: Releasing a lock that has not been acquired but is intialized
WH_ERROR_OK: Cleaning Up a lock that has not been initialized.

billphipps · 2026-01-24T17:50:13Z

wolfhsm/wh_lock.h

+ *
+ * This function initializes the lock by calling the init callback.
+ * If config is NULL or config->cb is NULL, locking is disabled
+ * (single-threaded mode) and all lock operations become no-ops.


All functions return WH_ERROR_OK when locking is disabled.

Init should return WH_ERROR_BADARGS when lock is NULL.

billphipps · 2026-01-24T17:55:31Z

wolfhsm/wh_lock.h

+ *
+ * @param[in] lock Pointer to the lock structure.
+ * @return int Returns WH_ERROR_OK on success, or a negative error code on
+ *         failure (e.g., WH_ERROR_LOCKED if acquisition failed).


Doesn't make sense to return ERROR_LOCKED if the acquisition failed. If the callback returned a status that is couldn't acquire the lock due to error, then ABORTED would be a better call. Locked makes me thing someone else has the lock, BUT this function is supposed to block until that is no longer true.

billphipps · 2026-01-25T16:28:39Z

src/wh_lock.c

+#include "wolfhsm/wh_lock.h"
+#include "wolfhsm/wh_error.h"
+
+#ifdef WOLFHSM_CFG_THREADSAFE


Is this the best name? Consider the more mundane WOLFHSM_CFG_LOCKS. Threadsafe may imply more than just locks, like cancelability.

billphipps · 2026-01-25T16:32:15Z

test/wh_test_lock.c

+    memset(&lock, 0, sizeof(lock));
+
+    WH_TEST_PRINT("Testing lock lifecycle...\n");
+


Consider checking that acquire/release/cleanup fail properly (BADARGS?) when on an uninitialized (but zeroed) wh_lock. Note that this should NOT attempt to call any callbacks.

billphipps · 2026-01-25T16:33:52Z

test/wh_test_lock.c

+    /* Cleanup should succeed */
+    rc = wh_Lock_Cleanup(&lock);
+    WH_TEST_ASSERT_RETURN(rc == WH_ERROR_OK);
+


Verify that Acquire/Release/Cleanup fail properly on a wh_lock after cleanup. This may be the same test as recommended above IF the cleanup zero's the structure. Hint hint.

billphipps · 2026-01-25T16:42:37Z

test/wh_test_threadsafe_stress.c

+    /* Configure lock with error-checking mutex for better debugging */
+    memset(&ctx->nvmLockCtx, 0, sizeof(ctx->nvmLockCtx));
+    pthread_mutexattr_init(&ctx->mutexAttr);
+    pthread_mutexattr_settype(&ctx->mutexAttr, PTHREAD_MUTEX_ERRORCHECK);


Note that this attribute can set errno values that other types cannot and may not block/fail in the same way a "default" or "normal" pthread mutex will. These values should be trapped in the port because it indicates an undefined behavior WOULD have occurred.

billphipps · 2026-01-25T16:47:35Z

test/wh_test_threadsafe_stress.c

Consider adding posix into the name of this file since it heavily used posix to provide any real functionality.

rizlik

I didn't look into tests yet.
Great work.
Is this lock enough to properly synchronize client request?
Example, _HandleNvmRead:

    rc = wh_Nvm_GetMetadata(server->nvm, id, &meta);
    if (rc != WH_ERROR_OK) {
        return rc;
    }

    if (offset >= meta.len)
        return WH_ERROR_BADARGS;

    /* Clamp length to object size */
    if ((offset + len) > meta.len) {
        len = meta.len - offset;
    }

    rc = wh_Nvm_ReadChecked(server->nvm, id, offset, len, out_data);
    if (rc != WH_ERROR_OK)

metadata can be changed between GetMetadata and ReadChecked.
Also, when handling key request:

            /* get a new id if one wasn't provided */
            if (WH_KEYID_ISERASED(meta->id)) {
                ret     = wh_Server_KeystoreGetUniqueId(server, &meta->id);
                resp.rc = ret;
            }
            /* write the key */
            if (ret == WH_ERROR_OK) {
                ret     = wh_Server_KeystoreCacheKeyChecked(server, meta, in);
                resp.rc = ret;
            }

the id might not be unique anymore when _KeysotreCacheKeyCached.

Would more coarse granular locking at request level simplify the design?

rizlik · 2026-01-26T17:20:42Z

src/wh_server_keystore.c

+        return ret;
+    }
+
+    /* Use unlocked variants since we already hold the lock */


probably this is assuming too much on the calling context

rizlik · 2026-01-26T17:26:52Z

src/wh_server_keystore.c

+        return WH_ERROR_BADARGS;
+    }
+
+    ret = _LockKeystore(server);


should we skip locking on local cache?

WOLFHSM_CFG_THREADSAFE: Adds framework for internal server thread saf…

2cfc0e4

…ety, serializing access to shared global resources like NVM and global keycache

bigbrett requested review from AlexLanzano and billphipps January 22, 2026 23:26

bigbrett assigned billphipps and AlexLanzano and unassigned billphipps Jan 22, 2026

bigbrett requested review from JacobBarthelmeh and rizlik January 22, 2026 23:27

bigbrett mentioned this pull request Jan 23, 2026

authentication manager feature addition #270

Open

billphipps requested changes Jan 25, 2026

View reviewed changes

rizlik requested changes Jan 26, 2026

View reviewed changes

	* Copyright (C) 2024 wolfSSL Inc.
	* Copyright (C) 2026 wolfSSL Inc.

		memset(&lock, 0, sizeof(lock));

		WH_TEST_PRINT("Testing lock lifecycle...\n");

Server thread safety #275

Are you sure you want to change the base?

Server thread safety #275

Uh oh!

Conversation

bigbrett commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Changes:

Gaps/future work:

Uh oh!

billphipps left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rizlik left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bigbrett commented Jan 22, 2026 •

edited

Loading