-
Notifications
You must be signed in to change notification settings - Fork 25
Server thread safety #275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Server thread safety #275
Conversation
…ety, serializing access to shared global resources like NVM and global keycache
billphipps
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Truly excellent! You solved this just the way I had hoped for!
My requested changes are very limited and not really functional. More just fleshing out the exact requirements for a real implementation and a few minor typos and renaming opportunities.
The stress testing framework is outstanding!
| @@ -0,0 +1,149 @@ | |||
| /* | |||
| * Copyright (C) 2024 wolfSSL Inc. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * Copyright (C) 2024 wolfSSL Inc. | |
| * Copyright (C) 2026 wolfSSL Inc. |
Happy new year!
| typedef int (*whLockCleanupCb)(void* context); | ||
|
|
||
| /** Acquire exclusive lock (blocking) */ | ||
| typedef int (*whLockAcquireCb)(void* context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a comment on why a non-blocking version of Acquire is not useful or not required?
| * All callbacks receive a user-provided context pointer (from whLockConfig). | ||
| * Each lock instance protects exactly one resource. | ||
| * | ||
| * Return: WH_ERROR_OK on success, negative error code on failure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add that no state is modified on error, as if the call had never happened.
Recommend that Initialize also acquires the lock, to stop initialization sequencing problems
Recommend that Cleanup is Non-blocking and will either acquire the mutex and destroy it, or mark it to be destroyed by the owner as it attempts to release it (consider atomic set of a flag in the state). This should stop race conditions at cleanup.
Recommend returning:
WH_ERROR_BADARGS: Invalid context pointer (follows EINVAL). Note NULL context means disable and all functions return WH_ERROR_OK.
WH_ERROR_BADARGS: Attempting to Acquire or Release a context that was not Initialized (follows EINVAL)
WH_ERROR_LOCKED: (optional) If non-owner attempts to release an acquired mutex OR if owner attempts to acquire the same mutex it already owns
WH_ERROR_OK: Releasing a lock that has not been acquired but is intialized
WH_ERROR_OK: Cleaning Up a lock that has not been initialized.
| * | ||
| * This function initializes the lock by calling the init callback. | ||
| * If config is NULL or config->cb is NULL, locking is disabled | ||
| * (single-threaded mode) and all lock operations become no-ops. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All functions return WH_ERROR_OK when locking is disabled.
Init should return WH_ERROR_BADARGS when lock is NULL.
| * | ||
| * @param[in] lock Pointer to the lock structure. | ||
| * @return int Returns WH_ERROR_OK on success, or a negative error code on | ||
| * failure (e.g., WH_ERROR_LOCKED if acquisition failed). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't make sense to return ERROR_LOCKED if the acquisition failed. If the callback returned a status that is couldn't acquire the lock due to error, then ABORTED would be a better call. Locked makes me thing someone else has the lock, BUT this function is supposed to block until that is no longer true.
| #include "wolfhsm/wh_lock.h" | ||
| #include "wolfhsm/wh_error.h" | ||
|
|
||
| #ifdef WOLFHSM_CFG_THREADSAFE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the best name? Consider the more mundane WOLFHSM_CFG_LOCKS. Threadsafe may imply more than just locks, like cancelability.
| memset(&lock, 0, sizeof(lock)); | ||
|
|
||
| WH_TEST_PRINT("Testing lock lifecycle...\n"); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider checking that acquire/release/cleanup fail properly (BADARGS?) when on an uninitialized (but zeroed) wh_lock. Note that this should NOT attempt to call any callbacks.
| /* Cleanup should succeed */ | ||
| rc = wh_Lock_Cleanup(&lock); | ||
| WH_TEST_ASSERT_RETURN(rc == WH_ERROR_OK); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verify that Acquire/Release/Cleanup fail properly on a wh_lock after cleanup. This may be the same test as recommended above IF the cleanup zero's the structure. Hint hint.
| /* Configure lock with error-checking mutex for better debugging */ | ||
| memset(&ctx->nvmLockCtx, 0, sizeof(ctx->nvmLockCtx)); | ||
| pthread_mutexattr_init(&ctx->mutexAttr); | ||
| pthread_mutexattr_settype(&ctx->mutexAttr, PTHREAD_MUTEX_ERRORCHECK); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this attribute can set errno values that other types cannot and may not block/fail in the same way a "default" or "normal" pthread mutex will. These values should be trapped in the port because it indicates an undefined behavior WOULD have occurred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding posix into the name of this file since it heavily used posix to provide any real functionality.
rizlik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look into tests yet.
Great work.
Is this lock enough to properly synchronize client request?
Example, _HandleNvmRead:
rc = wh_Nvm_GetMetadata(server->nvm, id, &meta);
if (rc != WH_ERROR_OK) {
return rc;
}
if (offset >= meta.len)
return WH_ERROR_BADARGS;
/* Clamp length to object size */
if ((offset + len) > meta.len) {
len = meta.len - offset;
}
rc = wh_Nvm_ReadChecked(server->nvm, id, offset, len, out_data);
if (rc != WH_ERROR_OK)
metadata can be changed between GetMetadata and ReadChecked.
Also, when handling key request:
/* get a new id if one wasn't provided */
if (WH_KEYID_ISERASED(meta->id)) {
ret = wh_Server_KeystoreGetUniqueId(server, &meta->id);
resp.rc = ret;
}
/* write the key */
if (ret == WH_ERROR_OK) {
ret = wh_Server_KeystoreCacheKeyChecked(server, meta, in);
resp.rc = ret;
}
the id might not be unique anymore when _KeysotreCacheKeyCached.
Would more coarse granular locking at request level simplify the design?
| return ret; | ||
| } | ||
|
|
||
| /* Use unlocked variants since we already hold the lock */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably this is assuming too much on the calling context
| return WH_ERROR_BADARGS; | ||
| } | ||
|
|
||
| ret = _LockKeystore(server); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we skip locking on local cache?
TL;DR: Makes wolfHSM server safe to use in multithreaded scenarios.
Overview
This pull request implements a generic framework for thread-safe access to shared server resources in wolfHSM, specifically targeting the NVM (non-volatile memory) and global key cache subsystems as the first shared data to be protected. Crypto is left to a subsequent PR but is the likely next candidate.
Note that a server context itself still cannot be shared across threads without proper serialization by the caller. This PR just adds the mechanisms such that, when multiple server contexts share an NVM instance or global keystore, access to those shared resources is properly serialized, allowing requests from multiple clients to be processed concurrently in separate threads.
Changes:
wh_lock.{c,h}) with callback-based design for platform independenceWOLFHSM_CFG_THREADSAFEbuild option. When this option is NOT defined, all lock abstraction operations compile to no-ops, with zero overheadGaps/future work: