You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've been investigating performance issues when using the hashing function provided by the tcrypto library.
We have isolated this issue to the way the IPP cryptography primitives are being used: intel/cryptography-primitives#93
To take the specific example of SHA-256, when calling ippsHashMessage_rmf, ippsHashMethod_SHA256_TT is called every time:
Because of the way this was implemented, calling ippsHashMethod_SHA256_TT repeatedly, on a platform supporting SHA-NI, results in every call setting the method.hashUpdate global function pointer to the normal implementation function pointer and then to the NI one:
Since this structure is static and shared across all threads, calling this method from different threads causes the function pointer to keep changing for all threads involved with devastating consequences for the memory caches (on the pure ippcp sample without using SGX, we could see a massive memory bottleneck due to L1D and L3 cache misses using perf).
I'm unsure what is the correct fix for the libtcrypto functions. I have seen some internal code implementing CPU dispatching directly using ippcp internal functions:
Hello,
We've been investigating performance issues when using the hashing function provided by the tcrypto library.
We have isolated this issue to the way the IPP cryptography primitives are being used: intel/cryptography-primitives#93
To take the specific example of SHA-256, when calling
ippsHashMessage_rmf
,ippsHashMethod_SHA256_TT
is called every time:linux-sgx/sdk/tlibcrypto/ipp/sgx_sha256_msg.cpp
Lines 49 to 68 in 7385e10
This seems innocent enough as this code should just return a static structure with function pointers to the specific functions for that hashing primitive. However, the
_TT
methods support dynamic dispatching to the NI implementations of those hashing primitives: https://www.intel.com/content/www/us/en/docs/ipp-crypto/developer-guide-reference/2021-9/one-way-hash-primitives.htmlBecause of the way this was implemented, calling
ippsHashMethod_SHA256_TT
repeatedly, on a platform supporting SHA-NI, results in every call setting themethod.hashUpdate
global function pointer to the normal implementation function pointer and then to the NI one:https://github.com/intel/cryptography-primitives/blob/59a3c2e80c8fccd0d37b7a58020671c5468ec49b/sources/ippcp/hash/sha256/pcphashmethod_sha256_tt.c#L49-L75
Since this structure is static and shared across all threads, calling this method from different threads causes the function pointer to keep changing for all threads involved with devastating consequences for the memory caches (on the pure ippcp sample without using SGX, we could see a massive memory bottleneck due to L1D and L3 cache misses using perf).
I'm unsure what is the correct fix for the libtcrypto functions. I have seen some internal code implementing CPU dispatching directly using ippcp internal functions:
linux-sgx/sdk/tlibcrypto/ipp/ipp_disp/intel64/ippsHashMessage_rmf.c
Lines 61 to 76 in 7385e10
For our use case, we will use ippcp directly, make sure to call
ippsHashMethod_SHA256_TT
only once and cache it.The text was updated successfully, but these errors were encountered: