Killing Cache-Line Contention with Sharded Counters
1731 words
~9 mins
A single hot AtomicU64 forces every writer to bounce the same cache line between cores. Sharding the counter across cache-line-padded atomics turns a contended fetch_add into uncontended local writes, at the cost of an O(SHARDS) read.