linux-mainline/lib
Thomas Gleixner ee1ee6db07 atomics: Provide rcuref - scalable reference counting
atomic_t based reference counting, including refcount_t, uses
atomic_inc_not_zero() for acquiring a reference. atomic_inc_not_zero() is
implemented with a atomic_try_cmpxchg() loop. High contention of the
reference count leads to retry loops and scales badly. There is nothing to
improve on this implementation as the semantics have to be preserved.

Provide rcuref as a scalable alternative solution which is suitable for RCU
managed objects. Similar to refcount_t it comes with overflow and underflow
detection and mitigation.

rcuref treats the underlying atomic_t as an unsigned integer and partitions
this space into zones:

  0x00000000 - 0x7FFFFFFF	valid zone (1 .. (INT_MAX + 1) references)
  0x80000000 - 0xBFFFFFFF	saturation zone
  0xC0000000 - 0xFFFFFFFE	dead zone
  0xFFFFFFFF   			no reference

rcuref_get() unconditionally increments the reference count with
atomic_add_negative_relaxed(). rcuref_put() unconditionally decrements the
reference count with atomic_add_negative_release().

This unconditional increment avoids the inc_not_zero() problem, but
requires a more complex implementation on the put() side when the count
drops from 0 to -1.

When this transition is detected then it is attempted to mark the reference
count dead, by setting it to the midpoint of the dead zone with a single
atomic_cmpxchg_release() operation. This operation can fail due to a
concurrent rcuref_get() elevating the reference count from -1 to 0 again.

If the unconditional increment in rcuref_get() hits a reference count which
is marked dead (or saturated) it will detect it after the fact and bring
back the reference count to the midpoint of the respective zone. The zones
provide enough tolerance which makes it practically impossible to escape
from a zone.

The racy implementation of rcuref_put() requires to protect rcuref_put()
against a grace period ending in order to prevent a subtle use after
free. As RCU is the only mechanism which allows to protect against that, it
is not possible to fully replace the atomic_inc_not_zero() based
implementation of refcount_t with this scheme.

The final drop is slightly more expensive than the atomic_dec_return()
counterpart, but that's not the case which this is optimized for. The
optimization is on the high frequeunt get()/put() pairs and their
scalability.

The performance of an uncontended rcuref_get()/put() pair where the put()
is not dropping the last reference is still on par with the plain atomic
operations, while at the same time providing overflow and underflow
detection and mitigation.

The performance of rcuref compared to plain atomic_inc_not_zero() and
atomic_dec_return() based reference counting under contention:

 -  Micro benchmark: All CPUs running a increment/decrement loop on an
    elevated reference count, which means the 0 to -1 transition never
    happens.

    The performance gain depends on microarchitecture and the number of
    CPUs and has been observed in the range of 1.3X to 4.7X

 - Conversion of dst_entry::__refcnt to rcuref and testing with the
    localhost memtier/memcached benchmark. That benchmark shows the
    reference count contention prominently.

    The performance gain depends on microarchitecture and the number of
    CPUs and has been observed in the range of 1.1X to 2.6X over the
    previous fix for the false sharing issue vs. struct
    dst_entry::__refcnt.

    When memtier is run over a real 1Gb network connection, there is a
    small gain on top of the false sharing fix. The two changes combined
    result in a 2%-5% total gain for that networked test.

Reported-by: Wangyang Guo <wangyang.guo@intel.com>
Reported-by: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230323102800.158429195@linutronix.de
2023-03-28 10:39:29 +02:00
..
842
crypto crypto: lib/blake2s - Split up test function to halve stack usage 2022-12-30 22:56:27 +08:00
dim
fonts lib/fonts: fix undefined behavior in bit shift for get_default_font 2022-11-18 13:55:09 -08:00
kunit kunit: Fix 'hooks.o' build by recursing into kunit 2023-02-27 14:49:03 -08:00
livepatch
lz4
lzo
math math64: favor kernel-doc from header files 2022-11-21 14:30:53 -07:00
mpi lib/mpi: Fix buffer overrun when SG is too long 2023-01-06 17:15:46 +08:00
pldmfw
raid6 for-6.2/block-2022-12-08 2022-12-13 10:43:59 -08:00
reed_solomon treewide: use get_random_u32_below() instead of deprecated function 2022-11-18 02:15:15 +01:00
test_fortify
vdso lib/vdso: use "grep -E" instead of "egrep" 2022-11-23 19:50:15 +01:00
xz
zlib_deflate lib/zlib: DFLTCC deflate does not write all available bits for Z_NO_FLUSH 2023-02-27 17:00:14 -08:00
zlib_dfltcc lib/zlib: remove redundation assignement of avail_in dfltcc_gdht() 2023-02-02 22:50:10 -08:00
zlib_inflate lib/zlib: Split deflate and inflate states for DFLTCC 2023-02-02 22:50:09 -08:00
zstd
.gitignore
argv_split.c
ashldi3.c
ashrdi3.c
asn1_decoder.c
asn1_encoder.c
assoc_array.c
atomic64_test.c
atomic64.c
audit.c
base64.c
bcd.c
bch.c
bitfield_kunit.c
bitmap.c
bitrev.c
bootconfig-data.S
bootconfig.c
bsearch.c
btree.c
bucket_locks.c
bug.c cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG 2023-01-31 15:01:45 +01:00
build_OID_registry
buildid.c
bust_spinlocks.c
check_signature.c
checksum.c
clz_ctz.c
clz_tab.c
cmdline_kunit.c
cmdline.c
cmpdi2.c
compat_audit.c
cpu_rmap.c
cpumask_kunit.c cpumask: re-introduce constant-sized cpumask optimizations 2023-03-05 14:30:34 -08:00
cpumask.c lib/cpumask: update comment for cpumask_local_spread() 2023-02-07 18:20:00 -08:00
crc4.c
crc7.c
crc8.c
crc16.c
crc32.c
crc32defs.h
crc32test.c
crc64-rocksoft.c
crc64.c
crc-ccitt.c
crc-itu-t.c
crc-t10dif.c
ctype.c
debug_info.c
debug_locks.c
debugobjects.c Non-MM patches for 6.2-rc1. 2022-12-12 17:28:58 -08:00
dec_and_lock.c perf: Fix perf_event_pmu_context serialization 2023-01-31 20:37:18 +01:00
decompress_bunzip2.c
decompress_inflate.c
decompress_unlz4.c
decompress_unlzma.c
decompress_unlzo.c
decompress_unxz.c
decompress_unzstd.c
decompress.c
devmem_is_allowed.c
devres.c
dhry_1.c lib: add Dhrystone benchmark test 2023-02-02 22:50:01 -08:00
dhry_2.c lib: add Dhrystone benchmark test 2023-02-02 22:50:01 -08:00
dhry_run.c lib: add Dhrystone benchmark test 2023-02-02 22:50:01 -08:00
dhry.h lib: add Dhrystone benchmark test 2023-02-02 22:50:01 -08:00
digsig.c
dump_stack.c
dynamic_debug.c
dynamic_queue_limits.c
earlycpio.c
errname.c printf: fix errname.c list 2023-02-15 15:44:43 +01:00
error-inject.c error-injection: remove EI_ETYPE_NONE 2023-02-02 22:50:00 -08:00
errseq.c
extable.c
fault-inject-usercopy.c
fault-inject.c fault-injection: make stacktrace filter works as expected 2022-12-15 16:40:44 -08:00
fdt_addresses.c
fdt_empty_tree.c
fdt_ro.c
fdt_rw.c
fdt_strerror.c
fdt_sw.c
fdt_wip.c
fdt.c
find_bit_benchmark.c treewide: use get_random_u32_below() instead of deprecated function 2022-11-18 02:15:15 +01:00
find_bit.c lib/find: introduce find_nth_and_andnot_bit 2023-02-07 18:20:00 -08:00
flex_proportions.c
fortify_kunit.c kunit/fortify: Validate __alloc_size attribute results 2022-11-22 21:08:28 -08:00
gen_crc32table.c
gen_crc64table.c
genalloc.c lib/genalloc: use try_cmpxchg in {set,clear}_bits_ll 2023-02-02 22:50:05 -08:00
generic-radix-tree.c
glob.c
globtest.c
group_cpus.c genirq/affinity: Only build SMP-only helper functions on SMP kernels 2023-01-18 12:16:47 +01:00
hashtable_test.c lib/hashtable_test.c: add test for the hashtable structure 2023-02-08 14:28:17 -07:00
hexdump.c
hweight.c
idr.c
inflate.c
interval_tree_test.c
interval_tree.c interval-tree: Add a utility to iterate over spans in an interval tree 2022-11-29 16:34:15 -04:00
iomap_copy.c
iomap.c
iommu-helper.c
iov_iter.c 46 fs/cifs (smb3 client) changesets, 37 in fs/cifs and 9 for related helper functions and cleanup outside from Dave Howells and Willy 2023-02-22 17:12:44 -08:00
irq_poll.c
irq_regs.c
is_signed_type_kunit.c lib: assume char is unsigned 2022-11-19 00:56:15 +01:00
is_single_threaded.c
kasprintf.c
Kconfig iommufd for 6.2 2022-12-14 09:15:43 -08:00
Kconfig.debug There is no particular theme here - mainly quick hits all over the tree. 2023-02-23 17:55:40 -08:00
Kconfig.kasan kasan: treat meminstrinsic as builtins in uninstrumented files 2023-03-02 21:54:22 -08:00
Kconfig.kcsan Kernel concurrency sanitizer (KCSAN) updates for v6.3 2023-02-25 13:02:20 -08:00
Kconfig.kfence
Kconfig.kgdb
Kconfig.kmsan kmsan: make sure PREEMPT_RT is off 2022-11-08 15:57:24 -08:00
Kconfig.ubsan
kfifo.c
klist.c
kobject_uevent.c
kobject.c kobject: make dynamic_kobj_ktype and kset_ktype const 2023-02-08 13:34:20 +01:00
kstrtox.c
kstrtox.h
libcrc32c.c
linear_ranges.c
list_debug.c
list_sort.c
list-test.c
llist.c llist: avoid extra memory read in llist_add_batch 2022-11-18 13:55:06 -08:00
locking-selftest-hardirq.h
locking-selftest-mutex.h
locking-selftest-rlock-hardirq.h
locking-selftest-rlock-softirq.h
locking-selftest-rlock.h
locking-selftest-rsem.h
locking-selftest-rtmutex.h
locking-selftest-softirq.h
locking-selftest-spin-hardirq.h
locking-selftest-spin-softirq.h
locking-selftest-spin.h
locking-selftest-wlock-hardirq.h
locking-selftest-wlock-softirq.h
locking-selftest-wlock.h
locking-selftest-wsem.h
locking-selftest.c
lockref.c lockref: stop doing cpu_relax in the cmpxchg loop 2023-01-13 14:35:38 -06:00
logic_iomem.c
logic_pio.c
lru_cache.c lru_cache: remove unused lc_private, lc_set, lc_index_of 2022-11-22 19:38:39 -07:00
lshrdi3.c
Makefile atomics: Provide rcuref - scalable reference counting 2023-03-28 10:39:29 +02:00
maple_tree.c maple_tree: reduce stack usage with gcc-9 and earlier 2023-02-16 20:43:55 -08:00
memcat_p.c
memcpy_kunit.c kunit: memcpy: Split slow memcpy tests into MEMCPY_SLOW_KUNIT_TEST 2023-01-25 12:24:40 -08:00
memory-notifier-error-inject.c
memregion.c
memweight.c
muldi3.c
net_utils.c mac_pton: Don't access memory over expected length 2022-11-09 19:28:02 -08:00
netdev-notifier-error-inject.c
nlattr.c netlink: prevent potential spectre v1 gadgets 2023-01-20 17:52:32 -08:00
nmi_backtrace.c x86/nmi: Print reasons why backtrace NMIs are ignored 2023-01-19 15:55:12 -08:00
notifier-error-inject.c lib/notifier-error-inject: fix error when writing -errno to debugfs file 2022-11-30 16:13:16 -08:00
notifier-error-inject.h
objagg.c
of-reconfig-notifier-error-inject.c
oid_registry.c lib/oid_registry.c: remove redundant assignment to variable num 2022-11-18 13:55:06 -08:00
once.c
overflow_kunit.c overflow: Introduce overflows_type() and castable_to_type() 2022-11-02 12:39:27 -07:00
packing.c lib: packing: replace bit_reverse() with bitrev8() 2022-12-12 15:06:30 -08:00
parman.c
parser.c lib: parser: update documentation for match_NUMBER functions 2023-03-02 21:54:22 -08:00
pci_iomap.c
percpu_counter.c lib/percpu_counter: percpu_counter_add_batch() overflow/underflow 2023-02-02 22:50:01 -08:00
percpu_test.c
percpu-refcount.c percpu-refcount: Use call_rcu_hurry() for atomic switch 2022-11-30 13:16:40 -08:00
plist.c
pm-notifier-error-inject.c
polynomial.c
radix-tree.c lib/radix-tree.c: fix uninitialized variable compilation warning 2022-11-30 16:13:17 -08:00
random32.c
ratelimit.c
rbtree_test.c
rbtree.c
rcuref.c atomics: Provide rcuref - scalable reference counting 2023-03-28 10:39:29 +02:00
ref_tracker.c
refcount.c
rhashtable.c rhashtable: Allow rhashtable to be used from irq-safe contexts 2022-12-09 10:42:56 +00:00
sbitmap.c sbitmap: correct wake_batch recalculation to avoid potential IO hung 2023-01-29 20:03:01 -07:00
scatterlist.c lib/scatterlist: Fix to calculate the last_pg properly 2023-01-16 12:08:31 -04:00
seq_buf.c
sg_pool.c
sg_split.c
show_mem.c
siphash_kunit.c siphash: Convert selftest to KUnit 2022-11-01 10:04:52 -07:00
siphash.c
slub_kunit.c linux-kselftest-kunit-next-6.2-rc1 2022-12-12 16:42:57 -08:00
smp_processor_id.c
sort.c
stackdepot.c lib/stackdepot: move documentation comments to stackdepot.h 2023-02-16 20:43:52 -08:00
stackinit_kunit.c kernel/range: Uplevel the cxl subsystem's range_contains() helper 2023-02-10 17:32:37 -08:00
stmp_device.c
string_helpers.c
string.c lib/string: Use strchr() in strpbrk() 2023-01-27 11:42:57 -08:00
strncpy_from_user.c
strnlen_user.c
strscpy_kunit.c fortify: Short-circuit known-safe calls to strscpy() 2022-11-01 10:04:52 -07:00
syscall.c
test_bitmap.c
test_bitops.c
test_bits.c
test_blackhole_dev.c
test_bpf.c net: remove skb->vlan_present 2022-11-11 18:18:05 -08:00
test_debug_virtual.c
test_dynamic_debug.c
test_firmware.c test_firmware: Use kstrtobool() instead of strtobool() 2023-01-20 14:09:47 +01:00
test_fprobe.c treewide: use get_random_u32_{above,below}() instead of manual loop 2022-11-18 02:15:22 +01:00
test_fpu.c
test_free_pages.c
test_hash.c
test_hexdump.c treewide: use get_random_u32_inclusive() when possible 2022-11-18 02:18:02 +01:00
test_hmm_uapi.h
test_hmm.c
test_ida.c
test_kmod.c test_kmod: stop kernel-doc warnings 2023-01-25 14:07:21 -08:00
test_kprobes.c test_kprobes: Add recursed kprobe test case 2023-02-21 08:52:42 +09:00
test_linear_ranges.c lib/test_linear_ranges: Use LINEAR_RANGE() 2022-11-16 13:32:32 +00:00
test_list_sort.c treewide: use get_random_u32_below() instead of deprecated function 2022-11-18 02:15:15 +01:00
test_lockup.c
test_maple_tree.c test_maple_tree: test modifications while iterating 2023-02-09 16:51:31 -08:00
test_memcat_p.c
test_meminit.c
test_min_heap.c
test_module.c
test_objagg.c
test_parman.c
test_printf.c mm: discard __GFP_ATOMIC 2023-02-02 22:33:13 -08:00
test_ref_tracker.c
test_rhashtable.c Networking changes for 6.2. 2022-12-13 15:47:48 -08:00
test_scanf.c
test_sort.c
test_static_key_base.c
test_static_keys.c
test_string.c
test_sysctl.c testing: use the copyleft-next-0.3.1 SPDX tag 2022-11-08 15:44:02 +01:00
test_ubsan.c
test_user_copy.c
test_uuid.c
test_vmalloc.c lib/test_vmalloc.c: add parameter use_huge for fix_size_alloc_test 2023-01-18 17:12:41 -08:00
test_xarray.c
test-kstrtox.c
test-string_helpers.c treewide: use get_random_u32_below() instead of deprecated function 2022-11-18 02:15:15 +01:00
textsearch.c
timerqueue.c
trace_readwrite.c asm-generic/io: Add _RET_IP_ to MMIO trace for more accurate debug info 2022-11-21 22:02:10 +01:00
ts_bm.c
ts_fsm.c
ts_kmp.c
ubsan.c hardening updates for v6.3-rc1 2023-02-21 11:07:23 -08:00
ubsan.h arm64: Support Clang UBSAN trap codes for better reporting 2023-02-08 15:26:58 -08:00
ucmpdi2.c
ucs2_string.c
usercopy.c uaccess: Add speculation barrier to copy_from_user() 2023-02-21 14:45:22 -08:00
uuid.c
vsprintf.c Random number generator updates for Linux 6.2-rc1. 2022-12-12 16:22:22 -08:00
win_minmax.c lib/win_minmax: use /* notation for regular comments 2023-01-11 16:14:21 -08:00
xarray.c
xxhash.c