linux-stable/Documentation
Willem de Bruijn b71a75739a bpf: Adjust free target to avoid global starvation of LRU map
[ Upstream commit d4adf1c9ee ]

BPF_MAP_TYPE_LRU_HASH can recycle most recent elements well before the
map is full, due to percpu reservations and force shrink before
neighbor stealing. Once a CPU is unable to borrow from the global map,
it will once steal one elem from a neighbor and after that each time
flush this one element to the global list and immediately recycle it.

Batch value LOCAL_FREE_TARGET (128) will exhaust a 10K element map
with 79 CPUs. CPU 79 will observe this behavior even while its
neighbors hold 78 * 127 + 1 * 15 == 9921 free elements (99%).

CPUs need not be active concurrently. The issue can appear with
affinity migration, e.g., irqbalance. Each CPU can reserve and then
hold onto its 128 elements indefinitely.

Avoid global list exhaustion by limiting aggregate percpu caches to
half of map size, by adjusting LOCAL_FREE_TARGET based on cpu count.
This change has no effect on sufficiently large tables.

Similar to LOCAL_NR_SCANS and lru->nr_scans, introduce a map variable
lru->free_target. The extra field fits in a hole in struct bpf_lru.
The cacheline is already warm where read in the hot path. The field is
only accessed with the lru lock held.

Tested-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://lore.kernel.org/r/20250618215803.3587312-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-17 18:35:21 +02:00
..
ABI x86/bugs: Add a Transient Scheduler Attacks mitigation 2025-07-10 16:03:21 +02:00
accel
accounting
admin-guide x86/bugs: Add a Transient Scheduler Attacks mitigation 2025-07-10 16:03:21 +02:00
arch x86/bugs: Rename MDS machinery to something more generic 2025-07-10 16:03:21 +02:00
block
bpf bpf: Adjust free target to avoid global starvation of LRU map 2025-07-17 18:35:21 +02:00
cdrom
core-api module: Provide EXPORT_SYMBOL_GPL_FOR_MODULES() helper 2025-07-10 16:03:18 +02:00
cpu-freq
crypto
dev-tools
devicetree dt-bindings: serial: 8250: Make clocks and clock-frequency exclusive 2025-07-06 11:00:13 +02:00
doc-guide
driver-api serial: mctrl_gpio: split disable_ms into sync and no_sync APIs 2025-06-04 14:42:07 +02:00
fault-injection
fb
features
filesystems
firmware_class
firmware-guide
fpga
gpu
hid
hwmon hwmon: (dell-smm) Increment the number of fans 2025-06-04 14:42:00 +02:00
i2c
iio
images
infiniband
input
isdn
kbuild
kernel-hacking
leds
litmus-tests
livepatch
locking
maintainer
mhi
misc-devices
mm
netlabel
netlink netlink: specs: rt-link: adjust mctp attribute naming 2025-04-25 10:45:43 +02:00
networking strparser: Add read_sock callback 2025-02-27 04:10:50 -08:00
nvdimm
nvme
PCI
pcmcia
peci
power
powerpc
process
RCU
riscv
rust
scheduler sched/topology: Consolidate and clean up access to a CPU's max compute capacity 2025-05-02 07:50:41 +02:00
scsi
security
sound
sphinx
sphinx-static
spi
staging
target
timers sched/isolation: Prevent boot crash when the boot CPU is nohz_full 2025-03-22 12:50:37 -07:00
tools
trace
translations
usb
userspace-api
virt
w1
watchdog
wmi
.gitignore
atomic_bitops.txt
atomic_t.txt
Changes
CodingStyle
conf.py
docutils.conf
dontdiff
index.rst
Kconfig
Makefile
memory-barriers.txt
SubmittingPatches
subsystem-apis.rst