mirror of
https://kernel.googlesource.com/pub/scm/linux/kernel/git/stable/linux-stable.git
synced 2025-09-16 12:06:08 +10:00
Patch series "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory". Utilizing DAMON for memory tiering usually requires manual tuning and/or tedious controls. Let it self-tune hotness and coldness thresholds for promotion and demotion aiming high utilization of high memory tiers, by introducing new DAMOS quota goal metrics representing the used and the free memory ratios of specific NUMA nodes. And introduce a sample DAMON module that demonstrates how the new feature can be used for memory tiering use cases. Backgrounds =========== A type of tiered memory system exposes the memory tiers as NUMA nodes. A straightforward pages placement strategy for such systems is placing access-hot and cold pages on upper and lower tiers, reespectively, pursuing higher utilization of upper tiers. Since access temperature can be dynamic, periodically finding and migrating hot pages and cold pages to proper tiers (promoting and demoting) is also required. Linux kernel provides several features for such dynamic and transparent pages placement. Page Faults and LRU ------------------- One widely known way is using NUMA balancing in tiering mode (a.k.a NUMAB-2) and reclaim-based demotion features. In the setup, NUMAB-2 finds hot pages using access check-purpose page faults (a.k.a prot_none) and promote those inside each process' context, until there is no more pages to promote, or the upper tier is filled up and memory pressure happens. In the latter case, LRU-based reclaim logic wakes up as a response to the memory pressure and demotes cold pages to lower tiers in asynchronous (kswapd) and/or synchronous ways (direct reclaim). DAMON ----- Yet another available solution is using DAMOS with migrate_hot and migrate_cold DAMOS actions for promotions and demotions, respectively. To make it optimum, users need to specify aggressiveness and access temperature thresholds for promotions and demotions in a good balance that results in high utilization of upper tiers. The number of parameters is not small, and optimum parameter values depend on characteristics of the underlying hardware and the workload. As a result, it often requires manual, time consuming and repetitive tuning of the DAMOS schemes for given workloads and systems combinations. Self-tuned DAMON-based Memory Tiering ===================================== To solve such manual tuning problems, DAMOS provides aim-oriented feedback-driven quotas self-tuning. Using the feature, we design a self-tuned DAMON-based memory tiering for general multi-tier memory systems. For each memory tier node, if it has a lower tier, run a DAMOS scheme that demotes cold pages of the node, auto-tuning the aggressiveness aiming an amount of free space of the node. The free space is for keeping the headroom that avoids significant memory pressure during upper tier memory usage spike, and promoting hot pages from the lower tier. For each memory tier node, if it has an upper tier, run a DAMOS scheme that promotes hot pages of the current node to the upper tier, auto-tuning the aggressiveness aiming a high utilization ratio of the upper tier. The target ratio is to ensure higher tiers are utilized as much as possible. It should match with the headroom for demotion scheme, but have slight overlap, to ensure promotion and demotion are not entirely stopped. The aim-oriented aggressiveness auto-tuning of DAMOS is already available. Hence, to make such tiering solution implementation, only new quota goal metrics for utilization and free space ratio of specific NUMA node need to be developed. Discussions =========== The design imposes below discussion points. Expected Behaviors ------------------ The system will let upper tier memory node accommodates as many hot data as possible. If total amount of the data is less than the top tier memory's promotion/demotion target utilization, entire data will be just placed on the top tier. Promotion scheme will do nothing since there is no data to promote. Demotion scheme will also do nothing since the free space ratio of the top tier is higher than the goal. Only if the amount of data is larger than the top tier's utilization ratio, demotion scheme will demote cold pages and ensure the headroom free space. Since the promotion and demotion schemes for a single node has small overlap at their target utilization and free space goals, promotions and demotions will continue working with a moderate aggressiveness level. It will keep all data is placed on access hotness under dynamic access pattern, while minimizing the migration overhead. In any case, each node will keep headroom free space and as many upper tiers are utilized as possible. Ease of Use ----------- Users still need to set the target utilization and free space ratio, but it will be easier to set. We argue 99.7 % utilization and 0.5 % free space ratios can be good default values. It can be easily adjusted based on desired headroom size of given use case. Users are also still required to answer the minimum coldness and hotness thresholds. Together with monitoring intervals auto-tuning[2], DAMON will always show meaningful amount of hot and cold memory. And DAMOS quota's prioritization mechanism will make good decision as long as the source information is that colorful. Hence, users can very naively set the minimum criterias. We believe any access observation and no access observation within last one aggregation interval is enough for minimum hot and cold regions criterias. General Tiered Memory Setup Applicability ----------------------------------------- The design can be applied to any number of tiers having any performance characteristics, as long as they can be hierarchical. Hence, applying the system to different tiered memory system will be straightforward. Note that this assumes only single CPU NUMA node case. Because today's DAMON is not aware of which CPU made each access, applying this on systems having multiple CPU NUMA nodes can be complicated. We are planning to extend DAMON for the use case, but that's out of the scope of this patch series. How To Use ---------- Users can implement the auto-tuned DAMON-based memory tiering using DAMON sysfs interface. It can be easily done using DAMON user-space tool like user-space tool. Below evaluation results section shows an example DAMON user-space tool command for that. For wider and simpler deployment, having a kernel module that sets up and run the DAMOS schemes via DAMON kernel API can be useful. The module can enable the memory tiering at boot time via kernel command line parameter or at run time with single command. This patch series implements a sample DAMON kernel module that shows how such module can be implemented. Comparison To Page Faults and LRU-based Approaches -------------------------------------------------- The existing page faults based promotion (NUMAB-2) does hot pages detection and migration in the process context. When there are many pages to promote, it can block the progress of the application's real works. DAMOS works in asynchronous worker thread, so it doesn't block the real works. NUMAB-2 doesn't provide a way to control aggressiveness of promotion other than the maximum amount of pages to promote per given time widnow. If hot pages are found, promotions can happen in the upper-bound speed, regardless of upper tier's memory pressure. If the maximum speed is not well set for the given workload, it can result in slow promotion or unnecessary memory pressure. Self-tuned DAMON-based memory tiering alleviates the problem by adjusting the speed based on current utilization of the upper tier. LRU-based demotion can be triggered in both asynchronous (kswapd) and synchronous (direct reclaim) ways. Other than the way of finding cold pages, asynchronous LRU-based demotion and DAMON-based demotion has no big difference. DAMON-based demotion can make a better balancing with DAMON-based promotion, though. The LRU-based demotion can do better than DAMON-based demotion when the tier is having significant memory pressure. It would be wise to use DAMON-based demotion as a proactive and primary one, but utilizing LRU-based demotions together as a fast backup solution. Evaluation ========== In short, under a setup that requires fast and frequent promotions, self-tuned DAMON-based memory tiering's hot pages promotion improves performance about 4.42 %. We believe this shows self-tuned DAMON-based promotion's effectiveness. Meanwhile, NUMAB-2's hot pages promotion degrades the performance about 7.34 %. We suspect the degradation is mostly due to NUMAB-2's synchronous nature that can block the application's progress, which highlights the advantage of DAMON-based solution's asynchronous nature. Note that the test was done with the RFC version of this patch series. We don't run it again since this patch series got no meaningful change after the RFC, while the test takes pretty long time. Setup ----- Hardware. Use a machine that equips 250 GiB DRAM memory tier and 50 GiB CXL memory tier. The tiers are exposed as NUMA nodes 0 and 1, respectively. Kernel. Use Linux kernel v6.13 that modified as following. Add all DAMON patches that available on mm tree of 2025-03-15, and this patch series. Also modify it to ignore mempolicy() system calls, to avoid bad effects from application's traditional NUMA systems assumed optimizations. Workload. Use a modified version of Taobench benchmark[3] that available on DCPerf benchmark suite. It represents an in-memory caching workload. We set its 'memsize', 'warmup_time', and 'test_time' parameter as 340 GiB, 2,500 seconds and 1,440 seconds. The parameters are chosen to ensure the workload uses more than DRAM memory tier. Its RSS under the parameter grows to 270 GiB within the warmup time. It turned out the workload has a very static access pattrn. Only about 13 % of the RSS is frequently accessed from the beginning to end. Hence promotion shows no meaningful performance difference regardless of different design and implementations. We therefore modify the kernel to periodically demote up to 10 GiB hot pages and promote up to 10 GiB cold pages once per minute. The intention is to simulate periodic access pattern changes. The hotness and coldness threshold is very naively set so that it is more like random access pattern change rather than strict hot/cold pages exchange. This is why we call the workload as "modified". It is implemented as two DAMOS schemes each running on an asynchronous thread. It can be reproduced with DAMON user-space tool like below. # ./damo start \ --ops paddr --numa_node 0 --monitoring_intervals 10s 200s 200s \ --damos_action migrate_hot 1 \ --damos_quota_interval 60s --damos_quota_space 10G \ --ops paddr --numa_node 1 --monitoring_intervals 10s 200s 200s \ --damos_action migrate_cold 0 \ --damos_quota_interval 60s --damos_quota_space 10G \ --nr_schemes 1 1 --nr_targets 1 1 --nr_ctxs 1 1 System configurations. Use below variant system configurations. - Baseline. No memory tiering features are turned on. - Numab_tiering. On the baseline, enable NUMAB-2 and relcaim-based demotion. In detail, following command is executed: echo 2 > /proc/sys/kernel/numa_balancing; echo 1 > /sys/kernel/mm/numa/demotion_enabled; echo 7 > /proc/sys/vm/zone_reclaim_mode - DAMON_tiering. On the baseline, utilize self-tuned DAMON-based memory tiering implementation via DAMON user-space tool. It utilizes two kernel threads, namely promotion thread and demotion thread. Demotion thread monitors access pattern of DRAM node using DAMON with auto-tuned monitoring intervals aiming 4% DAMON-observed access ratio, and demote coldest pages up to 200 MiB per second aiming 0.5% free space of DRAM node. Promotion thread monitors CXL node using same intervals auto-tuning, and promote hot pages in same way but aiming for 99.7% utilization of DRAM node. Because DAMON provides only best-effort accuracy, add young page DAMOS filters to allow only and reject all young pages at promoting and demoting, respectively. It can be reproduced with DAMON user-space tool like below. # ./damo start \ --numa_node 0 --monitoring_intervals_goal 4% 3 5ms 10s \ --damos_action migrate_cold 1 --damos_access_rate 0% 0% \ --damos_apply_interval 1s \ --damos_quota_interval 1s --damos_quota_space 200MB \ --damos_quota_goal node_mem_free_bp 0.5% 0 \ --damos_filter reject young \ --numa_node 1 --monitoring_intervals_goal 4% 3 5ms 10s \ --damos_action migrate_hot 0 --damos_access_rate 5% max \ --damos_apply_interval 1s \ --damos_quota_interval 1s --damos_quota_space 200MB \ --damos_quota_goal node_mem_used_bp 99.7% 0 \ --damos_filter allow young \ --damos_nr_quota_goals 1 1 --damos_nr_filters 1 1 \ --nr_targets 1 1 --nr_schemes 1 1 --nr_ctxs 1 1 Measurment Results ------------------ On each system configuration, run the modified version of Taobench and collect 'score'. 'score' is a metric that calculated and provided by Taobench to represents the performance of the run on the system. To handle the measurement errors, repeat the measurement five times. The results are as below. Config Score Stdev (%) Normalized Baseline 1.6165 0.0319 1.9764 1.0000 Numab_tiering 1.4976 0.0452 3.0209 0.9264 DAMON_tiering 1.6881 0.0249 1.4767 1.0443 'Config' column shows the system config of the measurement. 'Score' column shows the 'score' measurement in average of the five runs on the system config. 'Stdev' column shows the standsard deviation of the five measurements of the scores. '(%)' column shows the 'Stdev' to 'Score' ratio in percentage. Finally, 'Normalized' column shows the averaged score values of the configs that normalized to that of 'Baseline'. The periodic hot pages demotion and cold pages promotion that was conducted to simulate dynamic access pattern was started from the beginning of the workload. It resulted in the DRAM tier utilization always under the watermark, and hence no real demotion was happened for all test runs. This means the above results show no difference between LRU-based and DAMON-based demotions. Only difference between NUMAB-2 and DAMON-based promotions are represented on the results. Numab_tiering config degraded the performance about 7.36 %. We suspect this happened because NUMAB-2's synchronous promotion was blocking the Taobench's real work progress. DAMON_tiering config improved the performance about 4.43 %. We believe this shows effectiveness of DAMON-based promotion that didn't block Taobench's real work progress due to its asynchronous nature. Also this means DAMON's monitoring results are accurate enough to provide visible amount of improvement. Evaluation Limitations ---------------------- As mentioned above, this evaluation shows only comparison of promotion mechanisms. DAMON-based tiering is recommended to be used together with reclaim-based demotion as a faster backup under significant memory pressure, though. From some perspective, the modified version of Taobench may seems making the picture distorted too much. It would be better to evaluate with more realistic workload, or more finely tuned micro benchmarks. Patch Sequence ============== The first patch (patch 1) implements two new quota goal metrics on core layer and expose it to DAMON core kernel API. The second and third ones (patches 2 and 3) further link it to DAMON sysfs interface. Three following patches (patches 4-6) document the new feature and sysfs file on design, usage, and ABI documents. The final one (patch 7) implements a working version of a self-tuned DAMON-based memory tiering solution in an incomplete but easy to understand form as a kernel module under samples/damon/ directory. References ========== [1] https://lore.kernel.org/20231112195602.61525-1-sj@kernel.org/ [2] https://lore.kernel.org/20250303221726.484227-1-sj@kernel.org [3] https://github.com/facebookresearch/DCPerf/blob/main/packages/tao_bench/README.md This patch (of 7): Used and free space ratios for specific NUMA nodes can be useful inputs for NUMA-specific DAMOS schemes' aggressiveness self-tuning feedback loop. Implement DAMOS quota goal metrics for such self-tuned schemes. Link: https://lkml.kernel.org/r/20250420194030.75838-1-sj@kernel.org Link: https://lkml.kernel.org/r/20250420194030.75838-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Yunjeong Mun <yunjeong.mun@sk.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
947 lines
36 KiB
C
947 lines
36 KiB
C
/* SPDX-License-Identifier: GPL-2.0 */
|
|
/*
|
|
* DAMON api
|
|
*
|
|
* Author: SeongJae Park <sj@kernel.org>
|
|
*/
|
|
|
|
#ifndef _DAMON_H_
|
|
#define _DAMON_H_
|
|
|
|
#include <linux/memcontrol.h>
|
|
#include <linux/mutex.h>
|
|
#include <linux/time64.h>
|
|
#include <linux/types.h>
|
|
#include <linux/random.h>
|
|
|
|
/* Minimal region size. Every damon_region is aligned by this. */
|
|
#define DAMON_MIN_REGION PAGE_SIZE
|
|
/* Max priority score for DAMON-based operation schemes */
|
|
#define DAMOS_MAX_SCORE (99)
|
|
|
|
/* Get a random number in [l, r) */
|
|
static inline unsigned long damon_rand(unsigned long l, unsigned long r)
|
|
{
|
|
return l + get_random_u32_below(r - l);
|
|
}
|
|
|
|
/**
|
|
* struct damon_addr_range - Represents an address region of [@start, @end).
|
|
* @start: Start address of the region (inclusive).
|
|
* @end: End address of the region (exclusive).
|
|
*/
|
|
struct damon_addr_range {
|
|
unsigned long start;
|
|
unsigned long end;
|
|
};
|
|
|
|
/**
|
|
* struct damon_size_range - Represents size for filter to operate on [@min, @max].
|
|
* @min: Min size (inclusive).
|
|
* @max: Max size (inclusive).
|
|
*/
|
|
struct damon_size_range {
|
|
unsigned long min;
|
|
unsigned long max;
|
|
};
|
|
|
|
/**
|
|
* struct damon_region - Represents a monitoring target region.
|
|
* @ar: The address range of the region.
|
|
* @sampling_addr: Address of the sample for the next access check.
|
|
* @nr_accesses: Access frequency of this region.
|
|
* @nr_accesses_bp: @nr_accesses in basis point (0.01%) that updated for
|
|
* each sampling interval.
|
|
* @list: List head for siblings.
|
|
* @age: Age of this region.
|
|
*
|
|
* @nr_accesses is reset to zero for every &damon_attrs->aggr_interval and be
|
|
* increased for every &damon_attrs->sample_interval if an access to the region
|
|
* during the last sampling interval is found. The update of this field should
|
|
* not be done with direct access but with the helper function,
|
|
* damon_update_region_access_rate().
|
|
*
|
|
* @nr_accesses_bp is another representation of @nr_accesses in basis point
|
|
* (1 in 10,000) that updated for every &damon_attrs->sample_interval in a
|
|
* manner similar to moving sum. By the algorithm, this value becomes
|
|
* @nr_accesses * 10000 for every &struct damon_attrs->aggr_interval. This can
|
|
* be used when the aggregation interval is too huge and therefore cannot wait
|
|
* for it before getting the access monitoring results.
|
|
*
|
|
* @age is initially zero, increased for each aggregation interval, and reset
|
|
* to zero again if the access frequency is significantly changed. If two
|
|
* regions are merged into a new region, both @nr_accesses and @age of the new
|
|
* region are set as region size-weighted average of those of the two regions.
|
|
*/
|
|
struct damon_region {
|
|
struct damon_addr_range ar;
|
|
unsigned long sampling_addr;
|
|
unsigned int nr_accesses;
|
|
unsigned int nr_accesses_bp;
|
|
struct list_head list;
|
|
|
|
unsigned int age;
|
|
/* private: Internal value for age calculation. */
|
|
unsigned int last_nr_accesses;
|
|
};
|
|
|
|
/**
|
|
* struct damon_target - Represents a monitoring target.
|
|
* @pid: The PID of the virtual address space to monitor.
|
|
* @nr_regions: Number of monitoring target regions of this target.
|
|
* @regions_list: Head of the monitoring target regions of this target.
|
|
* @list: List head for siblings.
|
|
*
|
|
* Each monitoring context could have multiple targets. For example, a context
|
|
* for virtual memory address spaces could have multiple target processes. The
|
|
* @pid should be set for appropriate &struct damon_operations including the
|
|
* virtual address spaces monitoring operations.
|
|
*/
|
|
struct damon_target {
|
|
struct pid *pid;
|
|
unsigned int nr_regions;
|
|
struct list_head regions_list;
|
|
struct list_head list;
|
|
};
|
|
|
|
/**
|
|
* enum damos_action - Represents an action of a Data Access Monitoring-based
|
|
* Operation Scheme.
|
|
*
|
|
* @DAMOS_WILLNEED: Call ``madvise()`` for the region with MADV_WILLNEED.
|
|
* @DAMOS_COLD: Call ``madvise()`` for the region with MADV_COLD.
|
|
* @DAMOS_PAGEOUT: Call ``madvise()`` for the region with MADV_PAGEOUT.
|
|
* @DAMOS_HUGEPAGE: Call ``madvise()`` for the region with MADV_HUGEPAGE.
|
|
* @DAMOS_NOHUGEPAGE: Call ``madvise()`` for the region with MADV_NOHUGEPAGE.
|
|
* @DAMOS_LRU_PRIO: Prioritize the region on its LRU lists.
|
|
* @DAMOS_LRU_DEPRIO: Deprioritize the region on its LRU lists.
|
|
* @DAMOS_MIGRATE_HOT: Migrate the regions prioritizing warmer regions.
|
|
* @DAMOS_MIGRATE_COLD: Migrate the regions prioritizing colder regions.
|
|
* @DAMOS_STAT: Do nothing but count the stat.
|
|
* @NR_DAMOS_ACTIONS: Total number of DAMOS actions
|
|
*
|
|
* The support of each action is up to running &struct damon_operations.
|
|
* &enum DAMON_OPS_VADDR and &enum DAMON_OPS_FVADDR supports all actions except
|
|
* &enum DAMOS_LRU_PRIO and &enum DAMOS_LRU_DEPRIO. &enum DAMON_OPS_PADDR
|
|
* supports only &enum DAMOS_PAGEOUT, &enum DAMOS_LRU_PRIO, &enum
|
|
* DAMOS_LRU_DEPRIO, and &DAMOS_STAT.
|
|
*/
|
|
enum damos_action {
|
|
DAMOS_WILLNEED,
|
|
DAMOS_COLD,
|
|
DAMOS_PAGEOUT,
|
|
DAMOS_HUGEPAGE,
|
|
DAMOS_NOHUGEPAGE,
|
|
DAMOS_LRU_PRIO,
|
|
DAMOS_LRU_DEPRIO,
|
|
DAMOS_MIGRATE_HOT,
|
|
DAMOS_MIGRATE_COLD,
|
|
DAMOS_STAT, /* Do nothing but only record the stat */
|
|
NR_DAMOS_ACTIONS,
|
|
};
|
|
|
|
/**
|
|
* enum damos_quota_goal_metric - Represents the metric to be used as the goal
|
|
*
|
|
* @DAMOS_QUOTA_USER_INPUT: User-input value.
|
|
* @DAMOS_QUOTA_SOME_MEM_PSI_US: System level some memory PSI in us.
|
|
* @DAMOS_QUOTA_NODE_MEM_USED_BP: MemUsed ratio of a node.
|
|
* @DAMOS_QUOTA_NODE_MEM_FREE_BP: MemFree ratio of a node.
|
|
* @NR_DAMOS_QUOTA_GOAL_METRICS: Number of DAMOS quota goal metrics.
|
|
*
|
|
* Metrics equal to larger than @NR_DAMOS_QUOTA_GOAL_METRICS are unsupported.
|
|
*/
|
|
enum damos_quota_goal_metric {
|
|
DAMOS_QUOTA_USER_INPUT,
|
|
DAMOS_QUOTA_SOME_MEM_PSI_US,
|
|
DAMOS_QUOTA_NODE_MEM_USED_BP,
|
|
DAMOS_QUOTA_NODE_MEM_FREE_BP,
|
|
NR_DAMOS_QUOTA_GOAL_METRICS,
|
|
};
|
|
|
|
/**
|
|
* struct damos_quota_goal - DAMOS scheme quota auto-tuning goal.
|
|
* @metric: Metric to be used for representing the goal.
|
|
* @target_value: Target value of @metric to achieve with the tuning.
|
|
* @current_value: Current value of @metric.
|
|
* @last_psi_total: Last measured total PSI
|
|
* @nid: Node id.
|
|
* @list: List head for siblings.
|
|
*
|
|
* Data structure for getting the current score of the quota tuning goal. The
|
|
* score is calculated by how close @current_value and @target_value are. Then
|
|
* the score is entered to DAMON's internal feedback loop mechanism to get the
|
|
* auto-tuned quota.
|
|
*
|
|
* If @metric is DAMOS_QUOTA_USER_INPUT, @current_value should be manually
|
|
* entered by the user, probably inside the kdamond callbacks. Otherwise,
|
|
* DAMON sets @current_value with self-measured value of @metric.
|
|
*/
|
|
struct damos_quota_goal {
|
|
enum damos_quota_goal_metric metric;
|
|
unsigned long target_value;
|
|
unsigned long current_value;
|
|
/* metric-dependent fields */
|
|
union {
|
|
u64 last_psi_total;
|
|
int nid;
|
|
};
|
|
struct list_head list;
|
|
};
|
|
|
|
/**
|
|
* struct damos_quota - Controls the aggressiveness of the given scheme.
|
|
* @reset_interval: Charge reset interval in milliseconds.
|
|
* @ms: Maximum milliseconds that the scheme can use.
|
|
* @sz: Maximum bytes of memory that the action can be applied.
|
|
* @goals: Head of quota tuning goals (&damos_quota_goal) list.
|
|
* @esz: Effective size quota in bytes.
|
|
*
|
|
* @weight_sz: Weight of the region's size for prioritization.
|
|
* @weight_nr_accesses: Weight of the region's nr_accesses for prioritization.
|
|
* @weight_age: Weight of the region's age for prioritization.
|
|
*
|
|
* To avoid consuming too much CPU time or IO resources for applying the
|
|
* &struct damos->action to large memory, DAMON allows users to set time and/or
|
|
* size quotas. The quotas can be set by writing non-zero values to &ms and
|
|
* &sz, respectively. If the time quota is set, DAMON tries to use only up to
|
|
* &ms milliseconds within &reset_interval for applying the action. If the
|
|
* size quota is set, DAMON tries to apply the action only up to &sz bytes
|
|
* within &reset_interval.
|
|
*
|
|
* To convince the different types of quotas and goals, DAMON internally
|
|
* converts those into one single size quota called "effective quota". DAMON
|
|
* internally uses it as the only one real quota. The conversion is made as
|
|
* follows.
|
|
*
|
|
* The time quota is transformed to a size quota using estimated throughput of
|
|
* the scheme's action. DAMON then compares it against &sz and uses smaller
|
|
* one as the effective quota.
|
|
*
|
|
* If @goals is not empty, DAMON calculates yet another size quota based on the
|
|
* goals using its internal feedback loop algorithm, for every @reset_interval.
|
|
* Then, if the new size quota is smaller than the effective quota, it uses the
|
|
* new size quota as the effective quota.
|
|
*
|
|
* The resulting effective size quota in bytes is set to @esz.
|
|
*
|
|
* For selecting regions within the quota, DAMON prioritizes current scheme's
|
|
* target memory regions using the &struct damon_operations->get_scheme_score.
|
|
* You could customize the prioritization logic by setting &weight_sz,
|
|
* &weight_nr_accesses, and &weight_age, because monitoring operations are
|
|
* encouraged to respect those.
|
|
*/
|
|
struct damos_quota {
|
|
unsigned long reset_interval;
|
|
unsigned long ms;
|
|
unsigned long sz;
|
|
struct list_head goals;
|
|
unsigned long esz;
|
|
|
|
unsigned int weight_sz;
|
|
unsigned int weight_nr_accesses;
|
|
unsigned int weight_age;
|
|
|
|
/* private: */
|
|
/* For throughput estimation */
|
|
unsigned long total_charged_sz;
|
|
unsigned long total_charged_ns;
|
|
|
|
/* For charging the quota */
|
|
unsigned long charged_sz;
|
|
unsigned long charged_from;
|
|
struct damon_target *charge_target_from;
|
|
unsigned long charge_addr_from;
|
|
|
|
/* For prioritization */
|
|
unsigned int min_score;
|
|
|
|
/* For feedback loop */
|
|
unsigned long esz_bp;
|
|
};
|
|
|
|
/**
|
|
* enum damos_wmark_metric - Represents the watermark metric.
|
|
*
|
|
* @DAMOS_WMARK_NONE: Ignore the watermarks of the given scheme.
|
|
* @DAMOS_WMARK_FREE_MEM_RATE: Free memory rate of the system in [0,1000].
|
|
* @NR_DAMOS_WMARK_METRICS: Total number of DAMOS watermark metrics
|
|
*/
|
|
enum damos_wmark_metric {
|
|
DAMOS_WMARK_NONE,
|
|
DAMOS_WMARK_FREE_MEM_RATE,
|
|
NR_DAMOS_WMARK_METRICS,
|
|
};
|
|
|
|
/**
|
|
* struct damos_watermarks - Controls when a given scheme should be activated.
|
|
* @metric: Metric for the watermarks.
|
|
* @interval: Watermarks check time interval in microseconds.
|
|
* @high: High watermark.
|
|
* @mid: Middle watermark.
|
|
* @low: Low watermark.
|
|
*
|
|
* If &metric is &DAMOS_WMARK_NONE, the scheme is always active. Being active
|
|
* means DAMON does monitoring and applying the action of the scheme to
|
|
* appropriate memory regions. Else, DAMON checks &metric of the system for at
|
|
* least every &interval microseconds and works as below.
|
|
*
|
|
* If &metric is higher than &high, the scheme is inactivated. If &metric is
|
|
* between &mid and &low, the scheme is activated. If &metric is lower than
|
|
* &low, the scheme is inactivated.
|
|
*/
|
|
struct damos_watermarks {
|
|
enum damos_wmark_metric metric;
|
|
unsigned long interval;
|
|
unsigned long high;
|
|
unsigned long mid;
|
|
unsigned long low;
|
|
|
|
/* private: */
|
|
bool activated;
|
|
};
|
|
|
|
/**
|
|
* struct damos_stat - Statistics on a given scheme.
|
|
* @nr_tried: Total number of regions that the scheme is tried to be applied.
|
|
* @sz_tried: Total size of regions that the scheme is tried to be applied.
|
|
* @nr_applied: Total number of regions that the scheme is applied.
|
|
* @sz_applied: Total size of regions that the scheme is applied.
|
|
* @sz_ops_filter_passed:
|
|
* Total bytes that passed ops layer-handled DAMOS filters.
|
|
* @qt_exceeds: Total number of times the quota of the scheme has exceeded.
|
|
*
|
|
* "Tried an action to a region" in this context means the DAMOS core logic
|
|
* determined the region as eligible to apply the action. The access pattern
|
|
* (&struct damos_access_pattern), quotas (&struct damos_quota), watermarks
|
|
* (&struct damos_watermarks) and filters (&struct damos_filter) that handled
|
|
* on core logic can affect this. The core logic asks the operation set
|
|
* (&struct damon_operations) to apply the action to the region.
|
|
*
|
|
* "Applied an action to a region" in this context means the operation set
|
|
* (&struct damon_operations) successfully applied the action to the region, at
|
|
* least to a part of the region. The filters (&struct damos_filter) that
|
|
* handled on operation set layer and type of the action and pages of the
|
|
* region can affect this. For example, if a filter is set to exclude
|
|
* anonymous pages and the region has only anonymous pages, the region will be
|
|
* failed at applying the action. If the action is &DAMOS_PAGEOUT and all
|
|
* pages of the region are already paged out, the region will be failed at
|
|
* applying the action.
|
|
*/
|
|
struct damos_stat {
|
|
unsigned long nr_tried;
|
|
unsigned long sz_tried;
|
|
unsigned long nr_applied;
|
|
unsigned long sz_applied;
|
|
unsigned long sz_ops_filter_passed;
|
|
unsigned long qt_exceeds;
|
|
};
|
|
|
|
/**
|
|
* enum damos_filter_type - Type of memory for &struct damos_filter
|
|
* @DAMOS_FILTER_TYPE_ANON: Anonymous pages.
|
|
* @DAMOS_FILTER_TYPE_ACTIVE: Active pages.
|
|
* @DAMOS_FILTER_TYPE_MEMCG: Specific memcg's pages.
|
|
* @DAMOS_FILTER_TYPE_YOUNG: Recently accessed pages.
|
|
* @DAMOS_FILTER_TYPE_HUGEPAGE_SIZE: Page is part of a hugepage.
|
|
* @DAMOS_FILTER_TYPE_UNMAPPED: Unmapped pages.
|
|
* @DAMOS_FILTER_TYPE_ADDR: Address range.
|
|
* @DAMOS_FILTER_TYPE_TARGET: Data Access Monitoring target.
|
|
* @NR_DAMOS_FILTER_TYPES: Number of filter types.
|
|
*
|
|
* The anon pages type and memcg type filters are handled by underlying
|
|
* &struct damon_operations as a part of scheme action trying, and therefore
|
|
* accounted as 'tried'. In contrast, other types are handled by core layer
|
|
* before trying of the action and therefore not accounted as 'tried'.
|
|
*
|
|
* The support of the filters that handled by &struct damon_operations depend
|
|
* on the running &struct damon_operations.
|
|
* &enum DAMON_OPS_PADDR supports both anon pages type and memcg type filters,
|
|
* while &enum DAMON_OPS_VADDR and &enum DAMON_OPS_FVADDR don't support any of
|
|
* the two types.
|
|
*/
|
|
enum damos_filter_type {
|
|
DAMOS_FILTER_TYPE_ANON,
|
|
DAMOS_FILTER_TYPE_ACTIVE,
|
|
DAMOS_FILTER_TYPE_MEMCG,
|
|
DAMOS_FILTER_TYPE_YOUNG,
|
|
DAMOS_FILTER_TYPE_HUGEPAGE_SIZE,
|
|
DAMOS_FILTER_TYPE_UNMAPPED,
|
|
DAMOS_FILTER_TYPE_ADDR,
|
|
DAMOS_FILTER_TYPE_TARGET,
|
|
NR_DAMOS_FILTER_TYPES,
|
|
};
|
|
|
|
/**
|
|
* struct damos_filter - DAMOS action target memory filter.
|
|
* @type: Type of the target memory.
|
|
* @matching: Whether this is for @type-matching memory.
|
|
* @allow: Whether to include or exclude the @matching memory.
|
|
* @memcg_id: Memcg id of the question if @type is DAMOS_FILTER_MEMCG.
|
|
* @addr_range: Address range if @type is DAMOS_FILTER_TYPE_ADDR.
|
|
* @target_idx: Index of the &struct damon_target of
|
|
* &damon_ctx->adaptive_targets if @type is
|
|
* DAMOS_FILTER_TYPE_TARGET.
|
|
* @sz_range: Size range if @type is DAMOS_FILTER_TYPE_HUGEPAGE_SIZE.
|
|
* @list: List head for siblings.
|
|
*
|
|
* Before applying the &damos->action to a memory region, DAMOS checks if each
|
|
* byte of the region matches to this given condition and avoid applying the
|
|
* action if so. Support of each filter type depends on the running &struct
|
|
* damon_operations and the type. Refer to &enum damos_filter_type for more
|
|
* details.
|
|
*/
|
|
struct damos_filter {
|
|
enum damos_filter_type type;
|
|
bool matching;
|
|
bool allow;
|
|
union {
|
|
unsigned short memcg_id;
|
|
struct damon_addr_range addr_range;
|
|
int target_idx;
|
|
struct damon_size_range sz_range;
|
|
};
|
|
struct list_head list;
|
|
};
|
|
|
|
struct damon_ctx;
|
|
struct damos;
|
|
|
|
/**
|
|
* struct damos_walk_control - Control damos_walk().
|
|
*
|
|
* @walk_fn: Function to be called back for each region.
|
|
* @data: Data that will be passed to walk functions.
|
|
*
|
|
* Control damos_walk(), which requests specific kdamond to invoke the given
|
|
* function to each region that eligible to apply actions of the kdamond's
|
|
* schemes. Refer to damos_walk() for more details.
|
|
*/
|
|
struct damos_walk_control {
|
|
void (*walk_fn)(void *data, struct damon_ctx *ctx,
|
|
struct damon_target *t, struct damon_region *r,
|
|
struct damos *s, unsigned long sz_filter_passed);
|
|
void *data;
|
|
/* private: internal use only */
|
|
/* informs if the kdamond finished handling of the walk request */
|
|
struct completion completion;
|
|
/* informs if the walk is canceled. */
|
|
bool canceled;
|
|
};
|
|
|
|
/**
|
|
* struct damos_access_pattern - Target access pattern of the given scheme.
|
|
* @min_sz_region: Minimum size of target regions.
|
|
* @max_sz_region: Maximum size of target regions.
|
|
* @min_nr_accesses: Minimum ``->nr_accesses`` of target regions.
|
|
* @max_nr_accesses: Maximum ``->nr_accesses`` of target regions.
|
|
* @min_age_region: Minimum age of target regions.
|
|
* @max_age_region: Maximum age of target regions.
|
|
*/
|
|
struct damos_access_pattern {
|
|
unsigned long min_sz_region;
|
|
unsigned long max_sz_region;
|
|
unsigned int min_nr_accesses;
|
|
unsigned int max_nr_accesses;
|
|
unsigned int min_age_region;
|
|
unsigned int max_age_region;
|
|
};
|
|
|
|
/**
|
|
* struct damos - Represents a Data Access Monitoring-based Operation Scheme.
|
|
* @pattern: Access pattern of target regions.
|
|
* @action: &damo_action to be applied to the target regions.
|
|
* @apply_interval_us: The time between applying the @action.
|
|
* @quota: Control the aggressiveness of this scheme.
|
|
* @wmarks: Watermarks for automated (in)activation of this scheme.
|
|
* @target_nid: Destination node if @action is "migrate_{hot,cold}".
|
|
* @filters: Additional set of &struct damos_filter for &action.
|
|
* @ops_filters: ops layer handling &struct damos_filter objects list.
|
|
* @last_applied: Last @action applied ops-managing entity.
|
|
* @stat: Statistics of this scheme.
|
|
* @list: List head for siblings.
|
|
*
|
|
* For each @apply_interval_us, DAMON finds regions which fit in the
|
|
* &pattern and applies &action to those. To avoid consuming too much
|
|
* CPU time or IO resources for the &action, "a is used.
|
|
*
|
|
* If @apply_interval_us is zero, &damon_attrs->aggr_interval is used instead.
|
|
*
|
|
* To do the work only when needed, schemes can be activated for specific
|
|
* system situations using &wmarks. If all schemes that registered to the
|
|
* monitoring context are inactive, DAMON stops monitoring either, and just
|
|
* repeatedly checks the watermarks.
|
|
*
|
|
* @target_nid is used to set the migration target node for migrate_hot or
|
|
* migrate_cold actions, which means it's only meaningful when @action is either
|
|
* "migrate_hot" or "migrate_cold".
|
|
*
|
|
* Before applying the &action to a memory region, &struct damon_operations
|
|
* implementation could check pages of the region and skip &action to respect
|
|
* &filters
|
|
*
|
|
* The minimum entity that @action can be applied depends on the underlying
|
|
* &struct damon_operations. Since it may not be aligned with the core layer
|
|
* abstract, namely &struct damon_region, &struct damon_operations could apply
|
|
* @action to same entity multiple times. Large folios that underlying on
|
|
* multiple &struct damon region objects could be such examples. The &struct
|
|
* damon_operations can use @last_applied to avoid that. DAMOS core logic
|
|
* unsets @last_applied when each regions walking for applying the scheme is
|
|
* finished.
|
|
*
|
|
* After applying the &action to each region, &stat_count and &stat_sz is
|
|
* updated to reflect the number of regions and total size of regions that the
|
|
* &action is applied.
|
|
*/
|
|
struct damos {
|
|
struct damos_access_pattern pattern;
|
|
enum damos_action action;
|
|
unsigned long apply_interval_us;
|
|
/* private: internal use only */
|
|
/*
|
|
* number of sample intervals that should be passed before applying
|
|
* @action
|
|
*/
|
|
unsigned long next_apply_sis;
|
|
/* informs if ongoing DAMOS walk for this scheme is finished */
|
|
bool walk_completed;
|
|
/*
|
|
* If the current region in the filtering stage is allowed by core
|
|
* layer-handled filters. If true, operations layer allows it, too.
|
|
*/
|
|
bool core_filters_allowed;
|
|
/* whether to reject core/ops filters umatched regions */
|
|
bool core_filters_default_reject;
|
|
bool ops_filters_default_reject;
|
|
/* public: */
|
|
struct damos_quota quota;
|
|
struct damos_watermarks wmarks;
|
|
union {
|
|
int target_nid;
|
|
};
|
|
struct list_head filters;
|
|
struct list_head ops_filters;
|
|
void *last_applied;
|
|
struct damos_stat stat;
|
|
struct list_head list;
|
|
};
|
|
|
|
/**
|
|
* enum damon_ops_id - Identifier for each monitoring operations implementation
|
|
*
|
|
* @DAMON_OPS_VADDR: Monitoring operations for virtual address spaces
|
|
* @DAMON_OPS_FVADDR: Monitoring operations for only fixed ranges of virtual
|
|
* address spaces
|
|
* @DAMON_OPS_PADDR: Monitoring operations for the physical address space
|
|
* @NR_DAMON_OPS: Number of monitoring operations implementations
|
|
*/
|
|
enum damon_ops_id {
|
|
DAMON_OPS_VADDR,
|
|
DAMON_OPS_FVADDR,
|
|
DAMON_OPS_PADDR,
|
|
NR_DAMON_OPS,
|
|
};
|
|
|
|
/**
|
|
* struct damon_operations - Monitoring operations for given use cases.
|
|
*
|
|
* @id: Identifier of this operations set.
|
|
* @init: Initialize operations-related data structures.
|
|
* @update: Update operations-related data structures.
|
|
* @prepare_access_checks: Prepare next access check of target regions.
|
|
* @check_accesses: Check the accesses to target regions.
|
|
* @get_scheme_score: Get the score of a region for a scheme.
|
|
* @apply_scheme: Apply a DAMON-based operation scheme.
|
|
* @target_valid: Determine if the target is valid.
|
|
* @cleanup: Clean up the context.
|
|
*
|
|
* DAMON can be extended for various address spaces and usages. For this,
|
|
* users should register the low level operations for their target address
|
|
* space and usecase via the &damon_ctx.ops. Then, the monitoring thread
|
|
* (&damon_ctx.kdamond) calls @init and @prepare_access_checks before starting
|
|
* the monitoring, @update after each &damon_attrs.ops_update_interval, and
|
|
* @check_accesses, @target_valid and @prepare_access_checks after each
|
|
* &damon_attrs.sample_interval.
|
|
*
|
|
* Each &struct damon_operations instance having valid @id can be registered
|
|
* via damon_register_ops() and selected by damon_select_ops() later.
|
|
* @init should initialize operations-related data structures. For example,
|
|
* this could be used to construct proper monitoring target regions and link
|
|
* those to @damon_ctx.adaptive_targets.
|
|
* @update should update the operations-related data structures. For example,
|
|
* this could be used to update monitoring target regions for current status.
|
|
* @prepare_access_checks should manipulate the monitoring regions to be
|
|
* prepared for the next access check.
|
|
* @check_accesses should check the accesses to each region that made after the
|
|
* last preparation and update the number of observed accesses of each region.
|
|
* It should also return max number of observed accesses that made as a result
|
|
* of its update. The value will be used for regions adjustment threshold.
|
|
* @get_scheme_score should return the priority score of a region for a scheme
|
|
* as an integer in [0, &DAMOS_MAX_SCORE].
|
|
* @apply_scheme is called from @kdamond when a region for user provided
|
|
* DAMON-based operation scheme is found. It should apply the scheme's action
|
|
* to the region and return bytes of the region that the action is successfully
|
|
* applied. It should also report how many bytes of the region has passed
|
|
* filters (&struct damos_filter) that handled by itself.
|
|
* @target_valid should check whether the target is still valid for the
|
|
* monitoring.
|
|
* @cleanup is called from @kdamond just before its termination.
|
|
*/
|
|
struct damon_operations {
|
|
enum damon_ops_id id;
|
|
void (*init)(struct damon_ctx *context);
|
|
void (*update)(struct damon_ctx *context);
|
|
void (*prepare_access_checks)(struct damon_ctx *context);
|
|
unsigned int (*check_accesses)(struct damon_ctx *context);
|
|
int (*get_scheme_score)(struct damon_ctx *context,
|
|
struct damon_target *t, struct damon_region *r,
|
|
struct damos *scheme);
|
|
unsigned long (*apply_scheme)(struct damon_ctx *context,
|
|
struct damon_target *t, struct damon_region *r,
|
|
struct damos *scheme, unsigned long *sz_filter_passed);
|
|
bool (*target_valid)(struct damon_target *t);
|
|
void (*cleanup)(struct damon_ctx *context);
|
|
};
|
|
|
|
/**
|
|
* struct damon_callback - Monitoring events notification callbacks.
|
|
*
|
|
* @after_wmarks_check: Called after each schemes' watermarks check.
|
|
* @after_aggregation: Called after each aggregation.
|
|
* @before_terminate: Called before terminating the monitoring.
|
|
*
|
|
* The monitoring thread (&damon_ctx.kdamond) calls @before_terminate just
|
|
* before finishing the monitoring.
|
|
*
|
|
* The monitoring thread calls @after_wmarks_check after each DAMON-based
|
|
* operation schemes' watermarks check. If users need to make changes to the
|
|
* attributes of the monitoring context while it's deactivated due to the
|
|
* watermarks, this is the good place to do.
|
|
*
|
|
* The monitoring thread calls @after_aggregation for each of the aggregation
|
|
* intervals. Therefore, users can safely access the monitoring results
|
|
* without additional protection. For the reason, users are recommended to use
|
|
* these callback for the accesses to the results.
|
|
*
|
|
* If any callback returns non-zero, monitoring stops.
|
|
*/
|
|
struct damon_callback {
|
|
int (*after_wmarks_check)(struct damon_ctx *context);
|
|
int (*after_aggregation)(struct damon_ctx *context);
|
|
void (*before_terminate)(struct damon_ctx *context);
|
|
};
|
|
|
|
/*
|
|
* struct damon_call_control - Control damon_call().
|
|
*
|
|
* @fn: Function to be called back.
|
|
* @data: Data that will be passed to @fn.
|
|
* @return_code: Return code from @fn invocation.
|
|
*
|
|
* Control damon_call(), which requests specific kdamond to invoke a given
|
|
* function. Refer to damon_call() for more details.
|
|
*/
|
|
struct damon_call_control {
|
|
int (*fn)(void *data);
|
|
void *data;
|
|
int return_code;
|
|
/* private: internal use only */
|
|
/* informs if the kdamond finished handling of the request */
|
|
struct completion completion;
|
|
/* informs if the kdamond canceled @fn infocation */
|
|
bool canceled;
|
|
};
|
|
|
|
/**
|
|
* struct damon_intervals_goal - Monitoring intervals auto-tuning goal.
|
|
*
|
|
* @access_bp: Access events observation ratio to achieve in bp.
|
|
* @aggrs: Number of aggregations to acheive @access_bp within.
|
|
* @min_sample_us: Minimum resulting sampling interval in microseconds.
|
|
* @max_sample_us: Maximum resulting sampling interval in microseconds.
|
|
*
|
|
* DAMON automatically tunes &damon_attrs->sample_interval and
|
|
* &damon_attrs->aggr_interval aiming the ratio in bp (1/10,000) of
|
|
* DAMON-observed access events to theoretical maximum amount within @aggrs
|
|
* aggregations be same to @access_bp. The logic increases
|
|
* &damon_attrs->aggr_interval and &damon_attrs->sampling_interval in same
|
|
* ratio if the current access events observation ratio is lower than the
|
|
* target for each @aggrs aggregations, and vice versa.
|
|
*
|
|
* If @aggrs is zero, the tuning is disabled and hence this struct is ignored.
|
|
*/
|
|
struct damon_intervals_goal {
|
|
unsigned long access_bp;
|
|
unsigned long aggrs;
|
|
unsigned long min_sample_us;
|
|
unsigned long max_sample_us;
|
|
};
|
|
|
|
/**
|
|
* struct damon_attrs - Monitoring attributes for accuracy/overhead control.
|
|
*
|
|
* @sample_interval: The time between access samplings.
|
|
* @aggr_interval: The time between monitor results aggregations.
|
|
* @ops_update_interval: The time between monitoring operations updates.
|
|
* @intervals_goal: Intervals auto-tuning goal.
|
|
* @min_nr_regions: The minimum number of adaptive monitoring
|
|
* regions.
|
|
* @max_nr_regions: The maximum number of adaptive monitoring
|
|
* regions.
|
|
*
|
|
* For each @sample_interval, DAMON checks whether each region is accessed or
|
|
* not during the last @sample_interval. If such access is found, DAMON
|
|
* aggregates the information by increasing &damon_region->nr_accesses for
|
|
* @aggr_interval time. For each @aggr_interval, the count is reset. DAMON
|
|
* also checks whether the target memory regions need update (e.g., by
|
|
* ``mmap()`` calls from the application, in case of virtual memory monitoring)
|
|
* and applies the changes for each @ops_update_interval. All time intervals
|
|
* are in micro-seconds. Please refer to &struct damon_operations and &struct
|
|
* damon_callback for more detail.
|
|
*/
|
|
struct damon_attrs {
|
|
unsigned long sample_interval;
|
|
unsigned long aggr_interval;
|
|
unsigned long ops_update_interval;
|
|
struct damon_intervals_goal intervals_goal;
|
|
unsigned long min_nr_regions;
|
|
unsigned long max_nr_regions;
|
|
/* private: internal use only */
|
|
/*
|
|
* @aggr_interval to @sample_interval ratio.
|
|
* Core-external components call damon_set_attrs() with &damon_attrs
|
|
* that this field is unset. In the case, damon_set_attrs() sets this
|
|
* field of resulting &damon_attrs. Core-internal components such as
|
|
* kdamond_tune_intervals() calls damon_set_attrs() with &damon_attrs
|
|
* that this field is set. In the case, damon_set_attrs() just keep
|
|
* it.
|
|
*/
|
|
unsigned long aggr_samples;
|
|
};
|
|
|
|
/**
|
|
* struct damon_ctx - Represents a context for each monitoring. This is the
|
|
* main interface that allows users to set the attributes and get the results
|
|
* of the monitoring.
|
|
*
|
|
* @attrs: Monitoring attributes for accuracy/overhead control.
|
|
* @kdamond: Kernel thread who does the monitoring.
|
|
* @kdamond_lock: Mutex for the synchronizations with @kdamond.
|
|
*
|
|
* For each monitoring context, one kernel thread for the monitoring is
|
|
* created. The pointer to the thread is stored in @kdamond.
|
|
*
|
|
* Once started, the monitoring thread runs until explicitly required to be
|
|
* terminated or every monitoring target is invalid. The validity of the
|
|
* targets is checked via the &damon_operations.target_valid of @ops. The
|
|
* termination can also be explicitly requested by calling damon_stop().
|
|
* The thread sets @kdamond to NULL when it terminates. Therefore, users can
|
|
* know whether the monitoring is ongoing or terminated by reading @kdamond.
|
|
* Reads and writes to @kdamond from outside of the monitoring thread must
|
|
* be protected by @kdamond_lock.
|
|
*
|
|
* Note that the monitoring thread protects only @kdamond via @kdamond_lock.
|
|
* Accesses to other fields must be protected by themselves.
|
|
*
|
|
* @ops: Set of monitoring operations for given use cases.
|
|
* @callback: Set of callbacks for monitoring events notifications.
|
|
*
|
|
* @adaptive_targets: Head of monitoring targets (&damon_target) list.
|
|
* @schemes: Head of schemes (&damos) list.
|
|
*/
|
|
struct damon_ctx {
|
|
struct damon_attrs attrs;
|
|
|
|
/* private: internal use only */
|
|
/* number of sample intervals that passed since this context started */
|
|
unsigned long passed_sample_intervals;
|
|
/*
|
|
* number of sample intervals that should be passed before next
|
|
* aggregation
|
|
*/
|
|
unsigned long next_aggregation_sis;
|
|
/*
|
|
* number of sample intervals that should be passed before next ops
|
|
* update
|
|
*/
|
|
unsigned long next_ops_update_sis;
|
|
/*
|
|
* number of sample intervals that should be passed before next
|
|
* intervals tuning
|
|
*/
|
|
unsigned long next_intervals_tune_sis;
|
|
/* for waiting until the execution of the kdamond_fn is started */
|
|
struct completion kdamond_started;
|
|
/* for scheme quotas prioritization */
|
|
unsigned long *regions_score_histogram;
|
|
|
|
struct damon_call_control *call_control;
|
|
struct mutex call_control_lock;
|
|
|
|
struct damos_walk_control *walk_control;
|
|
struct mutex walk_control_lock;
|
|
|
|
/* public: */
|
|
struct task_struct *kdamond;
|
|
struct mutex kdamond_lock;
|
|
|
|
struct damon_operations ops;
|
|
struct damon_callback callback;
|
|
|
|
struct list_head adaptive_targets;
|
|
struct list_head schemes;
|
|
};
|
|
|
|
static inline struct damon_region *damon_next_region(struct damon_region *r)
|
|
{
|
|
return container_of(r->list.next, struct damon_region, list);
|
|
}
|
|
|
|
static inline struct damon_region *damon_prev_region(struct damon_region *r)
|
|
{
|
|
return container_of(r->list.prev, struct damon_region, list);
|
|
}
|
|
|
|
static inline struct damon_region *damon_last_region(struct damon_target *t)
|
|
{
|
|
return list_last_entry(&t->regions_list, struct damon_region, list);
|
|
}
|
|
|
|
static inline struct damon_region *damon_first_region(struct damon_target *t)
|
|
{
|
|
return list_first_entry(&t->regions_list, struct damon_region, list);
|
|
}
|
|
|
|
static inline unsigned long damon_sz_region(struct damon_region *r)
|
|
{
|
|
return r->ar.end - r->ar.start;
|
|
}
|
|
|
|
|
|
#define damon_for_each_region(r, t) \
|
|
list_for_each_entry(r, &t->regions_list, list)
|
|
|
|
#define damon_for_each_region_from(r, t) \
|
|
list_for_each_entry_from(r, &t->regions_list, list)
|
|
|
|
#define damon_for_each_region_safe(r, next, t) \
|
|
list_for_each_entry_safe(r, next, &t->regions_list, list)
|
|
|
|
#define damon_for_each_target(t, ctx) \
|
|
list_for_each_entry(t, &(ctx)->adaptive_targets, list)
|
|
|
|
#define damon_for_each_target_safe(t, next, ctx) \
|
|
list_for_each_entry_safe(t, next, &(ctx)->adaptive_targets, list)
|
|
|
|
#define damon_for_each_scheme(s, ctx) \
|
|
list_for_each_entry(s, &(ctx)->schemes, list)
|
|
|
|
#define damon_for_each_scheme_safe(s, next, ctx) \
|
|
list_for_each_entry_safe(s, next, &(ctx)->schemes, list)
|
|
|
|
#define damos_for_each_quota_goal(goal, quota) \
|
|
list_for_each_entry(goal, "a->goals, list)
|
|
|
|
#define damos_for_each_quota_goal_safe(goal, next, quota) \
|
|
list_for_each_entry_safe(goal, next, &(quota)->goals, list)
|
|
|
|
#define damos_for_each_filter(f, scheme) \
|
|
list_for_each_entry(f, &(scheme)->filters, list)
|
|
|
|
#define damos_for_each_filter_safe(f, next, scheme) \
|
|
list_for_each_entry_safe(f, next, &(scheme)->filters, list)
|
|
|
|
#define damos_for_each_ops_filter(f, scheme) \
|
|
list_for_each_entry(f, &(scheme)->ops_filters, list)
|
|
|
|
#define damos_for_each_ops_filter_safe(f, next, scheme) \
|
|
list_for_each_entry_safe(f, next, &(scheme)->ops_filters, list)
|
|
|
|
#ifdef CONFIG_DAMON
|
|
|
|
struct damon_region *damon_new_region(unsigned long start, unsigned long end);
|
|
|
|
/*
|
|
* Add a region between two other regions
|
|
*/
|
|
static inline void damon_insert_region(struct damon_region *r,
|
|
struct damon_region *prev, struct damon_region *next,
|
|
struct damon_target *t)
|
|
{
|
|
__list_add(&r->list, &prev->list, &next->list);
|
|
t->nr_regions++;
|
|
}
|
|
|
|
void damon_add_region(struct damon_region *r, struct damon_target *t);
|
|
void damon_destroy_region(struct damon_region *r, struct damon_target *t);
|
|
int damon_set_regions(struct damon_target *t, struct damon_addr_range *ranges,
|
|
unsigned int nr_ranges);
|
|
void damon_update_region_access_rate(struct damon_region *r, bool accessed,
|
|
struct damon_attrs *attrs);
|
|
|
|
struct damos_filter *damos_new_filter(enum damos_filter_type type,
|
|
bool matching, bool allow);
|
|
void damos_add_filter(struct damos *s, struct damos_filter *f);
|
|
bool damos_filter_for_ops(enum damos_filter_type type);
|
|
void damos_destroy_filter(struct damos_filter *f);
|
|
|
|
struct damos_quota_goal *damos_new_quota_goal(
|
|
enum damos_quota_goal_metric metric,
|
|
unsigned long target_value);
|
|
void damos_add_quota_goal(struct damos_quota *q, struct damos_quota_goal *g);
|
|
void damos_destroy_quota_goal(struct damos_quota_goal *goal);
|
|
|
|
struct damos *damon_new_scheme(struct damos_access_pattern *pattern,
|
|
enum damos_action action,
|
|
unsigned long apply_interval_us,
|
|
struct damos_quota *quota,
|
|
struct damos_watermarks *wmarks,
|
|
int target_nid);
|
|
void damon_add_scheme(struct damon_ctx *ctx, struct damos *s);
|
|
void damon_destroy_scheme(struct damos *s);
|
|
int damos_commit_quota_goals(struct damos_quota *dst, struct damos_quota *src);
|
|
|
|
struct damon_target *damon_new_target(void);
|
|
void damon_add_target(struct damon_ctx *ctx, struct damon_target *t);
|
|
bool damon_targets_empty(struct damon_ctx *ctx);
|
|
void damon_free_target(struct damon_target *t);
|
|
void damon_destroy_target(struct damon_target *t);
|
|
unsigned int damon_nr_regions(struct damon_target *t);
|
|
|
|
struct damon_ctx *damon_new_ctx(void);
|
|
void damon_destroy_ctx(struct damon_ctx *ctx);
|
|
int damon_set_attrs(struct damon_ctx *ctx, struct damon_attrs *attrs);
|
|
void damon_set_schemes(struct damon_ctx *ctx,
|
|
struct damos **schemes, ssize_t nr_schemes);
|
|
int damon_commit_ctx(struct damon_ctx *old_ctx, struct damon_ctx *new_ctx);
|
|
int damon_nr_running_ctxs(void);
|
|
bool damon_is_registered_ops(enum damon_ops_id id);
|
|
int damon_register_ops(struct damon_operations *ops);
|
|
int damon_select_ops(struct damon_ctx *ctx, enum damon_ops_id id);
|
|
|
|
static inline bool damon_target_has_pid(const struct damon_ctx *ctx)
|
|
{
|
|
return ctx->ops.id == DAMON_OPS_VADDR || ctx->ops.id == DAMON_OPS_FVADDR;
|
|
}
|
|
|
|
static inline unsigned int damon_max_nr_accesses(const struct damon_attrs *attrs)
|
|
{
|
|
/* {aggr,sample}_interval are unsigned long, hence could overflow */
|
|
return min(attrs->aggr_interval / attrs->sample_interval,
|
|
(unsigned long)UINT_MAX);
|
|
}
|
|
|
|
|
|
int damon_start(struct damon_ctx **ctxs, int nr_ctxs, bool exclusive);
|
|
int damon_stop(struct damon_ctx **ctxs, int nr_ctxs);
|
|
|
|
int damon_call(struct damon_ctx *ctx, struct damon_call_control *control);
|
|
int damos_walk(struct damon_ctx *ctx, struct damos_walk_control *control);
|
|
|
|
int damon_set_region_biggest_system_ram_default(struct damon_target *t,
|
|
unsigned long *start, unsigned long *end);
|
|
|
|
#endif /* CONFIG_DAMON */
|
|
|
|
#endif /* _DAMON_H */
|