From d49fbf766d27bc721caa48b391103d71e90804fa Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Thu, 28 Jul 2016 15:44:34 -0700 Subject: proc, oom: drop bogus task_lock and mm check Series "Handle oom bypass more gracefully", V5 The following 10 patches should put some order to very rare cases of mm shared between processes and make the paths which bypass the oom killer oom reapable and therefore much more reliable finally. Even though mm shared outside of thread group is rare (either vforked tasks for a short period, use_mm by kernel threads or exotic thread model of clone(CLONE_VM) without CLONE_SIGHAND) it is better to cover them. Not only it makes the current oom killer logic quite hard to follow and reason about it can lead to weird corner cases. E.g. it is possible to select an oom victim which shares the mm with unkillable process or bypass the oom killer even when other processes sharing the mm are still alive and other weird cases. Patch 1 drops bogus task_lock and mm check from oom_{score_}adj_write. This can be considered a bug fix with a low impact as nobody has noticed for years. Patch 2 drops sighand lock because it is not needed anymore as pointed by Oleg. Patch 3 is a clean up of oom_score_adj handling and a preparatory work for later patches. Patch 4 enforces oom_adj_score to be consistent between processes sharing the mm to behave consistently with the regular thread groups. This can be considered a user visible behavior change because one thread group updating oom_score_adj will affect others which share the same mm via clone(CLONE_VM). I argue that this should be acceptable because we already have the same behavior for threads in the same thread group and sharing the mm without signal struct is just a different model of threading. This is probably the most controversial part of the series, I would like to find some consensus here. There were some suggestions to hook some counter/oom_score_adj into the mm_struct but I feel that this is not necessary right now and we can rely on proc handler + oom_kill_process to DTRT. I can be convinced otherwise but I strongly think that whatever we do the userspace has to have a way to see the current oom priority as consistently as possible. Patch 5 makes sure that no vforked task is selected if it is sharing the mm with oom unkillable task. Patch 6 ensures that all user tasks sharing the mm are killed which in turn makes sure that all oom victims are oom reapable. Patch 7 guarantees that task_will_free_mem will always imply reapable bypass of the oom killer. Patch 8 is new in this version and it addresses an issue pointed out by 0-day OOM report where an oom victim was reaped several times. Patch 9 puts an upper bound on how many times oom_reaper tries to reap a task and hides it from the oom killer to move on when no progress can be made. This will give an upper bound to how long an oom_reapable task can block the oom killer from selecting another victim if the oom_reaper is not able to reap the victim. Patch 10 tries to plug the (hopefully) last hole when we can still lock up when the oom victim is shared with oom unkillable tasks (kthreads and global init). We just try to be best effort in that case and rather fallback to kill something else than risk a lockup. This patch (of 10): Both oom_adj_write and oom_score_adj_write are using task_lock, check for task->mm and fail if it is NULL. This is not needed because the oom_score_adj is per signal struct so we do not need mm at all. The code has been introduced by 3d5992d2ac7d ("oom: add per-mm oom disable count") but we do not do per-mm oom disable since c9f01245b6a7 ("oom: remove oom_disable_count"). The task->mm check is even not correct because the current thread might have exited but the thread group might be still alive - e.g. thread group leader would lead that echo $VAL > /proc/pid/oom_score_adj would always fail with EINVAL while /proc/pid/task/$other_tid/oom_score_adj would succeed. This is unexpected at best. Remove the lock along with the check to fix the unexpected behavior and also because there is not real need for the lock in the first place. Link: http://lkml.kernel.org/r/1466426628-15074-2-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko Reviewed-by: Vladimir Davydov Acked-by: Oleg Nesterov Cc: David Rientjes Cc: Tetsuo Handa Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/proc/base.c | 22 ++++------------------ 1 file changed, 4 insertions(+), 18 deletions(-) (limited to 'fs/proc') diff --git a/fs/proc/base.c b/fs/proc/base.c index a11eb7196ec8..2a0f5ee9b623 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -1083,15 +1083,9 @@ static ssize_t oom_adj_write(struct file *file, const char __user *buf, goto out; } - task_lock(task); - if (!task->mm) { - err = -EINVAL; - goto err_task_lock; - } - if (!lock_task_sighand(task, &flags)) { err = -ESRCH; - goto err_task_lock; + goto err_put_task; } /* @@ -1121,8 +1115,7 @@ static ssize_t oom_adj_write(struct file *file, const char __user *buf, trace_oom_score_adj_update(task); err_sighand: unlock_task_sighand(task, &flags); -err_task_lock: - task_unlock(task); +err_put_task: put_task_struct(task); out: return err < 0 ? err : count; @@ -1186,15 +1179,9 @@ static ssize_t oom_score_adj_write(struct file *file, const char __user *buf, goto out; } - task_lock(task); - if (!task->mm) { - err = -EINVAL; - goto err_task_lock; - } - if (!lock_task_sighand(task, &flags)) { err = -ESRCH; - goto err_task_lock; + goto err_put_task; } if ((short)oom_score_adj < task->signal->oom_score_adj_min && @@ -1210,8 +1197,7 @@ static ssize_t oom_score_adj_write(struct file *file, const char __user *buf, err_sighand: unlock_task_sighand(task, &flags); -err_task_lock: - task_unlock(task); +err_put_task: put_task_struct(task); out: return err < 0 ? err : count; -- cgit v1.2.3 From f913da596a407d3be7c8c220697beacd8fe7312a Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Thu, 28 Jul 2016 15:44:37 -0700 Subject: proc, oom: drop bogus sighand lock Oleg has pointed out that can simplify both oom_adj_{read,write} and oom_score_adj_{read,write} even further and drop the sighand lock. The main purpose of the lock was to protect p->signal from going away but this will not happen since ea6d290ca34c ("signals: make task_struct->signal immutable/refcountable"). The other role of the lock was to synchronize different writers, especially those with CAP_SYS_RESOURCE. Introduce a mutex for this purpose. Later patches will need this lock anyway. Suggested-by: Oleg Nesterov Link: http://lkml.kernel.org/r/1466426628-15074-3-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko Acked-by: Oleg Nesterov Cc: Vladimir Davydov Cc: David Rientjes Cc: Tetsuo Handa Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/proc/base.c | 51 +++++++++++++++++---------------------------------- 1 file changed, 17 insertions(+), 34 deletions(-) (limited to 'fs/proc') diff --git a/fs/proc/base.c b/fs/proc/base.c index 2a0f5ee9b623..f7dc1050387f 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -1024,23 +1024,21 @@ static ssize_t oom_adj_read(struct file *file, char __user *buf, size_t count, char buffer[PROC_NUMBUF]; int oom_adj = OOM_ADJUST_MIN; size_t len; - unsigned long flags; if (!task) return -ESRCH; - if (lock_task_sighand(task, &flags)) { - if (task->signal->oom_score_adj == OOM_SCORE_ADJ_MAX) - oom_adj = OOM_ADJUST_MAX; - else - oom_adj = (task->signal->oom_score_adj * -OOM_DISABLE) / - OOM_SCORE_ADJ_MAX; - unlock_task_sighand(task, &flags); - } + if (task->signal->oom_score_adj == OOM_SCORE_ADJ_MAX) + oom_adj = OOM_ADJUST_MAX; + else + oom_adj = (task->signal->oom_score_adj * -OOM_DISABLE) / + OOM_SCORE_ADJ_MAX; put_task_struct(task); len = snprintf(buffer, sizeof(buffer), "%d\n", oom_adj); return simple_read_from_buffer(buf, count, ppos, buffer, len); } +static DEFINE_MUTEX(oom_adj_mutex); + /* * /proc/pid/oom_adj exists solely for backwards compatibility with previous * kernels. The effective policy is defined by oom_score_adj, which has a @@ -1057,7 +1055,6 @@ static ssize_t oom_adj_write(struct file *file, const char __user *buf, struct task_struct *task; char buffer[PROC_NUMBUF]; int oom_adj; - unsigned long flags; int err; memset(buffer, 0, sizeof(buffer)); @@ -1083,11 +1080,6 @@ static ssize_t oom_adj_write(struct file *file, const char __user *buf, goto out; } - if (!lock_task_sighand(task, &flags)) { - err = -ESRCH; - goto err_put_task; - } - /* * Scale /proc/pid/oom_score_adj appropriately ensuring that a maximum * value is always attainable. @@ -1097,10 +1089,11 @@ static ssize_t oom_adj_write(struct file *file, const char __user *buf, else oom_adj = (oom_adj * OOM_SCORE_ADJ_MAX) / -OOM_DISABLE; + mutex_lock(&oom_adj_mutex); if (oom_adj < task->signal->oom_score_adj && !capable(CAP_SYS_RESOURCE)) { err = -EACCES; - goto err_sighand; + goto err_unlock; } /* @@ -1113,9 +1106,8 @@ static ssize_t oom_adj_write(struct file *file, const char __user *buf, task->signal->oom_score_adj = oom_adj; trace_oom_score_adj_update(task); -err_sighand: - unlock_task_sighand(task, &flags); -err_put_task: +err_unlock: + mutex_unlock(&oom_adj_mutex); put_task_struct(task); out: return err < 0 ? err : count; @@ -1133,15 +1125,11 @@ static ssize_t oom_score_adj_read(struct file *file, char __user *buf, struct task_struct *task = get_proc_task(file_inode(file)); char buffer[PROC_NUMBUF]; short oom_score_adj = OOM_SCORE_ADJ_MIN; - unsigned long flags; size_t len; if (!task) return -ESRCH; - if (lock_task_sighand(task, &flags)) { - oom_score_adj = task->signal->oom_score_adj; - unlock_task_sighand(task, &flags); - } + oom_score_adj = task->signal->oom_score_adj; put_task_struct(task); len = snprintf(buffer, sizeof(buffer), "%hd\n", oom_score_adj); return simple_read_from_buffer(buf, count, ppos, buffer, len); @@ -1152,7 +1140,6 @@ static ssize_t oom_score_adj_write(struct file *file, const char __user *buf, { struct task_struct *task; char buffer[PROC_NUMBUF]; - unsigned long flags; int oom_score_adj; int err; @@ -1179,25 +1166,21 @@ static ssize_t oom_score_adj_write(struct file *file, const char __user *buf, goto out; } - if (!lock_task_sighand(task, &flags)) { - err = -ESRCH; - goto err_put_task; - } - + mutex_lock(&oom_adj_mutex); if ((short)oom_score_adj < task->signal->oom_score_adj_min && !capable(CAP_SYS_RESOURCE)) { err = -EACCES; - goto err_sighand; + goto err_unlock; } task->signal->oom_score_adj = (short)oom_score_adj; if (has_capability_noaudit(current, CAP_SYS_RESOURCE)) task->signal->oom_score_adj_min = (short)oom_score_adj; + trace_oom_score_adj_update(task); -err_sighand: - unlock_task_sighand(task, &flags); -err_put_task: +err_unlock: + mutex_unlock(&oom_adj_mutex); put_task_struct(task); out: return err < 0 ? err : count; -- cgit v1.2.3 From 1d5f0acbc675506f98a2c42a40a35f5afb0d1c21 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Thu, 28 Jul 2016 15:44:40 -0700 Subject: proc, oom_adj: extract oom_score_adj setting into a helper Currently we have two proc interfaces to set oom_score_adj. The legacy /proc//oom_adj and /proc//oom_score_adj which both have their specific handlers. Big part of the logic is duplicated so extract the common code into __set_oom_adj helper. Legacy knob still expects some details slightly different so make sure those are handled same way - e.g. the legacy mode ignores oom_score_adj_min and it warns about the usage. This patch shouldn't introduce any functional changes. Link: http://lkml.kernel.org/r/1466426628-15074-4-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko Acked-by: Oleg Nesterov Cc: Vladimir Davydov Cc: David Rientjes Cc: Tetsuo Handa Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/proc/base.c | 94 +++++++++++++++++++++++++++------------------------------- 1 file changed, 43 insertions(+), 51 deletions(-) (limited to 'fs/proc') diff --git a/fs/proc/base.c b/fs/proc/base.c index f7dc1050387f..13a345952cc7 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -1037,7 +1037,47 @@ static ssize_t oom_adj_read(struct file *file, char __user *buf, size_t count, return simple_read_from_buffer(buf, count, ppos, buffer, len); } -static DEFINE_MUTEX(oom_adj_mutex); +static int __set_oom_adj(struct file *file, int oom_adj, bool legacy) +{ + static DEFINE_MUTEX(oom_adj_mutex); + struct task_struct *task; + int err = 0; + + task = get_proc_task(file_inode(file)); + if (!task) + return -ESRCH; + + mutex_lock(&oom_adj_mutex); + if (legacy) { + if (oom_adj < task->signal->oom_score_adj && + !capable(CAP_SYS_RESOURCE)) { + err = -EACCES; + goto err_unlock; + } + /* + * /proc/pid/oom_adj is provided for legacy purposes, ask users to use + * /proc/pid/oom_score_adj instead. + */ + pr_warn_once("%s (%d): /proc/%d/oom_adj is deprecated, please use /proc/%d/oom_score_adj instead.\n", + current->comm, task_pid_nr(current), task_pid_nr(task), + task_pid_nr(task)); + } else { + if ((short)oom_adj < task->signal->oom_score_adj_min && + !capable(CAP_SYS_RESOURCE)) { + err = -EACCES; + goto err_unlock; + } + } + + task->signal->oom_score_adj = oom_adj; + if (!legacy && has_capability_noaudit(current, CAP_SYS_RESOURCE)) + task->signal->oom_score_adj_min = (short)oom_adj; + trace_oom_score_adj_update(task); +err_unlock: + mutex_unlock(&oom_adj_mutex); + put_task_struct(task); + return err; +} /* * /proc/pid/oom_adj exists solely for backwards compatibility with previous @@ -1052,7 +1092,6 @@ static DEFINE_MUTEX(oom_adj_mutex); static ssize_t oom_adj_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos) { - struct task_struct *task; char buffer[PROC_NUMBUF]; int oom_adj; int err; @@ -1074,12 +1113,6 @@ static ssize_t oom_adj_write(struct file *file, const char __user *buf, goto out; } - task = get_proc_task(file_inode(file)); - if (!task) { - err = -ESRCH; - goto out; - } - /* * Scale /proc/pid/oom_score_adj appropriately ensuring that a maximum * value is always attainable. @@ -1089,26 +1122,7 @@ static ssize_t oom_adj_write(struct file *file, const char __user *buf, else oom_adj = (oom_adj * OOM_SCORE_ADJ_MAX) / -OOM_DISABLE; - mutex_lock(&oom_adj_mutex); - if (oom_adj < task->signal->oom_score_adj && - !capable(CAP_SYS_RESOURCE)) { - err = -EACCES; - goto err_unlock; - } - - /* - * /proc/pid/oom_adj is provided for legacy purposes, ask users to use - * /proc/pid/oom_score_adj instead. - */ - pr_warn_once("%s (%d): /proc/%d/oom_adj is deprecated, please use /proc/%d/oom_score_adj instead.\n", - current->comm, task_pid_nr(current), task_pid_nr(task), - task_pid_nr(task)); - - task->signal->oom_score_adj = oom_adj; - trace_oom_score_adj_update(task); -err_unlock: - mutex_unlock(&oom_adj_mutex); - put_task_struct(task); + err = __set_oom_adj(file, oom_adj, true); out: return err < 0 ? err : count; } @@ -1138,7 +1152,6 @@ static ssize_t oom_score_adj_read(struct file *file, char __user *buf, static ssize_t oom_score_adj_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos) { - struct task_struct *task; char buffer[PROC_NUMBUF]; int oom_score_adj; int err; @@ -1160,28 +1173,7 @@ static ssize_t oom_score_adj_write(struct file *file, const char __user *buf, goto out; } - task = get_proc_task(file_inode(file)); - if (!task) { - err = -ESRCH; - goto out; - } - - mutex_lock(&oom_adj_mutex); - if ((short)oom_score_adj < task->signal->oom_score_adj_min && - !capable(CAP_SYS_RESOURCE)) { - err = -EACCES; - goto err_unlock; - } - - task->signal->oom_score_adj = (short)oom_score_adj; - if (has_capability_noaudit(current, CAP_SYS_RESOURCE)) - task->signal->oom_score_adj_min = (short)oom_score_adj; - - trace_oom_score_adj_update(task); - -err_unlock: - mutex_unlock(&oom_adj_mutex); - put_task_struct(task); + err = __set_oom_adj(file, oom_score_adj, false); out: return err < 0 ? err : count; } -- cgit v1.2.3 From 44a70adec910d6929689e42b6e5cee5b7d202d20 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Thu, 28 Jul 2016 15:44:43 -0700 Subject: mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj oom_score_adj is shared for the thread groups (via struct signal) but this is not sufficient to cover processes sharing mm (CLONE_VM without CLONE_SIGHAND) and so we can easily end up in a situation when some processes update their oom_score_adj and confuse the oom killer. In the worst case some of those processes might hide from the oom killer altogether via OOM_SCORE_ADJ_MIN while others are eligible. OOM killer would then pick up those eligible but won't be allowed to kill others sharing the same mm so the mm wouldn't release the mm and so the memory. It would be ideal to have the oom_score_adj per mm_struct because that is the natural entity OOM killer considers. But this will not work because some programs are doing vfork() set_oom_adj() exec() We can achieve the same though. oom_score_adj write handler can set the oom_score_adj for all processes sharing the same mm if the task is not in the middle of vfork. As a result all the processes will share the same oom_score_adj. The current implementation is rather pessimistic and checks all the existing processes by default if there is more than 1 holder of the mm but we do not have any reliable way to check for external users yet. Link: http://lkml.kernel.org/r/1466426628-15074-5-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko Acked-by: Oleg Nesterov Cc: Vladimir Davydov Cc: David Rientjes Cc: Tetsuo Handa Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/proc/base.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) (limited to 'fs/proc') diff --git a/fs/proc/base.c b/fs/proc/base.c index 13a345952cc7..31370da2ee7c 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -1040,6 +1040,7 @@ static ssize_t oom_adj_read(struct file *file, char __user *buf, size_t count, static int __set_oom_adj(struct file *file, int oom_adj, bool legacy) { static DEFINE_MUTEX(oom_adj_mutex); + struct mm_struct *mm = NULL; struct task_struct *task; int err = 0; @@ -1069,10 +1070,55 @@ static int __set_oom_adj(struct file *file, int oom_adj, bool legacy) } } + /* + * Make sure we will check other processes sharing the mm if this is + * not vfrok which wants its own oom_score_adj. + * pin the mm so it doesn't go away and get reused after task_unlock + */ + if (!task->vfork_done) { + struct task_struct *p = find_lock_task_mm(task); + + if (p) { + if (atomic_read(&p->mm->mm_users) > 1) { + mm = p->mm; + atomic_inc(&mm->mm_count); + } + task_unlock(p); + } + } + task->signal->oom_score_adj = oom_adj; if (!legacy && has_capability_noaudit(current, CAP_SYS_RESOURCE)) task->signal->oom_score_adj_min = (short)oom_adj; trace_oom_score_adj_update(task); + + if (mm) { + struct task_struct *p; + + rcu_read_lock(); + for_each_process(p) { + if (same_thread_group(task, p)) + continue; + + /* do not touch kernel threads or the global init */ + if (p->flags & PF_KTHREAD || is_global_init(p)) + continue; + + task_lock(p); + if (!p->vfork_done && process_shares_mm(p, mm)) { + pr_info("updating oom_score_adj for %d (%s) from %d to %d because it shares mm with %d (%s). Report if this is unexpected.\n", + task_pid_nr(p), p->comm, + p->signal->oom_score_adj, oom_adj, + task_pid_nr(task), task->comm); + p->signal->oom_score_adj = oom_adj; + if (!legacy && has_capability_noaudit(current, CAP_SYS_RESOURCE)) + p->signal->oom_score_adj_min = (short)oom_adj; + } + task_unlock(p); + } + rcu_read_unlock(); + mmdrop(mm); + } err_unlock: mutex_unlock(&oom_adj_mutex); put_task_struct(task); -- cgit v1.2.3 From 50658e2e04c12d5cd628381c1b9cb69d0093a9c0 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Thu, 28 Jul 2016 15:46:14 -0700 Subject: mm: move page mapped accounting to the node Reclaim makes decisions based on the number of pages that are mapped but it's mixing node and zone information. Account NR_FILE_MAPPED and NR_ANON_PAGES pages on the node. Link: http://lkml.kernel.org/r/1467970510-21195-18-git-send-email-mgorman@techsingularity.net Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka Acked-by: Michal Hocko Cc: Hillf Danton Acked-by: Johannes Weiner Cc: Joonsoo Kim Cc: Minchan Kim Cc: Rik van Riel Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/proc/meminfo.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'fs/proc') diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c index cf301a9ef512..b8d52aa2f19a 100644 --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -140,8 +140,8 @@ static int meminfo_proc_show(struct seq_file *m, void *v) K(i.freeswap), K(global_page_state(NR_FILE_DIRTY)), K(global_page_state(NR_WRITEBACK)), - K(global_page_state(NR_ANON_PAGES)), - K(global_page_state(NR_FILE_MAPPED)), + K(global_node_page_state(NR_ANON_PAGES)), + K(global_node_page_state(NR_FILE_MAPPED)), K(i.sharedram), K(global_page_state(NR_SLAB_RECLAIMABLE) + global_page_state(NR_SLAB_UNRECLAIMABLE)), -- cgit v1.2.3 From 4b9d0fab7166c9323f06d708518a35cf3a90426c Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Thu, 28 Jul 2016 15:46:17 -0700 Subject: mm: rename NR_ANON_PAGES to NR_ANON_MAPPED NR_FILE_PAGES is the number of file pages. NR_FILE_MAPPED is the number of mapped file pages. NR_ANON_PAGES is the number of mapped anon pages. This is unhelpful naming as it's easy to confuse NR_FILE_MAPPED and NR_ANON_PAGES for mapped pages. This patch renames NR_ANON_PAGES so we have NR_FILE_PAGES is the number of file pages. NR_FILE_MAPPED is the number of mapped file pages. NR_ANON_MAPPED is the number of mapped anon pages. Link: http://lkml.kernel.org/r/1467970510-21195-19-git-send-email-mgorman@techsingularity.net Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka Cc: Hillf Danton Cc: Johannes Weiner Cc: Joonsoo Kim Cc: Michal Hocko Cc: Minchan Kim Cc: Rik van Riel Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/proc/meminfo.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs/proc') diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c index b8d52aa2f19a..40f108783d59 100644 --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -140,7 +140,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v) K(i.freeswap), K(global_page_state(NR_FILE_DIRTY)), K(global_page_state(NR_WRITEBACK)), - K(global_node_page_state(NR_ANON_PAGES)), + K(global_node_page_state(NR_ANON_MAPPED)), K(global_node_page_state(NR_FILE_MAPPED)), K(i.sharedram), K(global_page_state(NR_SLAB_RECLAIMABLE) + -- cgit v1.2.3 From 11fb998986a72aa7e997d96d63d52582a01228c5 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Thu, 28 Jul 2016 15:46:20 -0700 Subject: mm: move most file-based accounting to the node There are now a number of accounting oddities such as mapped file pages being accounted for on the node while the total number of file pages are accounted on the zone. This can be coped with to some extent but it's confusing so this patch moves the relevant file-based accounted. Due to throttling logic in the page allocator for reliable OOM detection, it is still necessary to track dirty and writeback pages on a per-zone basis. [mgorman@techsingularity.net: fix NR_ZONE_WRITE_PENDING accounting] Link: http://lkml.kernel.org/r/1468404004-5085-5-git-send-email-mgorman@techsingularity.net Link: http://lkml.kernel.org/r/1467970510-21195-20-git-send-email-mgorman@techsingularity.net Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka Acked-by: Michal Hocko Cc: Hillf Danton Acked-by: Johannes Weiner Cc: Joonsoo Kim Cc: Minchan Kim Cc: Rik van Riel Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/proc/meminfo.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) (limited to 'fs/proc') diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c index 40f108783d59..c1fdcc1a907a 100644 --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -40,7 +40,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v) si_swapinfo(&i); committed = percpu_counter_read_positive(&vm_committed_as); - cached = global_page_state(NR_FILE_PAGES) - + cached = global_node_page_state(NR_FILE_PAGES) - total_swapcache_pages() - i.bufferram; if (cached < 0) cached = 0; @@ -138,8 +138,8 @@ static int meminfo_proc_show(struct seq_file *m, void *v) #endif K(i.totalswap), K(i.freeswap), - K(global_page_state(NR_FILE_DIRTY)), - K(global_page_state(NR_WRITEBACK)), + K(global_node_page_state(NR_FILE_DIRTY)), + K(global_node_page_state(NR_WRITEBACK)), K(global_node_page_state(NR_ANON_MAPPED)), K(global_node_page_state(NR_FILE_MAPPED)), K(i.sharedram), @@ -152,9 +152,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v) #ifdef CONFIG_QUICKLIST K(quicklist_total_size()), #endif - K(global_page_state(NR_UNSTABLE_NFS)), + K(global_node_page_state(NR_UNSTABLE_NFS)), K(global_page_state(NR_BOUNCE)), - K(global_page_state(NR_WRITEBACK_TEMP)), + K(global_node_page_state(NR_WRITEBACK_TEMP)), K(vm_commit_limit()), K(committed), (unsigned long)VMALLOC_TOTAL >> 10, @@ -164,9 +164,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v) , atomic_long_read(&num_poisoned_pages) << (PAGE_SHIFT - 10) #endif #ifdef CONFIG_TRANSPARENT_HUGEPAGE - , K(global_page_state(NR_ANON_THPS) * HPAGE_PMD_NR) - , K(global_page_state(NR_SHMEM_THPS) * HPAGE_PMD_NR) - , K(global_page_state(NR_SHMEM_PMDMAPPED) * HPAGE_PMD_NR) + , K(global_node_page_state(NR_ANON_THPS) * HPAGE_PMD_NR) + , K(global_node_page_state(NR_SHMEM_THPS) * HPAGE_PMD_NR) + , K(global_node_page_state(NR_SHMEM_PMDMAPPED) * HPAGE_PMD_NR) #endif #ifdef CONFIG_CMA , K(totalcma_pages) -- cgit v1.2.3 From d30dd8be06a5ae640766b20ea9ae288832bd12ac Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Thu, 28 Jul 2016 15:48:14 -0700 Subject: mm: track NR_KERNEL_STACK in KiB instead of number of stacks Currently, NR_KERNEL_STACK tracks the number of kernel stacks in a zone. This only makes sense if each kernel stack exists entirely in one zone, and allowing vmapped stacks could break this assumption. Since frv has THREAD_SIZE < PAGE_SIZE, we need to track kernel stack allocations in a unit that divides both THREAD_SIZE and PAGE_SIZE on all architectures. Keep it simple and use KiB. Link: http://lkml.kernel.org/r/083c71e642c5fa5f1b6898902e1b2db7b48940d4.1468523549.git.luto@kernel.org Signed-off-by: Andy Lutomirski Cc: Vladimir Davydov Acked-by: Johannes Weiner Cc: Michal Hocko Reviewed-by: Josh Poimboeuf Reviewed-by: Vladimir Davydov Acked-by: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/proc/meminfo.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs/proc') diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c index c1fdcc1a907a..09e18fdf61e5 100644 --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -147,7 +147,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v) global_page_state(NR_SLAB_UNRECLAIMABLE)), K(global_page_state(NR_SLAB_RECLAIMABLE)), K(global_page_state(NR_SLAB_UNRECLAIMABLE)), - global_page_state(NR_KERNEL_STACK) * THREAD_SIZE / 1024, + global_page_state(NR_KERNEL_STACK_KB), K(global_page_state(NR_PAGETABLE)), #ifdef CONFIG_QUICKLIST K(quicklist_total_size()), -- cgit v1.2.3