且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

[CentOS6] Page allcation failure

更新时间:2022-09-25 23:24:42

线上发现多次page allocation failure的问题:

查看当时的监控内存并没有用满,还有很大的headroom,简单Google了下在CentOS 6.2下好多遇到同样的问题,疑是OSbughttps://bugzilla.redhat.com/show_bug.cgi?id=767127

提供的solution基本上是调整2个参数: vm.zone_reclaim_mode=1double/quadruple min_free_kbytes ,修改后基本上还没有看到这个问题再出现.

简单看了下参数的意思.

1.zone_reclaim_mode

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
  Zone_reclaim_modeallows someone to set more or less aggressive approaches to
reclaim memory when a zone runs out of memory. If it is set to zero then no zone reclaim occurs.
1   = Zone reclaim on
2   = Zone reclaim writes dirty pages out
4   = Zone reclaim swaps pages
  zone_reclaim_modeis set during bootup to 1 if it is determined that pages from remote zones will cause a measurable performance reduction. The pageallocator will then reclaim easily reusable pages (those page
cachepages that are currently not used) before allocating off node pages.
  It may be beneficial to switch off zone reclaim if the system is
usedfor a file server and all of memory should be used for caching files from disk. In that case the caching effect is more important than data locality.
  Allowing zone reclaim to write out pages stops processes that are
writing large amounts of data from dirtying pages on other nodes. Zone
reclaim will write out dirty pages if a zone fills up and so effectively throttle the process.
  This may decrease the performance of a single process since it cannot use all of system memory to buffer the outgoing writes anymore but it preserve the memory on other nodes so that the performance of other processes running on other nodes will not be affected.
  Allowing regular swap effectively restricts allocations to the local
node unless explicitly overridden by memory policies or cpuset
configurations.

具体算法请参考:http://www.orczhou.com/index.php/2011/02/linux-memory-management-3/

2.min_free_kbytes

1
2
3
4
5
6
7
8
9
  This is used to force the Linux VM to keep a minimum number
of kilobytes free.  The VM uses this numberto compute a
watermark [WMARK_MIN] value for each lowmem zone in the system.
  Each lowmem zone gets a number of reserved free pages based
proportionallyon its size.
  Some minimal amount of memory is needed to satisfy PF_MEMALLOC
allocations;if you set this to lower than 1024KB, your system will
become subtly broken, and prone to deadlock under high loads.
  Setting this too high will OOM your machine instantly.

其中watermark的min low high计算如下:

watermark[min] = min_free_kbytes

watermark[low] = watermark[min] * 5 / 4

watermark[high] = watermark[min] * 3 / 2

 在系统空闲内存低于 watermark[low]时,开始启动内核线程kswapd进行内存回收(每个zone一个),直到该zone的空闲内存数量达到watermark[high]后停止回收。如果上层申请内存的速度太快,导致空闲内存降至watermark[min]后,内核就会进行direct reclaim(直接回收),即直接在应用程序的进程上下文中进行回收,再用回收上来的空闲页满足内存申请,因此实际会阻塞应用程序,带来一定的响应延迟,而且可能会触发系统OOM。这是因为watermark[min]以下的内存属于系统的自留内存,用以满足特殊使用,所以不会给用户态的普通申请来用。

可以通过/proc/zoneinfo查看每个zone的watermark


本文转自MIKE老毕 51CTO博客,原文链接:http://blog.51cto.com/boylook/1335662,如需转载请自行联系原作者