Swap Memory Usage While vm.swappiness=1

Long story short: Even if memory is not overcommitted and vm.swappiness is set to 1, if you run disk IO heavy workloads on Hadoop, also increase vm.vfs_cache_pressure (to 200, say) and vm.min_free_kbytes (to 5242880 kB (=5 GB), say) to further minimize the chance of swapping.

Due to performance considerations, setting the kernel parameter vm.swappiness to 1 is a fundamental preparation step before deploying a Hadoop cluster. Hadoopers also know that vm.swappiness=1 does not really turn off swapping, just says the kernel to relinquish swapping if possible. When memory is not overcommited, it is usually enough to avoid writing pages out to disk.

But sometimes something more is needed. Especially when an application running on the cluster performs disk IO intensive operations. For instance, it writes/reads data to disks at a high rate of speed for a longer period. A typical case is when stressing the cluster by running a Teragen job with replication=1 using as many mappers as the number of disks we have in the data nodes. This type of workload can easily cause the kernel cache to be filled up sooner or later (in a few minutes) as shown by the report of free command.

[root@hdp-worker-1.datavelo.com]# free -h

      total        used        free      shared  buff/cache   available

Mem:   250G        5.7G         18G         31M        226G        244G

Swap:  127G        5.7M        127G

The available RAM for applications is 244 GB, but, at the same time, almost all of the physical memory is occupied by kernel cache, which is under continuous pressure due to the IO intensive job. If now an application quickly wants to allocate a larger amount of memory, there is a high probability that not all the required memory can immediately be allotted without swapping a few kB/MB. Swapping turns the cluster services into red in Cloudera Manager to warn administrators about the potential drop in cluster performance.

The issue can be solved by always reserving some free memory for immediate (no wait) application allocation requests by configuring vm.min_free_kbytes and increasing vm.vfs_cache_pressure parameters. The values should be the lowest possible experimental values, which still do not cause the kernel to swap even in case of IO heavy workloads. Also note that the higher value of vm.vfs_cache_pressure specified, the more aggressively the kernel reclaims memory used by kernel cache, however, memory reclamation has negative impact on performance, so never increase this value too high.

For further details on the above kernel properties refer to this and this articles.

Hope it helps.

2 thoughts on “Swap Memory Usage While vm.swappiness=1”

  1. Good advice except…
    Setting vm.min_free_kbytes to 5368709120 will crash the server as 5368709120 bytes is 5G and the setting is in KB so this will set it to 5120 GB (5.12 TB) the correct setting for vm.min_free_kbytes is 5242880 KB or 5 GB. Cheers

Leave a Reply

Your email address will not be published. Required fields are marked *