Long story short: Even if memory is not overcommitted and
vm.swappiness is set to
1, if you run disk IO heavy workloads on Hadoop, also increase
vm.vfs_cache_pressure (to 200, say) and
vm.min_free_kbytes (to 5242880 kB (=5 GB), say) to further minimize the chance of swapping.
Due to performance considerations, setting the kernel parameter
1 is a fundamental preparation step before deploying a Hadoop cluster. Hadoopers also know that
vm.swappiness=1 does not really turn off swapping, just says the kernel to relinquish swapping if possible. When memory is not overcommited, it is usually enough to avoid writing pages out to disk.
But sometimes something more is needed. Especially when an application running on the cluster performs disk IO intensive operations. For instance, it writes/reads data to disks at a high rate of speed for a longer period. A typical case is when stressing the cluster by running a Teragen job with
replication=1 using as many mappers as the number of disks we have in the data nodes. This type of workload can easily cause the kernel cache to be filled up sooner or later (in a few minutes) as shown by the report of free command.
[email@example.com]# free -h total used free shared buff/cache available Mem: 250G 5.7G 18G 31M 226G 244G Swap: 127G 5.7M 127G
The available RAM for applications is 244 GB, but, at the same time, almost all of the physical memory is occupied by kernel cache, which is under continuous pressure due to the IO intensive job. If now an application quickly wants to allocate a larger amount of memory, there is a high probability that not all the required memory can immediately be allotted without swapping a few kB/MB. Swapping turns the cluster services into red in Cloudera Manager to warn administrators about the potential drop in cluster performance.
The issue can be solved by always reserving some free memory for immediate (no wait) application allocation requests by configuring
vm.min_free_kbytes and increasing
vm.vfs_cache_pressure parameters. The values should be the lowest possible experimental values, which still do not cause the kernel to swap even in case of IO heavy workloads. Also note that the higher value of
vm.vfs_cache_pressure specified, the more aggressively the kernel reclaims memory used by kernel cache, however, memory reclamation has negative impact on performance, so never increase this value too high.
Hope it helps.