Out-Of-Memory KillerIn a Linux world, whenever your database server crashes or gets terminated, you need to find its cause. There can be several reasons for this. It can be SIGSEGV, which is a crash due to some bug in the backend server, but this is the least likely reason. The most common reason is running out of disk space or running out of memory. If you are getting the “running out of space” error, the only solution is to clear some space and restart your database.

Out-Of-Memory Killer

Whenever your server/process is out of memory, Linux has two ways to handle that, the first one is an OS(Linux) crash and your whole system is down, and the second one is to kill the process (application) making the system run out of memory. The best bet for the second option is to kill the process and save the OS from crashing. In short, the Out-Of-Memory Killer is the process which is responsible for terminating the application to save the kernel from crashing, as it only kills the application and saves the entire OS from crashing. Let’s first discuss how OOM and works and how to control that, and later we will discuss how OOM Killer decides which application to kill.

One of the primary jobs of a Linux operating system is to allocate memory to a process when it is requesting memory allocation. In most cases, the process/application will request the OS for memory, but it will not use all of the memory that was requested. If the OS allocates memory to all the processes that are requesting memory but doesn’t plan to use it, it will soon run out of memory – and the system will crash. In order to handle this scenario, the operating system has a feature that enables the OS to commit memory to a process without actually allocating it. The allocation is done only when the process actually plans to use that memory. At times the OS may not have available memory but it will commit the memory to the process, and when the process plans to use the memory, the OS will allocate the committed memory if it is available. The downside of this feature is that the OS will sometimes commit the memory, and at the time of allocation it won’t have available memory to allocate and the system will crash. The OOM plays a vital role in this scenario and kills the process(es) in order to save the kernel from getting a panic attack. Whenever your PostgreSQL process gets killed, you will see this message in the log file:

Whenever the system is low in memory and cannot find free memory space the out_of_memory function will be called. There’s only one thing it can do at this late point to make memory available – kill one (or more) processes. Should OOM-killer immediately kill a process or wait for some time? It is evident that when out_of_memory occurs some times it is due to waiting for IO or because of the wait for a page to swap on disk.  Therefore there must be some checks that need to be performed, and the OOM-killer will decide to terminate a process based on the following checks. If all the checks specified below are true, then the OOM will chime in and kill the process.

Process Selection

Whenever out of memory failure occurs, the out_of_memory() function will be called. Within it the select_bad_process() function is used which gets a score from the badness() function. The most ‘bad’ process is the one that will be sacrificed. There are some rules badness() function follows for the selection of the process.

  1. The kernel needs to obtain a minimum amount of memory for itself
  2. Try to reclaim a large amount of memory
  3. Don’t kill a process using a small amount of memory
  4. Try to kill the minimum number of processes
  5. Some meticulous algorithms that elevate the sacrifice priority on processes the user wants to kill

After all these checklists, the OOM killer checks the score (oom_score). OOM set the “oom_score”  to each process and then multiplies that value with memory usage. The processes with bigger values will have a high probability of getting terminated by the OOM killer. The processes that are associated with the privileged user have a lower score value and have fewer chances to be killed by OOM. 

The Postgres process id is 3813, therefore in another shell, you can get the score value by using this oom_score kernel parameter:

If you really want your process not to be killed by OOM-Killer, then there is another kernel parameter oom_score_adj. You can add a big negative value to that to reduce the chance your process gets killed.

To set the value of oom_score_adj you can set that OOMScoreAdjust in the service unit

or rcctl command’s oomprotect can be used to set that.

Killing a Process

When one or more processes are selected, then OOM-Killer calls the  oom_kill_task() function. This function is responsible to send the terminate/kill signal to the process. In case of out of memory oom_kill() call this function so, it can send the SIGKILL signal to the process. A kernel log message is generated.

How to control OOM-Killer

Linux provides a way to enable and disable the OOM-Killer, but it is not recommended to disable the OOM-killer. Kernel parameter vm.oom-kill is used to enable and disable the OOM-Killer. If you want to enable OOM-Killer runtime, then use sysctl command to enable that.

To disable the OOM-killer use the same command with the value 0:

This command does not set that permanently, and a machine reboot resets that. To set it permanently, add this line in /etc/sysctl.conf file:

The other way to enable or disable is to write the panic_on_oom variable, you can always check the value in /proc.

When you set the value to 0 that means the kernel will not panic when out of memory error occurred.

When you set that value 1 that means the kernel will panic on out of memory error.

There are some more settings for the OOM-Killer other than enabling and disabling.  As we already mentioned that Linux can overcommit the memory to processes with allocating it, this behavior can be controlled by the Linux kernel setting.  The vm.overcommit_memory is variably used to control this behavior. 

The vm_overcommit_memory variable memory can be controlled with the following settings :

0:  Setting the variable to 0, where the kernel will decide whether to overcommit or not. This is the default value for most versions of Linux.

1:  Setting the variable to 1 means that kernel will always overcommit. This is a risky setting because the kernel will always overcommit the memory to processes. This can lead to kernel running out of memory because there is a good chance that processes can end up using the memory committed by the kernel.

2: Setting the variable to 2 means that kernel is not supposed to overcommit memory greater than the overcommit_ratio. This overcommit_ratio is another kernel setting where you specify the percentage of memory kernel can overcommit. If there is no space for overcommit, the memory allocation function fails and overcommit is denied. This is the safest option and recommended value for PostgreSQL.   

The second thing that can affect the OOM-killer is the behavior of swappiness. This behavior can be controlled by variable cat /proc/sys/vm/swappiness. These values specify the kernel setting for handling the swappiness of pages. The bigger the value, the less of the chance OOM kills the process but it affects the database efficiency because of I/O. A smaller value for the variable controlling the swappiness means that there are higher chances for OOM-Killer kicking in, but it also improves the database performance. The default value is 60, but if you entire database fits in memory than it is recommended to set this value to 1.

Summary

You don’t need to be confused by the name Killer (OOM-Killer). The killer is not always harmful; it is a savior for your system. It kills the most culprit process and saves your system from crashing. To avoid having to use OOM-Killer to kill PostgreSQL, it is recommended to set the vm .overcommit_memory value to 2. It will not 100% avoid the OOM-Killer but will reduce the chance to kill the PostgreSQL process.

Ready to learn more about PostgreSQL backup solutions? Get started with Percona Distribution for PostgreSQL today.

4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Francesco

When PS 8.0.16? Upstream is already at .17… That would fix the memory bug… 🙂

Ibrar Ahmed

No, check the kernel documentation.

1 – Always overcommit. Appropriate for some scientific
applications. Classic example is code using sparse arrays
and just relying on the virtual memory consisting almost
entirely of zero pages.

https://www.kernel.org/doc/Documentation/vm/overcommit-accounting

Bruno Lavoie

My error.. it disables handling of by always committing to memory requests… this docs should also add the “always” keyword ☺