Connection Pooling in Pgpool-II

Pgpool-II is a cluster management tool for PostgreSQL that can cache connections to PostgreSQL servers. This blog introduces Pgpool-II connection pooling feature and shows how to configure connection pooling in Pgpool-II.

What is connection pooling?

Establishing and maintaining Database connections are expensive. The reason is that you have to establish a network connection, perform authentication and so on. Database connection pooling is a cache of database connections to keep database connections open so that the connection can be reused when a connection is required by a future request.

Reusing an active connection rather than establishing a new connection each time a connection is requested can improve performance and save system resources.

How does connection pooling work in Pgpool-II?

Pgpool-II caches established connections to the PostgreSQL servers and reuses them whenever a new request with the same properties (i.e. user name, database, protocol version, and other connection parameters if any) comes in.

First, let me explain how connection pooling works in Pgpool-II.
  1. At startup, Pgpool-II parent process preforks num_init_children child processes, and each child process can cache connections up to the configured value of max_pool.
  2. Pgpool-II waits for connection requests from clients.
  3. One Pgpool-II child process receives connection request from a client.
  4. This Pgpool-II child process looks for existing connections which has requested database/user pair in the pool up to max_pool.
  5. If found, reuse it.
  6. If not found, the child process opens new connections to PostgreSQL servers and registers it to the pool. If the pool has no empty slot, Pgpool-II closes the oldest connection to PostgreSQL and reuses the slot.
  7. Do some queries processing until the client sends session close request.
  8. Pgpool-II closes the connection to client but keeps the connection to PostgreSQL servers for future use.
  9. Go to #1

Sometimes users ask how to configure Pgpool-II's num_init_children (the number of preforked Pgpool-II child processes) and PostgreSQL's max_connections. Note that Pgpool-II may establish num_init_children * max_pool connections to each PostgreSQL server.

And PostgreSQL allows concurrent connections for non-superusers up to max_connections - superuser_reserved_connections. In addition, canceling a query creates another connection to PostgreSQL.


Therefore, max_pool, num_init_children, max_connections, superuser_reserved_connections must satisfy the following formula:

(no query canceling neede)
max_pool*num_init_children <= (max_connections - superuser_reserved_connections)

(query canceling needed)
max_pool*num_init_children*2 <= (max_connections - superuser_reserved_connections)
Indicates whether Pgpool-II caches connections.
Default is on.
The maximum number of cached connections in each Pgpool-II child process.
Default is 4.
Specifies the time in seconds to remove the cached connections.
Default is 0.
Specifies the time in seconds to terminate a Pgpool-II child process if it remains idle.
Default is 300 seconds.
Terminate the child process after receiving that many connections.
Default is 0.
Specifies the time to disconnect a client if it remains idle since the last query.
Default is 0.

Here are some useful configurations of the above parameters.

  • If a memory leak is detected in Pgpool-II, you can configure child_life_time parameter to prevent Pgpool-II child processes memory leak. Default value is 300 seconds. If a child process remains idle for more than the time specified in child_life_time, Pgpool-II will terminate the child process and spawn a new child process immediately.
  • To prevent the Pgpool-II child processes from being occupied for long time by a lazy clients or broken TCP/IP connection between client and Pgpool-II, you can set client_idle_limit to a non-zero value. Default value is 0.
However, if your application is using Tomcat JDBC connection pool, Tomcat may create persistent connections to Pgpool-II and will leave lots of idle connections. If Pgpool-II disconnects the connection due to the setting of client_idle_limit, your application will get an error message the next time Tomcat tries to send request to Pgpool-II. In order to eliminate such errors, you need to configure Tomcat parameters to clean up or validate idle connections.

For example:
  • To configure lifetime time of idle connections:
minIdle="0"
timeBetweenEvictionRunsMillis="5000" (default)
minEvictableIdleTimeMillis="60000" (default)

  • To validate connections:
timeBetweenEvictionRunsMillis="5000" (default)
testOnBorrow="true"
testWhileIdle="true"
validationQuery="SELECT 1"

Verify Pgpool-II's connection pooling

Let's verify Pgpool-II's connection pooling.

Edit pgpool.conf to enable connection pooling.

connection_cache = on

Use psql to connect to Pgpool-II and exit.
Note that connections to template0, template1, postgres and regression databases are not cached even if connection_cache is on.

$ psql -h /tmp -p 11000 -U user1 db1
db1=> \q

Verify the connections are cached. One way to verify Pgpool-II's connection pooling is to use SHOW POOL_POOLS command.

In the result of "SHOW POOL_POOLS" below, each row represents one connection.

As you can see in the result, the connections (db1/user1) are cached even if client has closed the connection. About "pool_connected", 0 represents this process is waiting for connection and 1 represents it is being used by frontend. "pool_connected" is useful for checking the number of connections from frontend (Formatted output is needed).

$ psql -h /tmp -p 11000 -U pengbo postgres
postgres=# SHOW POOL_POOLS;
 pool_pid |...| pool_id | backend_id | database | username |     create_time     |...| pool_counter | pool_backendpid | pool_connected
----------+---+---------+------------+----------+----------+---------------------+---+--------------+-----------------+----------------
 11973    |...| 0       | 0          | postgres | pengbo   | 2020-07-29 22:49:13 |...| 1            | 11987           | 1
 11973    |...| 0       | 1          | postgres | pengbo   | 2020-07-29 22:49:13 |...| 1            | 11988           | 1
 11973    |...| 1       | 0          |          |          |                     |...| 0            | 0               | 0
 11973    |...| 1       | 1          |          |          |                     |...| 0            | 0               | 0
 11973    |...| 2       | 0          |          |          |                     |...| 0            | 0               | 0
 11973    |...| 2       | 1          |          |          |                     |...| 0            | 0               | 0
 11974    |...| 0       | 0          |          |          |                     |...| 0            | 0               | 0
 11974    |...| 0       | 1          |          |          |                     |...| 0            | 0               | 0
 11974    |...| 1       | 0          |          |          |                     |...| 0            | 0               | 0
 11974    |...| 1       | 1          |          |          |                     |...| 0            | 0               | 0
 11974    |...| 2       | 0          |          |          |                     |...| 0            | 0               | 0
 11974    |...| 2       | 1          |          |          |                     |...| 0            | 0               | 0
 11975    |...| 0       | 0          | db1      | user1    | 2020-07-29 22:49:16 |...| 1            | 11990           | 0
 11975    |...| 0       | 1          | db1      | user1    | 2020-07-29 22:49:16 |...| 1            | 11991           | 0

 11975    |...| 1       | 0          |          |          |                     |...| 0            | 0               | 0
 11975    |...| 1       | 1          |          |          |                     |...| 0            | 0               | 0
 11975    |...| 2       | 0          |          |          |                     |...| 0            | 0               | 0

Normally we use pg_stat_activity to show the current connections. However, how can we find out the Pgpool-II process PID that corresponds to the backend process? In this case, the result of SHOW POOL_POOLS is very useful to display the Pgpool-II child process PID (pool_pid) which is connecting to backend process PID (pool_backendpid).

Conclusion

In this post I gave an overview of connection pooling and introduced how to configure connection pooling in Pgpool-II. Connection pooling can enhance performance and save your system resources. I encourage you to try Pgpool-II's connection pooling.


Comments

Popular posts from this blog

Installing Pgpool-II on Debian/Ubuntu

Query Load Balancing in Pgpool-II