Quick Links

Let's make PostgreSQL multi-threaded

Lists:	pgsql-hackers

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Let's make PostgreSQL multi-threaded
Date:	2023-06-05 14:51:57
Message-ID:	31cc6df9-53fe-3cd9-af5b-ac0d801163f4@iki.fi
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I spoke with some folks at PGCon about making PostgreSQL multi-threaded,
so that the whole server runs in a single process, with multiple
threads. It has been discussed many times in the past, last thread on
pgsql-hackers was back in 2017 when Konstantin made some experiments [0].

I feel that there is now pretty strong consensus that it would be a good
thing, more so than before. Lots of work to get there, and lots of
details to be hashed out, but no objections to the idea at a high level.

The purpose of this email is to make that silent consensus explicit. If
you have objections to switching from the current multi-process
architecture to a single-process, multi-threaded architecture, please
speak up.

If there are no major objections, I'm going to update the developer FAQ,
removing the excuses there for why we don't use threads [1]. And we can
start to talk about the path to get there. Below is a list of some
hurdles and proposed high-level solutions. This isn't an exhaustive
list, just some of the most obvious problems:

# Transition period

The transition surely cannot be done fully in one release. Even if we
could pull it off in core, extensions will need more time to adapt.
There will be a transition period of at least one release, probably
more, where you can choose multi-process or multi-thread model using a
GUC. Depending on how it goes, we can document it as experimental at first.

# Thread per connection

To get started, it's most straightforward to have one thread per
connection, just replacing backend process with a backend thread. In the
future, we might want to have a thread pool with some kind of a
scheduler to assign active queries to worker threads. Or multiple
threads per connection, or spawn additional helper threads for specific
tasks. But that's future work.

# Global variables

We have a lot of global and static variables:

$ objdump -t bin/postgres | grep -e "\.data" -e "\.bss" | grep -v
"data.rel.ro" | wc -l
1666

Some of them are pointers to shared memory structures and can stay as
they are. But many of them are per-connection state. The most
straightforward conversion for those is to turn them into thread-local
variables, like Konstantin did in [0].

It might be good to have some kind of a Session context struct that we
pass everywhere, or maybe have a single thread-local variable to hold
it. Many of the global variables would become fields in the Session. But
that's future work.

# Extensions

A lot of extensions also contain global variables or other things that
break in a multi-threaded environment. We need a way to label extensions
that support multi-threading. And in the future, also extensions that
*require* a multi-threaded server.

Let's add flags to the control file to mark if the extension is
thread-safe and/or process-safe. If you try to load an extension that's
not compatible with the server's mode, throw an error.

We might need new functions in addition _PG_init, called at connection
startup and shutdown. And background worker API probably needs some changes.

# Exposed PIDs

We expose backend process PIDs to users in a few places.
pg_stat_activity.pid and pg_terminate_backend(), for example. They need
to be replaced, or we can assign a fake PID to each connection when
running in multi-threaded mode.

# Signals

We use signals for communication between backends. SIGURG in latches,
and SIGUSR1 in procsignal, for example. Those primitives need to be
rewritten with some other signalling mechanism in multi-threaded mode.
In principle, it's possible to set per-thread signal handlers, and send
a signal to a particular thread (pthread_kill), but I think it's better
to just rewrite them.

We also document that you can send SIGINT, SIGTERM or SIGHUP to an
individual backend process. I think we need to deprecate that, and maybe
come up with some convenient replacement. E.g. send a message with
backend ID to a unix domain socket, and a new pg_kill executable to send
those messages.

# Restart on crash

If a backend process crashes, postmaster terminates all other backends
and restarts the system. That's hard (impossible?) to do safely if
everything runs in one process. We can continue have a separate
postmaster process that just monitors the main process and restarts it
on crash.

# Thread-safe libraries

Need to switch to thread-safe versions of library functions, e.g.
uselocale() instead of setlocale().

The Python interpreter has a Global Interpreter Lock. It's not possible
to create two completely independent Python interpreters in the same
process, there will be some lock contention on the GIL. Fortunately, the
python community just accepted https://peps.python.org/pep-0684/. That's
exactly what we need: it makes it possible for separate interpreters to
have their own GILs. It's not clear to me if that's in Python 3.12
already, or under development for some future version, but by the time
we make the switch in Postgres, there probably will be a solution in
cpython.

At a quick glance, I think perl and TCL are fine, you can have multiple
interpreters in one process. Need to check any other libraries we use.

[0]
https://www.postgresql.org/message-id/flat/9defcb14-a918-13fe-4b80-a0b02ff85527%40postgrespro.ru

[1]
https://wiki.postgresql.org/wiki/Developer_FAQ#Why_don.27t_you_use_raw_devices.2C_async-I.2FO.2C_.3Cinsert_your_favorite_wizz-bang_feature_here.3E.3F

--
Heikki Linnakangas
Neon (https://neon.tech)

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-05 15:18:27
Message-ID:	4178104.1685978307@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <hlinnaka(at)iki(dot)fi> writes:
> I spoke with some folks at PGCon about making PostgreSQL multi-threaded,
> so that the whole server runs in a single process, with multiple
> threads. It has been discussed many times in the past, last thread on
> pgsql-hackers was back in 2017 when Konstantin made some experiments [0].

> I feel that there is now pretty strong consensus that it would be a good
> thing, more so than before. Lots of work to get there, and lots of
> details to be hashed out, but no objections to the idea at a high level.

> The purpose of this email is to make that silent consensus explicit. If
> you have objections to switching from the current multi-process
> architecture to a single-process, multi-threaded architecture, please
> speak up.

For the record, I think this will be a disaster. There is far too much
code that will get broken, largely silently, and much of it is not
under our control.

regards, tom lane

From:	"Tristan Partin" <tristan(at)neon(dot)tech>
To:	"Heikki Linnakangas" <hlinnaka(at)iki(dot)fi>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-05 15:28:50
Message-ID:	CT4TNFJFKOY3.22XP8JDT7U0C5@gonk
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon Jun 5, 2023 at 9:51 AM CDT, Heikki Linnakangas wrote:
> # Global variables
>
> We have a lot of global and static variables:
>
> $ objdump -t bin/postgres | grep -e "\.data" -e "\.bss" | grep -v
> "data.rel.ro" | wc -l
> 1666
>
> Some of them are pointers to shared memory structures and can stay as
> they are. But many of them are per-connection state. The most
> straightforward conversion for those is to turn them into thread-local
> variables, like Konstantin did in [0].
>
> It might be good to have some kind of a Session context struct that we
> pass everywhere, or maybe have a single thread-local variable to hold
> it. Many of the global variables would become fields in the Session. But
> that's future work.

+1 to the session context idea after the more simple thread_local
storage idea.

> # Extensions
>
> A lot of extensions also contain global variables or other things that
> break in a multi-threaded environment. We need a way to label extensions
> that support multi-threading. And in the future, also extensions that
> *require* a multi-threaded server.
>
> Let's add flags to the control file to mark if the extension is
> thread-safe and/or process-safe. If you try to load an extension that's
> not compatible with the server's mode, throw an error.
>
> We might need new functions in addition _PG_init, called at connection
> startup and shutdown. And background worker API probably needs some changes.

It would be a good idea to start exposing a variable through pkg-config
to tell whether the backend is multi-threaded or multi-process.

> # Exposed PIDs
>
> We expose backend process PIDs to users in a few places.
> pg_stat_activity.pid and pg_terminate_backend(), for example. They need
> to be replaced, or we can assign a fake PID to each connection when
> running in multi-threaded mode.

Would it be possible to just transparently slot in the thread ID
instead?

> # Thread-safe libraries
>
> Need to switch to thread-safe versions of library functions, e.g.
> uselocale() instead of setlocale().

Seems like a good starting point.

> The Python interpreter has a Global Interpreter Lock. It's not possible
> to create two completely independent Python interpreters in the same
> process, there will be some lock contention on the GIL. Fortunately, the
> python community just accepted https://peps.python.org/pep-0684/. That's
> exactly what we need: it makes it possible for separate interpreters to
> have their own GILs. It's not clear to me if that's in Python 3.12
> already, or under development for some future version, but by the time
> we make the switch in Postgres, there probably will be a solution in
> cpython.

3.12 is the currently in-development version of Python. 3.12 is planned
for release in October of this year.

A workaround that some projects seem to do is to use multiple Python
interpreters[0], though it seems uncommon. It might be important to note
depending on the minimum version of Python Postgres aims to support (not
sure on this policy).

The C-API of Python also provides mechanisms for releasing the GIL. I am
not familiar with how Postgres uses Python, but I have seen huge
improvements to performance with well-placed GIL releases in
multi-threaded contexts. Surely this API would just become a no-op after
the PEP is implemented.

[0]: https://peps.python.org/pep-0684/#existing-use-of-multiple-interpreters

--
Tristan Partin
Neon (https://neon.tech)

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-05 15:33:57
Message-ID:	4ce6c0f8-e8a4-1672-93fd-49d3fa975ee5@iki.fi
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 05/06/2023 11:18, Tom Lane wrote:
> Heikki Linnakangas <hlinnaka(at)iki(dot)fi> writes:
>> I spoke with some folks at PGCon about making PostgreSQL multi-threaded,
>> so that the whole server runs in a single process, with multiple
>> threads. It has been discussed many times in the past, last thread on
>> pgsql-hackers was back in 2017 when Konstantin made some experiments [0].
>
>> I feel that there is now pretty strong consensus that it would be a good
>> thing, more so than before. Lots of work to get there, and lots of
>> details to be hashed out, but no objections to the idea at a high level.
>
>> The purpose of this email is to make that silent consensus explicit. If
>> you have objections to switching from the current multi-process
>> architecture to a single-process, multi-threaded architecture, please
>> speak up.
>
> For the record, I think this will be a disaster. There is far too much
> code that will get broken, largely silently, and much of it is not
> under our control.

Noted. Other large projects have gone through this transition. It's not
easy, but it's a lot easier now than it was 10 years ago. The platform
and compiler support is there now, all libraries have thread-safe
interfaces, etc.

I don't expect you or others to buy into any particular code change at
this point, or to contribute time into it. Just to accept that it's a
worthwhile goal. If the implementation turns out to be a disaster, then
it won't be accepted, of course. But I'm optimistic.

--
Heikki Linnakangas
Neon (https://neon.tech)

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	Tristan Partin <tristan(at)neon(dot)tech>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-05 15:43:54
Message-ID:	3185fb3b-bbff-4b3e-78c1-3fb9befe6ef8@iki.fi
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 05/06/2023 11:28, Tristan Partin wrote:
> On Mon Jun 5, 2023 at 9:51 AM CDT, Heikki Linnakangas wrote:
>> # Extensions
>>
>> A lot of extensions also contain global variables or other things that
>> break in a multi-threaded environment. We need a way to label extensions
>> that support multi-threading. And in the future, also extensions that
>> *require* a multi-threaded server.
>>
>> Let's add flags to the control file to mark if the extension is
>> thread-safe and/or process-safe. If you try to load an extension that's
>> not compatible with the server's mode, throw an error.
>>
>> We might need new functions in addition _PG_init, called at connection
>> startup and shutdown. And background worker API probably needs some changes.
>
> It would be a good idea to start exposing a variable through pkg-config
> to tell whether the backend is multi-threaded or multi-process.

I think we need to support both modes without having to recompile the
server or the extensions. So it needs to be a runtime check.

>> # Exposed PIDs
>>
>> We expose backend process PIDs to users in a few places.
>> pg_stat_activity.pid and pg_terminate_backend(), for example. They need
>> to be replaced, or we can assign a fake PID to each connection when
>> running in multi-threaded mode.
>
> Would it be possible to just transparently slot in the thread ID
> instead?

Perhaps. It might break applications that use the PID directly with e.g.
'kill <PID>', though.

>> The Python interpreter has a Global Interpreter Lock. It's not possible
>> to create two completely independent Python interpreters in the same
>> process, there will be some lock contention on the GIL. Fortunately, the
>> python community just accepted https://peps.python.org/pep-0684/. That's
>> exactly what we need: it makes it possible for separate interpreters to
>> have their own GILs. It's not clear to me if that's in Python 3.12
>> already, or under development for some future version, but by the time
>> we make the switch in Postgres, there probably will be a solution in
>> cpython.
>
> 3.12 is the currently in-development version of Python. 3.12 is planned
> for release in October of this year.
>
> A workaround that some projects seem to do is to use multiple Python
> interpreters[0], though it seems uncommon. It might be important to note
> depending on the minimum version of Python Postgres aims to support (not
> sure on this policy).
>
> The C-API of Python also provides mechanisms for releasing the GIL. I am
> not familiar with how Postgres uses Python, but I have seen huge
> improvements to performance with well-placed GIL releases in
> multi-threaded contexts. Surely this API would just become a no-op after
> the PEP is implemented.
>
> [0]: https://peps.python.org/pep-0684/#existing-use-of-multiple-interpreters

Oh, cool. I'm inclined to jump straight to PEP-684 and require python
3.12 in multi-threaded mode, though, or just accept that it's slow. But
let's see what the state of the world is when we get there.

--
Heikki Linnakangas
Neon (https://neon.tech)

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-05 17:10:52
Message-ID:	ZH4XHEIT0NVHUIDs@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

nOn Mon, Jun 5, 2023 at 05:51:57PM +0300, Heikki Linnakangas wrote:
> # Restart on crash
>
> If a backend process crashes, postmaster terminates all other backends and
> restarts the system. That's hard (impossible?) to do safely if everything
> runs in one process. We can continue have a separate postmaster process that
> just monitors the main process and restarts it on crash.

It would be good to know what new class of errors would cause server
restarts, e.g., memory allocation failures?

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-05 17:29:16
Message-ID:	14ee96fe-817a-ff77-95c1-6b5bc0efa616@iki.fi
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 05/06/2023 13:10, Bruce Momjian wrote:
> nOn Mon, Jun 5, 2023 at 05:51:57PM +0300, Heikki Linnakangas wrote:
>> # Restart on crash
>>
>> If a backend process crashes, postmaster terminates all other backends and
>> restarts the system. That's hard (impossible?) to do safely if everything
>> runs in one process. We can continue have a separate postmaster process that
>> just monitors the main process and restarts it on crash.
>
> It would be good to know what new class of errors would cause server
> restarts, e.g., memory allocation failures?

You mean "out of memory"? No, that would be horrible.

I don't think there would be any new class of errors that would cause
server restarts. In theory, having a separate address space for each
backend gives you some protection. In practice, there are a lot of
shared memory structures anyway that you can stomp over, and a segfault
or unexpected exit of any backend process causes postmaster to restart
the whole system anyway.

--
Heikki Linnakangas
Neon (https://neon.tech)

From:	"Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-05 17:40:13
Message-ID:	cd27db0c-01ae-ba4d-f141-8712cbb4188c@postgresql.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 6/5/23 11:33 AM, Heikki Linnakangas wrote:
> On 05/06/2023 11:18, Tom Lane wrote:
>> Heikki Linnakangas <hlinnaka(at)iki(dot)fi> writes:
>>> I spoke with some folks at PGCon about making PostgreSQL multi-threaded,
>>> so that the whole server runs in a single process, with multiple
>>> threads. It has been discussed many times in the past, last thread on
>>> pgsql-hackers was back in 2017 when Konstantin made some experiments
>>> [0].
>>
>>> I feel that there is now pretty strong consensus that it would be a good
>>> thing, more so than before. Lots of work to get there, and lots of
>>> details to be hashed out, but no objections to the idea at a high level.
>>
>>> The purpose of this email is to make that silent consensus explicit. If
>>> you have objections to switching from the current multi-process
>>> architecture to a single-process, multi-threaded architecture, please
>>> speak up.
>>
>> For the record, I think this will be a disaster. There is far too much
>> code that will get broken, largely silently, and much of it is not
>> under our control.
>
> Noted. Other large projects have gone through this transition. It's not
> easy, but it's a lot easier now than it was 10 years ago. The platform
> and compiler support is there now, all libraries have thread-safe
> interfaces, etc.
>
> I don't expect you or others to buy into any particular code change at
> this point, or to contribute time into it. Just to accept that it's a
> worthwhile goal. If the implementation turns out to be a disaster, then
> it won't be accepted, of course. But I'm optimistic.

I don't have enough expertise in this area to comment on if it'd be a
"disaster" or not. My zoomed out observations are two-fold:

1. It seems like there's a lack of consensus on which of processes vs.
threads yield the best performance benefit, and from talking to folks
with greater expertise than me, this can vary between workloads. I
believe one DB even gives uses a choice if they want to run in processes
vs. threads.

2. While I wouldn't want to necessarily discourage a moonshot effort, I
would ask if developer time could be better spent on tackling some of
the other problems around vertical scalability? Per some PGCon
discussions, there's still room for improvement in how PostgreSQL can
best utilize resources available very large "commodity" machines (a
448-core / 24TB RAM instance comes to mind).

I'm purposely giving a nonanswer on whether it's a worthwhile goal, but
rather I'd be curious where it could stack up against some other efforts
to continue to help PostgreSQL improve performance and handle very large
workloads.

Thanks,

Jonathan

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-05 18:04:01
Message-ID:	ZH4jkbtflmhgyyue@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 5, 2023 at 08:29:16PM +0300, Heikki Linnakangas wrote:
> On 05/06/2023 13:10, Bruce Momjian wrote:
> > nOn Mon, Jun 5, 2023 at 05:51:57PM +0300, Heikki Linnakangas wrote:
> > > # Restart on crash
> > >
> > > If a backend process crashes, postmaster terminates all other backends and
> > > restarts the system. That's hard (impossible?) to do safely if everything
> > > runs in one process. We can continue have a separate postmaster process that
> > > just monitors the main process and restarts it on crash.
> >
> > It would be good to know what new class of errors would cause server
> > restarts, e.g., memory allocation failures?
>
> You mean "out of memory"? No, that would be horrible.
>
> I don't think there would be any new class of errors that would cause server
> restarts. In theory, having a separate address space for each backend gives
> you some protection. In practice, there are a lot of shared memory
> structures anyway that you can stomp over, and a segfault or unexpected exit
> of any backend process causes postmaster to restart the whole system anyway.

Uh, yes, but don't we detect failures while modifying shared memory and
force a restart? Wouldn't the scope of failures be much larger?

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-05 18:30:28
Message-ID:	9e8a5399-29da-e517-81c6-9b78d9c39804@iki.fi
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 05/06/2023 14:04, Bruce Momjian wrote:
> On Mon, Jun 5, 2023 at 08:29:16PM +0300, Heikki Linnakangas wrote:
>> I don't think there would be any new class of errors that would cause server
>> restarts. In theory, having a separate address space for each backend gives
>> you some protection. In practice, there are a lot of shared memory
>> structures anyway that you can stomp over, and a segfault or unexpected exit
>> of any backend process causes postmaster to restart the whole system anyway.
>
> Uh, yes, but don't we detect failures while modifying shared memory and
> force a restart? Wouldn't the scope of failures be much larger?

If one process writes over shared memory that it shouldn't, it can cause
a crash in that process or some other process that reads it. Same with
multiple threads, no difference there.

With a single process, one thread can modify another thread's "backend
private" memory, and cause the other thread to crash. Perhaps that's
what you meant?

In practice, I don't think it's so bad. Even in a multi-threaded
environment, common bugs like buffer overflows and use-after-free are
still much more likely to access memory owned by the same thread, thanks
to how memory allocators work. And a completely random memory access is
still more likely to cause a segfault than corrupting another thread's
memory. And tools like CLOBBER_FREED_MEMORY/MEMORY_CONTEXT_CHECKING and
valgrind are pretty good at catching memory access bugs at development
time, whether it's multiple processes or threads.

--
Heikki Linnakangas
Neon (https://neon.tech)

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-05 18:51:50
Message-ID:	31ec84ad-c10c-9351-bf9f-19679c832b73@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2023-06-05 Mo 11:18, Tom Lane wrote:
> Heikki Linnakangas<hlinnaka(at)iki(dot)fi> writes:
>> I spoke with some folks at PGCon about making PostgreSQL multi-threaded,
>> so that the whole server runs in a single process, with multiple
>> threads. It has been discussed many times in the past, last thread on
>> pgsql-hackers was back in 2017 when Konstantin made some experiments [0].
>> I feel that there is now pretty strong consensus that it would be a good
>> thing, more so than before. Lots of work to get there, and lots of
>> details to be hashed out, but no objections to the idea at a high level.
>> The purpose of this email is to make that silent consensus explicit. If
>> you have objections to switching from the current multi-process
>> architecture to a single-process, multi-threaded architecture, please
>> speak up.
> For the record, I think this will be a disaster. There is far too much
> code that will get broken, largely silently, and much of it is not
> under our control.
>
>

If we were starting out today we would probably choose a threaded
implementation. But moving to threaded now seems to me like a
multi-year-multi-person project with the prospect of years to come
chasing bugs and the prospect of fairly modest advantages. The risk to
reward doesn't look great.

That's my initial reaction. I could be convinced otherwise.

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

From:	Joe Conway <mail(at)joeconway(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-05 20:08:24
Message-ID:	16a3acca-4dd2-cbcd-d078-1e01802a0b74@joeconway.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 6/5/23 14:51, Andrew Dunstan wrote:
>
> On 2023-06-05 Mo 11:18, Tom Lane wrote:
>> Heikki Linnakangas<hlinnaka(at)iki(dot)fi> writes:
>>> I spoke with some folks at PGCon about making PostgreSQL multi-threaded,
>>> so that the whole server runs in a single process, with multiple
>>> threads. It has been discussed many times in the past, last thread on
>>> pgsql-hackers was back in 2017 when Konstantin made some experiments [0].
>>> I feel that there is now pretty strong consensus that it would be a good
>>> thing, more so than before. Lots of work to get there, and lots of
>>> details to be hashed out, but no objections to the idea at a high level.
>>> The purpose of this email is to make that silent consensus explicit. If
>>> you have objections to switching from the current multi-process
>>> architecture to a single-process, multi-threaded architecture, please
>>> speak up.
>> For the record, I think this will be a disaster. There is far too much
>> code that will get broken, largely silently, and much of it is not
>> under our control.
>
> If we were starting out today we would probably choose a threaded
> implementation. But moving to threaded now seems to me like a
> multi-year-multi-person project with the prospect of years to come
> chasing bugs and the prospect of fairly modest advantages. The risk to
> reward doesn't look great.
>
> That's my initial reaction. I could be convinced otherwise.

I read through the thread thus far, and Andrew's response is the one
that best aligns with my reaction.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

From:	"Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-05 21:07:52
Message-ID:	CADUqk8UyyiLsVSFB+6LYwvRos123-333YT5d3eZ-eUBvXVyDBQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 5, 2023 at 8:18 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> For the record, I think this will be a disaster. There is far too much
> code that will get broken, largely silently, and much of it is not
> under our control.
>

While I've long been in favor of a multi-threaded implementation, now in my
old age, I tend to agree with Tom. I'd be interested in Konstantin's
thoughts (and PostgresPro's experience) of multi-threaded vs. internal
pooling with the current process-based model. I recall looking at and
playing with Konstantin's implementations of both, which were impressive.
Yes, the latter doesn't solve the same issues, but many real-world ones
where multi-threaded is argued. Personally, I think there would be not only
a significant amount of time spent dealing with in-the-field stability
regressions before a multi-threaded implementation matures, but it would
also increase the learning curve for anyone trying to start with internals
development.

--
Jonah H. Harris

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-05 23:26:15
Message-ID:	ZH5vFw3SHyHRNb6G@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 5, 2023 at 09:30:28PM +0300, Heikki Linnakangas wrote:
> If one process writes over shared memory that it shouldn't, it can cause a
> crash in that process or some other process that reads it. Same with
> multiple threads, no difference there.
>
> With a single process, one thread can modify another thread's "backend
> private" memory, and cause the other thread to crash. Perhaps that's what
> you meant?
>
> In practice, I don't think it's so bad. Even in a multi-threaded
> environment, common bugs like buffer overflows and use-after-free are still
> much more likely to access memory owned by the same thread, thanks to how
> memory allocators work. And a completely random memory access is still more
> likely to cause a segfault than corrupting another thread's memory. And
> tools like CLOBBER_FREED_MEMORY/MEMORY_CONTEXT_CHECKING and valgrind are
> pretty good at catching memory access bugs at development time, whether it's
> multiple processes or threads.

I remember we used to have macros we called before we modified critical
parts of shared memory, and if a process exited while in those blocks,
the server would restart. Unfortunately, I can't find that in the code
now.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-05 23:50:11
Message-ID:	CAH2-WznPEvcnP3RShC3M2GvwVyY1Ww660=X048CBSgepK46VXw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 5, 2023 at 4:26 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> I remember we used to have macros we called before we modified critical
> parts of shared memory, and if a process exited while in those blocks,
> the server would restart. Unfortunately, I can't find that in the code
> now.

Isn't that what we call a critical section? They effectively "promote"
any ERROR (e.g., from an OOM) into a PANIC.

I thought that we only used critical sections for things that are
WAL-logged, but I double checked just now. Turns out that I was wrong:
PGSTAT_BEGIN_WRITE_ACTIVITY() contains its own START_CRIT_SECTION(),
despite not being involved in WAL logging. And so critical sections
could indeed be described as something that we use whenever shared
memory cannot be left in an inconsistent state (which often coincides
with WAL logging, but need not).

--
Peter Geoghegan

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-06 00:15:56
Message-ID:	ZH56vC2CSXUvTsqU@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 5, 2023 at 04:50:11PM -0700, Peter Geoghegan wrote:
> On Mon, Jun 5, 2023 at 4:26 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > I remember we used to have macros we called before we modified critical
> > parts of shared memory, and if a process exited while in those blocks,
> > the server would restart. Unfortunately, I can't find that in the code
> > now.
>
> Isn't that what we call a critical section? They effectively "promote"
> any ERROR (e.g., from an OOM) into a PANIC.
>
> I thought that we only used critical sections for things that are
> WAL-logged, but I double checked just now. Turns out that I was wrong:
> PGSTAT_BEGIN_WRITE_ACTIVITY() contains its own START_CRIT_SECTION(),
> despite not being involved in WAL logging. And so critical sections
> could indeed be described as something that we use whenever shared
> memory cannot be left in an inconsistent state (which often coincides
> with WAL logging, but need not).

Yes, sorry, critical sections is what I was remembering. My question is
whether all unexpected backend exits should be treated as critical
sections?

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

From:	Jeremy Schneider <schneider(at)ardentperf(dot)com>
To:	"Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-06 00:27:11
Message-ID:	87feba4c-537c-9778-9175-289b76971aac@ardentperf.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 6/5/23 2:07 PM, Jonah H. Harris wrote:
> On Mon, Jun 5, 2023 at 8:18 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us
> <mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us>> wrote:
>
> For the record, I think this will be a disaster. There is far too much
> code that will get broken, largely silently, and much of it is not
> under our control.
>
>
> While I've long been in favor of a multi-threaded implementation, now in
> my old age, I tend to agree with Tom. I'd be interested in Konstantin's
> thoughts (and PostgresPro's experience) of multi-threaded vs. internal
> pooling with the current process-based model. I recall looking at and
> playing with Konstantin's implementations of both, which were
> impressive. Yes, the latter doesn't solve the same issues, but many
> real-world ones where multi-threaded is argued. Personally, I think
> there would be not only a significant amount of time spent dealing with
> in-the-field stability regressions before a multi-threaded
> implementation matures, but it would also increase the learning curve
> for anyone trying to start with internals development.

To me, processes feel just a little easier to observe and inspect, a
little easier to debug, and a little easier to reason about. Tooling
does exist for threads - but operating systems track more things at a
process level and I like having the full arsenal of unix process-based
tooling at my disposal.

Even simple things, like being able to see at a glance from "ps" or
"top" output which process is the bgwriter or the checkpointer, and
being able to attach gdb only on that process without pausing the whole
system. Or to a single backend.

A thread model certainly has advantages but I do feel that some useful
things might be lost here.

And for the record, just within the past few weeks I saw a small mistake
in some C code which smashed the stack of another thread in the same
process space. It manifested as unpredictable periodic random SIGSEGV
and SIGBUS with core dumps that were useless gibberish, and it was
rather difficult to root cause.

But one interesting outcome of that incident was learning from my
colleague Josh that apparently SUSv2 and C99 contradict each other: when
snprintf() is called with size=0 then SUSv2 stipulates an unspecified
return value less than 1, while C99 allows str to be NULL in this case,
and gives the return value (as always) as the number of characters that
would have been written in case the output string has been large enough.

So long story short... I think the robustness angle on the process model
shouldn't be underestimated either.

-Jeremy

--
http://about.me/jeremy_schneider

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-06 00:50:04
Message-ID:	CAH2-WzkFGDrcm=khqQv45V+AoozveJZWhYp=1AKny4NO4fbM+A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 5, 2023 at 5:15 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > Isn't that what we call a critical section? They effectively "promote"
> > any ERROR (e.g., from an OOM) into a PANIC.

> Yes, sorry, critical sections is what I was remembering. My question is
> whether all unexpected backend exits should be treated as critical
> sections?

I think that it boils down to this: critical sections help us to avoid
various inconsistencies that might otherwise be introduced to critical
state, usually in shared memory. And so critical sections are mostly
about protecting truly crucial state, even in the presence of
irrecoverable problems (e.g., those caused by corruption that was
missed before the critical section was reached, fsync() reporting
failure on recent Postgres versions). This is mostly about the state
itself -- it's not about cleaning up from routine errors at all. The
server isn't supposed to PANIC, and won't unless some fundamental
assumption that the system makes isn't met.

I said that an OOM could cause a PANIC. But that really shouldn't be
possible in practice, since it can only happen when code in a critical
section actually attempts to allocate memory in the first place. There
is an assertion in palloc() that will catch code that violates that
rule. It has been known to happen from time to time, but theoretically
it should never happen.

Discussion about the robustness of threads versus processes seems to
only be concerned with what can happen after something "impossible"
takes place. Not before. Backend code is not supposed to corrupt
memory, whether shared or local, with or without threads. Code in
critical sections isn't supposed to even attempt memory allocation.
Jeremy and others have suggested that processes have significant
robustness advantages. Maybe they do, but it's hard to say either way
because these benefits only apply "when the impossible happens". In
any given case it's reasonable to wonder if the user was protected by
our multi-process architecture, or protected by dumb luck. Could even
be both.

--
Peter Geoghegan

From:	Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
To:	andrew(at)dunslane(dot)net
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, hlinnaka(at)iki(dot)fi, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-06 01:30:05
Message-ID:	20230606.103005.1460825148194820240.t-ishii@sranhm.sra.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>> For the record, I think this will be a disaster. There is far too
>> much
>> code that will get broken, largely silently, and much of it is not
>> under our control.
>>
>>
>
>
> If we were starting out today we would probably choose a threaded
> implementation. But moving to threaded now seems to me like a
> multi-year-multi-person project with the prospect of years to come
> chasing bugs and the prospect of fairly modest advantages. The risk to
> reward doesn't look great.

+1.

Long time ago (PostgreSQL 7 days) I modified PostgreSQL to threaded
implementation so that it runs on Windows because there's was no
Windows port of PostgreSQL at that time. I don't remember the details
but it was desperately hard for me.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

From:	Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-06 12:06:19
Message-ID:	4658520e-5cd0-6242-e54c-c3af0a85890a@garret.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 06.06.2023 12:07 AM, Jonah H. Harris wrote:
> On Mon, Jun 5, 2023 at 8:18 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> For the record, I think this will be a disaster. There is far too
> much
> code that will get broken, largely silently, and much of it is not
> under our control.
>
>
> While I've long been in favor of a multi-threaded implementation, now
> in my old age, I tend to agree with Tom. I'd be interested in
> Konstantin's thoughts (and PostgresPro's experience) of multi-threaded
> vs. internal pooling with the current process-based model. I recall
> looking at and playing with Konstantin's implementations of both,
> which were impressive. Yes, the latter doesn't solve the same issues,
> but many real-world ones where multi-threaded is argued. Personally, I
> think there would be not only a significant amount of time spent
> dealing with in-the-field stability regressions before a
> multi-threaded implementation matures, but it would also increase the
> learning curve for anyone trying to start with internals development.
>
> --
> Jonah H. Harris
>

Let me share my experience with porting Postgres to threads (by the way
- repository is still alive -
https://github.com/postgrespro/postgresql.pthreads
<https://github.com/postgrespro/postgresql.pthreads>
but I have not keep it in sync with recent versions of Postgres).

1. Solving the problem with static variables was not so difficult as I
expected - thanks to TLS and its support in modern compilers.
So the only thing we should do is to add some special modified to
variable declaration:

-static int MyLockNo = 0;
-static bool holdingAllLocks = false;
+static session_local int MyLockNo = 0;
+static session_local bool holdingAllLocks = false;

But there are about 2k such variables which storage class has to be changed.
This is one of the reasons why I do not agree with the proposal to
define some session context, place all session specific variables in
such context and pass it everywhere. It will be very inconvenient to
maintain structure with 2k fields and adding new field to this struct
each time you need some non-local variable. Even i it can be hide in
some macros like DEF_SESSION_VAR(type, name).
Also it requires changing of all Postgres code working with this
variables, not just declarations.
So patch will be 100x times more and almost any line of Postgres code
has to be changed.
And I do not see any reasons for it except portability and avoid
dependecy on compiler.
Implementation of TLS is quite efficient (at least at x86) - there is
special register pointing to TLS area, so access TLS variable is not
more expensive than static variable.

2. Performance improvement from switching to threads was not so large
(~10%). But please notice that I have not changed ny Postgres sync
primitives.
(But still not sure that using for example pthead_rwlock instead of our
own LWLock will cause some gains in performance)

3. Multithreading significantly simplify concurrent query execution and
interaction between workers.
Right now with dynamic shared memory stuff we can support work with
varying size data in shared memory but
in mutithreaded program it can be done much easier.

4. Multuthreaded model opens a way for fixing many existed Postgres
problems: lack of shared catalog and prepared statements cache, changing
page pool size (shared buffers) in runtime, ...

5. During this porting I had most of troubles with the following
components: GUCs, signals, handling errors and file descriptor cache.
File descriptor cache really becomes bottleneck because now all backends
and competing for file descriptors which number is usually limited by
1024 (without editing system configuration). Protecting it with mutex
cause significant degrade of performance. So I have to maintain
thread-local cache.

6. It is not clear how to support external extensions.

7. It will be hard to support non-multithreaded PL languages (like
python), but for example support of Java will be more natural and efficient.

I do not think that development of multithreaded application is more
complex or requires large "learning curve".
When you deal with parallel execution you should be careful in any case.
The advantage of process model is that there is much clear distinction
between shared and private variables.
Concerning debugging and profiling - it is more convenient with
multithreading in some cases and less convenient in other.
But programmers are living with threads for more than 30 years so now
most tools are supporting threads at least not worse than processes.
And for many developers now working with threads is more natural and
convenient.

OOM and local backend memory consumption seems to be one of the main
challenges for multithreadig model:
right now some queries can cause high consumption of memory. work_mem is
just a hint and real memory consumption can be much higher.
Even if it doesn't cause OOM, still not all of the allocated memory is
returned to OS after query completion and increase memory fragmentation.
Right now restart of single backend suffering from memory fragmentation
eliminates this problem. But if will be impossible for multhreaded Postgres.

So? as I see from this thread, most of authoritative members of Postgres
community are still very pessimistic (or conservative:)
about making Postgres multi-threaded. And it is really huge work which
will cause significant code discrepancy. It significantly complicates
backpatching and support of external extension. It can not be done
without support and approval by most of committers. This is why this
work was stalled in PgPro.

My personal opinion is that Postgres almost reaches its "limit of
evolution" or is close to it.
Making some major changes such as multithreading, undo log, columnar
store with vector executor
requires so much changes and cause so many conflicts with existed code
that it will be easier to develop new system from scratch
rather than trying to plugin new approach in old architecture. May be I
wrong. It can be my personal fault that I was not able to bring
multithread Postgres, builtin connection pooler, vectorized executor,
libpq compression and other my PRs to commit.
I have a filling that it is not possible to merge in mainstream
something non-trivial, affecting Postgres core without interest and help
of several
committers. Fro the other hand presence of such Postgres forks as
TimescaleDB, OrioleDB, GreenPlum demonstrates that Postgres still has
high potential for extension.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-06 13:40:28
Message-ID:	CA+TgmoaQRwmsFxNYF3QAtu4W9XpONpuBm8FZeY+hvMvrocqYYA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 5, 2023 at 10:52 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> I spoke with some folks at PGCon about making PostgreSQL multi-threaded,
> so that the whole server runs in a single process, with multiple
> threads. It has been discussed many times in the past, last thread on
> pgsql-hackers was back in 2017 when Konstantin made some experiments [0].
>
> I feel that there is now pretty strong consensus that it would be a good
> thing, more so than before. Lots of work to get there, and lots of
> details to be hashed out, but no objections to the idea at a high level.

I'm not sure that there's a strong consensus, but I do think it's a good idea.

> # Transition period
>
> The transition surely cannot be done fully in one release. Even if we
> could pull it off in core, extensions will need more time to adapt.
> There will be a transition period of at least one release, probably
> more, where you can choose multi-process or multi-thread model using a
> GUC. Depending on how it goes, we can document it as experimental at first.

I think the transition period should probably be effectively infinite.
There might be some distant future day when we'd remove the process
support, if things go incredibly well with threads, but I don't think
it would be any time soon. If nothing else, considering that we don't
want to force a hard compatibility break for extensions.

--
Robert Haas
EDB: http://www.enterprisedb.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-06 14:13:47
Message-ID:	CA+TgmoY=hioNYW124e0CZ6Lbo_pVyeMw_rK+9fzzH6Aa85RYgw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jun 6, 2023 at 9:40 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I'm not sure that there's a strong consensus, but I do think it's a good idea.

Let me elaborate on this a bit.

I think one of PostgreSQL's bigger problems right now is that it
doesn't scale as far as users would like. Beyond a couple of hundred
connections, everything goes to heck. Back in the day, the big
scalability problems were around locking, but we've done a pretty good
job cleaning that stuff up over the issues. Now, the problem when you
run a ton of PostgreSQL connections isn't so much that PostgreSQL
stops working as it is that the OS stops working. PostgreSQL backends
use a lot of memory, even if they're idle. Some of that is for stuff
that we could optimize but haven't, like catcache and relcache
entries, and some of it is for stuff that we can't do anything about,
like per-process page tables. But the problem isn't just RAM, either.
I've seen machines running >1000 PostgreSQL backends where kill -9
took many *minutes* to work because the OS was overwhelmed. I don't
know exactly what goes wrong inside the kernel, but clearly something
does.

Not all databases have this problem, and PostgreSQL isn't going to be
able to stop having it without some kind of major architectural
change. Changing from a process model to a threaded model might be
insufficient, because while I think that threads consume fewer OS
resources than processes, what is really needed, in all likelihood, is
the ability to have idle connections have neither a process nor a
thread associated with them until they cease being idle. That's a huge
project and I'm not volunteering to do it, but if we want to have the
same kind of scalability as some competing products, that is probably
a place to which we ultimately need to go. Getting out of the current
model where every backend has an arbitrarily large amount of state
hanging off of random global variables, not all of which are even
known to any central system, is a critical step in that journey.

Also, programming with DSA and shm_mq sucks. It's doable (proof by
example) but it's awkward and it takes a long time and the performance
isn't great. Here again, threads instead of processes is no panacea.
For as long as we support a process model - and my guess is that we're
talking about a very long time - new features are going to have to
work with those systems or else be optional. But the amount of sheer
mental energy that is required to deal with DSA means we're unlikely
to ever have a rich library of parallel primitives. Maybe we wouldn't
anyway, volunteer efforts are hard to predict, but this is certainly
not helping. I do think that there's some danger that if sharing
memory becomes as easy as calling palloc(), we'll end up with memory
leaks that could eventually take the whole system down. We need to
give some thought to how to avoid or manage that danger.

Even think about something like the main lock table. That's a fixed
size hash table, so lock exhaustion is a real possibility. If we
weren't limited to a fixed-size shared memory segment, we could let
that thing grow without a server restart. We might not want to let it
grow infinitely, but we could raise the maximum size by 100x and
allocate as required and I think we'd just be better off. Doing that
as things stand would require nailing down that amount of memory
forever whether it's ever needed or not, which doesn't seem like a
good idea. But doing something where the memory can be allocated only
if it's needed would avoid user-facing errors with relatively little
cost.

I think doing something like this is going to be a huge effort, and
frankly, there's probably no point in anybody other than a handful of
people (Heikki, Andres, a handful of others) even trying. There's too
many ways to go wrong, and this has to be done really well to be worth
doing at all. But if somebody with the requisite expertise wants to
have a go at it, I don't think we should tell them "no, we don't want
that" on principle. Let's talk about whether a specific proposal is
good or bad, and why it's good or bad, rather than falling back on an
essentially religious argument. It's not an article of faith that
PostgreSQL should not use threads: it's a technology decision. The
difficulty of reversing the decision made long ago should weigh
heavily in evaluating any proposal to do so, but the potential
benefits of such a change should be considered, too.

--
Robert Haas
EDB: http://www.enterprisedb.com

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-06 15:46:48
Message-ID:	6e3082dc-ff29-9cbf-847e-5f570828b46b@iki.fi
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 06/06/2023 09:40, Robert Haas wrote:
> On Mon, Jun 5, 2023 at 10:52 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> I spoke with some folks at PGCon about making PostgreSQL multi-threaded,
>> so that the whole server runs in a single process, with multiple
>> threads. It has been discussed many times in the past, last thread on
>> pgsql-hackers was back in 2017 when Konstantin made some experiments [0].
>>
>> I feel that there is now pretty strong consensus that it would be a good
>> thing, more so than before. Lots of work to get there, and lots of
>> details to be hashed out, but no objections to the idea at a high level.
>
> I'm not sure that there's a strong consensus, but I do think it's a good idea.

The consensus is not as strong as I hoped for... To summarize:

Tom, Andrew, Joe are worried that it will break a lot of stuff. That's a
valid point. The transition needs to be done well and not break things,
I agree with that. But if we can make the transition smooth, that's not
an objection to the idea itself.

Many comments have been along the lines of "it's hard, not worth the
effort". That's fair, but also not an objection to the idea itself, if
someone decides to spend the time on it.

Bruce was worried about the loss of isolation that the separate address
spaces gives, and Jeremy shared an anecdote on that. That is an
objection to the idea itself, i.e. even if transition was smooth,
bug-free and effortless, that point remains. I personally think the
isolation we get from separate address spaces is overrated. Yes, it
gives you some protection, but given how much shared memory there is,
the blast radius is large even with separate backend processes.

So I think there's hope. I didn't hear any _strong_ objections to the
idea itself, assuming the transition can be done smoothly.

>> # Transition period
>>
>> The transition surely cannot be done fully in one release. Even if we
>> could pull it off in core, extensions will need more time to adapt.
>> There will be a transition period of at least one release, probably
>> more, where you can choose multi-process or multi-thread model using a
>> GUC. Depending on how it goes, we can document it as experimental at first.
>
> I think the transition period should probably be effectively infinite.
> There might be some distant future day when we'd remove the process
> support, if things go incredibly well with threads, but I don't think
> it would be any time soon.

I don't think this is worth it, unless we plan to eventually remove the
multi-process mode. We could e.g. make lock table expandable in threaded
mode, and fixed-size in process mode, but the big gains would come from
being able to share things between threads and have variable-length
shared data structures more easily. As long as you need to also support
processes, you need to code to the lowest common denominator and don't
really get the benefits.

I don't know how long a transition period we need. Maybe 1 release, maybe 5.

> If nothing else, considering that we don't want to force a hard
> compatibility break for extensions.
Extensions regularly need small tweaks to adapt to new major Postgres
versions, I don't think this would be too different.

--
Heikki Linnakangas
Neon (https://neon.tech)

From:	chap(at)anastigmatix(dot)net
To:	Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-06 15:48:23
Message-ID:	6c99fe026eea03fd8aac91aac7143301@anastigmatix.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2023-06-06 08:06, Konstantin Knizhnik wrote:
> 7. It will be hard to support non-multithreaded PL languages (like
> python), but for example support of Java will be more natural and
> efficient.

To this I say ...

Hmm.

Surely, the current situation with a JVM in each backend process
(that calls for one) has been often seen as heavier than desirable.

At the same time, I am not sure how manageable one giant process
with one giant JVM instance would prove to be, either.

It is somewhat nice to be able to tweak JVM settings in a session
and see what happens, without disrupting other sessions. There may
also exist cases for different JVM settings in per-user or per-
database GUCs.

Like Python with the GIL, it is documented for JNI_CreateJavaVM
that "Creation of multiple VMs in a single process is not
supported."[1]

And the devs of Java, in their immeasurable wisdom, have announced
a "JDK Enhancement Proposal" (that's just what these things are
called, don't blame Orwell), JEP 411[2][3], in which all of the
Security Manager features that PL/Java relies on for bounds on
'trusted' behavior are deprecated for eventual removal with no
functional replacement. I'd be even more leery of using one big
shared JVM for everybody's work after that happens.

Might the work toward allowing a run-time choice between a
process or threaded model also make possible some
intermediate models as well? A backend process for
connections to a particular database, or with particular
authentication credentials? Go through the authentication
handshake and then sendfd the connected socket to the
appropriate process. (Has every supported platform got
something like sendfd?)

That way, there could be some flexibility to arrange how many
distinct backends (and, for Java purposes, how many JVMs) get
fired up, and have each sharing sessions that have something in
common.

Or, would that just require all the complexity of both
approaches to synchronization, with no sufficient benefit?

Regards,
-Chap

[1]
https://docs.oracle.com/en/java/javase/17/docs/specs/jni/invocation.html#jni_createjavavm
[2]
https://blogs.apache.org/netbeans/entry/jep-411-deprecate-the-security1
[3] https://github.com/tada/pljava/wiki/JEP-411

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	chap(at)anastigmatix(dot)net, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-06 16:24:11
Message-ID:	d8868273-9797-ea1f-4a2d-ddfe1ce43ffc@iki.fi
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 06/06/2023 11:48, chap(at)anastigmatix(dot)net wrote:
> And the devs of Java, in their immeasurable wisdom, have announced
> a "JDK Enhancement Proposal" (that's just what these things are
> called, don't blame Orwell), JEP 411[2][3], in which all of the
> Security Manager features that PL/Java relies on for bounds on
> 'trusted' behavior are deprecated for eventual removal with no
> functional replacement. I'd be even more leery of using one big
> shared JVM for everybody's work after that happens.

Ouch.

> Might the work toward allowing a run-time choice between a
> process or threaded model also make possible some
> intermediate models as well? A backend process for
> connections to a particular database, or with particular
> authentication credentials? Go through the authentication
> handshake and then sendfd the connected socket to the
> appropriate process. (Has every supported platform got
> something like sendfd?)

I'm afraid having multiple processes and JVMs doesn't help that. If you
can escape the one JVM in one backend process, it's game over. Backend
processes are not a security barrier, and you have the same problems
with the current multi-process architecture, too.

https://github.com/greenplum-db/plcontainer is one approach. It launches
a separate process for the PL, separate from the backend process, and
sandboxes that.

--
Heikki Linnakangas
Neon (https://neon.tech)

From:	chap(at)anastigmatix(dot)net
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-06 17:00:11
Message-ID:	ff642e48c9b52427034d40d929457888@anastigmatix.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2023-06-06 12:24, Heikki Linnakangas wrote:
> I'm afraid having multiple processes and JVMs doesn't help that.
> If you can escape the one JVM in one backend process, it's game over.

So there's escape and there's escape, right? Java still prioritizes
(and has, in fact, strengthened) barriers against breaking module
encapsulation, or getting access to arbitrary native memory or code.

The features that have been deprecated, to eventually go away, are
the ones that offer fine-grained control over operations that there
are Java APIs for. Eventually it won't be as easy as it is now to say
"ok, your function gets to open these files or these sockets but
not those ones."

Even for those things, there may yet be solutions. There are Java
APIs for virtualizing the view of the file system, for example. It's
yet to be seen how things will shake out. Configuration may get
trickier, and there may be some incentive to to include, say,
sepgsql in the picture.

Sure, even access to a file API can be game over, depending on
what file you open, but that's already the risk for every PL
with an 'untrusted' flavor.

Regards,
-Chap

From:	Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-06 17:04:08
Message-ID:	4924300c-b919-afb5-7d1a-a411eec0315e@garret.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 06.06.2023 5:13 PM, Robert Haas wrote:
> On Tue, Jun 6, 2023 at 9:40 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I'm not sure that there's a strong consensus, but I do think it's a good idea.
> Let me elaborate on this a bit.
>
>
>
> Not all databases have this problem, and PostgreSQL isn't going to be
> able to stop having it without some kind of major architectural
> change. Changing from a process model to a threaded model might be
> insufficient, because while I think that threads consume fewer OS
> resources than processes, what is really needed, in all likelihood, is
> the ability to have idle connections have neither a process nor a
> thread associated with them until they cease being idle. That's a huge
> project and I'm not volunteering to do it, but if we want to have the
> same kind of scalability as some competing products, that is probably
> a place to which we ultimately need to go. Getting out of the current
> model where every backend has an arbitrarily large amount of state
> hanging off of random global variables, not all of which are even
> known to any central system, is a critical step in that journey.

It looks like built-in connection pooler, doesn't it?
Actually built-in connection pooler has a lot o common things with
multithreaded Postgres.
It also needs to keep session context.
Te main difference is that there is no need to place here all Postgres
global/static variables, because lefitime of most of them is shorter
than transaction. So it is really enough to place all such variables in
single struct.
This is how built-in connection pooler was implemented in PgPro.

Reading all concerns against multithreading Postgres makes me think
that it may erasonable to combine two approaches:
still have processes (backends) but be able to spawn multiple threads
inside process (for example for parallel query execution).
It can be considered that such approach can only increase complexity of
implementation and combine drawbacks of both approaches.
But actually such approach allows:
1. Support old (external, non-reentrant) extensions - them will be
executed by dedicated backends.
2. Simplify parallel query execution and make it more efficient.
3. Allows to most efficiently use multitreaded PL-s (like JVM based). As
far as there will be no single VM for all connections, but only for some
group of them(for example belonging to one user), then most complaints
concerning sharing VM between different connections can be avoided
4. Avoid or minimize problems with OOM and memory fragmentation.
5. Can be combine with connection pooler (save inactive connection state
without having process or thread for it)

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-06 17:59:59
Message-ID:	CA+TgmoaWMgRf4xv8N7_1XqZyT-KWeD=n0mN2fm4i_Mwmg6oepg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jun 6, 2023 at 11:46 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> Bruce was worried about the loss of isolation that the separate address
> spaces gives, and Jeremy shared an anecdote on that. That is an
> objection to the idea itself, i.e. even if transition was smooth,
> bug-free and effortless, that point remains. I personally think the
> isolation we get from separate address spaces is overrated. Yes, it
> gives you some protection, but given how much shared memory there is,
> the blast radius is large even with separate backend processes.

An interesting idea might be to look at the places where we ereport or
elog FATAL due to some kind of backend data structure corruption and
ask whether there would be an argument for elevating the level to
PANIC if we changed this. There are definitely some places where we
argue that the only corrupted state is backend-local and thus we don't
need to PANIC if it's corrupted. I wonder to what extent this change
would undermine that argument.

Even if it does, I think it's worth it. Corrupted backend-local data
structures aren't that common, thankfully.

> I don't think this is worth it, unless we plan to eventually remove the
> multi-process mode. We could e.g. make lock table expandable in threaded
> mode, and fixed-size in process mode, but the big gains would come from
> being able to share things between threads and have variable-length
> shared data structures more easily. As long as you need to also support
> processes, you need to code to the lowest common denominator and don't
> really get the benefits.
>
> I don't know how long a transition period we need. Maybe 1 release, maybe 5.

I think 1 release is wildly optimistic. Even if someone wrote a patch
for this and got it committed this release cycle, it's likely that
there would be follow-up commits needed over a period of several years
before it really worked as well as we'd like. Only after that could we
consider deprecating the per-process way. But I don't think that's
necessarily a huge problem. I originally intended DSM as an optional
feature: if you didn't have it, then you couldn't use features that
depended on it, but the rest of the system still worked. Eventually,
other people liked it enough that we decided to introduce hard
dependencies on it. I think that's a good model for a change like
this. When the inventor of a new system thinks that we should have a
hard dependency on it, MEH. When there's a groundswell of other,
unaffiliated hackers making that argument, COOL.

I'm also not quite convinced that there's no long-term use case for
multi-process mode. Maybe you're right and there isn't, but that
amounts to arguing that every extension in the world will be happy to
run in a multi-threaded world rather than not. I don't know if I quite
believe that. It also amounts to arguing that performance is going to
be better for everyone in this new multi-threaded mode, and that it
won't cause unforeseen problems for any significant numbers of users,
and maybe those things are true, but I think we need to get this new
system in place and get some real-world experience before we can judge
these kinds of things. I agree that, in theory, it would be nice to
get to a place where the multi-process mode is a dinosaur and that we
can just rip it out ... but I don't share your confidence that we can
get there in any short time period.

--
Robert Haas
EDB: http://www.enterprisedb.com

From:	Kirk Wolak <wolakk(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-06 18:50:38
Message-ID:	CACLU5mRkBsVXwXhu0fTsWWAH=G7s+MbfHFx0T0kPFy3kgefnrw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jun 6, 2023 at 2:00 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

>
> I'm also not quite convinced that there's no long-term use case for
> multi-process mode. Maybe you're right and there isn't, but that
> amounts to arguing that every extension in the world will be happy to
> run in a multi-threaded world rather than not. I don't know if I quite
> believe that. It also amounts to arguing that performance is going to
> be better for everyone in this new multi-threaded mode, and that it
> won't cause unforeseen problems for any significant numbers of users,
> and maybe those things are true, but I think we need to get this new
> system in place and get some real-world experience before we can judge
> these kinds of things. I agree that, in theory, it would be nice to
> get to a place where the multi-process mode is a dinosaur and that we
> can just rip it out ... but I don't share your confidence that we can
> get there in any short time period.
>

First, I am enjoying the activity of this thread. But my first question is
"to what end"?
Do I consider threads better? (yes... and no)

I do wonder if we could add better threading within any given
session/process to get a hybrid?
[maybe this gets us closer to solving some of the problems incrementally?]

If I could have anything (today)... I would prefer a Master-Master
Implementation leveraging some
of the ultra-fast server-server communication protocols to help sync
things. Then I wouldn't care.
I could avoid the O/S Overwhelm caused by excessive processes, via
spinning up machines.
[Unfortunately I know that PG leverages the filesystem cache, etc to such a
degree that communicating
from one master to another would require a really special architecture
there. And the N! communication lines].

Kirk...

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Kirk Wolak <wolakk(at)gmail(dot)com>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-06 18:55:55
Message-ID:	CA+TgmoassRX3R_8=_ocVm=P1cpevpWOThB2egomZ4MbFK31aeg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jun 6, 2023 at 2:51 PM Kirk Wolak <wolakk(at)gmail(dot)com> wrote:
> I do wonder if we could add better threading within any given session/process to get a hybrid?
> [maybe this gets us closer to solving some of the problems incrementally?]

I don't think it helps much -- if anything, I think that would be more
complicated.

> If I could have anything (today)... I would prefer a Master-Master Implementation leveraging some
> of the ultra-fast server-server communication protocols to help sync things. Then I wouldn't care.
> I could avoid the O/S Overwhelm caused by excessive processes, via spinning up machines.
> [Unfortunately I know that PG leverages the filesystem cache, etc to such a degree that communicating
> from one master to another would require a really special architecture there. And the N! communication lines].

I think there's plenty of interesting things to improve in this area,
but they're different things than what this thread is about.

--
Robert Haas
EDB: http://www.enterprisedb.com

From:	Greg Stark <stark(at)mit(dot)edu>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-06 20:14:41
Message-ID:	CAM-w4HPne2ab_ppKO6xSY+gyrczMu7CnFzggP+4mXqD1ctjh-A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 5 Jun 2023 at 10:52, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> I spoke with some folks at PGCon about making PostgreSQL multi-threaded,
> so that the whole server runs in a single process, with multiple
> threads. It has been discussed many times in the past, last thread on
> pgsql-hackers was back in 2017 when Konstantin made some experiments [0].
>
> I feel that there is now pretty strong consensus that it would be a good
> thing, more so than before. Lots of work to get there, and lots of
> details to be hashed out, but no objections to the idea at a high level.
>
> The purpose of this email is to make that silent consensus explicit. If
> you have objections to switching from the current multi-process
> architecture to a single-process, multi-threaded architecture, please
> speak up.

I suppose I should reiterate my comments that I gave at the time. I'm
not sure they qualify as "objections" but they're some kind of general
concern.

I think of processes and threads as fundamentally the same things,
just a slightly different API -- namely that in one memory is by
default unshared and needs to be explicitly shared and in the other
it's default shared and needs to be explicitly unshared. There are
obvious practical API differences too like how signals are handled but
those are just implementation details.

So the question is whether defaulting to shared memory or defaulting
to unshared memory is better -- and whether the implementation details
are significant enough to override that.

And my general concern was that in my experience default shared memory
leads to hugely complex and chaotic shared data structures with often
very loose rules for ownership of shared data and who is responsible
for making updates, handling errors, or releasing resources.

So all else equal I feel like having a good infrastructure for
explicitly allocating shared memory segments and managing them is
superior.

However all else is not equal. The discussion in the hallway turned to
whether we could just use pthread primitives like mutexes and
condition variables instead of our own locks -- and the point was
raised that those libraries assume these objects will be in threads of
one process not shared across completely different processes.

And that's probably not the only library we're stuck reimplementing
because of this. So the question is are these things worth taking the
risk of having data structures shared implicitly and having unclear
ownership rules?

I was going to say supporting both modes relieves that fear since it
would force that extra discipline and allow testing under the more
restrictive rule. However I don't think that will actually work. As
long as we support both modes we lose all the advantages of threads.
We still wouldn't be able to use pthreads and would still need to
provide and maintain our homegrown replacement infrastructure.

--
greg

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-06 22:26:07
Message-ID:	CA+hUKG+kfGGT5DTSFse1+v+bcuUtTD4szNnvruTDscDXaF39Pw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jun 6, 2023 at 6:52 AM Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> If we were starting out today we would probably choose a threaded implementation. But moving to threaded now seems to me like a multi-year-multi-person project with the prospect of years to come chasing bugs and the prospect of fairly modest advantages. The risk to reward doesn't look great.
>
> That's my initial reaction. I could be convinced otherwise.

Here is one thing I often think about when contemplating threads.
Take a look at dsa.c. It calls itself a shared memory allocator, but
really it has two jobs, the second being to provide software emulation
of virtual memory. That’s behind dshash.c and now the stats system,
and various parts of the parallel executor code. It’s slow and
complicated, and far from the state of the art. I wrote that code
(building on allocator code from Robert) with the expectation that it
was a transitional solution to unblock a bunch of projects. I always
expected that we'd eventually be deleting it. When I explain that
subsystem to people who are not steeped in the lore of PostgreSQL, it
sounds completely absurd. I mean, ... it is, right? My point is
that we’re doing pretty unreasonable and inefficient contortions to
develop new features -- we're not just happily chugging along without
threads at no cost.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-07 02:02:02
Message-ID:	268998.1686103322@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> ... My point is
> that we’re doing pretty unreasonable and inefficient contortions to
> develop new features -- we're not just happily chugging along without
> threads at no cost.

Sure, but it's not like chugging along *with* threads would be no-cost.
Others have already pointed out the permanent downsides of that, such
as loss of isolation between sessions leading to debugging headaches
(and, I predict, more than one security-grade bug).

I agree that if we were building this system from scratch today,
we'd probably choose thread-per-session not process-per-session.
But the costs of getting to that from where we are will be enormous.
I seriously doubt that the net benefits could justify that work,
no matter how long you want to look forward. It's not really
significantly different from "let's rewrite the server in
C++/Rust/$latest_hotness".

regards, tom lane

From:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-07 04:05:39
Message-ID:	CAFiTN-tF+K0QwOS84Dbz4K2sr1bBo=-pAeTta9wg0sVBwDRawQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jun 6, 2023 at 11:30 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Tue, Jun 6, 2023 at 11:46 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> > Bruce was worried about the loss of isolation that the separate address
> > spaces gives, and Jeremy shared an anecdote on that. That is an
> > objection to the idea itself, i.e. even if transition was smooth,
> > bug-free and effortless, that point remains. I personally think the
> > isolation we get from separate address spaces is overrated. Yes, it
> > gives you some protection, but given how much shared memory there is,
> > the blast radius is large even with separate backend processes.
>
> An interesting idea might be to look at the places where we ereport or
> elog FATAL due to some kind of backend data structure corruption and
> ask whether there would be an argument for elevating the level to
> PANIC if we changed this. There are definitely some places where we
> argue that the only corrupted state is backend-local and thus we don't
> need to PANIC if it's corrupted. I wonder to what extent this change
> would undermine that argument.

With the threaded model, that shouldn't change, right? Even though all
memory space is now shared across threads, we can maintain the same
rules for modifying critical shared data structures, i.e. modifying
such memory should still fall under the CRITICAL SECTION, so I guess
the rules for promoting error level to PANIC will remain the same.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

From:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-07 04:12:48
Message-ID:	CAFiTN-tpOGd+hLrZ4V8YVx+-0EfVnYMphUuhtscGjddFGB+9bw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 7, 2023 at 7:32 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> > ... My point is
> > that we’re doing pretty unreasonable and inefficient contortions to
> > develop new features -- we're not just happily chugging along without
> > threads at no cost.
>
> Sure, but it's not like chugging along *with* threads would be no-cost.
> Others have already pointed out the permanent downsides of that, such
> as loss of isolation between sessions leading to debugging headaches
> (and, I predict, more than one security-grade bug).

I agree in some cases debugging would be hard, but I feel there are
cases where the thread model will make the debugging experience better
e.g breaking at the entry point of the new parallel worker or other
worker is hard with the process model but that would be very smooth
with the thread model as per my experience.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

From:	Joe Conway <mail(at)joeconway(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-07 12:46:51
Message-ID:	6a6d5677-d26f-5222-ce11-760e2534e909@joeconway.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 6/6/23 22:02, Tom Lane wrote:
> (and, I predict, more than one security-grade bug).

*That* is what worries me the most

> I agree that if we were building this system from scratch today,
> we'd probably choose thread-per-session not process-per-session.
> But the costs of getting to that from where we are will be enormous.
> I seriously doubt that the net benefits could justify that work,
> no matter how long you want to look forward. It's not really
> significantly different from "let's rewrite the server in
> C++/Rust/$latest_hotness".

Agreed.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-07 12:53:24
Message-ID:	CA+TgmoZKrgkd+jEbRRpOYoG14Ue9GLWTH2kKH_Yhac3s6Ofemg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jun 6, 2023 at 10:02 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I agree that if we were building this system from scratch today,
> we'd probably choose thread-per-session not process-per-session.
> But the costs of getting to that from where we are will be enormous.
> I seriously doubt that the net benefits could justify that work,
> no matter how long you want to look forward. It's not really
> significantly different from "let's rewrite the server in
> C++/Rust/$latest_hotness".

Well, I don't know, I think that's a bunch of things that are not all
the same. Rewriting the server in a whole different programming
language would be a massive effort. I can't really see anyone
volunteering to rewrite a million lines of C (or whatever we've got)
in Rust, and I'm not sure who would use the result if they did, or
why. We could, perhaps, allow new source files to be written in Rust
while keeping old ones written in C, but then every hacker has to know
two languages, and having code written in both languages manipulating
the same data structures would probably be a recipe for confusion and
bugs. It's hard to believe that the upsides would be worth the pain.
Maybe transition to C++ would be easier, or maybe it wouldn't, I'm not
sure. But from my point of the view, the issue here is simply that
stop-the-world-and-change-everything is not a viable way forward for a
project the size of PostgreSQL, but incremental changes are
potentially acceptable if the benefits outweigh the drawbacks.

So what are the costs, exactly, of transition to a threaded model? It
seems to me that there's basically one problem: global variables.
Sure, there's a bunch of stuff around process management that would
likely have to be revised in some way, but that's not that much code
and wouldn't have that much impact on unrelated development. However,
the project's widespread and often gratuitous use of global variables
would have to be addressed in some way, and I think that will pretty
much inevitably involve touching all of those global variable
declarations in some way. Now, if we can get away with simply marking
all of those thread-local, then it's of the same general flavor as
PGDLLIMPORT. I am aware that you think that PGDLLIMPORT markings are
ugly as sin, and these would be more widespread since they'd have to
be applied to literally every global variable, including file-local
ones. However, it's hard to imagine that adding such markings would
cause PostgreSQL development to grind to a halt. It would cause minor
rebasing pain and that's about it. I hope that we'd have some tool
that would make the build fail if any markings are missing and
everybody would be annoyed until they finished rebasing all of their
WIP patches and then that would just be how things are. It's not
*lovely* but it doesn't sound that bad either.

In my mind, the bigger question is how much further than that do you
have to go? I think I remember a previous conversation with Andres
where he opined that thread-local variables are "really expensive"
(and I apologize in advance if I'm mis-remembering this). Now, Andres
is not a man who accepts a tax on performance of any size without a
fight, so his "really expensive" might turn out to resemble my "pretty
cheap." However, if widespread use of TLS is too expensive and we have
to start rewriting code to not depend on global variables, that's
going to be more of a problem. If we can get by with doing such
rewrites only in performance-critical places, it might not still be
too bad. Personally, I think the degree of dependence that PostgreSQL
has on global variables is pretty excessive and I don't think that a
certain amount of refactoring to reduce it would be a bad thing. If it
turns into an infinite series of hastily-written patches to rejigger
every source file we have, though, then I'm not really on board with
that.

Heikki mentions the idea of having a central Session object and just
passing that around. I have a hard time believing that's going to work
out nicely. First, it's not extensible. Right now, if you need a bit
of additional session-local state, you just declare a variable and
you're all set. That's not a perfect system and does cause some
problems, but we can't go from there to a system where it's impossible
to add session-local state without hacking core. Second, we will be
sad if session.h ends up #including every other header file that
defines a data structure anywhere in the backend. Or at least I'll be
sad. I'm not actually against the idea of having some kind of session
object that we pass around, but I think it either needs to be limited
to a relatively small set of well-defined things, or else it needs to
be design in some kind of extensible way that doesn't require it to
know the full details of every sort of object that's being used as
session-local state anywhere in the system. I haven't really seen any
convincing design ideas around this yet.

But I think jumping to the conclusion that the migration path here is
akin to rewriting the whole code base in Rust is jumping too far. I do
see some problems here that I don't know how to solve, but that's
nowhere near in the same category as find . -name '*.c' -exec rm {} \;

--
Robert Haas
EDB: http://www.enterprisedb.com

From:	Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-07 13:08:38
Message-ID:	CAExHW5uPNB57_3FM-vpwYMFC5rZLJ6Ni6Kk3UpC6sODT+qvhAQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 5, 2023 at 8:22 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> I spoke with some folks at PGCon about making PostgreSQL multi-threaded,
> so that the whole server runs in a single process, with multiple
> threads. It has been discussed many times in the past, last thread on
> pgsql-hackers was back in 2017 when Konstantin made some experiments [0].
>
> I feel that there is now pretty strong consensus that it would be a good
> thing, more so than before. Lots of work to get there, and lots of
> details to be hashed out, but no objections to the idea at a high level.
>
> The purpose of this email is to make that silent consensus explicit. If
> you have objections to switching from the current multi-process
> architecture to a single-process, multi-threaded architecture, please
> speak up.
>
> If there are no major objections, I'm going to update the developer FAQ,
> removing the excuses there for why we don't use threads [1]. And we can
> start to talk about the path to get there. Below is a list of some
> hurdles and proposed high-level solutions. This isn't an exhaustive
> list, just some of the most obvious problems:
>
> # Transition period
>
> The transition surely cannot be done fully in one release. Even if we
> could pull it off in core, extensions will need more time to adapt.
> There will be a transition period of at least one release, probably
> more, where you can choose multi-process or multi-thread model using a
> GUC. Depending on how it goes, we can document it as experimental at first.
>
> # Thread per connection
>
> To get started, it's most straightforward to have one thread per
> connection, just replacing backend process with a backend thread. In the
> future, we might want to have a thread pool with some kind of a
> scheduler to assign active queries to worker threads. Or multiple
> threads per connection, or spawn additional helper threads for specific
> tasks. But that's future work.

With multiple processes, we can use all the available cores (at least
theoretically if all those processes are independent). But is that
guaranteed with single process multi-thread model? Google didn't throw
any definitive answer to that. Usually it depends upon the OS and
architecture.

Maybe a good start is to start using threads instead of parallel
workers e.g. for parallel vacuum, parallel query and so on while
leaving the processes for connections and leaders. that itself might
take significant time. Based on that experience move to a completely
threaded model. Based on my experience with other similar products, I
think we will settle on a multi-process multi-thread model.

--
Best Wishes,
Ashutosh Bapat

From:	Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-07 16:05:54
Message-ID:	8c105e05-2e05-68fa-cc6e-1c9b92e60d64@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

07.06.2023 15:53, Robert Haas wrote:
> Right now, if you need a bit
> of additional session-local state, you just declare a variable and
> you're all set. That's not a perfect system and does cause some
> problems, but we can't go from there to a system where it's impossible
> to add session-local state without hacking core.

> or else it needs to
> be design in some kind of extensible way that doesn't require it to
> know the full details of every sort of object that's being used as
> session-local state anywhere in the system.
And it is quite possible. Although with indirection involved.

For example, we want to add session variable "my_hello_var".
We first need to declare "offset variable".
Then register it in a session.
And then use function and/or macros to get actual address:

/* session.h */
extern size_t RegisterSessionVar(size_t size);
extern void* CurSessionVar(size_t offset);

/* session.c */
typedef struct Session {
char *vars;
} Session;

static _Thread_local Session* curSession;
static size_t sessionVarsSize = 0;
size_t
RegisterSessionVar(size_t size)
{
size_t off = sessionVarsSize;
sessionVarsSize += size;
return off;
}

void*
CurSession(size_t offset)
{
return curSession->vars + offset;
}

/* module_internal.h */
typedef int my_hello_var_t;
extern size_t my_hello_var_offset;

/* access macros */
#define my_hello_var (*(my_hello_var_t*)(CurSessionVar(my_hello_var_offset)))

/* module.c */
size_t my_hello_var_offset = 0;

void
PG_init() {
RegisterSessionVar(sizeof(my_hello_var_t), &my_hello_var_offset);
}

For security reasons, offset could be mangled.

------

regards,
Yura Sokolov

From:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-07 19:20:15
Message-ID:	29fe5f48-a6ed-d896-45ed-16b5904353a9@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 6/5/23 17:33, Heikki Linnakangas wrote:
> On 05/06/2023 11:18, Tom Lane wrote:
>> Heikki Linnakangas <hlinnaka(at)iki(dot)fi> writes:
>>> I spoke with some folks at PGCon about making PostgreSQL multi-threaded,
>>> so that the whole server runs in a single process, with multiple
>>> threads. It has been discussed many times in the past, last thread on
>>> pgsql-hackers was back in 2017 when Konstantin made some experiments
>>> [0].
>>
>>> I feel that there is now pretty strong consensus that it would be a good
>>> thing, more so than before. Lots of work to get there, and lots of
>>> details to be hashed out, but no objections to the idea at a high level.
>>
>>> The purpose of this email is to make that silent consensus explicit. If
>>> you have objections to switching from the current multi-process
>>> architecture to a single-process, multi-threaded architecture, please
>>> speak up.
>>
>> For the record, I think this will be a disaster. There is far too much
>> code that will get broken, largely silently, and much of it is not
>> under our control.
>
> Noted. Other large projects have gone through this transition. It's not
> easy, but it's a lot easier now than it was 10 years ago. The platform
> and compiler support is there now, all libraries have thread-safe
> interfaces, etc.
>

Is the platform support really there for all platforms we want/intend to
support? I have no problem believing that for modern Linux/BSD systems,
but what about the older stuff we currently support.

Also, which other projects did this transition? Is there something we
could learn from them? Were they restricted to much smaller list of
platforms?

> I don't expect you or others to buy into any particular code change at
> this point, or to contribute time into it. Just to accept that it's a
> worthwhile goal. If the implementation turns out to be a disaster, then
> it won't be accepted, of course. But I'm optimistic.
>

I personally am not opposed to the effort in principle, but how do you
even evaluate cost and benefits for a transition like this? I have no
idea how to quantify the costs/benefits for this as a single change.

I've seen some benchmarks in the past, but it's hard to say which of
these improvements are possible only with threads, and what would be
doable with less invasive changes with the process model.

IMHO the only way to move this forward is to divide this into smaller
changes, each of which gives us some benefit we'd want anyway. For
example, this thread already mentioned improving handling of many
connections. AFAICS that requires isolating "session state", which seems
useful even without a full switch to threads as it makes connection
pooling simpler. It should be easier to get a buy-in for these changes,
while introducing abstractions simplifying the switch to threads.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-07 19:59:16
Message-ID:	CA+hUKGKeJNen_vy=XM8_FKWVzn1p5E+OJ+5Ezc1J-G8dKpkbpg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 8, 2023 at 7:20 AM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> Is the platform support really there for all platforms we want/intend to
> support? I have no problem believing that for modern Linux/BSD systems,
> but what about the older stuff we currently support.

There is a conversation to be had about whether/when/how to adopt
C11/C17 threads (= same API on Windows and Unix, but sadly two
straggler systems don't have required OS support yet (macOS,
OpenBSD)), but POSIX + NT threads were all worked out in the 90s. We
have last-mover advantage here.

> Also, which other projects did this transition? Is there something we
> could learn from them? Were they restricted to much smaller list of
> platforms?

Apache may be interesting. Wide ecosystem of extensions.

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-07 21:30:17
Message-ID:	20230607213017.zotqumqatch4zpbq@awork3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2023-06-05 17:51:57 +0300, Heikki Linnakangas wrote:
> If there are no major objections, I'm going to update the developer FAQ,
> removing the excuses there for why we don't use threads [1].

I think we should do this even if there's no concensus to slowly change to
threads. There's clearly no concensus on the opposite either.

> # Transition period
>
> The transition surely cannot be done fully in one release. Even if we could
> pull it off in core, extensions will need more time to adapt. There will be
> a transition period of at least one release, probably more, where you can
> choose multi-process or multi-thread model using a GUC. Depending on how it
> goes, we can document it as experimental at first.

One interesting bit around the transition is what tooling we ought to provide
to detect problems. It could e.g. be reasonably feasible to write something
checking how many read-write global variables an extension has on linux
systems.

> # Extensions
>
> A lot of extensions also contain global variables or other things that break
> in a multi-threaded environment. We need a way to label extensions that
> support multi-threading. And in the future, also extensions that *require* a
> multi-threaded server.
>
> Let's add flags to the control file to mark if the extension is thread-safe
> and/or process-safe. If you try to load an extension that's not compatible
> with the server's mode, throw an error.

I don't think the control file is the right place - that seems more like
something that should be signalled via PG_MODULE_MAGIC. We need to check this
not just during CREATE EXTENSION, but also during loading of libraries - think
of shared_preload_libraries.

> # Restart on crash
>
> If a backend process crashes, postmaster terminates all other backends and
> restarts the system. That's hard (impossible?) to do safely if everything
> runs in one process. We can continue have a separate postmaster process that
> just monitors the main process and restarts it on crash.

Yea, we definitely need the supervisor function in a separate
process. Presumably that means we need to split off some of the postmaster
responsibilities - e.g. I don't think it'd make sense to handle connection
establishment in the supervisor process. I wonder if this is something that
could end up being beneficial even in the process world.

A related issue is that we won't get SIGCHLD in the supervisor process
anymore. So we'd need to come up with some design for that.

Greetings,

Andres Freund

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	"Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-07 21:37:21
Message-ID:	20230607213721.al3etgcgtija3ytz@awork3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2023-06-05 13:40:13 -0400, Jonathan S. Katz wrote:
> 2. While I wouldn't want to necessarily discourage a moonshot effort, I
> would ask if developer time could be better spent on tackling some of the
> other problems around vertical scalability? Per some PGCon discussions,
> there's still room for improvement in how PostgreSQL can best utilize
> resources available very large "commodity" machines (a 448-core / 24TB RAM
> instance comes to mind).

I think we're starting to hit quite a few limits related to the process model,
particularly on bigger machines. The overhead of cross-process context
switches is inherently higher than switching between threads in the same
process - and my suspicion is that that overhead will continue to
increase. Once you have a significant number of connections we end up spending
a *lot* of time in TLB misses, and that's inherent to the process model,
because you can't share the TLB across processes.

The amount of duplicated code we have to deal with due to to the process model
is quite substantial. We have local memory, statically allocated shared memory
and dynamically allocated shared memory variants for some things. And that's
just going to continue.

> I'm purposely giving a nonanswer on whether it's a worthwhile goal, but
> rather I'd be curious where it could stack up against some other efforts to
> continue to help PostgreSQL improve performance and handle very large
> workloads.

There's plenty of things we can do before, but in the end I think tackling the
issues you mention and moving to threads are quite tightly linked.

Greetings,

Andres Freund

From:	Peter Eisentraut <peter(at)eisentraut(dot)org>
To:	Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-07 21:39:01
Message-ID:	cb999631-790f-72b2-ada3-b1945be9763a@eisentraut.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 07.06.23 23:30, Andres Freund wrote:
> Yea, we definitely need the supervisor function in a separate
> process. Presumably that means we need to split off some of the postmaster
> responsibilities - e.g. I don't think it'd make sense to handle connection
> establishment in the supervisor process. I wonder if this is something that
> could end up being beneficial even in the process world.

Something to think about perhaps ... how would that be different from
using an existing external supervisor process like systemd or supervisord.

From:	Thomas Kellerer <shammat(at)gmx(dot)net>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-07 21:39:54
Message-ID:	41c1e20d-f179-f87e-5929-80ca9ee0c105@gmx.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tomas Vondra schrieb am 07.06.2023 um 21:20:
> Also, which other projects did this transition? Is there something we
> could learn from them? Were they restricted to much smaller list of
> platforms?

Firebird did this a while ago if I'm not mistaken.

Not open source, but Oracle was historically multi-threaded on Windows and multi-process on all other platforms.
I _think_ starting with 19c you can optionally run it multi-threaded on Linux as well.

But I doubt, they are willing to share any insights ;)

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Peter Geoghegan <pg(at)bowt(dot)ie>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-07 21:45:02
Message-ID:	20230607214502.cm5vhj3ipntdoskf@awork3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2023-06-05 20:15:56 -0400, Bruce Momjian wrote:
> Yes, sorry, critical sections is what I was remembering. My question is
> whether all unexpected backend exits should be treated as critical
> sections?

Yes.

People have argued that the process model is more robust. But it turns out
that we have to crash-restart for just about any "bad failure" anyway. It used
to be (a long time ago) that we didn't, but that was just broken.

There are some advantages in debuggability, because it's a *tad* harder for a
bug in one process to cause another to crash, if less state is shared. But
that's by far outweighed by most debugging / validation tools not
understanding the multi-processes-with-shared-shmem model.

Greetings,

Andres Freund

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Peter Eisentraut <peter(at)eisentraut(dot)org>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-07 21:48:22
Message-ID:	20230607214822.ga6in2f7762sxnqs@awork3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2023-06-07 23:39:01 +0200, Peter Eisentraut wrote:
> On 07.06.23 23:30, Andres Freund wrote:
> > Yea, we definitely need the supervisor function in a separate
> > process. Presumably that means we need to split off some of the postmaster
> > responsibilities - e.g. I don't think it'd make sense to handle connection
> > establishment in the supervisor process. I wonder if this is something that
> > could end up being beneficial even in the process world.
>
> Something to think about perhaps ... how would that be different from using
> an existing external supervisor process like systemd or supervisord.

I think that's not really comparable. A postgres internal solution can
maintain resources like shared memory allocations, listening sockets, etc
across crash restarts. With something like systemd that's much harder to make
work well. And then there's the fact that you now need to deal with much more
drastic cross-platform behavioural differences.

Greetings,

Andres Freund

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-07 21:58:09
Message-ID:	20230607215809.igzd2mkgmaarawb3@awork3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2023-06-07 08:53:24 -0400, Robert Haas wrote:
> In my mind, the bigger question is how much further than that do you
> have to go? I think I remember a previous conversation with Andres
> where he opined that thread-local variables are "really expensive"
> (and I apologize in advance if I'm mis-remembering this).

It really is architecture and OS dependent. I think time has reduced the cost
somewhat, due to older architectures / OSs aging out. But yea, it's not free.

I suspect that we'd gain *far* more from the higher TLB hit rate, than we'd
loose due to using many thread local variables. Even with a stupid
search-and-replace approach.

But we'd gain more if we reduced the number of thread local variables...

> Now, Andres is not a man who accepts a tax on performance of any size
> without a fight, so his "really expensive" might turn out to resemble my
> "pretty cheap." However, if widespread use of TLS is too expensive and we
> have to start rewriting code to not depend on global variables, that's going
> to be more of a problem. If we can get by with doing such rewrites only in
> performance-critical places, it might not still be too bad. Personally, I
> think the degree of dependence that PostgreSQL has on global variables is
> pretty excessive and I don't think that a certain amount of refactoring to
> reduce it would be a bad thing. If it turns into an infinite series of
> hastily-written patches to rejigger every source file we have, though, then
> I'm not really on board with that.

I think a lot of such rewrites would be a good idea, even if we right now all
agree to swear we'll never go to threads. Not having any sort of grouping of
global variables makes it IMO considerably harder to debug. I can easily ask
somebody to print out a variable pointing to a struct describing the state of
a subsystem. I can't really do that for 50 variables.

And once you do that, I think you reduce the TLS cost substantially. The
variable pointing to the struct is already likely in a register. Whereas each
individual variable being in TLS makes the job harder for the compiler.

Greetings,

Andres Freund

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Greg Stark <stark(at)mit(dot)edu>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-07 22:09:19
Message-ID:	20230607220919.pilzcbqnp2rwfbl4@awork3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2023-06-06 16:14:41 -0400, Greg Stark wrote:
> I think of processes and threads as fundamentally the same things,
> just a slightly different API -- namely that in one memory is by
> default unshared and needs to be explicitly shared and in the other
> it's default shared and needs to be explicitly unshared.

In theory that's true, in practice it's entirely wrong.

For one, the amount of complexity you need to deal with to share state across
processes, post fork, is *substantial*. You can share file descriptors across
processes, but it's extremely platform dependant, requires cooperation between
both processes etc. You can share memory allocations made after the processes
forked, but you're typically not going to be able to guarantee they're at the
same pointer values. Etc.

But more importantly, there's crucial performance differences between threads
and processes. Having the same memory mapping between threads makes allows the
hardware to share the TLB (on x86 via process context identifiers), which
isn't realistically possible with different processes.

> However all else is not equal. The discussion in the hallway turned to
> whether we could just use pthread primitives like mutexes and
> condition variables instead of our own locks -- and the point was
> raised that those libraries assume these objects will be in threads of
> one process not shared across completely different processes.

Independent of threads vs processes, I am -many on using pthread mutexes and
condition variables. From experiments, that *looses* performance, and we loose
a lot of control and increase cross-platform behavioural differences. I also
don't see any benefit in going in that direction.

> And that's probably not the only library we're stuck reimplementing
> because of this. So the question is are these things worth taking the
> risk of having data structures shared implicitly and having unclear
> ownership rules?
>
> I was going to say supporting both modes relieves that fear since it
> would force that extra discipline and allow testing under the more
> restrictive rule. However I don't think that will actually work. As
> long as we support both modes we lose all the advantages of threads.

I don't think that has to be true. We could e.g. eventually decide that we
don't support parallel query without threading support - which would allow us
to get rid of a very significant amount of code and runtime overhead.

Greetings,

Andres Freund

From:	Jeremy Schneider <schneider(at)ardentperf(dot)com>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc:	Thomas Kellerer <shammat(at)gmx(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-07 22:37:31
Message-ID:	d9656e32-f1ea-5401-d07b-dab34569236a@ardentperf.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 6/7/23 2:39 PM, Thomas Kellerer wrote:
> Tomas Vondra schrieb am 07.06.2023 um 21:20:
>> Also, which other projects did this transition? Is there something we
>> could learn from them? Were they restricted to much smaller list of
>> platforms?
>
> Not open source, but Oracle was historically multi-threaded on Windows
> and multi-process on all other platforms.
> I _think_ starting with 19c you can optionally run it multi-threaded on
> Linux as well.
Looks like it actually became publicly available in 12c. AFAICT Oracle
supports both modes today, with a config parameter to switch between them.

This is a very interesting case study.

Concepts Manual:

https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/process-architecture.html#GUID-4B460E97-18A0-4F5A-A62F-9608FFD43664

Reference:

https://docs.oracle.com/en/database/oracle/oracle-database/23/refrn/THREADED_EXECUTION.html#GUID-7A668A49-9FC5-4245-AD27-10D90E5AE8A8

List of Oracle process types, which ones can run as threads and which
ones always run as processes:

https://docs.oracle.com/en/database/oracle/oracle-database/23/refrn/background-processes.html#GUID-86184690-5531-405F-AA05-BB935F57B76D

Looks like they have four processes that will never run in threads:
* dbwriter (writes dirty blocks in background)
* process monitor (cleanup after process crash to avoid full server
restarts) <jealous>
* process spawner (like postmaster)
* time keeper process

Per Tim Hall's oracle-base, it seems that plenty of people are sticking
with the process model, and that one use case for threads was:
"consolidating lots of instances onto a single server without using the
multitennant option. Without the multithreaded model, the number of OS
processes could get very high."

https://oracle-base.com/articles/12c/multithreaded-model-using-threaded_execution_12cr1

I did google search for "oracle threaded_execution" and browsed a bit;
didn't see anything that seems earth shattering so far.

Ludovico Caldara and Martin Bach published blogs when it was first
released, which just introduced but didn't test or hammer on it. The
feature has existed for 10 years now and I don't see any blog posts
saying that "everyone should use this because it doubles your
performance" or anything like that. I think if there were really
significant performance gains then there would be many interesting blog
posts on the internet by now from the independent Oracle professional
community - I know many of these people.

In fact, there's an interesting blog by Kamil Stawiarski from 2015 where
he actually observed one case of /slower/ performance with threads. That
blog post ends with: "So I raise the question: why and when use threaded
execution? If ever?"

https://blog.ora-600.pl/2015/12/17/oracle-12c-internals-of-threaded-execution/

I'm not sure if he ever got an answer

-Jeremy

--
http://about.me/jeremy_schneider

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Jeremy Schneider <schneider(at)ardentperf(dot)com>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, Thomas Kellerer <shammat(at)gmx(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-07 23:37:00
Message-ID:	CA+hUKGLuV68xKe9JwFkeQJc9A2vSe8+nhahAF1my4Qg=UD2PwQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 8, 2023 at 10:37 AM Jeremy Schneider
<schneider(at)ardentperf(dot)com> wrote:
> On 6/7/23 2:39 PM, Thomas Kellerer wrote:
> > Tomas Vondra schrieb am 07.06.2023 um 21:20:
> >> Also, which other projects did this transition? Is there something we
> >> could learn from them? Were they restricted to much smaller list of
> >> platforms?
> >
> > Not open source, but Oracle was historically multi-threaded on Windows
> > and multi-process on all other platforms.
> > I _think_ starting with 19c you can optionally run it multi-threaded on
> > Linux as well.
> Looks like it actually became publicly available in 12c. AFAICT Oracle
> supports both modes today, with a config parameter to switch between them.

It's old, but this describes the 4 main models and which well known
RDBMSes use them in section 2.3:

https://dsf.berkeley.edu/papers/fntdb07-architecture.pdf

TL;DR DB2 is the winner, it can do process-per-connection,
thread-per-connection, process-pool or thread-pool.

I understand this thread to be about thread-per-connection (= backend,
session, socket) for now.

From:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 03:38:34
Message-ID:	CAFiTN-vhBZQ7Y-OTZAxZeqJdpGre785Bifu7kHkxmfW4-9ApNA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 8, 2023 at 3:00 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>

> Yea, we definitely need the supervisor function in a separate
> process. Presumably that means we need to split off some of the postmaster
> responsibilities - e.g. I don't think it'd make sense to handle connection
> establishment in the supervisor process. I wonder if this is something that
> could end up being beneficial even in the process world.
>
> A related issue is that we won't get SIGCHLD in the supervisor process
> anymore. So we'd need to come up with some design for that.

If we fork the main Postgres process from the supervisor process then
any exit to the main process will send SIGCHLD in the supervisor
process, right? I agree we can handle all connection establishment
and other thread-related stuff in the main Postgres process. But I
assume this main process should be forked out of the supervisor
process.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

From:	"Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
To:	Jeremy Schneider <schneider(at)ardentperf(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc:	Thomas Kellerer <shammat(at)gmx(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 05:55:34
Message-ID:	65e31a5b-60f9-5f8b-a2da-b5800fbc3294@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 6/8/23 12:37 AM, Jeremy Schneider wrote:
> On 6/7/23 2:39 PM, Thomas Kellerer wrote:
>> Tomas Vondra schrieb am 07.06.2023 um 21:20:
>
> I did google search for "oracle threaded_execution" and browsed a bit;
> didn't see anything that seems earth shattering so far.

FWIW, I recall Karl Arao's wiki page: https://karlarao.github.io/karlaraowiki/#%2212c%20threaded_execution%22
where some performance and memory consumption studies have been done.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

From:	Hannu Krosing <hannuk(at)google(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	"Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 09:54:17
Message-ID:	CAMT0RQTAFrx5xVJ9SXyjP71=SmU+U9+4dDm+XEZWWoy6pbCF5g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 7, 2023 at 11:37 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2023-06-05 13:40:13 -0400, Jonathan S. Katz wrote:
> > 2. While I wouldn't want to necessarily discourage a moonshot effort, I
> > would ask if developer time could be better spent on tackling some of the
> > other problems around vertical scalability? Per some PGCon discussions,
> > there's still room for improvement in how PostgreSQL can best utilize
> > resources available very large "commodity" machines (a 448-core / 24TB RAM
> > instance comes to mind).
>
> I think we're starting to hit quite a few limits related to the process model,
> particularly on bigger machines. The overhead of cross-process context
> switches is inherently higher than switching between threads in the same
> process - and my suspicion is that that overhead will continue to
> increase. Once you have a significant number of connections we end up spending
> a *lot* of time in TLB misses, and that's inherent to the process model,
> because you can't share the TLB across processes.

This part was touched in the "AMA with a Linux Kernale Hacker"
Unconference session where he mentioned that the had proposed a
'mshare' syscall for this.

So maybe a more fruitful way to fixing the perceived issues with
process model is to push for small changes in Linux to overcome these
avoiding a wholesale rewrite ?

>
>
> The amount of duplicated code we have to deal with due to to the process model
> is quite substantial. We have local memory, statically allocated shared memory
> and dynamically allocated shared memory variants for some things. And that's
> just going to continue.

Maybe we can already remove the distinction between static and dynamic
shared memory ?

Though I already heard some complaints at the conference discussions
that having the dynamic version available has made some developers
sloppy in using it resulting in wastefulness.

>
>
> > I'm purposely giving a nonanswer on whether it's a worthwhile goal, but
> > rather I'd be curious where it could stack up against some other efforts to
> > continue to help PostgreSQL improve performance and handle very large
> > workloads.
>
> There's plenty of things we can do before, but in the end I think tackling the
> issues you mention and moving to threads are quite tightly linked.

Still we should be focusing our attention at solving the issues and
not at "moving to threads" and hoping this will fix the issues by
itself.

Cheers
Hannu

From:	Hannu Krosing <hannuk(at)google(dot)com>
To:	Thomas Kellerer <shammat(at)gmx(dot)net>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 09:56:37
Message-ID:	CAMT0RQR=Mvd+7jQ8hxioGPsGek4zKcPqSYyZ+-pGqRz+auCcfQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I think I remember that in the early days of development somebody did
send a patch-set for making PostgreSQL threaded on Solaris.

I don't remember why this did not catch on.

On Wed, Jun 7, 2023 at 11:40 PM Thomas Kellerer <shammat(at)gmx(dot)net> wrote:
>
> Tomas Vondra schrieb am 07.06.2023 um 21:20:
> > Also, which other projects did this transition? Is there something we
> > could learn from them? Were they restricted to much smaller list of
> > platforms?
>
> Firebird did this a while ago if I'm not mistaken.
>
> Not open source, but Oracle was historically multi-threaded on Windows and multi-process on all other platforms.
> I _think_ starting with 19c you can optionally run it multi-threaded on Linux as well.
>
> But I doubt, they are willing to share any insights ;)
>
>
>

From:	Hannu Krosing <hannuk(at)google(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Greg Stark <stark(at)mit(dot)edu>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 10:04:05
Message-ID:	CAMT0RQTC8ku23ZfmgkmLQ8GGMAirnU547Lp9dofn-t-BWs18xQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 8, 2023 at 12:09 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
...

> We could e.g. eventually decide that we
> don't support parallel query without threading support - which would allow us
> to get rid of a very significant amount of code and runtime overhead.

Here I was hoping to go in the opposite direction and support parallel
query across replicas.

This looks much more doable based on the process model than the single
process / multiple threads model.

---
Cheers
Hannu

From:	Hannu Krosing <hannuk(at)google(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	"Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 10:15:58
Message-ID:	CAMT0RQR1kAEhHmiy2SfhqbDvoq3mrezcENM=3vozL5cpkNDPOA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 8, 2023 at 11:54 AM Hannu Krosing <hannuk(at)google(dot)com> wrote:
>
> On Wed, Jun 7, 2023 at 11:37 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > Hi,
> >
> > On 2023-06-05 13:40:13 -0400, Jonathan S. Katz wrote:
> > > 2. While I wouldn't want to necessarily discourage a moonshot effort, I
> > > would ask if developer time could be better spent on tackling some of the
> > > other problems around vertical scalability? Per some PGCon discussions,
> > > there's still room for improvement in how PostgreSQL can best utilize
> > > resources available very large "commodity" machines (a 448-core / 24TB RAM
> > > instance comes to mind).
> >
> > I think we're starting to hit quite a few limits related to the process model,
> > particularly on bigger machines. The overhead of cross-process context
> > switches is inherently higher than switching between threads in the same
> > process - and my suspicion is that that overhead will continue to
> > increase. Once you have a significant number of connections we end up spending
> > a *lot* of time in TLB misses, and that's inherent to the process model,
> > because you can't share the TLB across processes.
>
>
> This part was touched in the "AMA with a Linux Kernale Hacker"
> Unconference session where he mentioned that the had proposed a
> 'mshare' syscall for this.

Also, the *static* huge pages already let you solve this problem now
by sharing the page tables

Cheers
Hannu

From:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Jeremy Schneider <schneider(at)ardentperf(dot)com>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, Thomas Kellerer <shammat(at)gmx(dot)net>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 10:37:37
Message-ID:	3d8ffaa5-b9a1-9538-9ac3-ffa751449f4b@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 6/8/23 01:37, Thomas Munro wrote:
> On Thu, Jun 8, 2023 at 10:37 AM Jeremy Schneider
> <schneider(at)ardentperf(dot)com> wrote:
>> On 6/7/23 2:39 PM, Thomas Kellerer wrote:
>>> Tomas Vondra schrieb am 07.06.2023 um 21:20:
>>>> Also, which other projects did this transition? Is there something we
>>>> could learn from them? Were they restricted to much smaller list of
>>>> platforms?
>>>
>>> Not open source, but Oracle was historically multi-threaded on Windows
>>> and multi-process on all other platforms.
>>> I _think_ starting with 19c you can optionally run it multi-threaded on
>>> Linux as well.
>> Looks like it actually became publicly available in 12c. AFAICT Oracle
>> supports both modes today, with a config parameter to switch between them.
>
> It's old, but this describes the 4 main models and which well known
> RDBMSes use them in section 2.3:
>
> https://dsf.berkeley.edu/papers/fntdb07-architecture.pdf
>
> TL;DR DB2 is the winner, it can do process-per-connection,
> thread-per-connection, process-pool or thread-pool.
>

I think the basic architectures are known, especially from the user
perspective. I'm more interested in challenges the projects faced while
moving from one architecture to the other, or how / why they support
more than just one, etc.

In [1] Heikki argued that:

I don't think this is worth it, unless we plan to eventually remove
the multi-process mode. ... As long as you need to also support
processes, you need to code to the lowest common denominator and
don't really get the benefits.

But these projects clearly support multiple architectures, and have no
intention to ditch some of them. So how did they do that? Surely they
think there are benefits.

One option would be to just have separate code paths for processes and
threads, but the effort required to maintain and improve that would be
deadly. So the only feasible option seems to be they managed to abstract
the subsystems enough for the "regular" code to not care about model.

[1]
https://www.postgresql.org/message-id/6e3082dc-ff29-9cbf-847e-5f570828b46b@iki.fi

> I understand this thread to be about thread-per-connection (= backend,
> session, socket) for now.

Maybe, although people also proposed to switch the parallel query to
threads (so that'd be multiple threads per session). But I don't think
it really matters, the concerns are mostly about moving from one
architecture to another and/or supporting both.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 12:00:49
Message-ID:	a31dcb51-83cd-d827-48b7-4d04ebfbd593@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2023-06-07 We 17:58, Andres Freund wrote:
> Hi,
>
> On 2023-06-07 08:53:24 -0400, Robert Haas wrote:
>> Now, Andres is not a man who accepts a tax on performance of any size
>> without a fight, so his "really expensive" might turn out to resemble my
>> "pretty cheap." However, if widespread use of TLS is too expensive and we
>> have to start rewriting code to not depend on global variables, that's going
>> to be more of a problem. If we can get by with doing such rewrites only in
>> performance-critical places, it might not still be too bad. Personally, I
>> think the degree of dependence that PostgreSQL has on global variables is
>> pretty excessive and I don't think that a certain amount of refactoring to
>> reduce it would be a bad thing. If it turns into an infinite series of
>> hastily-written patches to rejigger every source file we have, though, then
>> I'm not really on board with that.
> I think a lot of such rewrites would be a good idea, even if we right now all
> agree to swear we'll never go to threads. Not having any sort of grouping of
> global variables makes it IMO considerably harder to debug. I can easily ask
> somebody to print out a variable pointing to a struct describing the state of
> a subsystem. I can't really do that for 50 variables.
>
> And once you do that, I think you reduce the TLS cost substantially. The
> variable pointing to the struct is already likely in a register. Whereas each
> individual variable being in TLS makes the job harder for the compiler.
>

I could certainly get on board with a project to tame the use of global
variables.

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

From:	Jose Luis Tallon <jltallon(at)adv-solutions(dot)net>
To:	Andres Freund <andres(at)anarazel(dot)de>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 12:01:16
Message-ID:	d4bbfc52-24e6-2f09-13b6-62399829dcdf@adv-solutions.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 7/6/23 23:37, Andres Freund wrote:
> [snip]
> I think we're starting to hit quite a few limits related to the process model,
> particularly on bigger machines. The overhead of cross-process context
> switches is inherently higher than switching between threads in the same
> process - and my suspicion is that that overhead will continue to
> increase. Once you have a significant number of connections we end up spending
> a *lot* of time in TLB misses, and that's inherent to the process model,
> because you can't share the TLB across processes.

IMHO, as one sysadmin who has previously played with Postgres on "quite
large" machines, I'd propose what most would call a "hybrid model"....

* Threads are a very valuable addition for the "frontend" of the server.
Most would call this a built-in session-aware connection pooler :)

Heikki's (and others') efforts towards separating connection state
into discrete structs is clearly a prerequisite for this;
Implementation-wise, just toss the connState into a TLS[thread-local
storage] variable and many problems just vanish.

Postgres wouldn't be the first to adopt this approach, either...

* For "heavyweight" queries, the scalability of "almost independent"
processes w.r.t. NUMA is just _impossible to achieve_ (locality of
reference!) with a pure threaded system. When CPU+mem-bound
(bandwidth-wise), threads add nothing IMO.

Indeed a separate postmaster is very much needed in order to control the
processes / guard overall integrity.

Hence, my humble suggestion is to consider a hybrid architecture which
benefits from each model's strengths. I am quite convinced that
transition would be much safer and simpler (I do share most of Tom and
other's concerns...)

Other projects to draw inspiration from:

* Postfix -- multi-process, postfix's master guards processes and
performs privileged operations; unprivileged "subsystems". Interesting
IPC solutions
* Apache -- MPMs provide flexibility and support for e.g. non-threaded
workloads (PHP is the most popular; cfr. "prefork" multi-process MPM)
* NginX is actually multi-process (one per CPU) + event-based
(multiplexing) ...
* PowerDNS is internally threaded, but has a "guardian" process. Seems
to be evolving to a more hybrid model.

I would suggest something along the lines of :

* postmaster -- process supervision and (potentially privileged)
operations; process coordination (i.e descriptor passing); mostly as-is
* *frontend* -- connection/session handling; possibly even event-driven
* backends -- process heavyweight queries as independently as possible.
Can span worker threads AND processes when needed
* *dispatcher* -- takes care of cached/lightweight queries (cached
catalog / full snapshot visibility+processing)
* utility processes can be left "as is" mostly, except to be made
multi-threaded for heavy-sync ones (e.g. vacuum workers, stat workers)

For fixed-size buffers, i.e. pages / chunks, I'd say mmaped (anonymous)
shared memory isn't that bad... but haven't read the actual code in years.

For message queues / invalidation messages, i guess that shmem-based
sync is really a nuisance. My understanding is that Linux-specific (i.e.
eventfd) mechanisms aren't quite considered .. or are they?

> The amount of duplicated code we have to deal with due to to the process model
> is quite substantial. We have local memory, statically allocated shared memory
> and dynamically allocated shared memory variants for some things. And that's
> just going to continue.

Code duplication is indeed a problem... but I wouldn't call "different
approaches/solution for very similar problems depending on
context/requirement" a duplicate. I might well be wrong / lack detail,
though... (again: haven't read PG's code for some years already).

Just my two cents.

Thanks,

J.L.

--
Parkinson's Law: Work expands to fill the time alloted to it.

From:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To:	Hannu Krosing <hannuk(at)google(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 12:15:33
Message-ID:	CAEze2Whojjh++1Ybw8ohxo46DPeDS4aD5dkPsOse4Q-8r56awQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 8 Jun 2023 at 11:54, Hannu Krosing <hannuk(at)google(dot)com> wrote:
>
> On Wed, Jun 7, 2023 at 11:37 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > Hi,
> >
> > On 2023-06-05 13:40:13 -0400, Jonathan S. Katz wrote:
> > > 2. While I wouldn't want to necessarily discourage a moonshot effort, I
> > > would ask if developer time could be better spent on tackling some of the
> > > other problems around vertical scalability? Per some PGCon discussions,
> > > there's still room for improvement in how PostgreSQL can best utilize
> > > resources available very large "commodity" machines (a 448-core / 24TB RAM
> > > instance comes to mind).
> >
> > I think we're starting to hit quite a few limits related to the process model,
> > particularly on bigger machines. The overhead of cross-process context
> > switches is inherently higher than switching between threads in the same
> > process - and my suspicion is that that overhead will continue to
> > increase. Once you have a significant number of connections we end up spending
> > a *lot* of time in TLB misses, and that's inherent to the process model,
> > because you can't share the TLB across processes.
>
>
> This part was touched in the "AMA with a Linux Kernale Hacker"
> Unconference session where he mentioned that the had proposed a
> 'mshare' syscall for this.
>
> So maybe a more fruitful way to fixing the perceived issues with
> process model is to push for small changes in Linux to overcome these
> avoiding a wholesale rewrite ?

We support not just Linux, but also Windows and several (?) BSDs. I'm
not against pushing Linux to make things easier for us, but Linux is
an open source project, too, where someone need to put in time to get
the shiny things that you want. And I'd rather see our time spent in
PostgreSQL, as Linux is only used by a part of our user base.

> > The amount of duplicated code we have to deal with due to to the process model
> > is quite substantial. We have local memory, statically allocated shared memory
> > and dynamically allocated shared memory variants for some things. And that's
> > just going to continue.
>
> Maybe we can already remove the distinction between static and dynamic
> shared memory ?

That sounds like a bad idea, dynamic shared memory is more expensive
to maintain than our static shared memory systems, not in the least
because DSM is not guaranteed to share the same addresses in each
process' address space.

> Though I already heard some complaints at the conference discussions
> that having the dynamic version available has made some developers
> sloppy in using it resulting in wastefulness.

Do you know any examples of this wastefulness?

> > > I'm purposely giving a nonanswer on whether it's a worthwhile goal, but
> > > rather I'd be curious where it could stack up against some other efforts to
> > > continue to help PostgreSQL improve performance and handle very large
> > > workloads.
> >
> > There's plenty of things we can do before, but in the end I think tackling the
> > issues you mention and moving to threads are quite tightly linked.
>
> Still we should be focusing our attention at solving the issues and
> not at "moving to threads" and hoping this will fix the issues by
> itself.

I suspect that it is much easier to solve some of the issues when
working in a shared address space.
E.g. resizing shared_buffers is difficult right now due to the use of
a static allocation of shared memory, but if we had access to a single
shared address space, it'd be easier to do any cleanup necessary for
dynamically increasing/decreasing its size.
Same with parallel workers - if we have a shared address space, the
workers can pass any sized objects around without being required to
move the tuples through DSM and waiting for the leader process to
empty that buffer when it gets full.

Sure, most of that is probably possible with DSM as well, it's just
that I see a lot more issues that you need to take care of when you
don't have a shared address space (such as the pointer translation we
do in dsa_get_address).

Kind regards,

Matthias van de Meent
Neon, Inc.

From:	Hannu Krosing <hannuk(at)google(dot)com>
To:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 12:44:11
Message-ID:	CAMT0RQSfoUCNskuweVBhmEiWh76q+eqDhX+5_bWpq8Qq-KuTTg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 8, 2023 at 2:15 PM Matthias van de Meent
<boekewurm+postgres(at)gmail(dot)com> wrote:
>
> On Thu, 8 Jun 2023 at 11:54, Hannu Krosing <hannuk(at)google(dot)com> wrote:
> >
> > On Wed, Jun 7, 2023 at 11:37 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > >
> > > Hi,
> > >
> > > On 2023-06-05 13:40:13 -0400, Jonathan S. Katz wrote:
> > > > 2. While I wouldn't want to necessarily discourage a moonshot effort, I
> > > > would ask if developer time could be better spent on tackling some of the
> > > > other problems around vertical scalability? Per some PGCon discussions,
> > > > there's still room for improvement in how PostgreSQL can best utilize
> > > > resources available very large "commodity" machines (a 448-core / 24TB RAM
> > > > instance comes to mind).
> > >
> > > I think we're starting to hit quite a few limits related to the process model,
> > > particularly on bigger machines. The overhead of cross-process context
> > > switches is inherently higher than switching between threads in the same
> > > process - and my suspicion is that that overhead will continue to
> > > increase. Once you have a significant number of connections we end up spending
> > > a *lot* of time in TLB misses, and that's inherent to the process model,
> > > because you can't share the TLB across processes.
> >
> >
> > This part was touched in the "AMA with a Linux Kernale Hacker"
> > Unconference session where he mentioned that the had proposed a
> > 'mshare' syscall for this.
> >
> > So maybe a more fruitful way to fixing the perceived issues with
> > process model is to push for small changes in Linux to overcome these
> > avoiding a wholesale rewrite ?
>
> We support not just Linux, but also Windows and several (?) BSDs. I'm
> not against pushing Linux to make things easier for us, but Linux is
> an open source project, too, where someone need to put in time to get
> the shiny things that you want. And I'd rather see our time spent in
> PostgreSQL, as Linux is only used by a part of our user base.

Do we have any statistics for the distribution of our user base ?

My gut feeling says that for performance-critical use the non-Linux is
in low single digits at best.

My fascination for OpenSource started with realisation that instead of
workarounds you can actually fix the problem at source. So if the
specific problem is that TLB is not shared then the proper fix is
making it shared instead of rewriting everything else to get around
it. None of us is limited to writing code in PostgreSQL only. If the
easiest and more generix fix can be done in Linux then so be it.

It is also possible that Windows and *BSD already have a similar feature.

>
> > > The amount of duplicated code we have to deal with due to to the process model
> > > is quite substantial. We have local memory, statically allocated shared memory
> > > and dynamically allocated shared memory variants for some things. And that's
> > > just going to continue.
> >
> > Maybe we can already remove the distinction between static and dynamic
> > shared memory ?
>
> That sounds like a bad idea, dynamic shared memory is more expensive
> to maintain than our static shared memory systems, not in the least
> because DSM is not guaranteed to share the same addresses in each
> process' address space.

Then this too needs to be fixed

>
> > Though I already heard some complaints at the conference discussions
> > that having the dynamic version available has made some developers
> > sloppy in using it resulting in wastefulness.
>
> Do you know any examples of this wastefulness?

No. Just somebody mentioned it in a hallway conversation and the rest
of the developers present mumbled approvingly :)

> > > > I'm purposely giving a nonanswer on whether it's a worthwhile goal, but
> > > > rather I'd be curious where it could stack up against some other efforts to
> > > > continue to help PostgreSQL improve performance and handle very large
> > > > workloads.
> > >
> > > There's plenty of things we can do before, but in the end I think tackling the
> > > issues you mention and moving to threads are quite tightly linked.
> >
> > Still we should be focusing our attention at solving the issues and
> > not at "moving to threads" and hoping this will fix the issues by
> > itself.
>
> I suspect that it is much easier to solve some of the issues when
> working in a shared address space.

Probably. But it would come at the cost of needing to change a lot of
other parts of PostgreSQL.

I am not against making code cleaner for potential threaded model
support. I am just a bit sceptical about the actual switch being easy,
or doable in the next 10-15 years.

> E.g. resizing shared_buffers is difficult right now due to the use of
> a static allocation of shared memory, but if we had access to a single
> shared address space, it'd be easier to do any cleanup necessary for
> dynamically increasing/decreasing its size.

This again could be done with shared memory mapping + dynamic shared memory.

> Same with parallel workers - if we have a shared address space, the
> workers can pass any sized objects around without being required to
> move the tuples through DSM and waiting for the leader process to
> empty that buffer when it gets full.

Larger shared memory :)

Same for shared plan cache and shared schema cache.

> Sure, most of that is probably possible with DSM as well, it's just
> that I see a lot more issues that you need to take care of when you
> don't have a shared address space (such as the pointer translation we
> do in dsa_get_address).

All of the above seem to point to the need of a single thing - having
an option for shared memory mappings .

So let's focus on fixing things with minimal required change.

And this would not have an adverse affect on systems that can not
share mapping, they just won't become faster. And thay are all welcome
to add the option for shared mappings too if they see enough value in
it.

It could sound like the same thing as threaded model, but should need
much less changes and likely no changes for most out-of-tree
extensions

---
Cheers
Hannu

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Hannu Krosing <hannuk(at)google(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Greg Stark <stark(at)mit(dot)edu>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 13:38:16
Message-ID:	CA+TgmoYfYX6qyxN7O1zhfj+KagSADpo5AhObj3ghPAo-XA2Zrw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 8, 2023 at 6:04 AM Hannu Krosing <hannuk(at)google(dot)com> wrote:
> Here I was hoping to go in the opposite direction and support parallel
> query across replicas.
>
> This looks much more doable based on the process model than the single
> process / multiple threads model.

I don't think this is any more or less difficult to support in one
model vs. the other. The problems seem pretty much unrelated.

--
Robert Haas
EDB: http://www.enterprisedb.com

From:	Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 13:47:48
Message-ID:	86b55b50-a408-1604-1967-b4f87e9d7d0c@garret.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 07.06.2023 3:53 PM, Robert Haas wrote:
> I think I remember a previous conversation with Andres
> where he opined that thread-local variables are "really expensive"
> (and I apologize in advance if I'm mis-remembering this). Now, Andres
> is not a man who accepts a tax on performance of any size without a
> fight, so his "really expensive" might turn out to resemble my "pretty
> cheap." However, if widespread use of TLS is too expensive and we have
> to start rewriting code to not depend on global variables, that's
> going to be more of a problem. If we can get by with doing such
> rewrites only in performance-critical places, it might not still be
> too bad. Personally, I think the degree of dependence that PostgreSQL
> has on global variables is pretty excessive and I don't think that a
> certain amount of refactoring to reduce it would be a bad thing. If it
> turns into an infinite series of hastily-written patches to rejigger
> every source file we have, though, then I'm not really on board with
> that.

Actually TLS not not more expensive then accessing struct fields (at
least at x86 platform), consider the following program:

typedef struct {
   int a;
   int b;
   int c;
} ABC;

__thread int a;
__thread int b;
__thread int c;

void use_struct(ABC* abc) {
   abc->a += 1;
   abc->b += 1;
   abc->c += 1;
}

void use_tls(ABC* abc) {
   a += 1;
   b += 1;
   c += 1;
}

Now look at the generated assembler:

use_struct:
    addl    $1, (%rdi)
   addl    $1, 4(%rdi)
   addl    $1, 8(%rdi)
   ret

use_tls:
    addl    $1, %fs:a(at)tpoff
   addl    $1, %fs:b(at)tpoff
   addl    $1, %fs:c(at)tpoff
   ret

> Heikki mentions the idea of having a central Session object and just
> passing that around. I have a hard time believing that's going to work
> out nicely. First, it's not extensible. Right now, if you need a bit
> of additional session-local state, you just declare a variable and
> you're all set. That's not a perfect system and does cause some
> problems, but we can't go from there to a system where it's impossible
> to add session-local state without hacking core. Second, we will be
> sad if session.h ends up #including every other header file that
> defines a data structure anywhere in the backend. Or at least I'll be
> sad. I'm not actually against the idea of having some kind of session
> object that we pass around, but I think it either needs to be limited
> to a relatively small set of well-defined things, or else it needs to
> be design in some kind of extensible way that doesn't require it to
> know the full details of every sort of object that's being used as
> session-local state anywhere in the system. I haven't really seen any
> convincing design ideas around this yet.

There are about 2k static/global variables in Postgres.
It is almost impossible to maintain such struct.
But session context may be still needed for other purposes - if we want
to support built-in connection pool.

If we are using threads, then all variables needs to be either
thread-local, either access to them should be synchronized.
But If we want to save session context, then there is no need to
save/restore all this 2k variables.
We need to capture and these variables which lifetime exceeds
transaction boundary.
There are not so much such variables - tens not hundreds.

The question is how to better handle this "session context".
There are two alternatives:
1. Save/restore this context from/to normal TLS variables.
2. Replace such variables with access through the session context struct.

I prefer 2) because it requires less changes in code.
And performance overhead of session context store/resume is negligible
when number of such variables is ~10.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 13:56:37
Message-ID:	CA+TgmoaVGyHebYLwmuooHC0f58-=-NYQ-Mz4OWX8aeX5VB=W0A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 7, 2023 at 5:30 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2023-06-05 17:51:57 +0300, Heikki Linnakangas wrote:
> > If there are no major objections, I'm going to update the developer FAQ,
> > removing the excuses there for why we don't use threads [1].
>
> I think we should do this even if there's no concensus to slowly change to
> threads. There's clearly no concensus on the opposite either.

This is a very fair point.

> One interesting bit around the transition is what tooling we ought to provide
> to detect problems. It could e.g. be reasonably feasible to write something
> checking how many read-write global variables an extension has on linux
> systems.

Yes, this would be great.

> I don't think the control file is the right place - that seems more like
> something that should be signalled via PG_MODULE_MAGIC. We need to check this
> not just during CREATE EXTENSION, but also during loading of libraries - think
> of shared_preload_libraries.

+1.

Yeah, I've had similar thoughts. I'm not exactly sure what the
advantages of such a refactoring might be, but the current structure
feels pretty limiting. It works OK because we don't do anything in the
postmaster other than fork a new backend, but I'm not sure if that's
the best strategy. It means, for example, that if there's a ton of new
connection requests, we're spawning a ton of new processes, which
means that you can put a lot of load on a PostgreSQL instance even if
you can't authenticate. Maybe we'd be better off with a pool of
processes accepting connections; if authentication fails, that
connection goes back into the pool and tries again. If authentication
succeeds, either that process transitions to being a regular backend,
leaving the authentication pool, or perhaps hands off the connection
to a "real backend" at that point and loops around to accept() the
next request.

Whether that's a good ideal in detail or not, the point remains that
having the postmaster handle this task is quite limiting. It forces us
to hand off the connection to a new process at the earliest possible
stage, so that the postmaster remains free to handle other duties.
Giving the responsibility to another process would let us make
decisions about where to perform the hand-off based on real
architectural thought rather than being forced to do a certain way
because nothing else will work.

--
Robert Haas
EDB: http://www.enterprisedb.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	"Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 14:08:57
Message-ID:	CA+TgmobN11EbzP-Z0YfnvoJDNabqzn-u3JTaoHAPMwLiFWLtaA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 7, 2023 at 5:37 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> I think we're starting to hit quite a few limits related to the process model,
> particularly on bigger machines. The overhead of cross-process context
> switches is inherently higher than switching between threads in the same
> process - and my suspicion is that that overhead will continue to
> increase. Once you have a significant number of connections we end up spending
> a *lot* of time in TLB misses, and that's inherent to the process model,
> because you can't share the TLB across processes.

This is a very good point.

Our default posture on this mailing list is to try to maximize use of
OS facilities rather than reimplementing things - well and good. But
if a user writes a query with FOO JOIN BAR ON FOO.X = BAR.X OR FOO.Y =
BAR.Y and then complains that the resulting query plan sucks, we don't
slink off in embarrassment: we tell the user that there's not really
any fast plan for that query and that if they write queries like that
they have to live with the consequences. But the same thing applies
here. To the extent that context switching between more processes is
more expensive than context switching between threads for
hardware-related reasons, that's not something that the OS can fix for
us. If we choose to do the expensive thing then we pay the overhead.

--
Robert Haas
EDB: http://www.enterprisedb.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Peter Eisentraut <peter(at)eisentraut(dot)org>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 14:15:02
Message-ID:	CA+TgmoZ5CH5M0L9ks7aCLJ8kPrhoGsZ-Rr83vki4CrGm5yNn5Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 7, 2023 at 5:39 PM Peter Eisentraut <peter(at)eisentraut(dot)org> wrote:
> On 07.06.23 23:30, Andres Freund wrote:
> > Yea, we definitely need the supervisor function in a separate
> > process. Presumably that means we need to split off some of the postmaster
> > responsibilities - e.g. I don't think it'd make sense to handle connection
> > establishment in the supervisor process. I wonder if this is something that
> > could end up being beneficial even in the process world.
>
> Something to think about perhaps ... how would that be different from
> using an existing external supervisor process like systemd or supervisord.

systemd wouldn't start individual PostgreSQL processes, right? If we
want a checkpointer and a wal writer and a background writer and
whatever we have to have our own supervisor process to spawn all those
and keep them running. We could remove the logic to do a full system
reset without a postmaster exit in favor of letting systemd restart
everything from scratch, if we wanted to do that. But we'd still need
our own supervisor to start up all of the individual threads/processes
that we need.

--
Robert Haas
EDB: http://www.enterprisedb.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 14:17:04
Message-ID:	CA+TgmoaUVjy4vhpRWxLXDGKCURepbY89KGBwjsg5qtwTzgPjWw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 7, 2023 at 5:45 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> People have argued that the process model is more robust. But it turns out
> that we have to crash-restart for just about any "bad failure" anyway. It used
> to be (a long time ago) that we didn't, but that was just broken.

How hard have you thought about memory leaks as a failure mode? Or
file descriptor leaks?

Right now, a process needs to release all of its shared resources
before exiting, or trigger a crash-and-restart cycle. But it doesn't
need to release any process-local resources, because the OS will take
care of that. But that wouldn't be true any more, and that seems like
it might require fixing quite a few things.

--
Robert Haas
EDB: http://www.enterprisedb.com

From:	Greg Stark <stark(at)mit(dot)edu>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 14:33:26
Message-ID:	CAM-w4HMvXmjz=6r7Yf0yxHX-mzy60DPqoVB8K1b7COEBJKiZLg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 7 Jun 2023 at 18:09, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Having the same memory mapping between threads makes allows the
> hardware to share the TLB (on x86 via process context identifiers), which
> isn't realistically possible with different processes.

As a matter of historical interest Solaris actually did implement this
across different processes. It was called by the somewhat unfortunate
name "Intimate Shared Memory". I don't think Linux ever implemented
anything like it but I'm not sure.

I think this was not so much about cache hit rate but about just sheer
wasted memory in page mappings. So I guess hugepages more or less
target the same issues. But I find it interesting that they were
already running into issues like this 20 years ago -- presumably those
issues have only grown.

--
greg

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Hannu Krosing <hannuk(at)google(dot)com>
Cc:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 14:56:32
Message-ID:	CA+TgmoZhySzy4QuupAMCi3Fs7ZDwwKkUE9VEgxF2nn71Bo-FXQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 8, 2023 at 8:44 AM Hannu Krosing <hannuk(at)google(dot)com> wrote:
> > That sounds like a bad idea, dynamic shared memory is more expensive
> > to maintain than our static shared memory systems, not in the least
> > because DSM is not guaranteed to share the same addresses in each
> > process' address space.
>
> Then this too needs to be fixed

Honestly, I'm struggling to respond to this non-sarcastically. I mean,
I was the one who implemented DSM. Do you think it works the way that
it works because I considered doing something smart and decided to do
something dumb instead?

Suppose you have two PostgreSQL backends A and B. If we're not running
on Windows, each of these was forked from the postmaster, so things
like the text and data segments and the main shared memory segment are
going to be mapped at the same address in both processes, because they
inherit those mappings from the postmaster. However, additional things
can get mapped into the address space of either process later. This
can happen in a variety of ways. For instance, a shared library can
get loaded into one process and not the other. Or it can get loaded
into both processes but at different addresses - keep in mind that
it's the OS, not PostgreSQL, that decides what address to use when
loading a shared library. Or, if one process allocates a bunch of
memory, then new address space will have to be mapped into that
process to handle those memory allocations and, again, it is the OS
that decides where to put those mappings. So over time the memory
mappings of these two processes can diverge arbitrarily. That means
that if the same DSM has to be mapped into both processes, there is no
guarantee that it can be placed at the same address in both processes.
The address that gets used in one process might not be available in
the other process.

It's worth pointing out here that there are no portable primitives
available for a process to examine what memory segments are mapped
into its address space. I think it's probably possible on every OS,
but it works differently on different ones. Linux exposes such details
through /proc, for example, but macOS doesn't have /proc. So if we're
using standard, portable primitives, we can't even TRY to put the DSM
at the same address in every process that maps it. But even if we used
non-portable primitives to examine what's mapped into the address
space of every process, it wouldn't solve the problem. Suppose 4
processes want to share a DSM, so they all run around and use
non-portable OS-specific interfaces to figure out where there's a free
chunk of address space large enough to accommodate that DSM and they
all map it there. Hooray! But then say a fifth process comes along and
it ALSO wants to map that DSM, but in that fifth process the address
space that was available in the other four processes has already been
used by something else. Well, now we're screwed.

The fact that DSM is expensive and awkward to use isn't a defect in
the implementation of DSM. It's a consequence of the fact that the
address space mappings in one PostgreSQL backend can be almost
arbitrarily different from the address space mappings in another
PostgreSQL backend. If only there were some kind of OS feature
available that would allow us to set things up so that all of the
PostgreSQL backends shared the same address space mappings!

Oh, right, there is: THREADS.

The fact that we don't use threads is the reason why DSM sucks and has
to suck. In fact it's the reason why DSM has to exist at all. Saying
"fix DSM instead of using threads" is roughly in the same category as
saying "if the peasants are revolting because they have no bread, then
let them eat cake." Both statements evince a complete failure to
understand the actual source of the problem.

With apologies for my grumpiness,

--
Robert Haas
EDB: http://www.enterprisedb.com

From:	Hannu Krosing <hannuk(at)google(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 15:02:08
Message-ID:	CAMT0RQTCFec0J4hQDcpC0CZ4F-7VWFZXpREgaRU-96bVQ1pj2Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 8, 2023 at 4:56 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Thu, Jun 8, 2023 at 8:44 AM Hannu Krosing <hannuk(at)google(dot)com> wrote:
> > > That sounds like a bad idea, dynamic shared memory is more expensive
> > > to maintain than our static shared memory systems, not in the least
> > > because DSM is not guaranteed to share the same addresses in each
> > > process' address space.
> >
> > Then this too needs to be fixed
>
> Honestly, I'm struggling to respond to this non-sarcastically. I mean,
> I was the one who implemented DSM. Do you think it works the way that
> it works because I considered doing something smart and decided to do
> something dumb instead?

No, I meant that this needs to be fixed at OS level, by being able to
use the same mapping.

We should not shy away from asking the OS people for adding the useful
features still missing.

It was mentioned in the Unconference Kernel Hacker AMA talk and said
kernel hacker works for Oracle, andf they also seemed to be needing
this :)

From:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To:	Hannu Krosing <hannuk(at)google(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 15:08:16
Message-ID:	CAEze2Wgv2XK0r0S-Ru15R+DyeTJO9_jo7ZAPdrWqns0ESK6U6w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 8 Jun 2023 at 14:44, Hannu Krosing <hannuk(at)google(dot)com> wrote:
>
> On Thu, Jun 8, 2023 at 2:15 PM Matthias van de Meent
> <boekewurm+postgres(at)gmail(dot)com> wrote:
> >
> > On Thu, 8 Jun 2023 at 11:54, Hannu Krosing <hannuk(at)google(dot)com> wrote:
> > >
> > > This part was touched in the "AMA with a Linux Kernale Hacker"
> > > Unconference session where he mentioned that the had proposed a
> > > 'mshare' syscall for this.
> > >
> > > So maybe a more fruitful way to fixing the perceived issues with
> > > process model is to push for small changes in Linux to overcome these
> > > avoiding a wholesale rewrite ?
> >
> > We support not just Linux, but also Windows and several (?) BSDs. I'm
> > not against pushing Linux to make things easier for us, but Linux is
> > an open source project, too, where someone need to put in time to get
> > the shiny things that you want. And I'd rather see our time spent in
> > PostgreSQL, as Linux is only used by a part of our user base.
>
> Do we have any statistics for the distribution of our user base ?
>
> My gut feeling says that for performance-critical use the non-Linux is
> in low single digits at best.
>
> My fascination for OpenSource started with realisation that instead of
> workarounds you can actually fix the problem at source. So if the
> specific problem is that TLB is not shared then the proper fix is
> making it shared instead of rewriting everything else to get around
> it. None of us is limited to writing code in PostgreSQL only. If the
> easiest and more generix fix can be done in Linux then so be it.

TLB is a CPU hardware facility, not something that the OS can decide
to share between processes. While sharing (some) OS memory management
facilities across threads might be possible (as you mention, that
mshare syscall would be an example), that doesn't solve the issue of
the hardware not supporting sharing TLB entries across processes. We'd
use less kernel memory for memory management, but the CPU would still
stall on TLB misses every time we switch processes on the CPU (unless
we somehow were able to use non-process-namespaced TLB entries, which
would make our processes not meaningfully different from threads
w.r.t. address space).

> > >
> > > Maybe we can already remove the distinction between static and dynamic
> > > shared memory ?
> >
> > That sounds like a bad idea, dynamic shared memory is more expensive
> > to maintain than our static shared memory systems, not in the least
> > because DSM is not guaranteed to share the same addresses in each
> > process' address space.
>
> Then this too needs to be fixed

That needs kernel facilities in all (most?) supported OSes, and I
think that's much more work than moving to threads:
Allocations from the kernel are arbitrarily random across the
available address space, so a DSM segment that is allocated in one
backend might overlap with unshared allocations of a different
backend, making those backends have conflicting memory address spaces.
The only way to make that work is to have a shared memory addressing
space, but some backends just not having the allocation mapped into
their local address space; which seems only slightly more isolated
than threads and much more effort to maintain.

> > > Though I already heard some complaints at the conference discussions
> > > that having the dynamic version available has made some developers
> > > sloppy in using it resulting in wastefulness.
> >
> > Do you know any examples of this wastefulness?
>
> No. Just somebody mentioned it in a hallway conversation and the rest
> of the developers present mumbled approvingly :)

The only "wastefulness" that I know of in our use of DSM is the queue,
and that's by design: We need to move data from a backend's private
memory to memory that's accessible to other backends; i.e. shared
memory. You can't do that without copying or exposing your private
memory.

> > > Still we should be focusing our attention at solving the issues and
> > > not at "moving to threads" and hoping this will fix the issues by
> > > itself.
> >
> > I suspect that it is much easier to solve some of the issues when
> > working in a shared address space.
>
> Probably. But it would come at the cost of needing to change a lot of
> other parts of PostgreSQL.
>
> I am not against making code cleaner for potential threaded model
> support. I am just a bit sceptical about the actual switch being easy,
> or doable in the next 10-15 years.

PostgreSQL only has a support cycle of 5 years. 5 years after the last
release of un-threaded PostgreSQL we could drop support for "legacy"
extension models that don't support threading.

> > E.g. resizing shared_buffers is difficult right now due to the use of
> > a static allocation of shared memory, but if we had access to a single
> > shared address space, it'd be easier to do any cleanup necessary for
> > dynamically increasing/decreasing its size.
>
> This again could be done with shared memory mapping + dynamic shared memory.

Yes, but as I said, that's much more difficult than lock and/or atomic
operations on shared-between-backends static variables, because if
these variables aren't in shared memory you need to pass the messages
to update the variables to all backends.

> > Same with parallel workers - if we have a shared address space, the
> > workers can pass any sized objects around without being required to
> > move the tuples through DSM and waiting for the leader process to
> > empty that buffer when it gets full.
>
> Larger shared memory :)
>
> Same for shared plan cache and shared schema cache.

Shared memory in processes is not free, if only because the TLB gets
saturated much faster.

> > Sure, most of that is probably possible with DSM as well, it's just
> > that I see a lot more issues that you need to take care of when you
> > don't have a shared address space (such as the pointer translation we
> > do in dsa_get_address).
>
> All of the above seem to point to the need of a single thing - having
> an option for shared memory mappings .
>
> So let's focus on fixing things with minimal required change.

That seems logical, but not all kernels support dynamic shared memory
mappings. And, as for your suggested solution, I couldn't find much
info on this mshare syscall (or its successor mmap/VM_SHARED_PT), nor
on whether it would actually fix the TLB issue.

> And this would not have an adverse affect on systems that can not
> share mapping, they just won't become faster. And thay are all welcome
> to add the option for shared mappings too if they see enough value in
> it.
>
> It could sound like the same thing as threaded model, but should need
> much less changes and likely no changes for most out-of-tree
> extensions

We can't expect the kernel to fix everything for us - that's what we
build PostgreSQL for. Where possible, we do want to rely on OS
primitives, but I'm not sure that it would be easy to share memory
address mappings across backends, for reasons including the above
("That needs kernel facilities in all [...] more effort to maintain").

Kind regards,

Matthias van de Meent
Neon, Inc.

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Greg Stark <stark(at)mit(dot)edu>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 15:54:01
Message-ID:	20230608155401.f4bi2e3ravbffwu3@awork3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2023-06-08 10:33:26 -0400, Greg Stark wrote:
> On Wed, 7 Jun 2023 at 18:09, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Having the same memory mapping between threads makes allows the
> > hardware to share the TLB (on x86 via process context identifiers), which
> > isn't realistically possible with different processes.
>
> As a matter of historical interest Solaris actually did implement this
> across different processes. It was called by the somewhat unfortunate
> name "Intimate Shared Memory". I don't think Linux ever implemented
> anything like it but I'm not sure.

I don't think it shared the TLB - it did share page tables though.

From:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To:	Hannu Krosing <hannuk(at)google(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 15:55:57
Message-ID:	CAEze2WiM9XTCbp4K5sGt0udzSPqmEmxPVKOv7EagG8u3Vbya6A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 8 Jun 2023 at 17:02, Hannu Krosing <hannuk(at)google(dot)com> wrote:
>
> On Thu, Jun 8, 2023 at 4:56 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >
> > On Thu, Jun 8, 2023 at 8:44 AM Hannu Krosing <hannuk(at)google(dot)com> wrote:
> > > > That sounds like a bad idea, dynamic shared memory is more expensive
> > > > to maintain than our static shared memory systems, not in the least
> > > > because DSM is not guaranteed to share the same addresses in each
> > > > process' address space.
> > >
> > > Then this too needs to be fixed
> >
> > Honestly, I'm struggling to respond to this non-sarcastically. I mean,
> > I was the one who implemented DSM. Do you think it works the way that
> > it works because I considered doing something smart and decided to do
> > something dumb instead?
>
> No, I meant that this needs to be fixed at OS level, by being able to
> use the same mapping.
>
> We should not shy away from asking the OS people for adding the useful
> features still missing.

While I agree that "sharing page tables across processes" is useful,
it looks like it'd be much more effort to correctly implement for e.g.
DSM than implementing threading.
Konstantin's diff is "only" 20.1k lines [0] added and/or modified,
which is a lot, but it's manageable (13k+ of which are from files that
were auto-generated and then committed, likely accidentally).

> It was mentioned in the Unconference Kernel Hacker AMA talk and said
> kernel hacker works for Oracle, andf they also seemed to be needing
> this :)

Though these new kernel features allowing for better performance
(mostly in kernel memory usage, probably) would be nice to have, we
wouldn't get performance benefits for older kernels, benefits which we
would get if we were to implement threading.
I'm not on board with a policy of us twiddling thumbs and waiting for
the OS to fix our architectural performance issues. Sure, the kernel
could optimize for our usage pattern, but I think that's not something
we can (or should) rely on for performance ^1.

Kind regards,

Matthias van de Meent

[0] https://github.com/postgrespro/postgresql.pthreads/compare/801386af...d5933309?w=1
^1 OT: I think the same about us (ab)using the OS page cache, but
that's a tale for a different time and thread.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Hannu Krosing <hannuk(at)google(dot)com>
Cc:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 15:56:13
Message-ID:	CA+TgmobtDMH6Pzioyw5PXA_z2Nq8whzXY57O=Q_J=kHDY-eZcw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 8, 2023 at 11:02 AM Hannu Krosing <hannuk(at)google(dot)com> wrote:
> No, I meant that this needs to be fixed at OS level, by being able to
> use the same mapping.
>
> We should not shy away from asking the OS people for adding the useful
> features still missing.
>
> It was mentioned in the Unconference Kernel Hacker AMA talk and said
> kernel hacker works for Oracle, andf they also seemed to be needing
> this :)

Fair enough, but we aspire to work on a bunch of different operating
systems. To make use of an OS facility, we need something that works
on at least Linux, Windows, macOS, and a few different BSD flavors.
It's not as if when the PostgreSQL project asks for a new operating
system facility everyone springs into action to provide it
immediately. And even if they did, and even if they all released an
implementation of whatever we requested next year, it would still be
at least five, more realistically ten, years before systems with those
facilities were ubiquitous. And unless we have truly obscene amounts
of clout in the OS community, it's likely that all of those different
operating systems would implement different things to meet the stated
need, and then we'd have to have a complex bunch of platform-dependent
code in order to keep working on all of those systems.

To me, this is a road to nowhere. I have no problem at all with us
expressing our needs to the OS community, but realistically, any
PostgreSQL feature that depends on an OS feature less than twenty
years old is going to have to be optional, which means that if we want
to do anything about sharing address space mappings in the next few
years, it's going to need to be based on threads.

--
Robert Haas
EDB: http://www.enterprisedb.com

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Jose Luis Tallon <jltallon(at)adv-solutions(dot)net>
Cc:	"Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 15:57:15
Message-ID:	20230608155715.xhcb2bm246wlxlrj@awork3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2023-06-08 14:01:16 +0200, Jose Luis Tallon wrote:
> * For "heavyweight" queries, the scalability of "almost independent"
> processes w.r.t. NUMA is just _impossible to achieve_ (locality of
> reference!) with a pure threaded system. When CPU+mem-bound
> (bandwidth-wise), threads add nothing IMO.

I don't think this is true in any sort of way.

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Hannu Krosing <hannuk(at)google(dot)com>
Cc:	"Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 16:00:02
Message-ID:	20230608160002.qeollvbvfcrbq7sh@awork3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2023-06-08 12:15:58 +0200, Hannu Krosing wrote:
> On Thu, Jun 8, 2023 at 11:54 AM Hannu Krosing <hannuk(at)google(dot)com> wrote:
> >
> > On Wed, Jun 7, 2023 at 11:37 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > >
> > > Hi,
> > >
> > > On 2023-06-05 13:40:13 -0400, Jonathan S. Katz wrote:
> > > > 2. While I wouldn't want to necessarily discourage a moonshot effort, I
> > > > would ask if developer time could be better spent on tackling some of the
> > > > other problems around vertical scalability? Per some PGCon discussions,
> > > > there's still room for improvement in how PostgreSQL can best utilize
> > > > resources available very large "commodity" machines (a 448-core / 24TB RAM
> > > > instance comes to mind).
> > >
> > > I think we're starting to hit quite a few limits related to the process model,
> > > particularly on bigger machines. The overhead of cross-process context
> > > switches is inherently higher than switching between threads in the same
> > > process - and my suspicion is that that overhead will continue to
> > > increase. Once you have a significant number of connections we end up spending
> > > a *lot* of time in TLB misses, and that's inherent to the process model,
> > > because you can't share the TLB across processes.
> >
> >
> > This part was touched in the "AMA with a Linux Kernale Hacker"
> > Unconference session where he mentioned that the had proposed a
> > 'mshare' syscall for this.

As-is that'd just lead to sharing page table, not the TLB. I don't think you
currently do sharing of the TLB for parts of your address space on x86
hardware. It's possible that something like that gets added to future
hardware, but ...

> Also, the *static* huge pages already let you solve this problem now
> by sharing the page tables

You don't share the page tables with huge pages on linux.

- Andres

From:	Greg Sabino Mullane <htamfids(at)gmail(dot)com>
To:	Hannu Krosing <hannuk(at)google(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 16:05:21
Message-ID:	CAKAnmmJsZud_t9ZGRKs64HDAEJi0kPTwPJoLqznbiiFm=vGwyQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 8, 2023 at 8:44 AM Hannu Krosing <hannuk(at)google(dot)com> wrote:

> Do we have any statistics for the distribution of our user base ?
>
> My gut feeling says that for performance-critical use the non-Linux is
> in low single digits at best.
>

Stats are probably not possible, but based on years of consulting, as well
as watching places like SO, Slack, IRC, etc. over the years, IMO that's a
very accurate gut feeling. I'd hazard 1% or less for non-Linux systems.

Cheers,
Greg

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 16:59:58
Message-ID:	20230608165958.dbmoiwqupqrmqeng@awork3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2023-06-08 16:47:48 +0300, Konstantin Knizhnik wrote:
> Actually TLS not not more expensive then accessing struct fields (at least
> at x86 platform), consider the following program:

It really depends on the OS and the architecture, not just the
architecture. And even on x86-64 Linux, the fact that you're using the segment
offset in the address calculation means you can't use the more complicated
addressing modes for other reasons. And plenty instructions, e.g. most (?) SSE
instructions, won't be able to use that kind of addressing directly.

Even just compiling your, example you can see that with gcc -O2 you get
considerably faster code with the non-TLS version.

As a fairly extreme example, here's the mingw -O3 compiled code:

use_struct:
movq xmm1, QWORD PTR .LC0[rip]
movq xmm0, QWORD PTR [rcx]
add DWORD PTR 8[rcx], 1
paddd xmm0, xmm1
movq QWORD PTR [rcx], xmm0
ret
use_tls:
sub rsp, 40
lea rcx, __emutls_v.a[rip]
call __emutls_get_address
lea rcx, __emutls_v.b[rip]
add DWORD PTR [rax], 1
call __emutls_get_address
lea rcx, __emutls_v.c[rip]
add DWORD PTR [rax], 1
call __emutls_get_address
add DWORD PTR [rax], 1
add rsp, 40
ret

Greetings,

Andres Freund

From:	Ilya Anfimov <ilan(at)tzirechnoy(dot)com>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 17:02:46
Message-ID:	ZIIJtibyFcTAHKyy@azor.tzirechnoy.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 07, 2023 at 10:26:07AM +1200, Thomas Munro wrote:
> On Tue, Jun 6, 2023 at 6:52???AM Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> > If we were starting out today we would probably choose a threaded implementation. But moving to threaded now seems to me like a multi-year-multi-person project with the prospect of years to come chasing bugs and the prospect of fairly modest advantages. The risk to reward doesn't look great.
> >
> > That's my initial reaction. I could be convinced otherwise.
>
> Here is one thing I often think about when contemplating threads.
> Take a look at dsa.c. It calls itself a shared memory allocator, but
> really it has two jobs, the second being to provide software emulation
> of virtual memory. That???s behind dshash.c and now the stats system,
> and various parts of the parallel executor code. It???s slow and
> complicated, and far from the state of the art. I wrote that code
> (building on allocator code from Robert) with the expectation that it
> was a transitional solution to unblock a bunch of projects. I always
> expected that we'd eventually be deleting it. When I explain that
> subsystem to people who are not steeped in the lore of PostgreSQL, it
> sounds completely absurd. I mean, ... it is, right? My point is

Isn't all the memory operations would require nearly the same
shared memory allocators if someone switches to a threaded imple-
mentation?

> that we???re doing pretty unreasonable and inefficient contortions to
> develop new features -- we're not just happily chugging along without
> threads at no cost.
>

From:	Hannu Krosing <hannuk(at)google(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 17:07:48
Message-ID:	CAMT0RQSVMhk3FVDED8GxrKM35dj3z3Zx6TrSV4cZRTrof3xxdg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I discovered this thread from a Twitter post "PostgreSQL will finally
be rewritten in Rust" :)

On Mon, Jun 5, 2023 at 5:18 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Heikki Linnakangas <hlinnaka(at)iki(dot)fi> writes:
> > I spoke with some folks at PGCon about making PostgreSQL multi-threaded,
> > so that the whole server runs in a single process, with multiple
> > threads. It has been discussed many times in the past, last thread on
> > pgsql-hackers was back in 2017 when Konstantin made some experiments [0].
>
> > I feel that there is now pretty strong consensus that it would be a good
> > thing, more so than before. Lots of work to get there, and lots of
> > details to be hashed out, but no objections to the idea at a high level.
>
> > The purpose of this email is to make that silent consensus explicit. If
> > you have objections to switching from the current multi-process
> > architecture to a single-process, multi-threaded architecture, please
> > speak up.
>
> For the record, I think this will be a disaster. There is far too much
> code that will get broken, largely silently, and much of it is not
> under our control.
>
> regards, tom lane
>
>

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Hannu Krosing <hannuk(at)google(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 18:41:00
Message-ID:	20230608184100.xbvis2dtwjrfu754@awork3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2023-06-08 17:02:08 +0200, Hannu Krosing wrote:
> On Thu, Jun 8, 2023 at 4:56 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >
> > On Thu, Jun 8, 2023 at 8:44 AM Hannu Krosing <hannuk(at)google(dot)com> wrote:
> > > > That sounds like a bad idea, dynamic shared memory is more expensive
> > > > to maintain than our static shared memory systems, not in the least
> > > > because DSM is not guaranteed to share the same addresses in each
> > > > process' address space.
> > >
> > > Then this too needs to be fixed
> >
> > Honestly, I'm struggling to respond to this non-sarcastically. I mean,
> > I was the one who implemented DSM. Do you think it works the way that
> > it works because I considered doing something smart and decided to do
> > something dumb instead?
>
> No, I meant that this needs to be fixed at OS level, by being able to
> use the same mapping.
>
> We should not shy away from asking the OS people for adding the useful
> features still missing.

There's a large part of this that is about hardware, not software. And
honestly, for most of the problems the answer is to just use threads. Adding
complexity to operating systems to make odd architectures like postgres'
better is a pretty dubious proposition.

I don't think we have even remotely enough influence on CPU design to make
e.g. *partial* TLB sharing across processes a thing.

> It was mentioned in the Unconference Kernel Hacker AMA talk and said
> kernel hacker works for Oracle, andf they also seemed to be needing
> this :)

The proposals around that don't really help us all that much. Sharing the page
table will be a bit more efficient, but it won't really change anything
dramatically. From what I understand they are primarily interested in
changing properties of a memory mapping across multiple processes, e.g. making
some memory executable and have that reflected in all processes. I don't think
this will help us much.

Greetings,

Andres Freund

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc:	Hannu Krosing <hannuk(at)google(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 18:48:48
Message-ID:	20230608184848.4ejjqjeeicgbnbzb@awork3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2023-06-08 17:55:57 +0200, Matthias van de Meent wrote:
> While I agree that "sharing page tables across processes" is useful,
> it looks like it'd be much more effort to correctly implement for e.g.
> DSM than implementing threading.
> Konstantin's diff is "only" 20.1k lines [0] added and/or modified,
> which is a lot, but it's manageable (13k+ of which are from files that
> were auto-generated and then committed, likely accidentally).

Honestly, I don't think this patch is in a good enough state to allow a
realistic estimation of the overall work. Making global variables TLS is the
*easy* part. Redesigning postmaster, definining how to deal with extension
libraries, extension compatibility, developing tools to make developing a
threaded postgres feasible, dealing with freeing session lifetime memory
allocations that previously were freed via process exit, making the change
realistically reviewable, portability are all much harder.

Greetings,

Andres Freund

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Hannu Krosing <hannuk(at)google(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 18:54:44
Message-ID:	20230608185444.yujqs3ybgjqetoku@awork3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2023-06-08 11:56:13 -0400, Robert Haas wrote:
> On Thu, Jun 8, 2023 at 11:02 AM Hannu Krosing <hannuk(at)google(dot)com> wrote:
> > No, I meant that this needs to be fixed at OS level, by being able to
> > use the same mapping.
> >
> > We should not shy away from asking the OS people for adding the useful
> > features still missing.
> >
> > It was mentioned in the Unconference Kernel Hacker AMA talk and said
> > kernel hacker works for Oracle, andf they also seemed to be needing
> > this :)
>
> Fair enough, but we aspire to work on a bunch of different operating
> systems. To make use of an OS facility, we need something that works
> on at least Linux, Windows, macOS, and a few different BSD flavors.
> It's not as if when the PostgreSQL project asks for a new operating
> system facility everyone springs into action to provide it
> immediately. And even if they did, and even if they all released an
> implementation of whatever we requested next year, it would still be
> at least five, more realistically ten, years before systems with those
> facilities were ubiquitous.

I'm less concerned about this aspect - most won't have upgraded to a version
of postgres that benefit from threaded postgres in a similar timeframe. And if
the benefits are large enough, people will move. But:

> And unless we have truly obscene amounts of clout in the OS community, it's
> likely that all of those different operating systems would implement
> different things to meet the stated need, and then we'd have to have a
> complex bunch of platform-dependent code in order to keep working on all of
> those systems.

And even more likely, they just won't do anything, because it's a model that
large parts of the industry have decided isn't going anywhere. It'd be one
thing if we had 5 kernel devs that we could deploy to work on this, but we
don't. So we have to convince kernel devs employed by others that somehow this
is an urgent enough thing that they should work on it. The likely, imo
justified, answer is just going to be: Fix your architecture, then we can
talk.

Greetings,

Andres Freund

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Ilya Anfimov <ilan(at)tzirechnoy(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 19:10:35
Message-ID:	CA+hUKGLe8mkq_4AQqndkuaW80GVCVKcpw9KMCookce7Ep==wQA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 9, 2023 at 5:02 AM Ilya Anfimov <ilan(at)tzirechnoy(dot)com> wrote:
> Isn't all the memory operations would require nearly the same
> shared memory allocators if someone switches to a threaded imple-
> mentation?

It's true that we'd need concurrency-aware MemoryContext
implementations (details can be debated), but we wouldn't need that
address translation layer, which adds a measurable cost at every
access.

From:	Jose Luis Tallon <jltallon(at)adv-solutions(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 19:30:28
Message-ID:	84dfb317-8873-02ae-8461-f0401d76664b@adv-solutions.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 8/6/23 15:56, Robert Haas wrote:
> Yeah, I've had similar thoughts. I'm not exactly sure what the
> advantages of such a refactoring might be, but the current structure
> feels pretty limiting. It works OK because we don't do anything in the
> postmaster other than fork a new backend, but I'm not sure if that's
> the best strategy. It means, for example, that if there's a ton of new
> connection requests, we're spawning a ton of new processes, which
> means that you can put a lot of load on a PostgreSQL instance even if
> you can't authenticate. Maybe we'd be better off with a pool of
> processes accepting connections; if authentication fails, that
> connection goes back into the pool and tries again.

This. It's limited by connection I/O, hence a perfect use for
threads (minimize per-connection overhead).

IMV, "session state" would be best stored/managed here. Would need a way
to convey it efficiently, though.

> If authentication
> succeeds, either that process transitions to being a regular backend,
> leaving the authentication pool, or perhaps hands off the connection
> to a "real backend" at that point and loops around to accept() the
> next request.

Nicely done by passing the FD around....

But at this point, we'd just get a nice reimplementation of a threaded
connection pool inside Postgres :\

> Whether that's a good ideal in detail or not, the point remains that
> having the postmaster handle this task is quite limiting. It forces us
> to hand off the connection to a new process at the earliest possible
> stage, so that the postmaster remains free to handle other duties.
> Giving the responsibility to another process would let us make
> decisions about where to perform the hand-off based on real
> architectural thought rather than being forced to do a certain way
> because nothing else will work.

At least "tcop" surely feels like belonging in a separate process ....

J.L.

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Hannu Krosing <hannuk(at)google(dot)com>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 19:34:49
Message-ID:	CA+hUKGKgtZubfRJzog=ayMysVCG+QQp0DiecAWJC63Py_S5RrA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 9, 2023 at 4:00 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2023-06-08 12:15:58 +0200, Hannu Krosing wrote:
> > > This part was touched in the "AMA with a Linux Kernale Hacker"
> > > Unconference session where he mentioned that the had proposed a
> > > 'mshare' syscall for this.
>
> As-is that'd just lead to sharing page table, not the TLB. I don't think you
> currently do sharing of the TLB for parts of your address space on x86
> hardware. It's possible that something like that gets added to future
> hardware, but ...

I wasn't in Mathew Wilcox's unconference in Ottawa but I found an
older article on LWN:

https://lwn.net/Articles/895217/

For what it's worth, FreeBSD hackers have studied this topic too (and
it's been done in Android and no doubt other systems before):

https://www.cs.rochester.edu/u/sandhya/papers/ispass19.pdf

I've shared that paper on this list before in the context of
super/huge pages and their benefits (to executable code, and to the
buffer pool), but a second topic in that paper is the idea of a shared
page table: "We find that sharing PTPs across different processes can
reduce execution cycles by as much as 6.9%. Moreover, the combined
effects of using superpages to map the main executable and sharing
PTPs for the small shared libraries can reduce execution cycles up to
18.2%." And that's just part of it, because those guys are more
interested in shared code/libraries and such so that's probably not
even getting to the stuff like buffer pool and DSMs that we might tend
to think of first.

I'm pretty sure PostgreSQL (along with another fork-based RDBMSs
mentioned in this thread) must be one of the worst offenders for page
table bloat, simply because we can have a lot of processes and touch a
lot of memory.

I'm no expert in this stuff, but it seems to be that with shared page
table schemes you can avoid wasting huge amounts of RAM on duplicated
page table entries (pages * processes), and with huge/super pages you
can reduce the number of pages, but AFAIK you still can't escape the
TLB shootdown cost, which is all-or-nothing (PCID level at best). The
only way to avoid TLB shootdowns on context switches is to have
*exactly the same memory map*. Or, as Robert succinctly shouted,
"THREADS".

From:	Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	Tristan Partin <tristan(at)neon(dot)tech>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 19:47:04
Message-ID:	20230608194704.rybienokicvzocui@ddolgov.remote.csb
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On Mon, Jun 05, 2023 at 06:43:54PM +0300, Heikki Linnakangas wrote:
> On 05/06/2023 11:28, Tristan Partin wrote:
> > > # Exposed PIDs
> > >
> > > We expose backend process PIDs to users in a few places.
> > > pg_stat_activity.pid and pg_terminate_backend(), for example. They need
> > > to be replaced, or we can assign a fake PID to each connection when
> > > running in multi-threaded mode.
> >
> > Would it be possible to just transparently slot in the thread ID
> > instead?
>
> Perhaps. It might break applications that use the PID directly with e.g.
> 'kill <PID>', though.

I think things are getting more interesting if some external resource
accounting like cgroups is taking place. From what I know cgroup v2 has
only few controllers that allow threaded granularity, and memory or io
controllers are not part of this list. Since Postgres is doing quite a
lot of different things, sometimes it makes sense to put different
limitations on different types of activity, e.g. to give more priority
to a certain critical internal job on the account of slowing down
backends. In the end it might be complicated or not possible to do that
for individual threads. Such cases are probably not very important from
the high level point of view, but could become an argument when deciding
what should be a process and what should be a thread.

From:	Dave Cramer <davecramer(at)postgres(dot)rocks>
To:	Hannu Krosing <hannuk(at)google(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 19:59:29
Message-ID:	CADK3HHKZYZfE=4LrzPdtsdVXjsPjmLY-Dbq1mb96ALLyA6BEWg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 8 Jun 2023 at 13:08, Hannu Krosing <hannuk(at)google(dot)com> wrote:

> I discovered this thread from a Twitter post "PostgreSQL will finally
> be rewritten in Rust" :)
>

By the time we got around to finishing this, there would be a better
language to write it in.

Dave

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Hannu Krosing <hannuk(at)google(dot)com>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 20:26:34
Message-ID:	20230608202634.huwpe2yjfvsqelxj@awork3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2023-06-09 07:34:49 +1200, Thomas Munro wrote:
> I wasn't in Mathew Wilcox's unconference in Ottawa but I found an
> older article on LWN:
>
> https://lwn.net/Articles/895217/
>
> For what it's worth, FreeBSD hackers have studied this topic too (and
> it's been done in Android and no doubt other systems before):
>
> https://www.cs.rochester.edu/u/sandhya/papers/ispass19.pdf
>
> I've shared that paper on this list before in the context of
> super/huge pages and their benefits (to executable code, and to the
> buffer pool), but a second topic in that paper is the idea of a shared
> page table: "We find that sharing PTPs across different processes can
> reduce execution cycles by as much as 6.9%. Moreover, the combined
> effects of using superpages to map the main executable and sharing
> PTPs for the small shared libraries can reduce execution cycles up to
> 18.2%." And that's just part of it, because those guys are more
> interested in shared code/libraries and such so that's probably not
> even getting to the stuff like buffer pool and DSMs that we might tend
> to think of first.

I've experimented with using huge pages for executable code on linux, and the
benefits are quite noticable:
https://www.postgresql.org/message-id/20221104212126.qfh3yzi7luvyy5d6%40awork3.anarazel.de

I'm a bit dubious that sharing the page table for executable code increase the
benefit that much further in real workloads. I suspect the reason it was
different for the authors of the paper is:

> A fixed number of back-to-back
> transactions are performed on a 5GB database, and we use the
> -C option of pgbench to toggle between reconnecting after
> each transaction (reconnect mode) and using one persistent
> connection per client (persistent connection mode). We use
> the reconnect mode by default unless stated otherwise.

Using -C explains why you'd see a lot of benefit from sharing page tables for
executable code. But I don't think -C is a particularly interesting workload
to optimize for.

> I'm no expert in this stuff, but it seems to be that with shared page
> table schemes you can avoid wasting huge amounts of RAM on duplicated
> page table entries (pages * processes), and with huge/super pages you
> can reduce the number of pages, but AFAIK you still can't escape the
> TLB shootdown cost, which is all-or-nothing (PCID level at best).

Pretty much that. While you can avoid some TLB shootdowns via PCIDs, that only
avoids flushing the TLB, it doesn't help with the TLB hit rate being much
lower due to the number of "redundant" mappings with different PCIDs.

> The only way to avoid TLB shootdowns on context switches is to have *exactly
> the same memory map*. Or, as Robert succinctly shouted, "THREADS".

Greetings,

Andres Freund

From:	Stephan Doliov <stephan(dot)doliov(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Hannu Krosing <hannuk(at)google(dot)com>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-08 23:35:59
Message-ID:	CAFOdmV8_7BdvcdJC3EsiB7ayR0gQFOjtmUmOPTBJ_Y3qCdYN6w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

This is an interesting message thread. I think in regards to the OP's call
to make PG multi-threaded, there should be a clear and identifiable
performance target and use cases for the target. How much performance boost
can be expected, and if so, in which data application context? Will queries
return faster for transactional use cases? analytic use cases? How much
data needs to be stored before one can observe the difference, or better
yet, a difference with a measurable impact on reduced cloud compute costs
as a % of compute cloud costs. I think if you can demonstrate for different
test datasets what those savings amount to you can either find momentum to
pursue it. Beyond that, even with better modern tooling for multi-threaded
development, it's obviously a big lift (may well be worth it!). Some of us
cagey old cats on this list (at least me) still have some work to do to
shed the baggage that previous pain of MT dev has caused us. :-)

Cheers,
Steve

On Thu, Jun 8, 2023 at 1:26 PM Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> On 2023-06-09 07:34:49 +1200, Thomas Munro wrote:
> > I wasn't in Mathew Wilcox's unconference in Ottawa but I found an
> > older article on LWN:
> >
> > https://lwn.net/Articles/895217/
> >
> > For what it's worth, FreeBSD hackers have studied this topic too (and
> > it's been done in Android and no doubt other systems before):
> >
> > https://www.cs.rochester.edu/u/sandhya/papers/ispass19.pdf
> >
> > I've shared that paper on this list before in the context of
> > super/huge pages and their benefits (to executable code, and to the
> > buffer pool), but a second topic in that paper is the idea of a shared
> > page table: "We find that sharing PTPs across different processes can
> > reduce execution cycles by as much as 6.9%. Moreover, the combined
> > effects of using superpages to map the main executable and sharing
> > PTPs for the small shared libraries can reduce execution cycles up to
> > 18.2%." And that's just part of it, because those guys are more
> > interested in shared code/libraries and such so that's probably not
> > even getting to the stuff like buffer pool and DSMs that we might tend
> > to think of first.
>
> I've experimented with using huge pages for executable code on linux, and
> the
> benefits are quite noticable:
>
> https://www.postgresql.org/message-id/20221104212126.qfh3yzi7luvyy5d6%40awork3.anarazel.de
>
> I'm a bit dubious that sharing the page table for executable code increase
> the
> benefit that much further in real workloads. I suspect the reason it was
> different for the authors of the paper is:
>
> > A fixed number of back-to-back
> > transactions are performed on a 5GB database, and we use the
> > -C option of pgbench to toggle between reconnecting after
> > each transaction (reconnect mode) and using one persistent
> > connection per client (persistent connection mode). We use
> > the reconnect mode by default unless stated otherwise.
>
> Using -C explains why you'd see a lot of benefit from sharing page tables
> for
> executable code. But I don't think -C is a particularly interesting
> workload
> to optimize for.
>
>
> > I'm no expert in this stuff, but it seems to be that with shared page
> > table schemes you can avoid wasting huge amounts of RAM on duplicated
> > page table entries (pages * processes), and with huge/super pages you
> > can reduce the number of pages, but AFAIK you still can't escape the
> > TLB shootdown cost, which is all-or-nothing (PCID level at best).
>
> Pretty much that. While you can avoid some TLB shootdowns via PCIDs, that
> only
> avoids flushing the TLB, it doesn't help with the TLB hit rate being much
> lower due to the number of "redundant" mappings with different PCIDs.
>
>
> > The only way to avoid TLB shootdowns on context switches is to have
> *exactly
> > the same memory map*. Or, as Robert succinctly shouted, "THREADS".
>
> +1
>
> Greetings,
>
> Andres Freund
>
>
>

From:	Dave Cramer <davecramer(at)postgres(dot)rocks>
To:	Stephan Doliov <stephan(dot)doliov(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Hannu Krosing <hannuk(at)google(dot)com>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-09 15:19:56
Message-ID:	CADK3HH+oqOTYTk9+29O3XFosueragbqv36RupC+mG34XWpX0iw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

This is somewhat orthogonal to the topic of threading but relevant to the
use of resources.

If we are going to undertake some hard problems perhaps we should be
looking at other problems that solve other long term issues before we
commit to spending resources on changing the process model.

One thing I can think of is upgrading. AFAIK dump and restore is the only
way to change the on disk format.
Presuming that eventually we will be forced to change the on disk format it
would be nice to be able to do so in a manner which does not force long
down times

Dave

>
>>

From:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To:	Dave Cramer <davecramer(at)postgres(dot)rocks>
Cc:	Stephan Doliov <stephan(dot)doliov(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Hannu Krosing <hannuk(at)google(dot)com>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-09 15:53:52
Message-ID:	CAEze2WiNweEqUALCX2-Xb1m+XAakLBiBSztiTScv2OTw4DOW8Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 9 Jun 2023 at 17:20, Dave Cramer <davecramer(at)postgres(dot)rocks> wrote:
>
> This is somewhat orthogonal to the topic of threading but relevant to the use of resources.
>
> If we are going to undertake some hard problems perhaps we should be looking at other problems that solve other long term issues before we commit to spending resources on changing the process model.

-1. This and that are orthogonal and effort in one does not need to
block the other. If someone is willing to put in the effort, let them.
Last time I checked we, as a project, are not blocking bugfixes for
new features in MAIN either (or vice versa).

> One thing I can think of is upgrading. AFAIK dump and restore is the only way to change the on disk format.
> Presuming that eventually we will be forced to change the on disk format it would be nice to be able to do so in a manner which does not force long down times

I agree that we should improve our upgrade process (and we had a great
discussion on the topic at the PGCon Unconference last week), but in
my view that's not relevant to this discussion.

Kind regards,

Matthias van de Meent
Neon, Inc.

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Dave Cramer <davecramer(at)postgres(dot)rocks>
Cc:	Stephan Doliov <stephan(dot)doliov(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Hannu Krosing <hannuk(at)google(dot)com>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-09 22:29:24
Message-ID:	ZIOnxKN6L1cRPPO2@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greetings,

* Dave Cramer (davecramer(at)postgres(dot)rocks) wrote:
> One thing I can think of is upgrading. AFAIK dump and restore is the only
> way to change the on disk format.
> Presuming that eventually we will be forced to change the on disk format it
> would be nice to be able to do so in a manner which does not force long
> down times

There is an ongoing effort moving in this direction. The $subject isn't
great, but this patch set (which we are currently working on
updating...): https://commitfest.postgresql.org/43/3986/ attempts
changing a lot of currently compile-time block-size pieces to be
run-time which would open up the possibility to have a different page
format for, eg, different tablespaces. Possibly even different block
sizes. We'd certainly welcome discussion from others who are
interested.

Thanks,

Stephen

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-09 23:55:16
Message-ID:	ZIO75HcreqMg0iCt@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 7, 2023 at 06:38:38PM +0530, Ashutosh Bapat wrote:
> With multiple processes, we can use all the available cores (at least
> theoretically if all those processes are independent). But is that
> guaranteed with single process multi-thread model? Google didn't throw
> any definitive answer to that. Usually it depends upon the OS and
> architecture.
>
> Maybe a good start is to start using threads instead of parallel
> workers e.g. for parallel vacuum, parallel query and so on while
> leaving the processes for connections and leaders. that itself might
> take significant time. Based on that experience move to a completely
> threaded model. Based on my experience with other similar products, I
> think we will settle on a multi-process multi-thread model.

I think we have a few known problem that we might be able to solve
without threads, but can help us eventually move to threads if we find
it useful:

1) Use threads for background workers rather than processes
2) Allow sessions to be stopped and started by saving their state

Ideally we would solve the problem of making shared structures
resizable, but I am not sure how that can be easily done without
threads.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Jeremy Schneider <schneider(at)ardentperf(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Thomas Kellerer <shammat(at)gmx(dot)net>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-10 00:23:08
Message-ID:	ZIPCbJaeNHvPTpBe@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 8, 2023 at 11:37:00AM +1200, Thomas Munro wrote:
> It's old, but this describes the 4 main models and which well known
> RDBMSes use them in section 2.3:
>
> https://dsf.berkeley.edu/papers/fntdb07-architecture.pdf
>
> TL;DR DB2 is the winner, it can do process-per-connection,
> thread-per-connection, process-pool or thread-pool.
>
> I understand this thread to be about thread-per-connection (= backend,
> session, socket) for now.

I am quite confused that few people seem to care about which model,
processes or threads, is better for Oracle, and how having both methods
available can be a reasonable solution to maintain. Someone suggested
they abstracted the differences so the maintenance burden was minor, but
that seems very hard to me.

Did these vendors start with processes, add threads, and then find that
threads had downsides so they had to keep both?

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

From:	Dave Cramer <davecramer(at)postgres(dot)rocks>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Stephan Doliov <stephan(dot)doliov(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Hannu Krosing <hannuk(at)google(dot)com>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-10 11:20:53
Message-ID:	CADK3HH+uWLr-C-iOhPs7LSx9M36wjtpA+x=EGDHiMWvBt_eNXQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 9 Jun 2023 at 18:29, Stephen Frost <sfrost(at)snowman(dot)net> wrote:

> Greetings,
>
> * Dave Cramer (davecramer(at)postgres(dot)rocks) wrote:
> > One thing I can think of is upgrading. AFAIK dump and restore is the only
> > way to change the on disk format.
> > Presuming that eventually we will be forced to change the on disk format
> it
> > would be nice to be able to do so in a manner which does not force long
> > down times
>
> There is an ongoing effort moving in this direction. The $subject isn't
> great, but this patch set (which we are currently working on
> updating...): https://commitfest.postgresql.org/43/3986/ attempts
> changing a lot of currently compile-time block-size pieces to be
> run-time which would open up the possibility to have a different page
> format for, eg, different tablespaces. Possibly even different block
> sizes. We'd certainly welcome discussion from others who are
> interested.
>
> Thanks,
>
> Stephen
>

Upgrading was just one example of difficult problems that need to be
addressed.
My thought was that before we commit to something as potentially resource
intensive as changing the threading model we compile a list of other "big
issues" and prioritize.

I realize open source is more of a scratch your itch kind of development
model, but I'm not convinced the random walk that entails is the
appropriate way to move forward. At the very least I'd like us to question
it.
Dave

From:	Hannu Krosing <hannuk(at)google(dot)com>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-10 18:01:47
Message-ID:	CAMT0RQR+NLJw6hmN88D7RPe=vUEr7DB52dKmS9tWPO_eX2Np0g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 5, 2023 at 4:52 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> If there are no major objections, I'm going to update the developer FAQ,
> removing the excuses there for why we don't use threads [1].

I think it is not wise to start the wholesale removal of the objections there.

But I think it is worthwhile to revisit the section about threads and
maybe split out the historic part which is no more true, and provide
both pros and cons for these.

I started with this short summary from the discussion in this thread,
feel free to expand, argue, fix :)
* is current excuse
-- is counterargument or ack
----------------
As an example, threads are not yet used instead of multiple processes
for backends because:
* Historically, threads were poorly supported and buggy.
-- yes they were, not relevant now when threads are well-supported and non-buggy

* An error in one backend can corrupt other backends if they're
threads within a single process
-- still valid for silent corruption
-- for detected crash - yes, but we are restarting all backends in
case of crash anyway.

* Speed improvements using threads are small compared to the remaining
backend startup time.
-- we now have some measurements that show significant performance
improvements not related to startup time

* The backend code would be more complex.
-- this is still the case
-- even more worrisome is that all extensions also need to be rewritten
-- and many incompatibilities will be silent and take potentially years to find

* Terminating backend processes allows the OS to cleanly and quickly
free all resources, protecting against memory and file descriptor
leaks and making backend shutdown cheaper and faster
-- still true

* Debugging threaded programs is much harder than debugging worker
processes, and core dumps are much less useful
-- this was countered by claiming that
-- by now we have reasonable debugger support for threads
-- there is no direct debugger support for debugging the exact
system set up like PostgreSQL processes + shared memory

* Sharing of read-only executable mappings and the use of
shared_buffers means processes, like threads, are very memory
efficient
-- this seems to say that the current process model is as good as threads ?
-- there were a few counterarguments
-- per-backend virtual memory mapping can add up to significant
amount of extra RAM usage
-- the discussion did not yet touch various per-backend caches
(pg_catalog cache, statement cache) which are arguably easier to
implement in threaded model
-- TLB reload at each process switch is expensive and would be
mostly avoided in case of threads

* Regular creation and destruction of processes helps protect against
memory fragmentation, which can be hard to manage in long-running
processes
-- probably still true
-------------------------------------

From:	James Addison <jay(at)jp-hosting(dot)net>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-10 22:53:24
Message-ID:	CALDQ5NzJjwGbearspLQ-ewEMe1Z42BRUgWTr7kDyXgBge2TOPQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I don't have an objection, but I do wonder: can one (or perhaps a few)
queries/workloads be provided where threading would be significantly
beneficial?

(some material there could help get people on-board with the idea and
potentially guide many of the smaller questions that arise along the
way)

On Mon, 5 Jun 2023 at 15:52, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> I spoke with some folks at PGCon about making PostgreSQL multi-threaded,
> so that the whole server runs in a single process, with multiple
> threads. It has been discussed many times in the past, last thread on
> pgsql-hackers was back in 2017 when Konstantin made some experiments [0].
>
> I feel that there is now pretty strong consensus that it would be a good
> thing, more so than before. Lots of work to get there, and lots of
> details to be hashed out, but no objections to the idea at a high level.
>
> The purpose of this email is to make that silent consensus explicit. If
> you have objections to switching from the current multi-process
> architecture to a single-process, multi-threaded architecture, please
> speak up.
>
> If there are no major objections, I'm going to update the developer FAQ,
> removing the excuses there for why we don't use threads [1]. And we can
> start to talk about the path to get there. Below is a list of some
> hurdles and proposed high-level solutions. This isn't an exhaustive
> list, just some of the most obvious problems:
>
> # Transition period
>
> The transition surely cannot be done fully in one release. Even if we
> could pull it off in core, extensions will need more time to adapt.
> There will be a transition period of at least one release, probably
> more, where you can choose multi-process or multi-thread model using a
> GUC. Depending on how it goes, we can document it as experimental at first.
>
> # Thread per connection
>
> To get started, it's most straightforward to have one thread per
> connection, just replacing backend process with a backend thread. In the
> future, we might want to have a thread pool with some kind of a
> scheduler to assign active queries to worker threads. Or multiple
> threads per connection, or spawn additional helper threads for specific
> tasks. But that's future work.
>
> # Global variables
>
> We have a lot of global and static variables:
>
> $ objdump -t bin/postgres | grep -e "\.data" -e "\.bss" | grep -v
> "data.rel.ro" | wc -l
> 1666
>
> Some of them are pointers to shared memory structures and can stay as
> they are. But many of them are per-connection state. The most
> straightforward conversion for those is to turn them into thread-local
> variables, like Konstantin did in [0].
>
> It might be good to have some kind of a Session context struct that we
> pass everywhere, or maybe have a single thread-local variable to hold
> it. Many of the global variables would become fields in the Session. But
> that's future work.
>
> # Extensions
>
> A lot of extensions also contain global variables or other things that
> break in a multi-threaded environment. We need a way to label extensions
> that support multi-threading. And in the future, also extensions that
> *require* a multi-threaded server.
>
> Let's add flags to the control file to mark if the extension is
> thread-safe and/or process-safe. If you try to load an extension that's
> not compatible with the server's mode, throw an error.
>
> We might need new functions in addition _PG_init, called at connection
> startup and shutdown. And background worker API probably needs some changes.
>
> # Exposed PIDs
>
> We expose backend process PIDs to users in a few places.
> pg_stat_activity.pid and pg_terminate_backend(), for example. They need
> to be replaced, or we can assign a fake PID to each connection when
> running in multi-threaded mode.
>
> # Signals
>
> We use signals for communication between backends. SIGURG in latches,
> and SIGUSR1 in procsignal, for example. Those primitives need to be
> rewritten with some other signalling mechanism in multi-threaded mode.
> In principle, it's possible to set per-thread signal handlers, and send
> a signal to a particular thread (pthread_kill), but I think it's better
> to just rewrite them.
>
> We also document that you can send SIGINT, SIGTERM or SIGHUP to an
> individual backend process. I think we need to deprecate that, and maybe
> come up with some convenient replacement. E.g. send a message with
> backend ID to a unix domain socket, and a new pg_kill executable to send
> those messages.
>
> # Restart on crash
>
> If a backend process crashes, postmaster terminates all other backends
> and restarts the system. That's hard (impossible?) to do safely if
> everything runs in one process. We can continue have a separate
> postmaster process that just monitors the main process and restarts it
> on crash.
>
> # Thread-safe libraries
>
> Need to switch to thread-safe versions of library functions, e.g.
> uselocale() instead of setlocale().
>
> The Python interpreter has a Global Interpreter Lock. It's not possible
> to create two completely independent Python interpreters in the same
> process, there will be some lock contention on the GIL. Fortunately, the
> python community just accepted https://peps.python.org/pep-0684/. That's
> exactly what we need: it makes it possible for separate interpreters to
> have their own GILs. It's not clear to me if that's in Python 3.12
> already, or under development for some future version, but by the time
> we make the switch in Postgres, there probably will be a solution in
> cpython.
>
> At a quick glance, I think perl and TCL are fine, you can have multiple
> interpreters in one process. Need to check any other libraries we use.
>
>
> [0]
> https://www.postgresql.org/message-id/flat/9defcb14-a918-13fe-4b80-a0b02ff85527%40postgrespro.ru
>
> [1]
> https://wiki.postgresql.org/wiki/Developer_FAQ#Why_don.27t_you_use_raw_devices.2C_async-I.2FO.2C_.3Cinsert_your_favorite_wizz-bang_feature_here.3E.3F
>
> --
> Heikki Linnakangas
> Neon (https://neon.tech)
>
>

From:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To:	Hannu Krosing <hannuk(at)google(dot)com>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-12 04:01:17
Message-ID:	CAFiTN-vJqo4TSBpkQTJqhYz6CL0M=cPhQZUXnop1uDC47s2hBg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jun 10, 2023 at 11:32 PM Hannu Krosing <hannuk(at)google(dot)com> wrote:
>
> On Mon, Jun 5, 2023 at 4:52 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> >
> > If there are no major objections, I'm going to update the developer FAQ,
> > removing the excuses there for why we don't use threads [1].
>
> I think it is not wise to start the wholesale removal of the objections there.
>
> But I think it is worthwhile to revisit the section about threads and
> maybe split out the historic part which is no more true, and provide
> both pros and cons for these.
>
> I started with this short summary from the discussion in this thread,
> feel free to expand, argue, fix :)
> * is current excuse
> -- is counterargument or ack
> ----------------
> As an example, threads are not yet used instead of multiple processes
> for backends because:
> * Historically, threads were poorly supported and buggy.
> -- yes they were, not relevant now when threads are well-supported and non-buggy
>
> * An error in one backend can corrupt other backends if they're
> threads within a single process
> -- still valid for silent corruption
> -- for detected crash - yes, but we are restarting all backends in
> case of crash anyway.
>
> * Speed improvements using threads are small compared to the remaining
> backend startup time.
> -- we now have some measurements that show significant performance
> improvements not related to startup time
>
> * The backend code would be more complex.
> -- this is still the case
> -- even more worrisome is that all extensions also need to be rewritten
> -- and many incompatibilities will be silent and take potentially years to find
>
> * Terminating backend processes allows the OS to cleanly and quickly
> free all resources, protecting against memory and file descriptor
> leaks and making backend shutdown cheaper and faster
> -- still true
>
> * Debugging threaded programs is much harder than debugging worker
> processes, and core dumps are much less useful
> -- this was countered by claiming that
> -- by now we have reasonable debugger support for threads
> -- there is no direct debugger support for debugging the exact
> system set up like PostgreSQL processes + shared memory
>
> * Sharing of read-only executable mappings and the use of
> shared_buffers means processes, like threads, are very memory
> efficient
> -- this seems to say that the current process model is as good as threads ?
> -- there were a few counterarguments
> -- per-backend virtual memory mapping can add up to significant
> amount of extra RAM usage
> -- the discussion did not yet touch various per-backend caches
> (pg_catalog cache, statement cache) which are arguably easier to
> implement in threaded model
> -- TLB reload at each process switch is expensive and would be
> mostly avoided in case of threads

I think it is worth mentioning that parallel worker infrastructure
will be simplified with threaded models e.g. 'parallel query', and
'parallel vacuum'.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

From:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To:	Dave Cramer <davecramer(at)postgres(dot)rocks>, Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Stephan Doliov <stephan(dot)doliov(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Hannu Krosing <hannuk(at)google(dot)com>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-12 11:53:13
Message-ID:	fdf09cde-3def-0079-9d37-b008b7e61d7d@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 6/10/23 13:20, Dave Cramer wrote:
>
>
> On Fri, 9 Jun 2023 at 18:29, Stephen Frost <sfrost(at)snowman(dot)net
> <mailto:sfrost(at)snowman(dot)net>> wrote:
>
> Greetings,
>
> * Dave Cramer (davecramer(at)postgres(dot)rocks) wrote:
> > One thing I can think of is upgrading. AFAIK dump and restore is
> the only
> > way to change the on disk format.
> > Presuming that eventually we will be forced to change the on disk
> format it
> > would be nice to be able to do so in a manner which does not force
> long
> > down times
>
> There is an ongoing effort moving in this direction. The $subject isn't
> great, but this patch set (which we are currently working on
> updating...): https://commitfest.postgresql.org/43/3986/
> <https://commitfest.postgresql.org/43/3986/> attempts
> changing a lot of currently compile-time block-size pieces to be
> run-time which would open up the possibility to have a different page
> format for, eg, different tablespaces. Possibly even different block
> sizes. We'd certainly welcome discussion from others who are
> interested.
>
> Thanks,
>
> Stephen
>
>
> Upgrading was just one example of difficult problems that need to be
> addressed. My thought was that before we commit to something as
> potentially resource intensive as changing the threading model we
> compile a list of other "big issues" and prioritize.
>

I doubt anyone expects the community to commit to the threading switch
in this sense - drop everything else and just start working on this
(pretty massive) change. Not going to happen.

> I realize open source is more of a scratch your itch kind of development
> model, but I'm not convinced the random walk that entails is the
> appropriate way to move forward. At the very least I'd like us to
> question it.

I may be missing something, but it's not clear to me whether you argue
for the open source approach or against it. I personally think it's
perfectly fine for people to work on scratching their itch and focus on
stuff that yields value to them (or their customers).

And I think the only way to succeed at the threading switch is within
this very framework - split it into (much) smaller steps that are
beneficial on their own and scratch some other itch.

For example, we have issues with large number of connections and we've
discussed stuff like built-in connection pooling etc. for a very long
time (including this thread). But we have session state in various
places in process private memory, which makes it borderline impossible
and thus we don't have anything built-in. IIUC the threading would needs
to isolate/define the session state anyway, so perhaps it could do it in
a way that'd also work for the connection pooling (with processes)?

Which would mean this particular change is immediately beneficial even
without the threading switch (which I'd expect to take considerable
amount of time).

In a way, I think this "split into independently beneficial steps"
strategy is the only option with a meaningful chance of success.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	"Joel Jacobson" <joel(at)compiler(dot)org>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-12 12:13:48
Message-ID:	97f59427-a974-4dac-99b2-d6e9d240fb56@app.fastmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 12, 2023, at 13:53, Tomas Vondra wrote:
> In a way, I think this "split into independently beneficial steps"
> strategy is the only option with a meaningful chance of success.

/Joel

From:	Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>
To:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc:	Hannu Krosing <hannuk(at)google(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-12 12:23:14
Message-ID:	CALT9ZEH_ZT6Fv8KFmEPf2qM7g0Y0mdU-J4PQDVRN+vhn0CLSOQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Is the following true or not?

1. If we switch processes to threads but leave the amount of session
local variables unchanged, there would be hardly any performance gain.
2. If we move some backend's local variables into shared memory then
the performance gain would be very near to what we get with threads
having equal amount of session-local variables.

In other words, the overall goal in principle is to gain from less
memory copying wherever it doesn't add the burden of locks for
concurrent variables access?

Regards,
Pavel Borisov,
Supabase

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Hannu Krosing <hannuk(at)google(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-12 19:24:30
Message-ID:	20230612192430.kfffuxfh4bzjzgez@awork3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2023-06-12 16:23:14 +0400, Pavel Borisov wrote:
> Is the following true or not?
>
> 1. If we switch processes to threads but leave the amount of session
> local variables unchanged, there would be hardly any performance gain.

False.

> 2. If we move some backend's local variables into shared memory then
> the performance gain would be very near to what we get with threads
> having equal amount of session-local variables.

False.

> In other words, the overall goal in principle is to gain from less
> memory copying wherever it doesn't add the burden of locks for
> concurrent variables access?

False.

Those points seems pretty much unrelated to the potential gains from switching
to a threading model. The main advantages are:

1) We'd gain from being able to share state more efficiently (using normal
pointers) and more dynamically (not needing to pre-allocate). That'd remove
a good amount of complexity. As an example, consider the work we need to do
to ferry tuples from one process to another. Even if we just continue to
use shm_mq, in a threading world we could just put a pointer in the queue,
but have the tuple data be shared between the processes etc.

Eventually this could include removing the 1:1 connection<->process/thread
model. That's possible to do with processes as well, but considerably
harder.

2) Making context switches cheaper / sharing more resources at the OS and
hardware level.

Greetings,

Andres Freund

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	Hannu Krosing <hannuk(at)google(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-12 21:17:15
Message-ID:	879b3b32-e936-ee00-69d4-1ad6a8ea30e8@iki.fi
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/06/2023 21:01, Hannu Krosing wrote:
> On Mon, Jun 5, 2023 at 4:52 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>>
>> If there are no major objections, I'm going to update the developer FAQ,
>> removing the excuses there for why we don't use threads [1].
>
> I think it is not wise to start the wholesale removal of the objections there.
>
> But I think it is worthwhile to revisit the section about threads and
> maybe split out the historic part which is no more true, and provide
> both pros and cons for these.

> I started with this short summary from the discussion in this thread,
> feel free to expand, argue, fix :)
> * is current excuse
> -- is counterargument or ack

Thanks, that's a good idea.

> * Speed improvements using threads are small compared to the remaining
> backend startup time.
> -- we now have some measurements that show significant performance
> improvements not related to startup time

Also, I don't expect much performance gain directly from switching to
threads. The point is that switching to a multi-threaded model makes
possible, or at least greatly simplifies, a lot of other development.
Which can then help with the backend startup time, among other things.
For example, a shared catalog cache.

> * The backend code would be more complex.
> -- this is still the case

I don't quite buy that. A multi-threaded model isn't inherently more
complex than a multi-process model. Just different. Sure, the transition
period will be more complex, when we need to support both models. But in
the long run, if we can remove the multi-process mode, we can make a lot
of things *simpler*.

> -- even more worrisome is that all extensions also need to be rewritten

"rewritten" is an exaggeration. Yes, extensions will need adapt, similar
to the core code. But I hope it will be pretty mechanical work, marking
global variables as thread-local and such. Many extensions will work
with little to no changes.

> -- and many incompatibilities will be silent and take potentially years to find

IMO this is the most scary part of all this. I'm optimistic that we can
have enough compiler support and tooling to catch most issues. But we
don't know for sure at this point.

> * Terminating backend processes allows the OS to cleanly and quickly
> free all resources, protecting against memory and file descriptor
> leaks and making backend shutdown cheaper and faster
> -- still true

Yep. I'm not too worried about PostgreSQL code, our memory contexts and
resource owners are very good at stopping leaks. But 3rd party libraries
could pose hard problems. IIRC we still have a leak with the LLVM JIT
code, for example. We should fix that anyway, of course, but the
multi-process model is more forgiving with leaks like that.

--
Heikki Linnakangas
Neon (https://neon.tech)

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Hannu Krosing <hannuk(at)google(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-12 22:24:04
Message-ID:	ZIebBIgkZVMd2Fbc@paquier.xyz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 12, 2023 at 12:24:30PM -0700, Andres Freund wrote:
> Those points seems pretty much unrelated to the potential gains from switching
> to a threading model. The main advantages are:
>
> 1) We'd gain from being able to share state more efficiently (using normal
> pointers) and more dynamically (not needing to pre-allocate). That'd remove
> a good amount of complexity. As an example, consider the work we need to do
> to ferry tuples from one process to another. Even if we just continue to
> use shm_mq, in a threading world we could just put a pointer in the queue,
> but have the tuple data be shared between the processes etc.
>
> Eventually this could include removing the 1:1 connection<->process/thread
> model. That's possible to do with processes as well, but considerably
> harder.
>
> 2) Making context switches cheaper / sharing more resources at the OS and
> hardware level.

Yes. FWIW, while reading the thread, parallel workers stroke me as
the first area that would benefit from all that. Could it be easier
to figure out the incremental pieces if working on a new node doing a
Gather based on threads, for instance?
--
Michael

From:	Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
To:	Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc:	Hannu Krosing <hannuk(at)google(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-13 06:55:36
Message-ID:	2c2665d2-c513-c12e-9097-9b1805bc2471@garret.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12.06.2023 3:23 PM, Pavel Borisov wrote:
> Is the following true or not?
>
> 1. If we switch processes to threads but leave the amount of session
> local variables unchanged, there would be hardly any performance gain.
> 2. If we move some backend's local variables into shared memory then
> the performance gain would be very near to what we get with threads
> having equal amount of session-local variables.
>
> In other words, the overall goal in principle is to gain from less
> memory copying wherever it doesn't add the burden of locks for
> concurrent variables access?
>
> Regards,
> Pavel Borisov,
> Supabase
>
>
IMHO both statements are not true.
Switching to threads will cause less context switch overhead (because
all threads are sharing the same memory space and so preserve TLB.
How big will be this advantage? In my prototype I got ~10%. But may be
it is possible to fin workloads when it is larger.

Postgres backend is "thick" not because of large number of local variables.
It is because of local caches: catalog cache, relation cache, prepared
statements cache,...
If they are not rewritten, then backend still may consume a lot of
memory even if it will be thread rather then process.
But threads simplify development of global caches, although it can be
done with DSM.

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	knizhnik(at)garret(dot)ru
Cc:	pashkin(dot)elfe(at)gmail(dot)com, dilipbalaut(at)gmail(dot)com, hannuk(at)google(dot)com, hlinnaka(at)iki(dot)fi, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-13 07:55:12
Message-ID:	20230613.165512.2091685398843624399.horikyota.ntt@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Tue, 13 Jun 2023 09:55:36 +0300, Konstantin Knizhnik <knizhnik(at)garret(dot)ru> wrote in
> Postgres backend is "thick" not because of large number of local
> variables.
> It is because of local caches: catalog cache, relation cache, prepared
> statements cache,...
> If they are not rewritten, then backend still may consume a lot of
> memory even if it will be thread rather then process.
> But threads simplify development of global caches, although it can be
> done with DSM.

With the process model, that local stuff are flushed out upon
reconnection. If we switch to the thread model, we will need an
expiration mechanism for those stuff.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	pashkin(dot)elfe(at)gmail(dot)com, dilipbalaut(at)gmail(dot)com, hannuk(at)google(dot)com, hlinnaka(at)iki(dot)fi, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-13 08:20:56
Message-ID:	5b0e3201-e800-5dfc-a8c8-207a95a91e05@garret.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 13.06.2023 10:55 AM, Kyotaro Horiguchi wrote:
> At Tue, 13 Jun 2023 09:55:36 +0300, Konstantin Knizhnik <knizhnik(at)garret(dot)ru> wrote in
>> Postgres backend is "thick" not because of large number of local
>> variables.
>> It is because of local caches: catalog cache, relation cache, prepared
>> statements cache,...
>> If they are not rewritten, then backend still may consume a lot of
>> memory even if it will be thread rather then process.
>> But threads simplify development of global caches, although it can be
>> done with DSM.
> With the process model, that local stuff are flushed out upon
> reconnection. If we switch to the thread model, we will need an
> expiration mechanism for those stuff.

We already have invalidation mechanism. It will be also used in case of
shared cache, but we do not need to send invalidations to all backends.
I do not completely understand your point.
Right now caches (for example catalog cache) is not limited at all.
So if you have very large database schema, then this cache will consume
a lot of memory (multiplied by number of
backends). The fact that it is flushed out upon reconnection can not
help much: what if backends are not going to disconnect?

In case of shared cache we will have to address the same problem:
whether this cache should be limited (with some replacement discipline
as LRU).
Or it is unlimited. In case of shared cache, size of the cache is less
critical because it is not multiplied by number of backends.
So we can assume that catalog and relation cache should always fir in
memory (otherwise significant rewriting of all Postgtres code working
with relations will be needed).

But Postgres also have temporary tables. For them we may need local
backend cache in any case.
Global temp table patch was not approved so we still have to deal with
this awful temp tables.

In any case I do not understand why do we need some expiration mechanism
for this caches.
If there is some relation than information about this relation should be
kept in the cache as long as this relation is alive.
If there is not enough memory to cache information about all relations,
then we may need some replacement algorithm.
But I do not think that there is any sense to remove some item fro the
cache just because it is too old.

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	knizhnik(at)garret(dot)ru
Cc:	pashkin(dot)elfe(at)gmail(dot)com, dilipbalaut(at)gmail(dot)com, hannuk(at)google(dot)com, hlinnaka(at)iki(dot)fi, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-13 08:46:58
Message-ID:	20230613.174658.548424684295647548.horikyota.ntt@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Tue, 13 Jun 2023 11:20:56 +0300, Konstantin Knizhnik <knizhnik(at)garret(dot)ru> wrote in
>
>
> On 13.06.2023 10:55 AM, Kyotaro Horiguchi wrote:
> > At Tue, 13 Jun 2023 09:55:36 +0300, Konstantin Knizhnik
> > <knizhnik(at)garret(dot)ru> wrote in
> >> Postgres backend is "thick" not because of large number of local
> >> variables.
> >> It is because of local caches: catalog cache, relation cache, prepared
> >> statements cache,...
> >> If they are not rewritten, then backend still may consume a lot of
> >> memory even if it will be thread rather then process.
> >> But threads simplify development of global caches, although it can be
> >> done with DSM.
> > With the process model, that local stuff are flushed out upon
> > reconnection. If we switch to the thread model, we will need an
> > expiration mechanism for those stuff.
>
> We already have invalidation mechanism. It will be also used in case
> of shared cache, but we do not need to send invalidations to all
> backends.

Invalidation is not expiration.

> I do not completely understand your point.
> Right now caches (for example catalog cache) is not limited at all.
> So if you have very large database schema, then this cache will
> consume a lot of memory (multiplied by number of
> backends). The fact that it is flushed out upon reconnection can not
> help much: what if backends are not going to disconnect?

Right now, if one out of many backends creates a huge system catalog
cahce, it can be cleard upon disconnection. The same client can
repeat this process, but users can ensure such situations don't
persist. However, with the thread model, we won't be able to clear
parts of the cache that aren't required by the active backends
anymore. (Of course with threads, we can avoid duplications, though.)

> In case of shared cache we will have to address the same problem:
> whether this cache should be limited (with some replacement discipline
> as LRU).
> Or it is unlimited. In case of shared cache, size of the cache is less
> critical because it is not multiplied by number of backends.

Yes.

> So we can assume that catalog and relation cache should always fir in
> memory (otherwise significant rewriting of all Postgtres code working
> with relations will be needed).

I'm not sure that is ture.. But likely to be?

> But Postgres also have temporary tables. For them we may need local
> backend cache in any case.
> Global temp table patch was not approved so we still have to deal with
> this awful temp tables.
>
> In any case I do not understand why do we need some expiration
> mechanism for this caches.

I don't think it is efficient that PostgreSQL to consume a large
amount of memory for seldom-used content. While we may not need
expiration mechanism for moderate use cases, I have observed instances
where a single process hogs a significant amount of memory,
particularly for intermittent tasks.

> If there is some relation than information about this relation should
> be kept in the cache as long as this relation is alive.
> If there is not enough memory to cache information about all
> relations, then we may need some replacement algorithm.
> But I do not think that there is any sense to remove some item fro the
> cache just because it is too old.

Ah. I see. I am fine with a replacement mechanishm. But the evicition
algorithm seems almost identical to the exparation algorithm. The
algorithm will not be simply driven by object age, but I'm not sure we
need more than access frequency.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Andreas Karlsson <andreas(at)proxel(dot)se>
To:	Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	pashkin(dot)elfe(at)gmail(dot)com, dilipbalaut(at)gmail(dot)com, hannuk(at)google(dot)com, hlinnaka(at)iki(dot)fi, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-13 10:05:48
Message-ID:	40b563bb-f4f1-9255-3c66-44c4fbcfd07f@proxel.se
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 6/13/23 10:20, Konstantin Knizhnik wrote:
> The fact that it is flushed out upon reconnection can not
> help much: what if backends are not going to disconnect?

This is why many connection pools have a maximum connection lifetime
which can be configured. So in practice flushing all caches on
disconnect helps a lot.

The nice proper solution might very well be adding a maximum cache sizes
and replacement but it obviously makes the cache more complex and adds
an new GUC. Probably worth it, but flushing caches on disconnect is a
simple solution which works well in practice for many but no all workloads.

Andreas

From:	Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	pashkin(dot)elfe(at)gmail(dot)com, dilipbalaut(at)gmail(dot)com, hannuk(at)google(dot)com, hlinnaka(at)iki(dot)fi, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-14 05:46:05
Message-ID:	d61b6b69-31b0-e8c3-b44e-122543a4ddb3@garret.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 13.06.2023 11:46 AM, Kyotaro Horiguchi wrote:
> So we can assume that catalog and relation cache should always fit in
> memory
>> memory (otherwise significant rewriting of all Postgtres code working
>> with relations will be needed).
> I'm not sure that is ture.. But likely to be?

Sorry, looks like I was wrong.
Right now access to sys/cat/rel caches is protected by reference counter.
So we can easily add some replacement algorithm for this caches.

> I don't think it is efficient that PostgreSQL to consume a large
> amount of memory for seldom-used content. While we may not need
> expiration mechanism for moderate use cases, I have observed instances
> where a single process hogs a significant amount of memory,
> particularly for intermittent tasks.

Usually system catalog is small enough and do not cause any problems
with memory consumption.
But partitioned and temporary tables can cause bloat of catalog.
In such cases some eviction mechanism will be really useful.
But I do not think that it is somehow related with using threads instead
of process.
The question whether to use private or shared cache is not directly
related to threads vs. process choice.
Yes, threads makes implementation of shared cache much easier. But it
can be also done using dynamic
memory segments, Definitely shared cache has its pros and cons, first if
all it requires sycnhronization
which may have negative impact o performance.

I have made an attempt to combine both caches: use relatively small
per-backend local cache
and large shared cache.
I wonder what people think about the idea to make backends less thick by
using shared cache.

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	knizhnik(at)garret(dot)ru
Cc:	pashkin(dot)elfe(at)gmail(dot)com, dilipbalaut(at)gmail(dot)com, hannuk(at)google(dot)com, hlinnaka(at)iki(dot)fi, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-14 07:01:33
Message-ID:	20230614.160133.1540361929672513850.horikyota.ntt@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Wed, 14 Jun 2023 08:46:05 +0300, Konstantin Knizhnik <knizhnik(at)garret(dot)ru> wrote in
> But I do not think that it is somehow related with using threads
> instead of process.
> The question whether to use private or shared cache is not directly
> related to threads vs. process choice.

Yeah, I unconsciously conflated the two things. We can use per-thread
cache on multithreading.

> Yes, threads makes implementation of shared cache much easier. But it
> can be also done using dynamic
> memory segments, Definitely shared cache has its pros and cons, first
> if all it requires sycnhronization
> which may have negative impact o performance.

True.

> I have made an attempt to combine both caches: use relatively small
> per-backend local cache
> and large shared cache.
> I wonder what people think about the idea to make backends less thick
> by using shared cache.

I remember of a relatively old thread about that.

https://www.postgresql.org/message-id/4E72940DA2BF16479384A86D54D0988A567B9245%40G01JPEXMBKW04

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Andreas Karlsson <andreas(at)proxel(dot)se>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, knizhnik(at)garret(dot)ru
Cc:	pashkin(dot)elfe(at)gmail(dot)com, dilipbalaut(at)gmail(dot)com, hannuk(at)google(dot)com, hlinnaka(at)iki(dot)fi, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-14 07:06:05
Message-ID:	fb103da5-930b-f6cb-b6d2-ea77172c18b9@proxel.se
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 6/14/23 09:01, Kyotaro Horiguchi wrote:
> At Wed, 14 Jun 2023 08:46:05 +0300, Konstantin Knizhnik <knizhnik(at)garret(dot)ru> wrote in
>> But I do not think that it is somehow related with using threads
>> instead of process.
>> The question whether to use private or shared cache is not directly
>> related to threads vs. process choice.
>
> Yeah, I unconsciously conflated the two things. We can use per-thread
> cache on multithreading.

For sure, and we can drop the cache when dropping the memory context.
And in the first versions of an imagined threaded PostgreSQL I am sure
that is how things will work.

Then later someone will have to investigate which caches are worth
making shared and what the eviction/expiration strategy should be.

Andreas

From:	James Addison <jay(at)jp-hosting(dot)net>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Hannu Krosing <hannuk(at)google(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-14 19:15:37
Message-ID:	CALDQ5Nxxj_9Yddo-0XrmxHJdHqaJf4jj=4Y4DQiXwN-Ci5HqDA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 12 Jun 2023 at 20:24, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2023-06-12 16:23:14 +0400, Pavel Borisov wrote:
> > Is the following true or not?
> >
> > 1. If we switch processes to threads but leave the amount of session
> > local variables unchanged, there would be hardly any performance gain.
>
> False.
>
>
> > 2. If we move some backend's local variables into shared memory then
> > the performance gain would be very near to what we get with threads
> > having equal amount of session-local variables.
>
> False.
>
>
> > In other words, the overall goal in principle is to gain from less
> > memory copying wherever it doesn't add the burden of locks for
> > concurrent variables access?
>
> False.
>
> Those points seems pretty much unrelated to the potential gains from switching
> to a threading model. The main advantages are:

I think that they're practical performance-related questions about the
benefits of performing a technical migration that could involve
significant development time, take years to complete, and uncover
problems that cause reliability issues for a stable, proven database
management system.

> 1) We'd gain from being able to share state more efficiently (using normal
> pointers) and more dynamically (not needing to pre-allocate). That'd remove
> a good amount of complexity. As an example, consider the work we need to do
> to ferry tuples from one process to another. Even if we just continue to
> use shm_mq, in a threading world we could just put a pointer in the queue,
> but have the tuple data be shared between the processes etc.
>
> Eventually this could include removing the 1:1 connection<->process/thread
> model. That's possible to do with processes as well, but considerably
> harder.

This reads like a code quality argument: that's worthwhile, but I
don't see how it supports your 'False' assertions. Do two queries
running in separate processes spend much time allocating and waiting
on resources that could be shared within a single thread?

> 2) Making context switches cheaper / sharing more resources at the OS and
> hardware level.

That seems valid. Even so, I would expect that for many queries, I/O
access and row processing time is the bulk of the work, and that
context-switches to/from other query processes is relatively
negligible.

From:	Hannu Krosing <hannuk(at)google(dot)com>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	knizhnik(at)garret(dot)ru, pashkin(dot)elfe(at)gmail(dot)com, dilipbalaut(at)gmail(dot)com, hlinnaka(at)iki(dot)fi, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-14 19:45:44
Message-ID:	CAMT0RQSR1EPNRhexzijhR0KTcAx1T+YwcTWb+K8e7xSy-Rmz3A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jun 13, 2023 at 9:55 AM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> At Tue, 13 Jun 2023 09:55:36 +0300, Konstantin Knizhnik <knizhnik(at)garret(dot)ru> wrote in
> > Postgres backend is "thick" not because of large number of local
> > variables.
> > It is because of local caches: catalog cache, relation cache, prepared
> > statements cache,...
> > If they are not rewritten, then backend still may consume a lot of
> > memory even if it will be thread rather then process.
> > But threads simplify development of global caches, although it can be
> > done with DSM.
>
> With the process model, that local stuff are flushed out upon
> reconnection. If we switch to the thread model, we will need an
> expiration mechanism for those stuff.

The part that can not be so easily solved is that "the local stuff"
can include some leakage that is not directly controlled by us.

I remember a few times when memory leaks in some PostGIS packages
cause slow memory exhaustion and the simple fix was limiting
connection lifetime to something between 15 min and an hour.

The main problem here is that PostGIS uses a few tens of other GPL GIS
related packages which are all changing independently and thus it is
quite hard to be sure that none of these have developed a leak. And
you also likely can not just stop upgrading these as they also contain
security fixes.

I have no idea what the fix could be in case of threaded server.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	James Addison <jay(at)jp-hosting(dot)net>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Hannu Krosing <hannuk(at)google(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-14 19:47:49
Message-ID:	CA+TgmoaczpJmZnzFDkkv6md+EcYMo-7jzw5DBrfE9sG11xOxpQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 14, 2023 at 3:16 PM James Addison <jay(at)jp-hosting(dot)net> wrote:
> I think that they're practical performance-related questions about the
> benefits of performing a technical migration that could involve
> significant development time, take years to complete, and uncover
> problems that cause reliability issues for a stable, proven database
> management system.

I don't. I think they're reflecting confusion about what the actual,
practical path forward is.

For a first cut at this, all of our global variables become
thread-local. Every single last one of them. So there's no savings of
the type described in that email. We do each and every thing just as
we do it today, except that it's all in different parts of a single
address space instead of different address spaces with a chunk of
shared memory mapped into each one. Syscaches don't change, catcaches
don't change, memory copying is not reduced, literally nothing
changes. The coding model is just as it is today. Except for
decorating global variables, virtually no backend code needs to notice
or care about the transition. There are a few exceptions. For
instance, TopMemoryContext would need to be deleted explicitly, and
the FD caching stuff would have to be revised, because it uses up all
the FDs that the process can open, and having many threads doing that
in a single process isn't going to work. There's probably some other
things that I'm forgetting, but the typical effect on the average bit
of backend code should be very, very low. If it isn't, we're doing it
wrong.

So, I think saying "oh, this is going to destabliize PostgreSQL for
years" is just fear-mongering. If someone proposes a patch that we
think is going to have that effect, we should (and certainly will)
reject it. But I see no reason why we can't have a good patch for this
where most code changes only in mechanical ways that are easy to
validate.

> This reads like a code quality argument: that's worthwhile, but I
> don't see how it supports your 'False' assertions. Do two queries
> running in separate processes spend much time allocating and waiting
> on resources that could be shared within a single thread?

I don't have any idea what this has to do with what Andres was talking
about, honestly. However, there certainly are cases of the thing
you're talking about here. Having many backends separately open the
same file means we've got a whole bunch of different file descriptors
accessing the same file instead of just one. That does have a
meaningful cost on some workloads. Passing tuples between cooperating
processes that are jointly executing a parallel query is costly in the
current scheme, too. There might be ways to improve on that somewhat
even without threads, but if you don't think that the process model
made getting parallel query working harder and less efficient, I'm
here as the guy who wrote a lot of that code to tell you that it very
much did.

> That seems valid. Even so, I would expect that for many queries, I/O
> access and row processing time is the bulk of the work, and that
> context-switches to/from other query processes is relatively
> negligible.

That's completely true, but there are ALSO many OTHER situations in
which the overhead of frequent context switching is absolutely
crushing. You might as well argue that umbrellas don't need to exist
because there are lots of sunny days.

--
Robert Haas
EDB: http://www.enterprisedb.com

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	knizhnik(at)garret(dot)ru, pashkin(dot)elfe(at)gmail(dot)com, dilipbalaut(at)gmail(dot)com, hannuk(at)google(dot)com, hlinnaka(at)iki(dot)fi, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-14 19:51:39
Message-ID:	20230614195139.tz7w6j24hxk4gnwc@awork3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2023-06-13 16:55:12 +0900, Kyotaro Horiguchi wrote:
> At Tue, 13 Jun 2023 09:55:36 +0300, Konstantin Knizhnik <knizhnik(at)garret(dot)ru> wrote in
> > Postgres backend is "thick" not because of large number of local
> > variables.
> > It is because of local caches: catalog cache, relation cache, prepared
> > statements cache,...
> > If they are not rewritten, then backend still may consume a lot of
> > memory even if it will be thread rather then process.
> > But threads simplify development of global caches, although it can be
> > done with DSM.
>
> With the process model, that local stuff are flushed out upon
> reconnection. If we switch to the thread model, we will need an
> expiration mechanism for those stuff.

Isn't that just doing something like MemoryContextDelete(TopMemoryContext) at
the end of proc_exit() (or it's thread equivalent)?

Greetings,

Andres Freund

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Hannu Krosing <hannuk(at)google(dot)com>
Cc:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, knizhnik(at)garret(dot)ru, pashkin(dot)elfe(at)gmail(dot)com, dilipbalaut(at)gmail(dot)com, hlinnaka(at)iki(dot)fi, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-14 19:56:44
Message-ID:	CA+Tgmoadhk-yHnjrPq34iaZ_UXPLm3QAiqwyyvfYXFoPU3CTHw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 14, 2023 at 3:46 PM Hannu Krosing <hannuk(at)google(dot)com> wrote:
> I remember a few times when memory leaks in some PostGIS packages
> cause slow memory exhaustion and the simple fix was limiting
> connection lifetime to something between 15 min and an hour.
>
> The main problem here is that PostGIS uses a few tens of other GPL GIS
> related packages which are all changing independently and thus it is
> quite hard to be sure that none of these have developed a leak. And
> you also likely can not just stop upgrading these as they also contain
> security fixes.
>
> I have no idea what the fix could be in case of threaded server.

Presumably, when a thread exits, we
MemoryContextDelete(TopMemoryContext). If the leak is into any memory
context managed by PostgreSQL, this still frees the memory. But it
might not be. Right now, if a library does a malloc() that it doesn't
free() every once in a while, it's no big deal. If it does it too
often, it's a problem now, too. But if it does it only every now and
then, process exit will prevent accumulation over time. In a threaded
model, that isn't true any longer: those allocations will accumulate
until we OOM.

And IMHO that's definitely a very significant downside of this
direction. I don't think it should be dispositive because such
problems are, hopefully, fixable, whereas some of the problems caused
by the process model are basically unfixable except by not using it
any more. However, if we lived in a world where both models were
supported and a particular user said, "hey, I'm sticking with the
process model because I don't trust my third-party libraries not to
leak," I would be like "yep, I totally get it."

--
Robert Haas
EDB: http://www.enterprisedb.com

From:	James Addison <jay(at)jp-hosting(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Hannu Krosing <hannuk(at)google(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-14 22:14:01
Message-ID:	CALDQ5Nzi7ientQ-mN94A-Eww7vFbgdwB9dn=RPUkGzvSNx=kPg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 14 Jun 2023 at 20:48, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Wed, Jun 14, 2023 at 3:16 PM James Addison <jay(at)jp-hosting(dot)net> wrote:
> > I think that they're practical performance-related questions about the
> > benefits of performing a technical migration that could involve
> > significant development time, take years to complete, and uncover
> > problems that cause reliability issues for a stable, proven database
> > management system.
>
> I don't. I think they're reflecting confusion about what the actual,
> practical path forward is.

Ok. My concern is that the balance between the downstream ecosystem
impact (people and processes that use PIDs to identify, monitor and
manage query and background processes, for example) compared to the
benefits (performance improvement for some -- but what kind of? --
workloads) seems unclear, and if it's unclear, it's less likely to be
compelling.

Pavel's message and questions seem to poke at some of the potential
limitations of the performance improvements, and Andres' response
mentions reduced complexity and reduced context-switching. Elsewhere
I also see that TLB (translation lookaside buffer?) lookups in
particular should see improvements. Those are good, but somewhat
unquantified.

The benefits are less of an immediate concern if there's going to be a
migration/transition phase where both the process model and the thread
model are available. But again, if the benefits of the threading
model aren't clear, people are unlikely to want to switch, and I don't
think that the cost for people and systems to migrate from tooling and
methods built around processes will be zero. That could lead to a bad
outcome, where the codebase includes both models and yet is unable to
plan to simplify to one.

From:	James Addison <jay(at)jp-hosting(dot)net>
To:	Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Cc:	Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Hannu Krosing <hannuk(at)google(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-14 22:23:45
Message-ID:	CALDQ5NwfEA+s8O=G-frW6WfRA_yXj_AyhLf2Lb_ZgXWjH3nyEg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 13 Jun 2023 at 07:55, Konstantin Knizhnik <knizhnik(at)garret(dot)ru> wrote:
>
>
>
> On 12.06.2023 3:23 PM, Pavel Borisov wrote:
> > Is the following true or not?
> >
> > 1. If we switch processes to threads but leave the amount of session
> > local variables unchanged, there would be hardly any performance gain.
> > 2. If we move some backend's local variables into shared memory then
> > the performance gain would be very near to what we get with threads
> > having equal amount of session-local variables.
> >
> > In other words, the overall goal in principle is to gain from less
> > memory copying wherever it doesn't add the burden of locks for
> > concurrent variables access?
> >
> > Regards,
> > Pavel Borisov,
> > Supabase
> >
> >
> IMHO both statements are not true.
> Switching to threads will cause less context switch overhead (because
> all threads are sharing the same memory space and so preserve TLB.
> How big will be this advantage? In my prototype I got ~10%. But may be
> it is possible to fin workloads when it is larger.

Hi Konstantin - do you have code/links that you can share for the
prototype and benchmarks used to gather those results?

From:	Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
To:	James Addison <jay(at)jp-hosting(dot)net>
Cc:	Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Hannu Krosing <hannuk(at)google(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-15 07:12:32
Message-ID:	36f61a71-3bbb-b7b0-0d99-db5e69715af7@garret.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 15.06.2023 1:23 AM, James Addison wrote:
> On Tue, 13 Jun 2023 at 07:55, Konstantin Knizhnik<knizhnik(at)garret(dot)ru> wrote:
>>
>>
>> On 12.06.2023 3:23 PM, Pavel Borisov wrote:
>>> Is the following true or not?
>>>
>>> 1. If we switch processes to threads but leave the amount of session
>>> local variables unchanged, there would be hardly any performance gain.
>>> 2. If we move some backend's local variables into shared memory then
>>> the performance gain would be very near to what we get with threads
>>> having equal amount of session-local variables.
>>>
>>> In other words, the overall goal in principle is to gain from less
>>> memory copying wherever it doesn't add the burden of locks for
>>> concurrent variables access?
>>>
>>> Regards,
>>> Pavel Borisov,
>>> Supabase
>>>
>>>
>> IMHO both statements are not true.
>> Switching to threads will cause less context switch overhead (because
>> all threads are sharing the same memory space and so preserve TLB.
>> How big will be this advantage? In my prototype I got ~10%. But may be
>> it is possible to fin workloads when it is larger.
> Hi Konstantin - do you have code/links that you can share for the
> prototype and benchmarks used to gather those results?

Sorry, I have already shared the link:
https://github.com/postgrespro/postgresql.pthreads/

As you can see last commit was 6 years ago when I stopped work on this
project.
Why? I already tried to explain it:
- benefits from switching to threads were not so large. May be I just
failed to fid proper workload, but is was more or less expected result,
because most of the code was not changed - it uses the same sync
primitives, the same local catalog/relation caches,..
To take all advantage of multithreadig model it is necessary to rewrite
many components, especially related with interprocess communication.
But maintaining such fork of Postgres and synchronize it with mainstream
requires too much efforts and I was not able to do it myself.

There are three different but related directions of improving current
Postgres:
1. Replacing processes with threads
2. Builtin connection pooler
3. Lightweight backends (shared catalog/relation/prepared statements caches)

The motivation for such changes are also similar:
1. Increase Postgres scalability
2. Reduce memory consumption
3. Make Postgres better fir cloud and serverless requirements

I am not sure now which one should be addressed first or them can be
done together.

Replacing static variables with thread-local is the first and may be the
easiest step.
It requires more or less mechanical changes. More challenging thing is
replacing private per-backend data structures
with shared ones (caches, file descriptors,...)

From:	James Addison <jay(at)jp-hosting(dot)net>
To:	Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Cc:	Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Hannu Krosing <hannuk(at)google(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-15 08:41:39
Message-ID:	CALDQ5NwotYMZtXA2z6EkBQ72jVBrfyrn-+a9xU8=w54VBQPOhg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 15 Jun 2023 at 08:12, Konstantin Knizhnik <knizhnik(at)garret(dot)ru> wrote:
>
>
>
> On 15.06.2023 1:23 AM, James Addison wrote:
>
> On Tue, 13 Jun 2023 at 07:55, Konstantin Knizhnik <knizhnik(at)garret(dot)ru> wrote:
>
>
> On 12.06.2023 3:23 PM, Pavel Borisov wrote:
>
> Is the following true or not?
>
> 1. If we switch processes to threads but leave the amount of session
> local variables unchanged, there would be hardly any performance gain.
> 2. If we move some backend's local variables into shared memory then
> the performance gain would be very near to what we get with threads
> having equal amount of session-local variables.
>
> In other words, the overall goal in principle is to gain from less
> memory copying wherever it doesn't add the burden of locks for
> concurrent variables access?
>
> Regards,
> Pavel Borisov,
> Supabase
>
>
> IMHO both statements are not true.
> Switching to threads will cause less context switch overhead (because
> all threads are sharing the same memory space and so preserve TLB.
> How big will be this advantage? In my prototype I got ~10%. But may be
> it is possible to fin workloads when it is larger.
>
> Hi Konstantin - do you have code/links that you can share for the
> prototype and benchmarks used to gather those results?
>
>
>
> Sorry, I have already shared the link:
> https://github.com/postgrespro/postgresql.pthreads/

Nope, my mistake for not locating the existing link - thank you.

Is there a reason that parser-related files (flex/bison) are added as
part of the changeset? (I'm trying to narrow it down to only the
changes necessary for the functionality. so far it looks mostly
fairly minimal, which is good. the adjustments to progname are
another thing that look a bit unusual/maybe unnecessary for the
feature)

> As you can see last commit was 6 years ago when I stopped work on this project.
> Why? I already tried to explain it:
> - benefits from switching to threads were not so large. May be I just failed to fid proper workload, but is was more or less expected result,
> because most of the code was not changed - it uses the same sync primitives, the same local catalog/relation caches,..
> To take all advantage of multithreadig model it is necessary to rewrite many components, especially related with interprocess communication.
> But maintaining such fork of Postgres and synchronize it with mainstream requires too much efforts and I was not able to do it myself.

I get the feeling that there are probably certain query types or
patterns where a significant, order-of-magnitude speedup is possible
with threads - but yep, I haven't seen those described in detail yet
on the mailing list (but as hinted by my not noticing the github link
previously, maybe I'm not following the list closely enough).

What workloads did you try with your version of the project?

> There are three different but related directions of improving current Postgres:
> 1. Replacing processes with threads
> 2. Builtin connection pooler
> 3. Lightweight backends (shared catalog/relation/prepared statements caches)
>
> The motivation for such changes are also similar:
> 1. Increase Postgres scalability
> 2. Reduce memory consumption
> 3. Make Postgres better fir cloud and serverless requirements
>
> I am not sure now which one should be addressed first or them can be done together.
>
> Replacing static variables with thread-local is the first and may be the easiest step.
> It requires more or less mechanical changes. More challenging thing is replacing private per-backend data structures
> with shared ones (caches, file descriptors,...)

Thank you. Personally I think that motivation two (reducing memory
consumption) -- as long as it can be done without detrimentally
affecting functionality or correctness, and without making the code
harder to develop/understand -- could provide benefits for all three
of the motivating cases (and, in fact, for non-cloud/serverful use
cases too).

This is making me wonder about other performance/scalability areas
that might not have been considered due to focus on the details of the
existing codebase, but I'll save that for another thread and will try
to learn more first.

From:	Hannu Krosing <hannuk(at)google(dot)com>
To:	Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Cc:	James Addison <jay(at)jp-hosting(dot)net>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-15 08:50:31
Message-ID:	CAMT0RQT7OKq8uGH6aXYCAZYEPVZHk+sztQ60KmtRjwcgWbEpjQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 15, 2023 at 9:12 AM Konstantin Knizhnik <knizhnik(at)garret(dot)ru> wrote:

> There are three different but related directions of improving current Postgres:
> 1. Replacing processes with threads

Here we could likely start with making parallel query multi-threaded.

This would also remove the big blocker for parallelizing things like
CREATE TABLE AS SELECT ... where we are currently held bac by the
restriction that only the leader process can write.

> 2. Builtin connection pooler

Would be definitely a nice thing to have. And we could even start by
integrating a non-threaded pooler like pgbouncer to run as a
postgresql worker process (or two).

> 3. Lightweight backends (shared catalog/relation/prepared statements caches)

Shared prepared statement caches (of course have to be per-user and
per-database) would give additional benefit of lightweight connection
poolers not needing to track these. Currently the missing support of
named prepared statements is one of the main hindrances of using
pgbouncer with JDBC in transaction pooling mode (you can use it, but
have to turn off automatic statement preparing)

>
> The motivation for such changes are also similar:
> 1. Increase Postgres scalability
> 2. Reduce memory consumption
> 3. Make Postgres better fit cloud and serverless requirements

The memory consumption reduction would be a big and clear win for many
workloads.

Also just moving more things in shared memory will also prepare us for
move to threaded server (if it will eventually happen)

> I am not sure now which one should be addressed first or them can be done together.

Shared caches seem like a guaranteed win at least on memory usage.
There could be performance (and complexity) downsides for specific
workloads, but they would be the same as for the threaded model, so
would also be a good learning opportunity.

> Replacing static variables with thread-local is the first and may be the easiest step.

I think we got our first patch doing this (as part of patches for
running PG threaded on Solaris) quite early in the OSS development ,
could have been even in the last century :)

> It requires more or less mechanical changes. More challenging thing is replacing private per-backend data structures
> with shared ones (caches, file descriptors,...)

Indeed, sharing caches would be also part of the work that is needed
for the sharded model, so anyone feeling strongly about moving to
threads could start with this :)

---
Hannu

From:	Hannu Krosing <hannuk(at)google(dot)com>
To:	James Addison <jay(at)jp-hosting(dot)net>
Cc:	Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-15 09:04:20
Message-ID:	CAMT0RQTRzDgVzj13zryKRCcwt5VvNwWcw_d+-dQQeh7GkF40Fg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 15, 2023 at 10:41 AM James Addison <jay(at)jp-hosting(dot)net> wrote:
>
> This is making me wonder about other performance/scalability areas
> that might not have been considered due to focus on the details of the
> existing codebase, but I'll save that for another thread and will try
> to learn more first.

A gradual move to more shared structures seems to be a way forward

It should get us all the benefits of threading minus the need for TLB
reloading and (in some cases) reduction of per-process virtual memory
mapping tables.

In any case we would need to implement all the locking and parallelism
management of these shared structures that are not there in the
current process architecture.

So a fair bit of work but also a clearly defined benefits of
1) reduced memory usage
2) no need to rebuild caches for each new connection
3) no need to track PREPARE statements inside connection poolers.

There can be extra complexity when different connections use the same
prepared statement name (say "PREP001") for different queries.
For this wel likely will need a good cooperation with connection
pooler where it passes some kind of client connection id along at the
transaction start

From:	Hannu Krosing <hannuk(at)google(dot)com>
To:	James Addison <jay(at)jp-hosting(dot)net>
Cc:	Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-15 09:07:30
Message-ID:	CAMT0RQShmBHqVPg++fYk_RqNWRo8JwRsm54ORy5gaityqN_GNw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

One more unexpected benefit of having shared caches would be easing
access to other databases.

If the system caches are there for all databases anyway, then it
becomes much easier to make queries using objects from multiple
databases.

Note that this does not strictly need threads, just shared caches.

On Thu, Jun 15, 2023 at 11:04 AM Hannu Krosing <hannuk(at)google(dot)com> wrote:
>
> On Thu, Jun 15, 2023 at 10:41 AM James Addison <jay(at)jp-hosting(dot)net> wrote:
> >
> > This is making me wonder about other performance/scalability areas
> > that might not have been considered due to focus on the details of the
> > existing codebase, but I'll save that for another thread and will try
> > to learn more first.
>
> A gradual move to more shared structures seems to be a way forward
>
> It should get us all the benefits of threading minus the need for TLB
> reloading and (in some cases) reduction of per-process virtual memory
> mapping tables.
>
> In any case we would need to implement all the locking and parallelism
> management of these shared structures that are not there in the
> current process architecture.
>
> So a fair bit of work but also a clearly defined benefits of
> 1) reduced memory usage
> 2) no need to rebuild caches for each new connection
> 3) no need to track PREPARE statements inside connection poolers.
>
> There can be extra complexity when different connections use the same
> prepared statement name (say "PREP001") for different queries.
> For this wel likely will need a good cooperation with connection
> pooler where it passes some kind of client connection id along at the
> transaction start

From:	Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
To:	James Addison <jay(at)jp-hosting(dot)net>
Cc:	Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Hannu Krosing <hannuk(at)google(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-15 19:36:30
Message-ID:	67370d03-9244-c7eb-1b87-8052659457ba@garret.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 15.06.2023 11:41 AM, James Addison wrote:
> On Thu, 15 Jun 2023 at 08:12, Konstantin Knizhnik <knizhnik(at)garret(dot)ru> wrote:
>>
>>
>> On 15.06.2023 1:23 AM, James Addison wrote:
>>
>> On Tue, 13 Jun 2023 at 07:55, Konstantin Knizhnik <knizhnik(at)garret(dot)ru> wrote:
>>
>>
>> On 12.06.2023 3:23 PM, Pavel Borisov wrote:
>>
>> Is the following true or not?
>>
>> 1. If we switch processes to threads but leave the amount of session
>> local variables unchanged, there would be hardly any performance gain.
>> 2. If we move some backend's local variables into shared memory then
>> the performance gain would be very near to what we get with threads
>> having equal amount of session-local variables.
>>
>> In other words, the overall goal in principle is to gain from less
>> memory copying wherever it doesn't add the burden of locks for
>> concurrent variables access?
>>
>> Regards,
>> Pavel Borisov,
>> Supabase
>>
>>
>> IMHO both statements are not true.
>> Switching to threads will cause less context switch overhead (because
>> all threads are sharing the same memory space and so preserve TLB.
>> How big will be this advantage? In my prototype I got ~10%. But may be
>> it is possible to fin workloads when it is larger.
>>
>> Hi Konstantin - do you have code/links that you can share for the
>> prototype and benchmarks used to gather those results?
>>
>>
>>
>> Sorry, I have already shared the link:
>> https://github.com/postgrespro/postgresql.pthreads/
> Nope, my mistake for not locating the existing link - thank you.
>
> Is there a reason that parser-related files (flex/bison) are added as
> part of the changeset? (I'm trying to narrow it down to only the
> changes necessary for the functionality. so far it looks mostly
> fairly minimal, which is good. the adjustments to progname are
> another thing that look a bit unusual/maybe unnecessary for the
> feature)

Sorry, absolutely no reason - just my fault.

>> As you can see last commit was 6 years ago when I stopped work on this project.
>> Why? I already tried to explain it:
>> - benefits from switching to threads were not so large. May be I just failed to fid proper workload, but is was more or less expected result,
>> because most of the code was not changed - it uses the same sync primitives, the same local catalog/relation caches,..
>> To take all advantage of multithreadig model it is necessary to rewrite many components, especially related with interprocess communication.
>> But maintaining such fork of Postgres and synchronize it with mainstream requires too much efforts and I was not able to do it myself.
> I get the feeling that there are probably certain query types or
> patterns where a significant, order-of-magnitude speedup is possible
> with threads - but yep, I haven't seen those described in detail yet
> on the mailing list (but as hinted by my not noticing the github link
> previously, maybe I'm not following the list closely enough).
>
> What workloads did you try with your version of the project?

I do not remember now precisely (6 years passed).
But definitely I tried pgbench, especially read-only pgbench (to be more
CPU rather than disk bounded)

From:	Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
To:	Hannu Krosing <hannuk(at)google(dot)com>, James Addison <jay(at)jp-hosting(dot)net>
Cc:	Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-06-15 19:49:23
Message-ID:	361f63ed-2a1a-7b1f-65a2-d7cadee76937@garret.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 15.06.2023 12:04 PM, Hannu Krosing wrote:
> So a fair bit of work but also a clearly defined benefits of
> 1) reduced memory usage
> 2) no need to rebuild caches for each new connection
> 3) no need to track PREPARE statements inside connection poolers.

Shared plan cache (not only prepared statements cache) also opens way to
more sophisticated query optimizations.
Right now we are not performing some optimization (like constant
expression folding) just because them increase time of processing normal
queries.
This is why queries generated by ORMs or wizards, which can contain a
lot of dumb stuff, are not well simplified by Postgres.
With MS-Sql it is quite frequent that query execution time is much
smaller than query optimization time.
Having shared plan cache allows us to spend more time in optimization
without risk to degrade performance.

From:	Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-07-19 14:46:52
Message-ID:	CAExHW5v9GX86j3kgYrNfk+jiUgxZr7MiJf6g-QycSAF3L2hP4w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I think planner would also benefit from threads. There are many tasks
in planner that are independent and can be scheduled using dependency
graph. They are too small to be parallelized through separate backends
but large enough to be performed by threads. Planning queries
involving partitioned tables take longer time (in seconds) esp. when
there are thousands of partitions. That kind of planning will get
immensely benefited by threading. Of course we can use backends which
can pull tasks from queue but sharing the PlannerInfo and its
substructure is easier through the same address space rather than
shared memory.

On Sat, Jun 10, 2023 at 5:25 AM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>
> On Wed, Jun 7, 2023 at 06:38:38PM +0530, Ashutosh Bapat wrote:
> > With multiple processes, we can use all the available cores (at least
> > theoretically if all those processes are independent). But is that
> > guaranteed with single process multi-thread model? Google didn't throw
> > any definitive answer to that. Usually it depends upon the OS and
> > architecture.
> >
> > Maybe a good start is to start using threads instead of parallel
> > workers e.g. for parallel vacuum, parallel query and so on while
> > leaving the processes for connections and leaders. that itself might
> > take significant time. Based on that experience move to a completely
> > threaded model. Based on my experience with other similar products, I
> > think we will settle on a multi-process multi-thread model.
>
> I think we have a few known problem that we might be able to solve
> without threads, but can help us eventually move to threads if we find
> it useful:
>
> 1) Use threads for background workers rather than processes
> 2) Allow sessions to be stopped and started by saving their state
>
> Ideally we would solve the problem of making shared structures
> resizable, but I am not sure how that can be easily done without
> threads.
>
> --
> Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
> EDB https://enterprisedb.com
>
> Only you can decide what is important to you.

--
Best Wishes,
Ashutosh Bapat

From:	David Geier <geidav(dot)pg(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-07-27 13:27:57
Message-ID:	2ce25d83-39ea-569d-b73f-d4e9a3eb9cc1@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 6/7/23 23:37, Andres Freund wrote:
> I think we're starting to hit quite a few limits related to the process model,
> particularly on bigger machines. The overhead of cross-process context
> switches is inherently higher than switching between threads in the same
> process - and my suspicion is that that overhead will continue to
> increase. Once you have a significant number of connections we end up spending
> a *lot* of time in TLB misses, and that's inherent to the process model,
> because you can't share the TLB across processes.

Another problem I haven't seen mentioned yet is the excessive kernel
memory usage because every process has its own set of page table entries
(PTEs). Without huge pages the amount of wasted memory can be huge if
shared buffers are big.

For example with 256 GiB of used shared buffers a single process needs
about 256 MiB for the PTEs (for simplicity I ignored the tree structure
of the page tables and just took the number of 4k pages times 4 bytes
per PTE). With 512 connections, which is not uncommon for machines with
many cores, a total of 128 GiB of memory is just spent on page tables.

We used non-transparent huge pages to work around this limitation but
they come with plenty of provisioning challenges, especially in cloud
infrastructures where different services run next to each other on the
same server. Transparent huge pages have unpredictable performance
disadvantages. Also if some backends only use shared buffers sparsely,
memory is wasted for the remaining, unused range inside the huge page.

--
David Geier
(ServiceNow)

From:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To:	Hannu Krosing <hannuk(at)google(dot)com>
Cc:	James Addison <jay(at)jp-hosting(dot)net>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-07-28 18:10:44
Message-ID:	CAEze2WjJZZ=P_ich+=RiA51APHsNV7zi=-j-24DCPsaV43c+Ow@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 15 Jun 2023 at 11:07, Hannu Krosing <hannuk(at)google(dot)com> wrote:
>
> One more unexpected benefit of having shared caches would be easing
> access to other databases.
>
> If the system caches are there for all databases anyway, then it
> becomes much easier to make queries using objects from multiple
> databases.

We have several optimizations in our visibility code that allow us to
remove dead tuples from this database when another database still has
a connection that has an old snapshot in which the deleting
transaction of this database has not yet committed. This is allowed
because we can say with confidence that other database's connections
will never be able to see this database's tables. If we were to allow
cross-database data access, that would require cross-database snapshot
visibility checks, and that would severely hinder these optimizations.
As an example, it would increase the work we need to do for snapshots:
For the snapshot data of tables that aren't shared catalogs, we only
need to consider our own database's backends for visibility. With
cross-database visibility, we would need to consider all active
backends for all snapshots, and this can be significantly more work.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech/)

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	David Geier <geidav(dot)pg(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-08-11 12:05:17
Message-ID:	CAHyXU0z_miQ8QiE+bOKJSd=0OPSsrAboJov6WbVyJ_V_B1RJWg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jul 27, 2023 at 8:28 AM David Geier <geidav(dot)pg(at)gmail(dot)com> wrote:

> Hi,
>
> On 6/7/23 23:37, Andres Freund wrote:
> > I think we're starting to hit quite a few limits related to the process
> model,
> > particularly on bigger machines. The overhead of cross-process context
> > switches is inherently higher than switching between threads in the same
> > process - and my suspicion is that that overhead will continue to
> > increase. Once you have a significant number of connections we end up
> spending
> > a *lot* of time in TLB misses, and that's inherent to the process model,
> > because you can't share the TLB across processes.
>
> Another problem I haven't seen mentioned yet is the excessive kernel
> memory usage because every process has its own set of page table entries
> (PTEs). Without huge pages the amount of wasted memory can be huge if
> shared buffers are big.

Hm, noted this upthread, but asking again, does this
help/benefit interactions with the operating system make oom kill
situations less likely? These things are the bane of my existence, and
I'm having a hard time finding a solution that prevents them other than
running pgbouncer and lowering max_connections, which adds complexity. I
suspect I'm not the only one dealing with this. What's really scary about
these situations is they come without warning. Here's a pretty typical
example per sar -r.

kbmemfree kbmemused %memused kbbuffers kbcached kbcommit
%commit kbactive kbinact kbdirty
14:20:02 461612 15803476 97.16 0 11120280 12346980
60.35 10017820 4806356 220
14:30:01 378244 15886844 97.67 0 11239012 12296276
60.10 10003540 4909180 240
14:40:01 308632 15956456 98.10 0 11329516 12295892
60.10 10015044 4981784 200
14:50:01 458956 15806132 97.18 0 11383484 12101652
59.15 9853612 5019916 112
15:00:01 10592736 5672352 34.87 0 4446852 8378324
40.95 1602532 3473020 264 <-- reboot!
15:10:01 9151160 7113928 43.74 0 5298184 8968316
43.83 2714936 3725092 124
15:20:01 8629464 7635624 46.94 0 6016936 8777028
42.90 2881044 4102888 148
15:30:01 8467884 7797204 47.94 0 6285856 8653908
42.30 2830572 4323292 436
15:40:02 8077480 8187608 50.34 0 6828240 8482972
41.46 2885416 4671620 320
15:50:01 7683504 8581584 52.76 0 7226132 8511932
41.60 2998752 4958880 308
16:00:01 7239068 9026020 55.49 0 7649948 8496764
41.53 3032140 5358388 232
16:10:01 7030208 9234880 56.78 0 7899512 8461588
41.36 3108692 5492296 216

Triggering query was heavy (maybe even runaway), server load was minimal
otherwise:

CPU %user %nice %system %iowait %steal
%idle
14:30:01 all 9.55 0.00 0.63 0.02 0.00
89.81

14:40:01 all 9.95 0.00 0.69 0.02 0.00
89.33

14:50:01 all 10.22 0.00 0.83 0.02 0.00
88.93

15:00:01 all 10.62 0.00 1.63 0.76 0.00
86.99

15:10:01 all 8.55 0.00 0.72 0.12 0.00
90.61

The conjecture here is that lots of idle connections make the server appear
to have less memory available than it looks, and sudden transient demands
can cause it to destabilize.

Just throwing it out there, if it can be shown to help it may be supportive
of moving forward with something like this, either instead of, or along
with, O_DIRECT or other internalized database memory management
strategies. Lowering context switches, faster page access etc are of
course nice would not be a game changer for the workloads we see which are
pretty varied (OLTP, analytics) although we don't extremely high
transaction rates.

merlin

From:	Mark Woodward <woodwardm(at)google(dot)com>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	Hannu Krosing <hannuk(at)google(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-08-23 20:42:27
Message-ID:	CAE0x3MkyQF8N5V2-wk81Q_5+JzO7w5mkyFxXHBAj+6NecRNqCQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 12, 2023 at 5:17 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:

> On 10/06/2023 21:01, Hannu Krosing wrote:
> > On Mon, Jun 5, 2023 at 4:52 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
> wrote:
>
> <<<SNIP>>>

>
> > * The backend code would be more complex.
> > -- this is still the case
>
> I don't quite buy that. A multi-threaded model isn't inherently more
> complex than a multi-process model. Just different. Sure, the transition
> period will be more complex, when we need to support both models. But in
> the long run, if we can remove the multi-process mode, we can make a lot
> of things *simpler*.
>

If I may weigh in here:
Making a previously unthreaded process able to handle multiple threads, is
a tedious process.

>
> > -- even more worrisome is that all extensions also need to be rewritten
>
> "rewritten" is an exaggeration. Yes, extensions will need adapt, similar
> to the core code. But I hope it will be pretty mechanical work, marking
> global variables as thread-local and such. Many extensions will work
> with little to no changes.
>

I can tell you from experience it isn't that easy. In my career I have
taken a few "old" technologies and made them multithreaded and it is really
a complex and laborious undertaking.
Many operations that you do just fine without threads will break in a
multithreaded system. You need to make sure every function in every library
that you use is "thread safe." Take a file handle, if you read, seek, or
write a file handle you are fine in a single process, but this breaks in a
multithreaded environment if the file handle is shared. That's a very
simple example. Openssl operations will almost certainly break and you will
need to rewrite your ssl stuff and protect some things with mutexes. When
you fork() a lot is essentially duplicated (COW) between the parent and
child that will ultimately be shared in a threaded model. Decades old
assumptions in the design and architecture will break and you will need to
rethink what you are doing and how it is done. You will need to change file
handling to get beyond the 1024 file limit in calls like "select." There is
a LOT of this kind of stuff, it is not mechanical. I even call into
question "Many extensions will work with little to no changes" as those too
will need to be audited for thread safety. Think about loading extensions,
extensions are typically not loaded until they are used. In a
multi-threaded model, a shared library will only be loaded once. Think
about memory management, you will have multiple threads fighting over the
global heap as they allocate memory. The list is virtually endless.

>
> > -- and many incompatibilities will be silent and take potentially years
> to find
>
> IMO this is the most scary part of all this. I'm optimistic that we can
> have enough compiler support and tooling to catch most issues. But we
> don't know for sure at this point.
>

We absolutely do not know and it *is* very scary.

>
> > * Terminating backend processes allows the OS to cleanly and quickly
> > free all resources, protecting against memory and file descriptor
> > leaks and making backend shutdown cheaper and faster
> > -- still true
>
> Yep. I'm not too worried about PostgreSQL code, our memory contexts and
> resource owners are very good at stopping leaks. But 3rd party libraries
> could pose hard problems. IIRC we still have a leak with the LLVM JIT
> code, for example. We should fix that anyway, of course, but the
> multi-process model is more forgiving with leaks like that.
>
> Again, we believe that this is true.

> --
> Heikki Linnakangas
> Neon (https://neon.tech)
>
>
>
>

From:	David Geier <geidav(dot)pg(at)gmail(dot)com>
To:	Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-08-25 12:01:23
Message-ID:	c8300886-9353-69de-6b62-861cbc484e1b@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 8/11/23 14:05, Merlin Moncure wrote:
> On Thu, Jul 27, 2023 at 8:28 AM David Geier <geidav(dot)pg(at)gmail(dot)com> wrote:
>
> Hi,
>
> On 6/7/23 23:37, Andres Freund wrote:
> > I think we're starting to hit quite a few limits related to the
> process model,
> > particularly on bigger machines. The overhead of cross-process
> context
> > switches is inherently higher than switching between threads in
> the same
> > process - and my suspicion is that that overhead will continue to
> > increase. Once you have a significant number of connections we
> end up spending
> > a *lot* of time in TLB misses, and that's inherent to the
> process model,
> > because you can't share the TLB across processes.
>
> Another problem I haven't seen mentioned yet is the excessive kernel
> memory usage because every process has its own set of page table
> entries
> (PTEs). Without huge pages the amount of wasted memory can be huge if
> shared buffers are big.
>
>
> Hm, noted this upthread, but asking again, does this
> help/benefit interactions with the operating system make oom kill
> situations less likely? These things are the bane of my existence,
> and I'm having a hard time finding a solution that prevents them other
> than running pgbouncer and lowering max_connections, which adds
> complexity. I suspect I'm not the only one dealing with this.
> What's really scary about these situations is they come without
> warning. Here's a pretty typical example per sar -r.
>
> The conjecture here is that lots of idle connections make the server
> appear to have less memory available than it looks, and sudden
> transient demands can cause it to destabilize.

It does in the sense that your server will have more memory available in
case you have many long living connections around. Every connection has
less kernel memory overhead if you will. Of course even then a runaway
query will be able to invoke the OOM killer. The unfortunate thing with
the OOM killer is that, in my experience, it often kills the
checkpointer. That's because the checkpointer will touch all of shared
buffers over time which makes it likely to get selected by the OOM
killer. Have you tried disabling memory overcommit?

--
David Geier
(ServiceNow)

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	David Geier <geidav(dot)pg(at)gmail(dot)com>
Cc:	Merlin Moncure <mmoncure(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Let's make PostgreSQL multi-threaded
Date:	2023-08-25 13:35:00
Message-ID:	ZOiuBDUurMBGrWEw@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greetings,

* David Geier (geidav(dot)pg(at)gmail(dot)com) wrote:
> On 8/11/23 14:05, Merlin Moncure wrote:
> > Hm, noted this upthread, but asking again, does this
> > help/benefit interactions with the operating system make oom kill
> > situations less likely? These things are the bane of my existence, and
> > I'm having a hard time finding a solution that prevents them other than
> > running pgbouncer and lowering max_connections, which adds complexity.
> > I suspect I'm not the only one dealing with this. What's really scary
> > about these situations is they come without warning. Here's a pretty
> > typical example per sar -r.
> >
> > The conjecture here is that lots of idle connections make the server
> > appear to have less memory available than it looks, and sudden transient
> > demands can cause it to destabilize.
>
> It does in the sense that your server will have more memory available in
> case you have many long living connections around. Every connection has less
> kernel memory overhead if you will. Of course even then a runaway query will
> be able to invoke the OOM killer. The unfortunate thing with the OOM killer
> is that, in my experience, it often kills the checkpointer. That's because
> the checkpointer will touch all of shared buffers over time which makes it
> likely to get selected by the OOM killer. Have you tried disabling memory
> overcommit?

This is getting a bit far afield in terms of this specific thread, but
there's an ongoing effort to give PG administrators knobs to be able to
control how much actual memory is used rather than depending on the
kernel to actually tell us when we're "out" of memory. There'll be new
patches for the September commitfest posted soon. If you're interested
in this issue, it'd be great to get more folks involved in review and
testing.

Thanks!

Stephen