PostgreSQL 9.3 beta: Federated databases and more
Benefits for LWN subscribers The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today! |
In Berkeley, California — the birthplace of PostgreSQL — it's spring: plum and cherry blossoms, courting finches and college students, new plans for the summer, and the first beta release of the database system. Every year, the first beta of the next PostgreSQL version comes out in April or May, for a final release in September. PostgreSQL 9.3 beta 1 was released to the public on May 13th, and contains a couple dozen new features both for database administrators and application developers.
Of course, if you're in the southern hemisphere, it's not spring for you. Nor if you live in the north Midwest of the US or central Canada. Sorry about that; we just track the weather with PostgreSQL, we don't control it.
As usual, there are too many features in the new PostgreSQL to cover them all individually, so we're going to go over just a few of them here: database federation, writable foreign data sources, and the end of System V shared memory dependence.
Federated databases
Sharding and master-slave replication are the most common ways to horizontally scale a relational database, but they are not the only ones. A type of horizontal scaling which has been little explored among open source databases is federated databases. PostgreSQL 9.3 will introduce database federation as a standard feature of the database system, through the postgres_fdw extension, which will ship with the core code.
The idea of federated databases is that each database server can query other database servers on the network, effectively giving the user access to all databases and tables on all servers from any single database client. Tables can even be JOINed in queries with other tables from other servers. Federated databases are basically connected by the database engine at the query executor level. This type of horizontal scaling is mainly good for spreading reads of large amounts of data around several machines, and as such is primarily useful for data warehousing and analytics, rather than for transaction processing.
This concept has been implemented before, and is one of the most distinctive features of IBM's DB2. However, implementations within open source relational databases have not been extensively used. PostgreSQL has had federated databases though Skype's PL/Proxy extension since 2006, but that extension was never integrated into the core, and it forces database access via stored procedures, which didn't work for many users. MySQL introduced the FEDERATED storage engine in version 5.0, but for some reason it seems not to have been widely adopted as part of scalable MySQL solutions.
PostgreSQL introduced the foreign data wrapper (FDW) feature for accessing external data at the query level in version 9.1. That version's FDWs were read-only, however, and queried external data by copying the entire target data source to memory on the querying server, which necessarily limits how big of an external table you could query. Since August of 2010, a development team including major contributor Shigeru Hanada and committer Takahiro Itagaki, worked on making FDWs something more, culminating in the new postgres_fdw extension. Hanada described the use case and development of the feature as follows:
I want to allow creating a cluster which contains multiple PG databases connected loosely with postgres_fdw. Sometimes users want to have partial remote data available on the other database, but Streaming Replication doesn't allow writable stand-by, and contrib/dblink is not easy for application developers. postgres_fdw provides us with a nearly realtime view of remote database(s).
Creating a link to a table on another PostgreSQL server is relatively simple, although it has multiple steps. First, import the postgres_fdw extension:
CREATE EXTENSION postgres_fdw;Then create a server-to-server link:
CREATE SERVER remotesrv foreign data wrapper postgres_fdw OPTIONS ( host '127.0.01', port '5433', dbname 'bench');Create a mapping for local database users to remote database users:
CREATE USER MAPPING FOR current_user SERVER remotesrv OPTIONS ( user 'josh', password 'password' );Finally, link to the tables on the remote server:
CREATE FOREIGN TABLE remoteacct (aid int, bid int, abalance int, filler char(84)) SERVER remotesrv OPTIONS ( table_name 'pgbench_accounts' );
Queries can be run against the tables on the attached server, including writing to it, just as if it were a table on the local server:
EXPLAIN SELECT * FROM remoteacct WHERE bid = 5; INSERT INTO remoteacct ( aid, bid, abalance ) VALUES ( 10000000, 5, 100 );
The postgres_fdw extension permits the PostgreSQL query planner to "push down" query clauses, such as WHERE clauses, to the remote database, allowing for distributed, parallel execution of several portions of the federated query. This eliminates the requirement to copy the entire foreign table into memory, and with some architectures permits execution of federated queries over much larger data sets than could be queried on one server alone. More work needs to be done in this area to make this a full "big data" solution, however; Hanada hopes to add "push down" of joins, sorts, and aggregates in future versions of PostgreSQL.
Linking PostgreSQL to Redis
The improved foreign data wrappers aren't limited to PostgreSQL-to-PostgreSQL connections. FDWs can be used to connect PostgreSQL to any kind of external data, as long as it can be rendered in a tabular format: Oracle database tables, CSV files, process lists, or data from non-relational databases such as Redis and MongoDB, are all possibilities. Now that FDWs are writable as well as readable, PostgreSQL becomes much more useful as a data integration tool for multiple other databases and data sources.
In order to connect to a non-PostgreSQL data source, a developer needs to write a special driver (also called an FDW), which is then installable as a PostgreSQL extension. Existing FDWs will need to be enhanced to support writability. One which is likely to be ready for read-write access when PostgreSQL 9.3 is released is the Redis FDW. Andrew Dunstan, PostgreSQL committer, is working on a new version of the driver, because he uses Redis together with PostgreSQL. Redis is a non-relational, memory-only database which stores hashes, lists, and sets.
Many people use Redis alongside PostgreSQL. It functions well as an application cache, a buffer for rapid-fire writes, or to support queuing. The Redis FDW allows users to pull data directly from Redis into PostgreSQL without going through the application layer, saving both development time and system resources. Dunstan describes the work:
Redis's response times are extremely fast. It isn't just a simple key value store. The values can be structured. That makes it easier to fit them to a PostgreSQL table structure. In particular, a Redis hash is a good fit for a row on a PostgreSQL table, and a Redis set is useful as more or less a table index, and via the keyset structure a substitute for a "where" clause.
The Redis FDW works by mapping each of the Redis data types (scalar, hash, set, list, ordered set) into PostgreSQL tables. Tables can be specified to occupy a subset of the global Redis keyspace, either by the keys having a specified prefix, or by designating that the keys are stored in a named Redis set.
Redis can be attached as a foreign server as well. First you need to install the redis_fdw extension, and create the foreign server, in this case a Redis server on the same machine, and tell PostgreSQL what users are allowed to access it, as you do with a Postgres-to-Postgres link:
CREATE EXTENSION redis_fdw; CREATE SERVER localredis FOREIGN DATA WRAPPER redis_fdw; CREATE USER MAPPING FOR public SERVER localredis;
Redis stores data in five forms: scalars, hashes, sets, lists and sorted sets (zsets). These objects can be mapped to local PostgreSQL tables, either singly or in groups. The code below, for example, makes a two-column table out of all of the lists whose key begins with "jobqueue":
CREATE FOREIGN TABLE jobqueues(key text, list text[]) SERVER localredis OPTIONS (database '0', tabletype 'list', tableprefix 'jobqueue');
The future read-write Redis FDW will permit pushing data to Redis as well as reading it, enabling new ways of using Redis as an integrated cache or queuing store with PostgreSQL. For example, cache invalidation and refresh can be controlled using a database trigger, making invalidation much more discriminating about which data it needs to replace. Or PostgreSQL can automatically push items onto a Redis queue whenever certain events in the database happen.
No more "shmmax"
PostgreSQL's use of SysV shared memory has been a frequent source of irritation for system administrators. Every time an administrator installs PostgreSQL on a brand-new system, they have to edit the sysctl.conf file in order to raise the kernel limits on shared memory by substantially increasing "shmmax" and "shmall"; otherwise, PostgreSQL is limited to using a few megabytes of RAM. As of version 9.3, this headache will go away, thanks to committer Robert Haas:
The changes were isolated to just one file, sysv_shmem.c. I wrote the patch in just one day. The only real hard part was figuring out what shared memory facilities existed that would be portable enough to serve our needs.
We haven't completely given up SysV RAM. PostgreSQL 9.3 will still allocate a small region of System V shared memory, just a few bytes. This region is important because it allows us to protect against the catastrophic situation where two unrelated sets of PostgreSQL processes access the same data directory at the same time. This interlock relies on a feature of System V shared memory that does not seem to be offered by any other commonly-available shared memory facility: the ability to determine the number of other processes which are attached to a given shared memory segment. Until someone devises another, equally bullet-proof way of doing this, we'll probably continue to rely on SysV shared memory at least for this small task.
After debating several other approaches, Haas ended up moving all of PostgreSQL's dedicated memory to mmap(), using anonymous shared memory regions. This avoids shared memory limits, and is a widely supported interface that works on all of the operating systems supported by PostgreSQL, except for Windows. PostgreSQL for Windows, however, has always used a different system of memory allocation. The new mechanism works without users needing to make any changes to their PostgreSQL configurations. The one problem reported so far has been decreased performance on BSD operating systems, due to the loss of special optimizations those operating systems made for SysV shared memory.
Lots more features
As always, this PostgreSQL annual release has a lot more features to offer users. Included in the current beta are:
- A new CREATE MATERIALIZED VIEW declaration, which lets users generate precalculated summary tables using a simple SQL statement.
- Regular dynamic VIEWs (saved queries) are now automatically updatable as well as readable.
- A data page checksum option for users who need to detect disk integrity issues immediately.
- A new JOIN construct, LATERAL JOINs, which allow self-referencing table lists; this is especially useful with functions or foreign data wrappers.
- Additional built-in JSON manipulation functions for the JSON data type.
- Indexed regular expression search to speed up text matching.
- Sub-second "fast failover" option when switching replicated servers, for 99.999% high availability.
Plus many other features, some of them unique to PostgreSQL. The community has documented some of them, so that you can see if there is something for you in this beta.
Inevitably, these features also introduce lots of new bugs, which is why the PostgreSQL team wants you to download the beta and test it with your applications. Like most community-driven open source projects, PostgreSQL relies heavily on end-user testing to find bugs before the final release. The project will release multiple betas over the four month testing period. The final release for version 9.3 is expected to be sometime in September.
[ Josh Berkus is a member of the PostgreSQL Core Team. ]
Index entries for this article | |
---|---|
GuestArticles | Berkus, Josh |
(Log in to post comments)
PostgreSQL 9.3 beta: Federated databases and more
Posted May 14, 2013 22:17 UTC (Tue) by felixfix (subscriber, #242) [Link]
I did not know of the connection between UC Berkeley and PostgreSQL.
Is your name a pseudonym, possibly a pun? Or did you gravitate towards PostgreSQL because of your name?
:-)
PostgreSQL 9.3 beta: Federated databases and more
Posted May 14, 2013 23:14 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]
PostgreSQL 9.3 beta: Federated databases and more
Posted May 15, 2013 3:08 UTC (Wed) by jberkus (guest, #55561) [Link]
Given my once-radical leftist politics, I was known as "Joshua Bezerkeley" in college.
But no, Berkus is a Ukrainian name, actually.
PostgreSQL 9.3 beta: Federated databases and more
Posted May 15, 2013 4:34 UTC (Wed) by felixfix (subscriber, #242) [Link]
PostgreSQL 9.3 beta: Federated databases and more
Posted May 15, 2013 7:43 UTC (Wed) by ortalo (guest, #4654) [Link]
http://www.postgresql.org/about/history/
PostgreSQL 9.3 beta: Federated databases and more
Posted May 15, 2013 7:03 UTC (Wed) by ncm (guest, #165) [Link]
About the same time, PG got a 64-bit block CRC using a polynomial extracted from a magnetic-tape format standard. I gather that modern cryptographic hashes can be computed faster, on modern hardware, than CRCs. Maybe it's time to reconsider that choice too?
It's gratifying to look back on decades of monotonic improvement along so many axes and recognize the mature leadership that has made it possible. It could so easily have gone off the rails at every point.
PostgreSQL 9.3 beta: Federated databases and more
Posted May 15, 2013 18:00 UTC (Wed) by dlang (guest, #313) [Link]
PostgreSQL 9.3 beta: Federated databases and more
Posted May 15, 2013 18:30 UTC (Wed) by jberkus (guest, #55561) [Link]
mmapped table files
Posted May 15, 2013 19:03 UTC (Wed) by ncm (guest, #165) [Link]
PostgreSQL 9.3 beta: Federated databases and more
Posted May 15, 2013 19:22 UTC (Wed) by nix (subscriber, #2304) [Link]
Also, mmap() implies page faults, which imply TLB shootdowns, which are slower than straight reads into already-allocated buffers as is generally done by read(). Combine that with the fact that EOF is fairly hard to detect, and appending is harder, and...
I wish mmap() was used for everything: it's a lovely unifying interface. But it's also a bit of a pig.
PostgreSQL 9.3 beta: Federated databases and more
Posted May 15, 2013 19:38 UTC (Wed) by andresfreund (subscriber, #69562) [Link]
The problem is that we cannot influence the order in which the pages are flushed to disk. For crash safety we cannot allow any pages to be written out that have a LSN (Log Sequence Number := Address of the write ahead log record covering the last modification) bigger than the last LSN of the corresponding WAL that has been flushed out.
So we would have to be able to reliably prevent writeout on a page (postgres' ones, by default 8kb) granularity in an efficient manner.
PostgreSQL 9.3 beta: Federated databases and more
Posted May 15, 2013 20:58 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]
PostgreSQL 9.3 beta: Federated databases and more
Posted May 15, 2013 21:26 UTC (Wed) by ncm (guest, #165) [Link]
PostgreSQL 9.3 beta: Federated databases and more
Posted May 15, 2013 23:17 UTC (Wed) by andresfreund (subscriber, #69562) [Link]
The point is that writeout for file writes needs to have interlock with the writes for the journal. You can only writeout a modified page if its corresponding log entry has already been written out.
Writing out only the journal in an mmap()ed fashion would actually be far easier. But I don't see much benefit in that direction since only small amounts of data (up to maybe 64MB or so has been measured as being benefical) are held in memory for the log. And we frequently write to new files which would always requiring an mmap()/munmap() cycle (which actually sucks for concurrency).
PostgreSQL 9.3 beta: Federated databases and more
Posted May 16, 2013 5:19 UTC (Thu) by ncm (guest, #165) [Link]
You seem to be describing a process more like a traditional store and write-ahead log, where first you write in the log all the changes are planned for the main store, and then lazily update the main store, writing it all again, knowing that if you are interrupted somebody else can replay the rest of the log. But I thought the great advantage of the PG scheme is that you only have to write once.
Maybe only metadata goes to the journal and is then copied out, while bulk data goes directly into unused blocks?
PostgreSQL 9.3 beta: Federated databases and more
Posted May 16, 2013 7:17 UTC (Thu) by andresfreund (subscriber, #69562) [Link]
> You seem to be describing a process more like a traditional store and write-ahead log, where first you write in the log all the changes are planned for the main store, and then lazily update the main store, writing it all again, knowing that if you are interrupted somebody else can replay the rest of the log.
Postgres' implementation is a pretty classical write ahead log scheme that is far more like the second scheme you describe than the first one. And afaik has been since the introduction of crash safety (in 7.0 or so).
> But I thought the great advantage of the PG scheme is that you only have to write once.
Hm. Not sure what that corresponds to then? Postgres' WAL doesn't write full pages (except in some circumstances, but let's leave them out for now), but only a description of the change like 'insert tuple at slot X of page YYY) so amount of data that has to be fsync()ed for commit is reasonably small. Perhaps that is what you were referring to?
PostgreSQL 9.3 beta: Federated databases and more
Posted May 16, 2013 8:10 UTC (Thu) by ncm (guest, #165) [Link]
PostgreSQL 9.3 beta: Federated databases and more
Posted May 15, 2013 21:43 UTC (Wed) by andresfreund (subscriber, #69562) [Link]
I don't see how it could be done without destroying either the benefits (fixin memory waste by caching a buffer in pg and in the os) or harming other things. The PG code relies on quickly marking a buffer dirty, requiring to copy it somewhere else for that would be rather expensive.
Calling munmap()/mmap() everytimes that happens would also be prohibitively expensive, especially in concurrent situations, so we cannot just do it for the individual memory areas.
But that doesn't mean there isn't a way. I just don't know of anyone describing a realistic implementation strategy so far.
PostgreSQL 9.3 beta: Federated databases and more
Posted May 16, 2013 0:08 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]
PostgreSQL 9.3 beta: Federated databases and more
Posted May 16, 2013 0:14 UTC (Thu) by andresfreund (subscriber, #69562) [Link]
Afair there are no checks made against it, so yes. But what would be the point? You need to modify the page first, which makes the write superflous? It doesn't prevent the kernel from writing out the page too early either.
I think I am not following where you are going with this?
PostgreSQL 9.3 beta: Federated databases and more
Posted May 16, 2013 12:56 UTC (Thu) by heijo (guest, #88363) [Link]
I hear they can share memory automatically and efficiently and have been available for more than 20 years.
PostgreSQL 9.3 beta: Federated databases and more
Posted May 16, 2013 17:58 UTC (Thu) by jberkus (guest, #55561) [Link]
PostgreSQL 9.3 beta: Federated databases and more
Posted May 18, 2013 2:44 UTC (Sat) by ghane (guest, #1805) [Link]
A programmer had a problem. He thought to himself, "I know, I'll solve it with threads!". has Now problems. two he
PostgreSQL 9.3 beta: Federated databases and more
Posted May 20, 2013 17:49 UTC (Mon) by zlynx (guest, #2285) [Link]
PostgreSQL 9.3 beta: Federated databases and more
Posted May 28, 2013 20:57 UTC (Tue) by rpkelly (guest, #74224) [Link]
PostgreSQL 9.3 beta: Federated databases and more
Posted Jun 1, 2013 9:01 UTC (Sat) by kleptog (subscriber, #1183) [Link]
That said, if you restrict yourself to just the executor you primarily have to deal with the memory allocator and the disk buffers. Is it possible to make that thread-safe? I'm not sure anyone has tried. I think with only a few weeks work you could probably make something functional. However, convincing everyone that the solution is as robust as the current setup is much much harder.
PostgreSQL 9.3 beta: Federated databases and more
Posted May 17, 2013 10:01 UTC (Fri) by ras (subscriber, #33059) [Link]
Memory doesn't have to be shared by all threads living in the same process. There are any number of ways, including explicitly shared and memory mapped files. These boil down to choosing to "not shared by default" instead of the "shared by" default model threads use. Speed of access to the memory is the same. The latter is safer, on multi machines with multi CPU's usually faster because less sharing means less cache thrashing. But there is an extra cost of creating a process which is why it loses on Windows.
PostgreSQL 9.3 beta: Federated databases and more
Posted May 16, 2013 8:03 UTC (Thu) by ptman (subscriber, #57271) [Link]
PostgreSQL 9.3 beta: Federated databases and more
Posted May 16, 2013 17:54 UTC (Thu) by jberkus (guest, #55561) [Link]
PostgreSQL 9.3 beta: Federated databases and more
Posted May 16, 2013 18:09 UTC (Thu) by andresfreund (subscriber, #69562) [Link]
I think that ship has sailed and for the on-disk page checksums we are going with the modified FNV.
Explanations about the algorithm:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;...
PostgreSQL 9.3 beta: Federated databases and more
Posted May 17, 2013 7:26 UTC (Fri) by Tobu (subscriber, #24111) [Link]
MurmurHash3 has better throughput and dispersion.