SlideShare a Scribd company logo
1 of 9
How to build a Postgresql server that weights
less than 20Mb (with docker and buildroot)
Highlights of this article (TL,DR): we'll show how to use buildroot to create a basic but fully
functional container using less than 4 MB of disk space (uncompressed). Then we will apply the
same technique to obtain a PostgreSQL image which fits in less than 20 MB (not including your
databases, of course). You can play with those containers at once if you want. Just run "docker
run jpetazzo/pglite", and within seconds, you will have a PostgreSQL server running on your
machine!
I like containers, because they are lighter than virtual machines. This means that they will
use less disk space, less memory, and ultimately be cheaper and faster than their heavier
counterparts. They also boot much faster. Great.
But how "lightweight" is "lightweight"? I wanted to know. We all wanted to know. We already
have a small image, docker-ut, using a statically compiled buildbox (it's built using this script). It
uses about 7 MB of disk space, and is only good to run simple shell scripts; but it is fully
functional—and perfect for Docker unit tests.
How can we build something even smaller? And how can we build something more useful (e.g.,
a PostgreSQL server), but with a ridiculously low footprint?
To build really small systems, you have to look at embedded systems. That's where you find
the experts about everything small-footprint and space efficient. In the world of embedded
systems, sometimes you have to cram a complete system, including Linux kernel, drivers, start
up scripts, essential libraries, web and SSH servers, WiFi access point management code, radius
server, OpenVPN client, bittorrent downloader -- all in 4 MB of flash. Sounds like what we need,
right?
There are many tools out there to build images for embedded systems. We decided to use
buildroot. Quoting buildroot's project page: "Buildroot is a set of Makefiles and patches that
makes it easy to generate a complete embedded Linux system." Let's put it to the test!
The first step is to download and unpack buildroot:
curl http://buildroot.uclibc.org/downloads/buildroot-2013.05.tar.bz2 | tar jx
Buildroot itself is rather small, because it doesn't include the source of all the things that it compiles. It
will download those later. Now let's dive in:
cd buildroot-2013.05/
The first thing is to tell buildroot what we want to build. If you have ever built your own kernel, this step
will look familiar:
make menuconfig
For now, we will change just one thing: tell buildroot that we want to compile for a 64 bits traget. Go to
the "target architecture" menu, and select x86_64. Then exit (save along the way). Now brew a big pot
of coffee, and fire up the build:
make
This will take a while (from 10 minutes to a couple of hours, depending on your local machine
beefiness). This takes so long because it will first compile a toolchain. It means that instead of
using your default compiler and libraries, it will: download and compile a preset version of gcc;
download and compile uclibc (a small-footprint libc); and then it will use those to compile
everything else. This sounds like a lot of extra work, but it brings two huge advantages:
 if you want to build for a different architecture (e.g. that Raspberry Pi), it will work
exactly the same way;
 it abstracts your local compiler: your version of gcc/clang/other is irrelevant, since your
image will be built by the versions fixed by buildroot anyway.
At the end of the build, our minimalist container is ready! Let's have a look:
cd output/images
ls -l
You should see a small, lean, rootfs.tar file, containing the image to be imported in Docker.
But it's not quite ready yet. We need to fix a few things.
 Docker sets the DNS configuration by bind-mounting over /etc/resolv.conf. This
means that /etc/resolv.conf has to be a standard file. By default, buildroot makes it a
symlink. We have to replace that symlink with a file (an empty file will do).
 Likewise, Docker "injects" itself within containers by bind-mounting over /sbin/init.
This means that /sbin/init should be a regular file as well. By default, buildroot makes
it a symlink to busybox. We will change that, too.
 Docker injects itself within containers, and (as of I write this) it is dynamically linked.
This means that it requires a couple of libraries to run correctly. We will need to add
those libraries to the container.
(Note: Docker will eventually switch to static linkage, which means that the last step won't be
necessary anymore.)
We could unpack the tar file, do our changes, and repack; but that would be boring. So instead,
we will be fancy and update the file on the fly.
Let's create an extra directory, and populate it with those "additions":
mkdir extra extra/etc extra/sbin extra/lib extra/lib64
touch extra/etc/resolv.conf
touch extra/sbin/init
cp /lib/x86_64-linux-gnu/libpthread.so.0 /lib/x86_64-linux-gnu/libc.so.6 extra/lib
cp /lib64/ld-linux-x86-64.so.2 extra/lib64
The paths to the libraries might be different on your machine. In doubt, you can run ldd
$(which docker) to see which libraries are used by your local Docker install.
Then, create a new tarball including those extra files:
cp rootfs.tar fixup.tar
tar rvf fixup.tar -C extra .
Last but not least, the "import" command will bring this image into Docker. We will name it "dietfs":
docker import - dietfs < fixup.tar
We're done! Let's make sure that everything worked properly, by creating a new container with this
image:
docker run -t -i dietfs /bin/sh
For what it's worth, I put together a small fixup script on Gist, to automate those steps, so you can also
execute it like this:
curl https://gist.github.com/jpetazzo/b932fb0c753e69c73d31/raw > fixup.sh
sh fixup.sh
The result is a rather small image; less than 3.5 MB:
REPOSITORY TAG ID CREATED SIZE
jpetazzo/busybox latest 0c0468ea37af 5 days ago 3.389 MB (virtual 3.389 MB)
Not Bad!
Now, how do we build something more complex, like a PostgreSQL server?
Why PostgreSQL? Two reasons. One: it's awesome. Two: I didn't find a PostgreSQL package
in buildroot, so it was an excellent opportunity to learn how to include something "from scratch",
as opposed to merely ticking a checkbox and recompiling away.
First, we want to create a directory for our new package. From buildroot's top directory:
mkdir packages/postgres
Then, we need to put a couple of files in that directory. For your convenience, I stored them on Gist:
curl https://gist.github.com/jpetazzo/5819538/raw/Config.in > packages/postgres/Config.in
curl https://gist.github.com/jpetazzo/5819538/raw/postgres.mk > packages/postgres/postgres.mk
Let's have a look at those files now. First, Config.in: it is used by make menuconfig to display a
checkbox for our new package (yay!), but also to define some build dependencies. In that case, we need
IPV6 support.
config BR2_PACKAGE_POSTGRES
bool "postgres"
depends on BR2_TOOLCHAIN_BUILDROOT_INET_IPV6
help
PostgresSQL server
comment "postgres requires a toolchain with IPV6 support enabled"
depends on !BR2_TOOLCHAIN_BUILDROOT_INET_IPV6
How does one know which dependencies to use? I confess that I tried first with no dependency
at all. The build failed, so I had a look at the error messages, saw that it complained about
missing IPV6 headers; so I fixed the issue by adding the required dependencies.
The other file, postgres.mk, contains the actual build instructions:
#############################################################
#
# postgresql
#
#############################################################
POSTGRES_VERSION = 9.2.4
POSTGRES_SOURCE = postgresql-$(POSTGRES_VERSION).tar.gz
POSTGRES_SITE = http://ftp.postgresql.org/pub/source/v$(POSTGRES_VERSION)/$(POSTGRES_SOURCE)
POSTGRES_CONF_OPT = --with-system-tzdata=/usr/share/zoneinfo
POSTGRES_DEPENDENCIES = readline zlib
$(eval $(autotools-package))
As you can see, it is pretty straightforward. The main thing is to define some variables to tell
buildroot where it should fetch PostgreSQL source code. We don't have to provide actual build
instructions, because PostgreSQL uses autotools. ("This project uses autotools" means that you
typically compile it with "./configure && make && make install ; this probably rings a bell
if you ever compiled a significant project manually on any kind of UNIX system!)
The build instructions will actually be expanded from the last line. If you want more details
about buildroot's operation, have a look at buildroot's autotools package tutorial.
We can see that postgres.mk also defines more dependencies: readline and zlib. So what's the
difference between the CONF_OPT, DEPENDENCIES, and the "depends" previously seen in
Config.in?
 CONF_OPT provides extra flags which will be passed to ./configure. In this case, the
compilation was failing, telling me that I should specify the path to timezone data. I
looked around and figured out the right flag.
 DEPENDENCIES tells buildroot to compile extra libraries before taking care of our
package. Guess what: when I tried to compile, it failed and complained about missing
readline and zlib; so I added those dependencies and that's it.
 "depends" in Config.in is a toolchain dependency. It is not really a library; it merely
tells buildroot "hey, when you will compile uclibc, make sure to include IPV6 support,
will you?". It has a strong implication: when you change the configuration of the
toolchain (C library or compiler), you have to recompile everything: the toolchain and
everything which was compiled with it. This will obviously be longer than just
recompiling a single package. It is done with the command make clean all.
Last but not least, we need to include our Config.in file in the top-level Config.in. The quick
and dirty way is to do this (from buildroot top directory):
echo 'source "package/postgres/Config.in"' >> Config.in
Note: normally, we should do this in a neat submenu section within e.g. packages/Config.in.
But this way will save us some hassle navigating through the menus.
Alright, now run make menuconfig again; go to "Toolchain", enable IPV6 support, go back to
the main menu, and enable "postgres". Now recompile everything with make clean all. This
will take a while.
Just like before, we need to "fixup" the resulting image:
cd output/images
curl https://gist.github.com/jpetazzo/b932fb0c753e69c73d31/raw | sh
We now have a Docker image with PostgreSQL in it; but it is not enough. We still need to setup
the image to start PostgreSQL automatically, and even before that, PostgreSQL will have to
initialize its data directory (with initdb). We will use a Dockerfile and a custom script for
that.
What's a Dockerfile? A Dockerfile contains basic instructions telling Docker how to build an
image. When you use Docker for the first time, you will probably use "docker run" and "docker
commit" to create new images; but you should quickly move to Dockerfiles and "docker build"
because it automates those operations and makes it easier to share "recipes" to build images.
Let's start with the custom script. We want this script to run automatically within the container
when it starts. Make a new empty directory, and create the following init file in it:
#!/bin/sh
set –e
mkdir /usr/share/zoneinfo /data
chown default /data
head -c 16 /dev/urandom | sha1sum | cut -c1-10 > /pwfile
echo "PG_PASSWORD=$(cat /pwfile)"
su default -s /usr/bin/initdb -- --pgdata=/data --pwfile=/pwfile --username=postgres --auth=trust
>/dev/null
echo host all all 0.0.0.0 0.0.0.0 md5 >> /data/pg_hba.conf
exec su default -s /usr/bin/postgres -- -D /data -c 'listen_addresses=*'
PostgreSQL will refuse to run as root, so we use the default user (conveniently provided by
buildroot). We create /data to hold PostgreSQL data files, assign it to the non-privileged user.
We also generate a random password, save it to /pwfile, and display it (to make it easier to
retrieve later). We can then run initdb to actually create the data files. Then, we extend
pg_hba.conf to authorize connections from the network (by default, only local connections are
allowed). The last step is to actually start the server.
Make sure that the script is executable:
chmod +x init
Now, in the same directory, we will create the following Dockerfile, to actually inject the previous
script in a new image:
from dietfs
add . /
expose 5432
cmd /init
The fixup.sh script has imported our image under the name "dietfs", so our Dockerfile will start
with from dietfs, to tell Docker that we want to use that image as a base. Then, we add all the
files in the current directory to the root of our image. This will also inject the Dockerfile itself,
but we don't care. We expose TCP port 5432, and finally tell Docker that by default, when a
container is created from this image, it should run our /init script. You can read more about the
Dockerfile syntax in Docker's documentation.
The next step is to build the new image using our Dockerfile:
docker build -t pglite .
That's it. You can now start a new PostgreSQL instance:
docker run pglite
The output will include the password, and then the first log messages from the server:
PG_PASSWORD=4e68b1958c
LOG: database system was shut down at 2013-06-20 03:55:50 UTC
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
Weak Password Is Weak! Our password is random, but in only includes hexadecimal digits
(i.e. [0-9a-f]). You can make it better by including base64 in the image, and using base64 instead
of md5sum. Alternatively, you can use longer passwords.
Take note of the password. It's OK to hit "Ctrl-C" now: the container will still run in the
background. Let's check which port was allocated for our container. docker ps will show us all
the containers currently running; but to make things even simpler, we will use docker ps -l,
which only shows the latest container.
$ docker ps –l
ID IMAGE COMMAND CREATED STATUS PORTS SIZE
e21ba744ff09 pglite:latest /bin/sh -c /init About a minute ago Up About a minute 49168->5432
23.53 MB (virtual 39.87 MB)
Alright, that's port 49168. Does it really work? Let's check for ourselves! You can try locally if you have a
PostgreSQL client installed on your Docker machine; or from anywhere else (just replace "localhost"
with the hostname or IP address of your Docker machine).
$ psql postgres --host localhost --port 49168 --username postgres
Password for user postgres: 4e68b1958c
psql (9.1.3, server 9.2.4)
WARNING: psql version 9.1, server version 9.2.
Some psql features might not work.
Type "help" for help.
postgres=# q
$
A small note about sizes: the image takes about 16 MB, but the data files take almost 24 MB.
So the total footprint is really about 40 MB.
What if we want to automate the creation of our PostgreSQL container, to run our own
PostgreSQL-as-a-Service platform? Easy, with just a tiny bit of shell trickery!
CONTAINERID=$(docker run -d pglite)
while ! docker logs $CONTAINERID 2>/dev/null | grep -q ^PG_PASSWORD= ; do sleep 1 ; done
eval $(docker logs $CONTAINERID 2>/dev/null)
PG_PORT=$(docker port $CONTAINERID 5432)
echo "A new PostgreSQL instance is listening on port $PG_PORT. The admin user is postgres, the admin
password is $PG_PASSWORD."
That's it! If you name your image "yourname/pglite" instead of just "pglite", you will be able to "docker
push" it to the Docker Public Registry, and to "docker pull" it from any other Docker host anywhere in
the world. You are one PHP script away from setting up your own PostgreSQL-as-a-Service provider
About Jérôme Petazzoni
Jérôme is a senior engineer at dotCloud, where he rotates between Ops, Support
and Evangelist duties and has earned the nickname of “master Yoda”. In a
previous life he built and operated large scale Xen hosting back when EC2 was
just the name of a plane, supervized the deployment of fiber interconnects
through the French subway, built a specialized GIS to visualize fiber
infrastructure, specialized in commando deployments of large-scale computer
systems in bandwidth-constrained environments such as conference centers,
and various other feats of technical wizardry. He cares for the servers powering
dotCloud, helps our users feel at home on the platform, and documents the
many ways to use dotCloud in articles, tutorials and sample applications. He’s
also an avid dotCloud power user who has deployed just about anything on
dotCloud – look for one of his many custom services on our Github repository.
Connect with Jérôme on Twitter! @jpetazzo

More Related Content

Recently uploaded

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Recently uploaded (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Featured

Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationErica Santiago
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellSaba Software
 

Featured (20)

Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
 

How to build a PostgreSQL server that weights less than 20Mb - using Docker & Buildroot

  • 1. How to build a Postgresql server that weights less than 20Mb (with docker and buildroot) Highlights of this article (TL,DR): we'll show how to use buildroot to create a basic but fully functional container using less than 4 MB of disk space (uncompressed). Then we will apply the same technique to obtain a PostgreSQL image which fits in less than 20 MB (not including your databases, of course). You can play with those containers at once if you want. Just run "docker run jpetazzo/pglite", and within seconds, you will have a PostgreSQL server running on your machine! I like containers, because they are lighter than virtual machines. This means that they will use less disk space, less memory, and ultimately be cheaper and faster than their heavier counterparts. They also boot much faster. Great. But how "lightweight" is "lightweight"? I wanted to know. We all wanted to know. We already have a small image, docker-ut, using a statically compiled buildbox (it's built using this script). It uses about 7 MB of disk space, and is only good to run simple shell scripts; but it is fully functional—and perfect for Docker unit tests. How can we build something even smaller? And how can we build something more useful (e.g., a PostgreSQL server), but with a ridiculously low footprint? To build really small systems, you have to look at embedded systems. That's where you find the experts about everything small-footprint and space efficient. In the world of embedded systems, sometimes you have to cram a complete system, including Linux kernel, drivers, start up scripts, essential libraries, web and SSH servers, WiFi access point management code, radius server, OpenVPN client, bittorrent downloader -- all in 4 MB of flash. Sounds like what we need, right? There are many tools out there to build images for embedded systems. We decided to use buildroot. Quoting buildroot's project page: "Buildroot is a set of Makefiles and patches that makes it easy to generate a complete embedded Linux system." Let's put it to the test! The first step is to download and unpack buildroot: curl http://buildroot.uclibc.org/downloads/buildroot-2013.05.tar.bz2 | tar jx Buildroot itself is rather small, because it doesn't include the source of all the things that it compiles. It will download those later. Now let's dive in: cd buildroot-2013.05/
  • 2. The first thing is to tell buildroot what we want to build. If you have ever built your own kernel, this step will look familiar: make menuconfig For now, we will change just one thing: tell buildroot that we want to compile for a 64 bits traget. Go to the "target architecture" menu, and select x86_64. Then exit (save along the way). Now brew a big pot of coffee, and fire up the build: make This will take a while (from 10 minutes to a couple of hours, depending on your local machine beefiness). This takes so long because it will first compile a toolchain. It means that instead of using your default compiler and libraries, it will: download and compile a preset version of gcc; download and compile uclibc (a small-footprint libc); and then it will use those to compile everything else. This sounds like a lot of extra work, but it brings two huge advantages:  if you want to build for a different architecture (e.g. that Raspberry Pi), it will work exactly the same way;  it abstracts your local compiler: your version of gcc/clang/other is irrelevant, since your image will be built by the versions fixed by buildroot anyway. At the end of the build, our minimalist container is ready! Let's have a look: cd output/images ls -l You should see a small, lean, rootfs.tar file, containing the image to be imported in Docker. But it's not quite ready yet. We need to fix a few things.  Docker sets the DNS configuration by bind-mounting over /etc/resolv.conf. This means that /etc/resolv.conf has to be a standard file. By default, buildroot makes it a symlink. We have to replace that symlink with a file (an empty file will do).  Likewise, Docker "injects" itself within containers by bind-mounting over /sbin/init. This means that /sbin/init should be a regular file as well. By default, buildroot makes it a symlink to busybox. We will change that, too.  Docker injects itself within containers, and (as of I write this) it is dynamically linked. This means that it requires a couple of libraries to run correctly. We will need to add those libraries to the container. (Note: Docker will eventually switch to static linkage, which means that the last step won't be necessary anymore.) We could unpack the tar file, do our changes, and repack; but that would be boring. So instead, we will be fancy and update the file on the fly.
  • 3. Let's create an extra directory, and populate it with those "additions": mkdir extra extra/etc extra/sbin extra/lib extra/lib64 touch extra/etc/resolv.conf touch extra/sbin/init cp /lib/x86_64-linux-gnu/libpthread.so.0 /lib/x86_64-linux-gnu/libc.so.6 extra/lib cp /lib64/ld-linux-x86-64.so.2 extra/lib64 The paths to the libraries might be different on your machine. In doubt, you can run ldd $(which docker) to see which libraries are used by your local Docker install. Then, create a new tarball including those extra files: cp rootfs.tar fixup.tar tar rvf fixup.tar -C extra . Last but not least, the "import" command will bring this image into Docker. We will name it "dietfs": docker import - dietfs < fixup.tar We're done! Let's make sure that everything worked properly, by creating a new container with this image: docker run -t -i dietfs /bin/sh For what it's worth, I put together a small fixup script on Gist, to automate those steps, so you can also execute it like this: curl https://gist.github.com/jpetazzo/b932fb0c753e69c73d31/raw > fixup.sh sh fixup.sh
  • 4. The result is a rather small image; less than 3.5 MB: REPOSITORY TAG ID CREATED SIZE jpetazzo/busybox latest 0c0468ea37af 5 days ago 3.389 MB (virtual 3.389 MB) Not Bad! Now, how do we build something more complex, like a PostgreSQL server? Why PostgreSQL? Two reasons. One: it's awesome. Two: I didn't find a PostgreSQL package in buildroot, so it was an excellent opportunity to learn how to include something "from scratch", as opposed to merely ticking a checkbox and recompiling away. First, we want to create a directory for our new package. From buildroot's top directory: mkdir packages/postgres Then, we need to put a couple of files in that directory. For your convenience, I stored them on Gist: curl https://gist.github.com/jpetazzo/5819538/raw/Config.in > packages/postgres/Config.in curl https://gist.github.com/jpetazzo/5819538/raw/postgres.mk > packages/postgres/postgres.mk
  • 5. Let's have a look at those files now. First, Config.in: it is used by make menuconfig to display a checkbox for our new package (yay!), but also to define some build dependencies. In that case, we need IPV6 support. config BR2_PACKAGE_POSTGRES bool "postgres" depends on BR2_TOOLCHAIN_BUILDROOT_INET_IPV6 help PostgresSQL server comment "postgres requires a toolchain with IPV6 support enabled" depends on !BR2_TOOLCHAIN_BUILDROOT_INET_IPV6 How does one know which dependencies to use? I confess that I tried first with no dependency at all. The build failed, so I had a look at the error messages, saw that it complained about missing IPV6 headers; so I fixed the issue by adding the required dependencies. The other file, postgres.mk, contains the actual build instructions: ############################################################# # # postgresql # ############################################################# POSTGRES_VERSION = 9.2.4 POSTGRES_SOURCE = postgresql-$(POSTGRES_VERSION).tar.gz POSTGRES_SITE = http://ftp.postgresql.org/pub/source/v$(POSTGRES_VERSION)/$(POSTGRES_SOURCE) POSTGRES_CONF_OPT = --with-system-tzdata=/usr/share/zoneinfo POSTGRES_DEPENDENCIES = readline zlib $(eval $(autotools-package)) As you can see, it is pretty straightforward. The main thing is to define some variables to tell buildroot where it should fetch PostgreSQL source code. We don't have to provide actual build instructions, because PostgreSQL uses autotools. ("This project uses autotools" means that you typically compile it with "./configure && make && make install ; this probably rings a bell if you ever compiled a significant project manually on any kind of UNIX system!) The build instructions will actually be expanded from the last line. If you want more details about buildroot's operation, have a look at buildroot's autotools package tutorial.
  • 6. We can see that postgres.mk also defines more dependencies: readline and zlib. So what's the difference between the CONF_OPT, DEPENDENCIES, and the "depends" previously seen in Config.in?  CONF_OPT provides extra flags which will be passed to ./configure. In this case, the compilation was failing, telling me that I should specify the path to timezone data. I looked around and figured out the right flag.  DEPENDENCIES tells buildroot to compile extra libraries before taking care of our package. Guess what: when I tried to compile, it failed and complained about missing readline and zlib; so I added those dependencies and that's it.  "depends" in Config.in is a toolchain dependency. It is not really a library; it merely tells buildroot "hey, when you will compile uclibc, make sure to include IPV6 support, will you?". It has a strong implication: when you change the configuration of the toolchain (C library or compiler), you have to recompile everything: the toolchain and everything which was compiled with it. This will obviously be longer than just recompiling a single package. It is done with the command make clean all. Last but not least, we need to include our Config.in file in the top-level Config.in. The quick and dirty way is to do this (from buildroot top directory): echo 'source "package/postgres/Config.in"' >> Config.in Note: normally, we should do this in a neat submenu section within e.g. packages/Config.in. But this way will save us some hassle navigating through the menus. Alright, now run make menuconfig again; go to "Toolchain", enable IPV6 support, go back to the main menu, and enable "postgres". Now recompile everything with make clean all. This will take a while. Just like before, we need to "fixup" the resulting image: cd output/images curl https://gist.github.com/jpetazzo/b932fb0c753e69c73d31/raw | sh We now have a Docker image with PostgreSQL in it; but it is not enough. We still need to setup the image to start PostgreSQL automatically, and even before that, PostgreSQL will have to initialize its data directory (with initdb). We will use a Dockerfile and a custom script for that. What's a Dockerfile? A Dockerfile contains basic instructions telling Docker how to build an image. When you use Docker for the first time, you will probably use "docker run" and "docker commit" to create new images; but you should quickly move to Dockerfiles and "docker build" because it automates those operations and makes it easier to share "recipes" to build images.
  • 7. Let's start with the custom script. We want this script to run automatically within the container when it starts. Make a new empty directory, and create the following init file in it: #!/bin/sh set –e mkdir /usr/share/zoneinfo /data chown default /data head -c 16 /dev/urandom | sha1sum | cut -c1-10 > /pwfile echo "PG_PASSWORD=$(cat /pwfile)" su default -s /usr/bin/initdb -- --pgdata=/data --pwfile=/pwfile --username=postgres --auth=trust >/dev/null echo host all all 0.0.0.0 0.0.0.0 md5 >> /data/pg_hba.conf exec su default -s /usr/bin/postgres -- -D /data -c 'listen_addresses=*' PostgreSQL will refuse to run as root, so we use the default user (conveniently provided by buildroot). We create /data to hold PostgreSQL data files, assign it to the non-privileged user. We also generate a random password, save it to /pwfile, and display it (to make it easier to retrieve later). We can then run initdb to actually create the data files. Then, we extend pg_hba.conf to authorize connections from the network (by default, only local connections are allowed). The last step is to actually start the server. Make sure that the script is executable: chmod +x init Now, in the same directory, we will create the following Dockerfile, to actually inject the previous script in a new image: from dietfs add . / expose 5432 cmd /init The fixup.sh script has imported our image under the name "dietfs", so our Dockerfile will start with from dietfs, to tell Docker that we want to use that image as a base. Then, we add all the files in the current directory to the root of our image. This will also inject the Dockerfile itself, but we don't care. We expose TCP port 5432, and finally tell Docker that by default, when a container is created from this image, it should run our /init script. You can read more about the Dockerfile syntax in Docker's documentation.
  • 8. The next step is to build the new image using our Dockerfile: docker build -t pglite . That's it. You can now start a new PostgreSQL instance: docker run pglite The output will include the password, and then the first log messages from the server: PG_PASSWORD=4e68b1958c LOG: database system was shut down at 2013-06-20 03:55:50 UTC LOG: database system is ready to accept connections LOG: autovacuum launcher started Weak Password Is Weak! Our password is random, but in only includes hexadecimal digits (i.e. [0-9a-f]). You can make it better by including base64 in the image, and using base64 instead of md5sum. Alternatively, you can use longer passwords. Take note of the password. It's OK to hit "Ctrl-C" now: the container will still run in the background. Let's check which port was allocated for our container. docker ps will show us all the containers currently running; but to make things even simpler, we will use docker ps -l, which only shows the latest container. $ docker ps –l ID IMAGE COMMAND CREATED STATUS PORTS SIZE e21ba744ff09 pglite:latest /bin/sh -c /init About a minute ago Up About a minute 49168->5432 23.53 MB (virtual 39.87 MB) Alright, that's port 49168. Does it really work? Let's check for ourselves! You can try locally if you have a PostgreSQL client installed on your Docker machine; or from anywhere else (just replace "localhost" with the hostname or IP address of your Docker machine). $ psql postgres --host localhost --port 49168 --username postgres Password for user postgres: 4e68b1958c psql (9.1.3, server 9.2.4) WARNING: psql version 9.1, server version 9.2. Some psql features might not work. Type "help" for help.
  • 9. postgres=# q $ A small note about sizes: the image takes about 16 MB, but the data files take almost 24 MB. So the total footprint is really about 40 MB. What if we want to automate the creation of our PostgreSQL container, to run our own PostgreSQL-as-a-Service platform? Easy, with just a tiny bit of shell trickery! CONTAINERID=$(docker run -d pglite) while ! docker logs $CONTAINERID 2>/dev/null | grep -q ^PG_PASSWORD= ; do sleep 1 ; done eval $(docker logs $CONTAINERID 2>/dev/null) PG_PORT=$(docker port $CONTAINERID 5432) echo "A new PostgreSQL instance is listening on port $PG_PORT. The admin user is postgres, the admin password is $PG_PASSWORD." That's it! If you name your image "yourname/pglite" instead of just "pglite", you will be able to "docker push" it to the Docker Public Registry, and to "docker pull" it from any other Docker host anywhere in the world. You are one PHP script away from setting up your own PostgreSQL-as-a-Service provider About Jérôme Petazzoni Jérôme is a senior engineer at dotCloud, where he rotates between Ops, Support and Evangelist duties and has earned the nickname of “master Yoda”. In a previous life he built and operated large scale Xen hosting back when EC2 was just the name of a plane, supervized the deployment of fiber interconnects through the French subway, built a specialized GIS to visualize fiber infrastructure, specialized in commando deployments of large-scale computer systems in bandwidth-constrained environments such as conference centers, and various other feats of technical wizardry. He cares for the servers powering dotCloud, helps our users feel at home on the platform, and documents the many ways to use dotCloud in articles, tutorials and sample applications. He’s also an avid dotCloud power user who has deployed just about anything on dotCloud – look for one of his many custom services on our Github repository. Connect with Jérôme on Twitter! @jpetazzo