Looking at MySQL 8 with PostgreSQL goggles on

Posted on 2019-02-20 by CYBERTEC Guest

By Kaarel Moppel – Compare MySQL and PostgreSQL – First off – I’m not trying to kindle any flame wars here, just trying to broaden my (your) horizons a bit, gather some ideas (maybe I’m missing out on something cool, it’s the most used Open Source RDBMS after all) and to somewhat compare the two despite being a difficult thing to do correctly / objectively. Also, I’m leaving aside performance comparisons and I’m looking at just the available features, general querying experience and documentation clarity as this is I guess the most important for beginners. So this is just a list of points I made for myself, grouped in no particular order.

Disclaimer: The last time I used MySQL for some personal project is 10 years ago, so basically I’m starting from zero and it only took one and a half days to get to know it – thus if you see that I’ve gotten something screamingly wrong then please do leave a comment and I’ll change it. Also, my bias in this article probably tends to favour Postgres…but I’m pretty sure a MySQL veteran with good knowledge of pros and cons can write up something similar also on Postgres, so my hope is that you can leave this aside and learn a thing or two about either system.

To run MySQL, I used the official Docker image, 8.0.14. Under MySQL the default InnoDB engine is meant.

docker run --rm -p 3306:3306 -e MYSQL_ROOT_PASSWORD=root mysql:8

“mysql” CLI (“psql” equivalent) and general querying experience

* When the server requires a password why doesn’t it just ask for it?

mysql -h 0.0.0.0 -u root   # adding '-p' will fix the error
ERROR 1045 (28000): Access denied for user 'root'@'172.17.0.1' (using password: NO)

* Very poor tab-completion compared to “psql”. Using “mycli” instead makes much more sense. I’m myself 99% of time on CLI-s, so it’s essential.
* Lot less shortcut helpers to list tables, views, functions, etc…
* Can’t set to “extended output” (columns as rows) permanently, only “auto” and “per query”.
* One does not need to specify a DB to connect to – I actually find it positive as it’s easy to forget those database names and when once in, one can call “show databases”.
* No “generate_series” function…might seem like a small thing…but with quite a costly (in time sense) impact when trying to generate some test data. there seems to be an alternative function on github but first you’d need to create a table so not quite the same.
* CLI help has links to web e.g. “help select;” shows “URL: http://dev.mysql.com/doc/refman/8.0/en/select.html” at the end of syntax description. That is great.
* If some SQL script has errors “mysql” immediately stops, whereas “psql” would continue unless a bit cryptic “-v ON_ERROR_STOP=1” flag set. I think “mysql” default behaviour is more correct here.
* No SQL standard “TABLE” syntax support. It’s a nice shortcut so I use it a lot for Postgres when testing out features / looking at config or “system stats” tables.
* MySQL has index / optimizer hints, which might be a good thing to direct some queries in your favour. Postgres has decided not to implement this feature as it can also cause problems when queries are not updated when data magnitudes changes or new/better indexes are added. There’s an extension though for Postgres (as usually).
* Some shorthand type casting (“::” in Postgres) seems to be missing. A small thing again sure, but a lot of small things will make out a big one.
* Some “pgbench” equivalent missing. A tiny and simple tool that I personally appreciate a lot in Postgres, really handy to quickly gauge server performance and OS behaviour under heavy load.

MySQL positive findings

* Much more configuration options (548 vs 282), allowing possibly to get better performance or specific behaviour. A double-edged sword though.
* Threaded implementation, should give better total performance for very large numbers (hundreds) of concurrent users.
* Good JSON handling features, like array range indexers for example: “$[1 to 10]” and JSON Path.
* More performance metrics views/tables in the “performance_schema”. Not sure how useful the information is in there though.
* There is an official clustering product option (commercial)
* Built-in support for tablespace and WAL encryption (needs 3rd party stuff for Postgres).
* MySQL workbench, a GUI tool for queries and DB design, is way more capable (and visually nicer) as “pgadmin3/4”. There’s also a commercial version with even more capabilities (backup automation, auditing).

MySQL downsides

* Seems generally more complex as Postgres for a beginner – quite some options and exceptions for different storage engines like MyISAM. Options are not bad, but remember looking as a beginner here.
* Documentation provides too many details at once, making it hard to follow – moving some corner-case stuff (exceptions about old versions etc) onto separate pages would do a lot of good there. Maybe on the plus side: there physically almost 2x more documentation, so chances are than in case of some weird problems you have higher chances for finding some explanations to it.
* From documentation it seems that besides bugfixes, also features are added to minor MySQL versions quite often…which a Postgres user would find confusing.
* Less compliant with the SQL standard. At least based on sources I found googling: 1, 2, 3.
* Importing and exporting data. There’s something equivalent to COPY but more complex (some specific grant and config setting involved for loading files located on the DB server) so that a separate tool for importing data, called “mysqlimport”. Also found an interesting passage from the docus that points to some implicit change in transaction behaviour, depending in which way you load data:

With LOAD DATA LOCAL INFILE, data-interpretation and duplicate-key errors become warnings and the operation continues because the server has no way to stop transmission of the file in the middle of the operation.

* EXPLAIN provides less value on trying to understand why a query is slow. Also, there is no EXPLAIN ANALYZE – that’s a bit of a bummer as workaround with “trace” is already a bit arcane. “EXPLAIN FORMAT=JSON” provides a bit more detail to estimate the costs though.
* Full-text search is a bit half-baked. Built-in configurations seems to be tuned for English only and there is no stemming (Postgres has 15 biggest western languages covered out of the box).
* Some size limits seem arbitrary (64TB on tablespace size, 512GB on InnoDB log files [WAL I assume]). Postgres leaves those to the OS / FS (a single table/partition size is limited to 32TB though).

PostgreSQL architectural/conceptual/platform advantages

* 100% all ACID, no exceptions. MySQL has gotten a lot better with version 8, but it’s not quite there yet with DDL for example.
* More advanced extension-system. MySQL has a plugins system also though, but not as generic to enable for example stored procedures in Python.
* More different index types available (6 vs 3) – for example it’s possible to index strings also for regex search and there are lossy indexes for Big-data. Also, MySQL doesn’t seem to support partial indexes.
* Simpler standby replica building / management. From PG10+ it’s a single command on replication host side with no special config group setup.
* Synchronous replication support.
* Closer to Oracle in terms of features, SQL standard compatibility and stored procedure capabilities. Also, there are some extensions that add some Oracle string/date processing functions etc.
* Couple of more authentication options available out of the box. MySQL has also LDAP and pluggable authentication though.
* More advanced parallel query execution. Postgres is a couple of years ahead in development here since version 10, MySQL just got the very basic (select count(*) from tbl) support out with the latest 8.0.14.
* JIT (Just-in-time) compilation, e.g. “tailored machine-code” for tuple extraction and filtering. Massive savings for Data Warehouse and other row-intensive type of queries.

MySQL architectural/conceptual/platform advantages

* Multiple storage engines. Something similar in works also for Postgres.
* Less bloat due to use of “UNDO”-based row versioning model. Work-in-progress for Postgres though.
* Threads vs processes should give a boost at high session numbers
* Built-in support for multi-master (Multi-Primary Mode) replication. There are caveats as always (CAP theorem still stands), and very few people needs something like that actually, but definitely reassuring that it’s in the “core” – for Postgres there’s a 3rd-party extension providing similar, but as I’ve understood the plan is to get it into the “core” also.
* Built-in “event scheduling”. Postgres again needs a 3rd party extension or custom C code to employ background workers.
* “REPEATABLE READ” is the default transaction model, providing consistent reads throughout a transaction out of the box, saving novice RDBMS developers possibly from quite some head-scratching.

Things I found weird in MySQL

* A table alias in an aggregate can break a query:

mysql> select count(d.*) from dept_emp de join departments d on d.dept_no = de.dept_no  where emp_no < 10011;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '*) from dept_emp de join departments d on d.dept_no = de.dept_no  where emp_no <' at line 1

* Couldn’t find a list of built-in routines from the system catalog. After some googling found that:
For ease of maintenance, we prefer to document them in only one place: the MySQL manual. Well OK, kind of makes sense, but why not to create some catalog view where one could at least have the function names and do something like “\df *terminate*”. Very handy in Postgres.
* One needs to always specify an index name! I personally leave it to Postgres as life has shown that it’s super hard to enforce a naming policy, even when the team consists of a…single developer (yes, I’m looking at myself).
* TIMESTAMP min value is ‘1970-01-01 00:00:01.000000’ and the more generous DATETIME starts with ’1000-01-01 00:00:00′ but doesn’t know about time zones…
* The effective maximum length of a VARCHAR is subject to the maximum row size (65,535 bytes, which is shared among all VARCHAR columns).
* It is not possible to (easily, w/o CASE WHEN workaround) specify if you like your NULL-s first or last, which is very weird…as this is specified in SQL Standard 2003, 15 years ago :/ By the way, default “ASC(ENDING)” mode the behaviour is also contrary to Postgres, which has NULLS LAST. Has to do with that part not specified in the SQL standard.
* No FULL OUTER JOIN. Sure, they’re quite rarely used, but most of the “competitors” have them and shouldn’t be too hard to implement if having LEFT JOIN etc.
* Only “Nested Loop” joins and it’s variations. Postgres has additionally also “Hash” and “Merge” join which help a lot when joining millions of rows.

Things I found very weird in MySQL

* CAST() function does not support all data types :/. For example, “int” is available when declaring tables but:

mysql> select cast('1' as int) x;   -- will work when cast to 'unsigned'
ERROR 1064 (42000): You have an error in your SQL syntax

* Some DDL (e.g. dropping a table) is not transactional! New tables are also immediately visible (empty though) to other transactions, when declared from a not-yet-committed transaction. Not ACID enough MySQL.
* CHECK constraints can be declared but they are silently ignored!
* FOREIGN KEY-s declared with the shorter REFERENCES syntax (at the end of column definitions) are not enforced and there are even no errors when the referenced table/column is missing! One needs to use the longer FOREIGN KEY + REFERENCES syntax.
* “Truncation of excess trailing spaces from values to be inserted into TEXT columns always generates a warning, regardless of the SQL mode.” i.e. data is silently chopped despite of the “STRICT MODE” which is the default. For indexes truncating data would be OK (Postgres does it also), but not for data.

MySQL cool features that I would welcome in Postgres

* Implicit session variables. In PG it’s also possible, but in a tedious way with “set”/set_config() + current_setting() functions. There’s a patch in circulation also for Postgres but not yet in core.

"select @a := 42; select @a;"

* Built-in “spatial” support. MySQL GIS functions fall short of Postgres equivalent PostGIS but having it in “contrib” and officially supported would make it a lot more visible and provide more guarantees for potential developers on lookout for a GIS platform, in result aiding the whole Postgres project.
* Generated columns. In Postgres you need views currently, but some work on that is luckily in progress already.
* Resource groups to prioritize/throttle some workloads (users) within an instance. Currently only CPU can be managed.
* “X Protocol” plugin. A relatively new thing that allows asynchronous calls from a single session!
* Auto-updated TIMESTAMP columns (ON UPDATE CURRENT_TIMESTAMP) when row is changed. In Postgres similar works only on initial INSERT and needs a trigger otherwise
* A single “SHOW STATUS” SQL command that gives a nice overview of global server status for both server events and normal query operations – Connections, Aborted_connects, Innodb_num_open_files, Bytes_received / sent, “admin commands” counter, object create/drop counters,pages_read/written, locks, etc. For Postgres it’s only possible with continuous pg_stat* monitoring and/or continuous log file parsing.
* RESTART (also SHUTDOWN) – a SQL command that stops and restarts the MySQL server. It requires the SHUTDOWN privilege.
* Real clustered (index-organized) tables (PRIMARY KEY implementation). In Postgres clustering is effective only for a short(ish) time.
* There’s a dead simple tool on board that auto-generates SSL certs both for server and clients.
* Fresh 8.0.14 version permits accounts to have dual passwords, designated as primary and secondary passwords. This enables smooth password phaseouts.

My verdict on MySQL 8 vs PostgreSQL 11

First again, the idea of the article is not to bash MySQL – it has shown a lot of progress recently with the latest version 8. Judging by the release notes, a lot of issues got eliminated and cool features (e.g. CTE-s, Window functions) added, making it more enterprise-suitable. There’s also much more activity happening on the source code repository compared to Postgres (according to www.openhub.net), and even if it’s a bit hard to acknowledge for a PostgreSQL consultant – it has much more installations and has very good future prospects to develop further due to solid financial backing, which is a bit of a topic for Postgres as it’s not really owned by any company (which is a good thing in other aspects).

But to somehow sum it up – currently (having a lot-lot more PG knowledge, of course) I would still recommend Postgres for 99% of users needing a relational database for their project. The remaining 1% percent would be then for cases where some global start-up scaling would be required, due to native multi-master support. In other aspects PostgreSQL is bit more lightweight, comprehensible (yes, this means occasionally also less choices) and most importantly provides fewer surprises and doesn’t play with data integrity: it is simply not possible to lose/violate data if you have constraints (checks, foreign keys) set! With MySQL you need to keep your guards up at the developer end…but as we know, people forget and are busy and take shortcuts when under time pressure – something that could bite you hard years after the shortcut was taken. Also, Postgres has more advanced extension possibilities, for example 100+ Foreign Data Wrappers for the weirdest data integration needs.

Hope you found something new and interesting for yourself, thanks for reading!

Check out the latest blogs concerning the important topic of PostgreSQL security in our security blog spot.