Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Schema liberation with JSON and plv8 (and Postgres)

Schema liberation with JSON and plv8 (and Postgres)

Dropbox link: https://www.dropbox.com/s/vc2oheabr5s1x11/schema%20liberation%20with%20json%20and%20plv8.pdf

A few more details: http://www.chesnok.com/daily/2013/05/31/more-about-javascript-and-postgresql/

JSON is the lingua franca of the web. Postgres supports JSON natively and the PLV8 Javascript engine as an extension. Full Javascript is supported inside the database, including direct access to tables and important database features.

And, developer-friendly features make it incredibly easy to transform your existing, normalized schemas into liberated JSON ones!

Tour what putting a document store inside of Postgres looks like, including a look at a production use-case from Mozilla. crash-stats.mozilla.com is backed by a 2 TB Postgres database cluster that's adding about about 5 GB of JSON per week.

All photos of birds were taken in the Amelia Island, FL area during the JSConf Family Reunion in May 2013.

Birds represented:
Baby Snowy Egrets
Great Egret
Yellow throated warbler
Tri-colored Heron
Laughing gull
Roseate spoonbill (endangered)
Black skimmer
Roseate spoonbill flock
Great Egret
Baby great egrets
Crocodile (not a bird)
Wood stork (endangered)
Eggs (we think from a snowy egret)
Green Heron

Selena Deckelmann

May 31, 2013
Tweet

More Decks by Selena Deckelmann

Other Decks in Programming

Transcript

  1. Data in MongoDB has a flexible schema. ... Schema flexibility

    means that you can model your documents in MongoDB so that they can closely resemble and reflect application-level objects. -The MongoDB Documentation
  2. Data in MongoDB has a flexible schema. ... Schema flexibility

    means that you can model your documents in MongoDB so that they can closely resemble and reflect application-level objects. -The MongoDB Documentation
  3. create table birds (id serial, name text); create table locations

    (id serial, state text, city text); create table sightings (id serial, seen_on date DEFAULT now()::date, bird_id int, location_id int); insert into table birds (name) values(‘yellow throated warbler’); insert into table locations (state, city) values(‘FL’, ‘Amelia Island’); insert into sightings (bird_id, location_id) ( with b_id as (select id from birds where name = ‘painted bunting’), with l_id as (select id from location where state = ‘FL’ and city = ‘Amelia Island’) select b_id.id, l_id.id from b_id, l_id );
  4. vs. create table birds (id serial, name text); create table

    locations (id serial, state text, city text); create table sightings (id serial, seen_on date DEFAULT now()::date, bird_id int, location_id int); insert into table birds (name) values(‘yellow throated warbler’); insert into table locations (state, city) values(‘FL’, ‘Amelia Island’); insert into sightings (bird_id, location_id) ( with b_id as (select id from birds where name = ‘painted bunting’), with l_id as (select id from location where state = ‘FL’ and city = ‘Amelia Island’) select b_id.id, l_id.id from b_id, l_id );
  5. {"sightings": [{"name": "yellow throated warbler", "where": "Amelia Island, FL", "when":

    "2013-05-28"}] } vs. create table birds (id serial, name text); create table locations (id serial, state text, city text); create table sightings (id serial, seen_on date DEFAULT now()::date, bird_id int, location_id int); insert into table birds (name) values(‘yellow throated warbler’); insert into table locations (state, city) values(‘FL’, ‘Amelia Island’); insert into sightings (bird_id, location_id) ( with b_id as (select id from birds where name = ‘painted bunting’), with l_id as (select id from location where state = ‘FL’ and city = ‘Amelia Island’) select b_id.id, l_id.id from b_id, l_id );
  6. postgres# \d reports Table "public.reports" Column | Type | Modifiers

    ---------------------+--------------------------+------------------------------------------------------ id | integer | not null default nextval('reports_id_seq'::regclass) client_crash_date | timestamp with time zone | date_processed | timestamp with time zone | uuid | character varying(50) | not null product | character varying(30) | version | character varying(16) | build | character varying(30) | signature | character varying(255) | install_age | integer | last_crash | integer | uptime | integer | cpu_name | character varying(100) | cpu_info | character varying(100) | reason | character varying(255) | address | character varying(20) | os_name | character varying(100) | os_version | character varying(100) | user_id | character varying(50) | started_datetime | timestamp with time zone | completed_datetime | timestamp with time zone | success | boolean | truncated | boolean | processor_notes | text | user_comments | character varying(1024) | app_notes | character varying(1024) | distributor | character varying(20) | distributor_version | character varying(20) | topmost_filenames | text | addons_checked | boolean | flash_version | text | hangid | text | process_type | text | release_channel | text | productid | text | exploitability | text | flash_process_dump | text | url | text | email | text | (38 columns)
  7. postgres# \d reports Table "public.reports" Column | Type | Modifiers

    ---------------------+--------------------------+------------------------------------------------------ id | integer | not null default nextval('reports_id_seq'::regclass) client_crash_date | timestamp with time zone | date_processed | timestamp with time zone | uuid | character varying(50) | not null product | character varying(30) | version | character varying(16) | build | character varying(30) | signature | character varying(255) | install_age | integer | last_crash | integer | uptime | integer | cpu_name | character varying(100) | cpu_info | character varying(100) | reason | character varying(255) | address | character varying(20) | os_name | character varying(100) | os_version | character varying(100) | user_id | character varying(50) | started_datetime | timestamp with time zone | completed_datetime | timestamp with time zone | success | boolean | truncated | boolean | processor_notes | text | user_comments | character varying(1024) | app_notes | character varying(1024) | distributor | character varying(20) | distributor_version | character varying(20) | topmost_filenames | text | addons_checked | boolean | flash_version | text | hangid | text | process_type | text | release_channel | text | productid | text | exploitability | text | flash_process_dump | text | url | text | email | text | (38 columns) Roseate spoonbill says: “Yuck!”
  8. postgres# \d liberated Table "public.liberated" Column | Type | Modifiers

    -------------+---------+----------- row_to_json | json |
  9. postgres# create table liberated as (select row_to_json(row) from (select *

    from reports_20130506) as row); SELECT 2695824 Time: 110470.716 ms
  10. postgres# create table liberated as (select row_to_json(row) from (select *

    from reports_20130506) as row); SELECT 2695824 Time: 110470.716 ms Took about 2 minutes on an older dev system.
  11. postgres# select pg_size_pretty(pg_table_size('repo rts_20130506')) as “original size”; original size ---------------

    2120 MB postgres# select pg_size_pretty(pg_table_size('libe rated')) as “liberated size”; liberated size ---------------- 4076 MB
  12. CREATE SCHEMA liberated; CREATE OR REPLACE FUNCTION public.liberate() RETURNS boolean

    LANGUAGE plv8 IMMUTABLE AS $function$ var tables = plv8.execute( "SELECT relname FROM pg_catalog.pg_class c" + " LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace" + " WHERE relkind = 'r' AND n.nspname = 'public'"); for (var i = 0; i < tables.length; i++) { plv8.elog(NOTICE, tables[i]['relname']); var code = "CREATE TABLE liberated." + tables[i]['relname'] + " AS ( SELECT row_to_json(" + tables[i]['relname'] + ") AS liberated_json FROM " + tables[i]['relname'] + ")"; plv8.execute(code); } return 1; $function$; https://gist.github.com/selenamarie/5646494
  13. CREATE SCHEMA liberated; CREATE OR REPLACE FUNCTION public.liberate() RETURNS boolean

    LANGUAGE plv8 IMMUTABLE AS $function$ var tables = plv8.execute( "SELECT relname FROM pg_catalog.pg_class c" + " LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace" + " WHERE relkind = 'r' AND n.nspname = 'public'"); for (var i = 0; i < tables.length; i++) { plv8.elog(NOTICE, tables[i]['relname']); var code = "CREATE TABLE liberated." + tables[i]['relname'] + " AS ( SELECT row_to_json(" + tables[i]['relname'] + ") AS liberated_json FROM " + tables[i]['relname'] + ")"; plv8.execute(code); } return 1; $function$; https://gist.github.com/selenamarie/5646494
  14. CREATE SCHEMA liberated; -- A namespace for our new tables!

    -- a cluster contains databases -- a database contains schemas -- a schema contains tables
  15. CREATE OR REPLACE FUNCTION liberate() RETURNS boolean -- Say what

    we’re returning LANGUAGE plv8 -- Say we’ll be using plv8 AS $function$ -- Function goes here! $function$;
  16. -- Find all the names of -- all postgres tables!

    SELECT relname AS table_name FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE relkind = 'r' -- meaning a table! and n.nspname = 'public';
  17. CREATE SCHEMA liberated; CREATE OR REPLACE FUNCTION public.liberate() RETURNS boolean

    LANGUAGE plv8 IMMUTABLE AS $function$ var tables = plv8.execute( "SELECT relname FROM pg_catalog.pg_class c" + " LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace" + " WHERE relkind = 'r' AND n.nspname = 'public'"); for (var i = 0; i < tables.length; i++) { plv8.elog(NOTICE, tables[i]['relname']); var code = "CREATE TABLE liberated." + tables[i]['relname'] + " AS ( SELECT row_to_json(" + tables[i]['relname'] + ") AS liberated_json FROM " + tables[i]['relname'] + ")"; plv8.execute(code); } return 1; $function$; https://gist.github.com/selenamarie/5646494
  18. postgres# create table birds (sighting json); CREATE TABLE postgres# insert

    into birds VALUES( '{"name": "Great Egret", "where": "Amelia Island, FL", "when":"2013-05-28"}' ); INSERT 0 1 postgres# select * from birds; sighting -------------------------------------------------------------------- {"name": "Great Egret", "where": "Amelia Island, FL", "when": "2013-05-28"} (1 row)
  19. postgres# select json_object_field_text(sighting, 'name') from birds; json_object_field_text ------------------------ Great Egret

    postgres# select json_object_field(sighting, 'name') from birds; json_object_field ------------------- "Great Egret" postgres# select json_object_keys(sighting) from birds; json_object_keys ------------------ name where when And we can create indexes, compare, join against and use JSON columns like any other Postgres column.
  20. create or replace function json_value( j json, key text) returns

    text AS $$ if (typeof j != 'object') return NULL; return JSON.stringify(j[key]); $$ language plv8;
  21. plv8 supports running raw SQL, prepared statements and cursors: rows

    = plv8.execute(...) plan = plv8.prepare(...) rows = plan.execute(...) plan = plv8.prepare(...) cursor = plan.cursor(...) while (row = cursor.fetch()) {}
  22. create or replace function inject_js(src text) returns text as $$

    return eval( "(function() {" + src + "})")(); $$ LANGUAGE plv8;
  23. postgres# \d reports Table "public.reports" Column | Type | Modifiers

    ---------------------+--------------------------+------------------------------------------------------ id | integer | not null default nextval('reports_id_seq'::regclass) client_crash_date | timestamp with time zone | date_processed | timestamp with time zone | uuid | character varying(50) | not null product | character varying(30) | version | character varying(16) | build | character varying(30) | signature | character varying(255) | install_age | integer | last_crash | integer | uptime | integer | cpu_name | character varying(100) | cpu_info | character varying(100) | reason | character varying(255) | address | character varying(20) | os_name | character varying(100) | os_version | character varying(100) | user_id | character varying(50) | started_datetime | timestamp with time zone | completed_datetime | timestamp with time zone | success | boolean | truncated | boolean | processor_notes | text | user_comments | character varying(1024) | app_notes | character varying(1024) | distributor | character varying(20) | distributor_version | character varying(20) | topmost_filenames | text | addons_checked | boolean | flash_version | text | hangid | text | process_type | text | release_channel | text | productid | text | exploitability | text | flash_process_dump | text | url | text | email | text | (38 columns)
  24. postgres# select pg_size_pretty(pg_table_size('reports _20130513')) as reports_20130513_size; reports_20130513_size ----------------------- 2136 MB

    postgres# select pg_size_pretty(pg_table_size('raw_cra shes_20130513')) as raw_crashes_20130513_size; raw_crashes_20130513_size --------------------------- 5246 MB
  25. 1. Use “bulkbag” schema design + schema evolution JSON to

    start, normalize to optimize Why use Postgres?
  26. 1. Use “bulkbag” schema design + schema evolution JSON to

    start, normalize to optimize 2. Easily scale to multi-terabyte DBs For write- or read-heavy loads, non-cloud storage Why use Postgres?
  27. 1. Use “bulkbag” schema design + schema evolution JSON to

    start, normalize to optimize 2. Easily scale to multi-terabyte DBs For write- or read-heavy loads, non-cloud storage 3. Manage your data with a language you love Put the .js for big data jobs with the big data Why use Postgres?
  28. TLDR Datatypes Conditional Indexes Transactional DDL Foreign Data Wrappers Concurrent

    Index Creation Extensions Common Table Expressions Fast Column Addition Listen/Notify Table Inheritance Per Transaction sync replication Window Functions Materialized Views Updatable Views Momentum
  29. liberate your schema! with json and plv8 selena deckelmann @selenamarie

    [email protected] https://code.google.com/p/plv8js/wiki/PLV8 http://www.postgresql.org/developer/beta/ https://gist.github.com/selenamarie/5646494 http://www.pgxn.org/dist/json_enhancements/doc/ json_enhancements.html
  30. Special thanks to: Scott Deckelmann Andrew Dunstan (JSON datatype developer)

    Hitoshi Harada (plv8 author) Will Leinweber (source) Craig Kerstiens (source) reviewers: Brenda Wallace Aurynn Shaw Doug Gorley Andrew Grangaard George London Jacinta Richardson
  31. This work is licensed under the Creative Commons Attribution- ShareAlike

    3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA. Photos from slides 1, 3, 6, 8, 9, 10, 12, 13, 22, 31, 39, 40, 48, 49, 51, 53 © Scott Deckelmann, used with permission.