SlideShare a Scribd company logo
1 of 26
Download to read offline
Accelera'ng 
Local 
Search 
With 
PostgreSQL 
9.1 
Jonathan 
S. 
Katz 
Exco 
Ventures 
PGConf.eu 
2011 
– 
Oct 
21, 
2011
Local 
Search? 
• Not 
necessarily 
loca'on 
based 
on 
places 
• “How 
close 
are 
two 
en''es 
to 
one 
another?” 
• “What 
are 
the 
closest 
en''es 
to 
me?” 
• “What 
are 
my 
nearest 
neighbors?” 
2
Nearest 
Neighbor 
Overview 
• Want 
to 
know 
“how 
similar” 
objects 
are 
rela've 
to 
each 
other 
– What 
are 
the 
top 
“k” 
choices 
near 
me? 
• Need 
to 
define 
a 
“metric” 
for 
similarity 
– “distance” 
3
K-­‐Nearest 
Neighbor 
• Given 
a 
collec'on 
of 
n 
objects 
• When 
trying 
to 
classify 
an 
unknown 
object 
– compute 
the 
distance 
between 
all 
known 
objects 
– find 
the 
k 
(k 
≥ 
1) 
closest 
objects 
to 
the 
unknown 
object 
– classify 
the 
object 
based 
on 
class 
of 
k 
closest 
objects 
• When 
k=1, 
then 
unknown 
object 
is 
given 
same 
classifica'on 
as 
object 
it 
is 
closest 
to 
4
K=1 
Example 
Voronoi 
Diagram 
of 
order 
1 
can 
be 
used 
to 
make 
k=1 
NN 
queries 
5
Just 
a 
bit 
more 
theory… 
• Voronoi 
diagrams 
of 
order-­‐1 
are 
created 
in 
O(n 
log 
n) 
'me 
– Looks 
similar 
to…? 
:-­‐) 
• Queried 
in 
O(1) 
'me 
• Therefore: 
– Pay 
the 
'me 
penalty 
to 
build 
the 
index 
– Query 
against 
index 
quickly 
6
Applica'ons 
• Geoloca'on 
+ 
Op'mizing 
Posi'oning 
• Classifica'on 
• Similarity 
• Recommenda'on 
systems 
• Content-­‐based 
image 
retrieval 
• etc. 
7
So 
what 
about 
PostgreSQL? 
• As 
of 
PostgreSQL 
9.0 
– supports 
geometric 
types 
and 
distances 
• Points, 
circles, 
lines, 
boxes, 
polygons 
• Distance 
operator: 
<-­‐> 
– pg_trgm 
– 
supplied 
module 
for 
determining 
text 
similarity 
• similarity(“abc”, 
“ade”) 
computes 
similarity 
score 
• <-­‐> 
defines 
distance 
(opposite 
if 
similarity), 
not 
defined 
(in 
9.0) 
8
PostgreSQL 
9.1: 
KNN-­‐GiST 
• Let 
n 
= 
size 
of 
a 
table 
• Can 
index 
data 
that 
provides 
a 
“<-­‐>” 
(distance) 
operator 
– Geometric 
– pg_trgm 
• “k” 
= 
LIMIT 
clause 
• Known 
inefficiencies 
when 
k=n 
and 
n 
is 
small 
9
Example: 
pg_trgm 
• Data: 
– List 
of 
1,000,000 
names 
– 
700,000 
unique 
– n 
= 
1,000,000 
• Indexes: 
– CREATE 
INDEX 
names_name_idx 
ON 
names 
(name) 
– CREATE 
INDEX 
trgm_idx 
ON 
names 
USING 
gist 
(name 
gist_trgm_ops) 
• k=10 
• Displaying 
query 
plan 
/ 
execu'on 
'me 
aqer 
10 
runs 
10
pg_trgm: 
9.0 
EXPLAIN 
ANALYZE 
SELECT 
name, 
similarity(name, 
'jon') 
AS 
sim 
FROM 
names 
WHERE 
name 
% 
'jon' 
ORDER 
BY 
sim 
DESC 
LIMIT 
10; 
11
pg_trgm: 
9.0 
Limit 
(cost=2724.95..2724.98 
rows=10 
width=14) 
(actual 
'me=192.793..192.794 
rows=10 
loops=1) 
-­‐> 
Sort 
(cost=2724.95..2727.45 
rows=1000 
width=14) 
(actual 
'me=192.790..192.791 
rows=10 
loops=1) 
Sort 
Key: 
(similarity(name, 
'jon'::text)) 
Sort 
Method: 
top-­‐N 
heapsort 
Memory: 
25kB 
-­‐> 
Bitmap 
Heap 
Scan 
on 
names 
(cost=56.47..2703.34 
rows=1000 
width=14) 
(actual 
'me=188.836..192.499 
rows=865 
loops=1) 
Recheck 
Cond: 
(name 
% 
'jon'::text) 
-­‐> 
Bitmap 
Index 
Scan 
on 
trgm_idx 
(cost=0.00..56.22 
rows=1000 
width=0) 
(actual 
'me=188.652..188.652 
rows=865 
loops=1) 
Index 
Cond: 
(name 
% 
'jon'::text) 
Total 
run'me: 
192.881 
ms 
12
pg_trgm: 
9.1 
EXPLAIN 
ANALYZE 
SELECT 
name, 
similarity(name, 
'jon') 
AS 
sim 
FROM 
names 
WHERE 
name 
% 
'jon' 
ORDER 
BY 
sim 
DESC 
LIMIT 
10; 
13
pg_trgm 
9.1 
Limit 
(cost=2720.91..2720.93 
rows=10 
width=14) 
(actual 
'me=202.022..202.023 
rows=10 
loops=1) 
-­‐> 
Sort 
(cost=2720.91..2723.41 
rows=1000 
width=14) 
(actual 
'me=202.020..202.021 
rows=10 
loops=1) 
Sort 
Key: 
(similarity(name, 
'jon'::text)) 
Sort 
Method: 
top-­‐N 
heapsort 
Memory: 
25kB 
-­‐> 
Bitmap 
Heap 
Scan 
on 
names 
(cost=52.43..2699.30 
rows=1000 
width=14) 
(actual 
'me=198.324..201.719 
rows=865 
loops=1) 
Recheck 
Cond: 
(name 
% 
'jon'::text) 
-­‐> 
Bitmap 
Index 
Scan 
on 
names_trgm_idx 
(cost=0.00..52.18 
rows=1000 
width=0) 
(actual 
'me=198.156..198.156 
rows=865 
loops=1) 
Index 
Cond: 
(name 
% 
'jon'::text) 
Total 
run1me: 
202.113 
ms 
14
Comparable? 
• Seems 
to 
be 
similar 
– Need 
to 
do 
more 
research 
why 
– 
anyone? 
• However, 
9.1 
offers 
improvements 
for 
LIKE/ 
ILIKE 
search 
with 
pg_trgm 
15
LIKE/ILIKE 
EXPLAIN 
ANALYZE 
SELECT 
name 
FROM 
names 
WHERE 
name 
LIKE 
'%ata%n'; 
16
LIKE/ILIKE 
pg_trgm: 
9.0 
vs 
9.1 
Seq 
Scan 
on 
names 
(cost=0.00..18717.00 
rows=99 
width=14) 
(actual 
'me=0.339..205.659 
rows=665 
loops=1) 
Filter: 
(name 
~~ 
'%ata%n'::text) 
Total 
run1me: 
205.743 
ms 
Bitmap 
Heap 
Scan 
on 
names 
(cost=9.45..369.20 
rows=99 
width=14) 
(actual 
'me=122.494..125.967 
rows=665 
loops=1) 
Recheck 
Cond: 
(name 
~~ 
'%ata%n'::text) 
-­‐> 
Bitmap 
Index 
Scan 
on 
names_trgm_idx 
(cost=0.00..9.42 
rows=99 
width=0) 
(actual 
'me=121.972..121.972 
rows=3551 
loops=1) 
Index 
Cond: 
(name 
~~ 
'%ata%n'::text) 
Total 
run1me: 
126.065 
ms 
17
Geometry 
• Data: 
– 2,000,000 
points, 
from 
(0,0) 
-­‐> 
(10000, 
10000) 
• Index: 
– CREATE 
INDEX 
geoloc_coord_idx 
ON 
geoloc 
USING 
gist 
(coord); 
18
Geometry 
EXPLAIN 
ANALYZE 
SELECT 
*, 
coord 
<-­‐> 
point(500,500) 
FROM 
geoloc 
ORDER 
BY 
coord 
<-­‐> 
point(500,500) 
LIMIT 
10; 
19
Geometry: 
9.0 
vs 
9.1 
Limit 
(cost=80958.28..80958.31 
rows=10 
width=20) 
(actual 
'me=1035.313..1035.316 
rows=10 
loops=1) 
-­‐> 
Sort 
(cost=80958.28..85958.28 
rows=2000000 
width=20) 
(actual 
'me=1035.312..1035.314 
rows=10 
loops=1) 
Sort 
Key: 
((coord 
<-­‐> 
'(500,500)'::point)) 
Sort 
Method: 
top-­‐N 
heapsort 
Memory: 
25kB 
-­‐> 
Seq 
Scan 
on 
geoloc 
(cost=0.00..37739.00 
rows=2000000 
width=20) 
(actual 
'me=0.029..569.501 
rows=2000000 
loops=1) 
Total 
run1me: 
1035.349 
ms 
Limit 
(cost=0.00..0.81 
rows=10 
width=20) 
(actual 
'me=0.576..1.255 
rows=10 
loops=1) 
-­‐> 
Index 
Scan 
using 
geoloc_coord_idx 
on 
geoloc 
(cost=0.00..162068.96 
rows=2000000 
width=20) 
(actual 
'me=0.575..1.251 
rows=10 
loops=1) 
Order 
By: 
(coord 
<-­‐> 
'(500,500)'::point) 
Total 
run1me: 
1.391 
ms 
20
Applica'on 
Examples 
• Proximity 
map 
search 
– 
fast! 
21
Drawbacks 
• Performance 
benefits 
are 
limited 
when: 
– LIMIT 
is 
close 
to 
size 
of 
data 
set 
and 
data 
set 
is 
large 
– Data 
set 
is 
small 
• Time 
to 
build 
index 
– High 
transac'on 
table 
22
Conclusions 
• GiST: 
“Generalized 
Search 
Tree” 
– 
index 
is 
there, 
up 
to 
developers 
to 
define 
access 
methods 
of 
data 
types 
– e.g. 
yields 
KNN-­‐GiST 
• Different 
types 
of 
applica'ons 
can 
be 
built 
– 
performance 
enhancements 
• Next 
steps? 
23
My 
Wish 
List 
• Further 
geometric-­‐type 
support 
in 
Postgres 
– N-­‐dimensional 
points 
– ‘=‘ 
operator 
for 
point 
type 
– (PostGIS 
s'll 
champion 
of 
complex 
geometric 
+ 
geographic 
data 
types) 
• Define 
“distance” 
over 
mul'columns 
with 
different 
types? 
– SELECT 
(a.name, 
a.geocode) 
<-­‐> 
(b.name, 
b.geocode) 
FROM 
x 
a, 
x 
b; 
24
References 
• Oleg 
Bartunov 
– 
“Efficient 
K-­‐nearest 
neighbour 
search 
in 
PostgreSQL” 
(h~p://www.sai.msu.su/~megera/ 
postgres/talks/pgday-­‐2010.pdf) 
• Oleg 
Bartunov 
and 
Teodor 
Sigaev 
for 
work 
on 
KNN-­‐ 
GiST 
and 
notes 
on 
pg_trgm 
(h~p:// 
developer.postgresql.org/pgdocs/postgres/ 
pgtrgm.html) 
• Hubert 
“depesz” 
Lubaczewski 
– 
pa~erned 
benchmarks 
off 
of 
his 
work 
-­‐ 
h~p://www.depesz.com/index.php/ 
2010/12/11/wai'ng-­‐for-­‐9-­‐1-­‐knngist/ 
25
Contact 
• jonathan.katz@excoventures.com 
• @jkatz05 
• Feedback 
Please! 
– h~p://2011.pgconf.eu/feedback 
26

More Related Content

What's hot

カスタムプランと汎用プラン
カスタムプランと汎用プランカスタムプランと汎用プラン
カスタムプランと汎用プランMasao Fujii
 
PostgreSQLの実行計画を読み解こう(OSC2015 Spring/Tokyo)
PostgreSQLの実行計画を読み解こう(OSC2015 Spring/Tokyo)PostgreSQLの実行計画を読み解こう(OSC2015 Spring/Tokyo)
PostgreSQLの実行計画を読み解こう(OSC2015 Spring/Tokyo)Satoshi Yamada
 
SQLAlchemy Core: An Introduction
SQLAlchemy Core: An IntroductionSQLAlchemy Core: An Introduction
SQLAlchemy Core: An IntroductionJason Myers
 
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQLJoel Brewer
 
joins and subqueries in big data analysis
joins and subqueries in big data analysisjoins and subqueries in big data analysis
joins and subqueries in big data analysisSanSan149
 
MySQL Index Cookbook
MySQL Index CookbookMySQL Index Cookbook
MySQL Index CookbookMYXPLAIN
 
[네이버오픈소스세미나] Contribution, 전쟁의 서막 : Apache OpenWhisk 성능 개선 - 김동경
[네이버오픈소스세미나] Contribution, 전쟁의 서막 : Apache OpenWhisk 성능 개선 - 김동경[네이버오픈소스세미나] Contribution, 전쟁의 서막 : Apache OpenWhisk 성능 개선 - 김동경
[네이버오픈소스세미나] Contribution, 전쟁의 서막 : Apache OpenWhisk 성능 개선 - 김동경NAVER Engineering
 
Introduction to oracle functions
Introduction to oracle functionsIntroduction to oracle functions
Introduction to oracle functionsNitesh Singh
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouseAltinity Ltd
 
PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)
PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)
PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)Hironobu Suzuki
 
PostgreSQL13でのレプリケーション関連の改善について(第14回PostgreSQLアンカンファレンス@オンライン)
PostgreSQL13でのレプリケーション関連の改善について(第14回PostgreSQLアンカンファレンス@オンライン)PostgreSQL13でのレプリケーション関連の改善について(第14回PostgreSQLアンカンファレンス@オンライン)
PostgreSQL13でのレプリケーション関連の改善について(第14回PostgreSQLアンカンファレンス@オンライン)NTT DATA Technology & Innovation
 
JSONB型でpostgresをNoSQLっぽく使う
JSONB型でpostgresをNoSQLっぽく使うJSONB型でpostgresをNoSQLっぽく使う
JSONB型でpostgresをNoSQLっぽく使うYuki Takeichi
 
Introduction VAUUM, Freezing, XID wraparound
Introduction VAUUM, Freezing, XID wraparoundIntroduction VAUUM, Freezing, XID wraparound
Introduction VAUUM, Freezing, XID wraparoundMasahiko Sawada
 
View, Store Procedure & Function and Trigger in MySQL - Thaipt
View, Store Procedure & Function and Trigger in MySQL - ThaiptView, Store Procedure & Function and Trigger in MySQL - Thaipt
View, Store Procedure & Function and Trigger in MySQL - ThaiptFramgia Vietnam
 
PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)NTT DATA Technology & Innovation
 
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesWebinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesAltinity Ltd
 
Advanced MySQL Query Tuning
Advanced MySQL Query TuningAdvanced MySQL Query Tuning
Advanced MySQL Query TuningAlexander Rubin
 
PostgreSQL実行計画入門@関西PostgreSQL勉強会
PostgreSQL実行計画入門@関西PostgreSQL勉強会PostgreSQL実行計画入門@関西PostgreSQL勉強会
PostgreSQL実行計画入門@関西PostgreSQL勉強会Satoshi Yamada
 
2019年度 若手技術者向け講座 実行計画
2019年度 若手技術者向け講座 実行計画2019年度 若手技術者向け講座 実行計画
2019年度 若手技術者向け講座 実行計画keki3
 

What's hot (20)

カスタムプランと汎用プラン
カスタムプランと汎用プランカスタムプランと汎用プラン
カスタムプランと汎用プラン
 
PostgreSQLの実行計画を読み解こう(OSC2015 Spring/Tokyo)
PostgreSQLの実行計画を読み解こう(OSC2015 Spring/Tokyo)PostgreSQLの実行計画を読み解こう(OSC2015 Spring/Tokyo)
PostgreSQLの実行計画を読み解こう(OSC2015 Spring/Tokyo)
 
SQLAlchemy Core: An Introduction
SQLAlchemy Core: An IntroductionSQLAlchemy Core: An Introduction
SQLAlchemy Core: An Introduction
 
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQL
 
joins and subqueries in big data analysis
joins and subqueries in big data analysisjoins and subqueries in big data analysis
joins and subqueries in big data analysis
 
MySQL Index Cookbook
MySQL Index CookbookMySQL Index Cookbook
MySQL Index Cookbook
 
[네이버오픈소스세미나] Contribution, 전쟁의 서막 : Apache OpenWhisk 성능 개선 - 김동경
[네이버오픈소스세미나] Contribution, 전쟁의 서막 : Apache OpenWhisk 성능 개선 - 김동경[네이버오픈소스세미나] Contribution, 전쟁의 서막 : Apache OpenWhisk 성능 개선 - 김동경
[네이버오픈소스세미나] Contribution, 전쟁의 서막 : Apache OpenWhisk 성능 개선 - 김동경
 
Introduction to oracle functions
Introduction to oracle functionsIntroduction to oracle functions
Introduction to oracle functions
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouse
 
PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)
PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)
PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)
 
PostgreSQL13でのレプリケーション関連の改善について(第14回PostgreSQLアンカンファレンス@オンライン)
PostgreSQL13でのレプリケーション関連の改善について(第14回PostgreSQLアンカンファレンス@オンライン)PostgreSQL13でのレプリケーション関連の改善について(第14回PostgreSQLアンカンファレンス@オンライン)
PostgreSQL13でのレプリケーション関連の改善について(第14回PostgreSQLアンカンファレンス@オンライン)
 
JSONB型でpostgresをNoSQLっぽく使う
JSONB型でpostgresをNoSQLっぽく使うJSONB型でpostgresをNoSQLっぽく使う
JSONB型でpostgresをNoSQLっぽく使う
 
Introduction VAUUM, Freezing, XID wraparound
Introduction VAUUM, Freezing, XID wraparoundIntroduction VAUUM, Freezing, XID wraparound
Introduction VAUUM, Freezing, XID wraparound
 
View, Store Procedure & Function and Trigger in MySQL - Thaipt
View, Store Procedure & Function and Trigger in MySQL - ThaiptView, Store Procedure & Function and Trigger in MySQL - Thaipt
View, Store Procedure & Function and Trigger in MySQL - Thaipt
 
PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)
 
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesWebinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
 
Advanced MySQL Query Tuning
Advanced MySQL Query TuningAdvanced MySQL Query Tuning
Advanced MySQL Query Tuning
 
PostgreSQL安定運用のコツ2009 @hbstudy#5
PostgreSQL安定運用のコツ2009 @hbstudy#5PostgreSQL安定運用のコツ2009 @hbstudy#5
PostgreSQL安定運用のコツ2009 @hbstudy#5
 
PostgreSQL実行計画入門@関西PostgreSQL勉強会
PostgreSQL実行計画入門@関西PostgreSQL勉強会PostgreSQL実行計画入門@関西PostgreSQL勉強会
PostgreSQL実行計画入門@関西PostgreSQL勉強会
 
2019年度 若手技術者向け講座 実行計画
2019年度 若手技術者向け講座 実行計画2019年度 若手技術者向け講座 実行計画
2019年度 若手技術者向け講座 実行計画
 

Similar to Accelerating Local Search with PostgreSQL (KNN-Search)

MongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
MongoDB Chicago - MapReduce, Geospatial, & Other Cool FeaturesMongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
MongoDB Chicago - MapReduce, Geospatial, & Other Cool Featuresajhannan
 
Dive into EXPLAIN - PostgreSql
Dive into EXPLAIN  - PostgreSqlDive into EXPLAIN  - PostgreSql
Dive into EXPLAIN - PostgreSqlDmytro Shylovskyi
 
On Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data TypesOn Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data TypesJonathan Katz
 
SQL: Query optimization in practice
SQL: Query optimization in practiceSQL: Query optimization in practice
SQL: Query optimization in practiceJano Suchal
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesMongoDB
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenPostgresOpen
 
The state of geo in ElasticSearch
The state of geo in ElasticSearchThe state of geo in ElasticSearch
The state of geo in ElasticSearchFan Robbin
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford MapR Technologies
 
Graph Regularised Hashing
Graph Regularised HashingGraph Regularised Hashing
Graph Regularised HashingSean Moran
 
k-means Clustering and Custergram with R
k-means Clustering and Custergram with Rk-means Clustering and Custergram with R
k-means Clustering and Custergram with RDr. Volkan OBAN
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analyticsMongoDB
 
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Sean Moran
 
PostgreSQL 9.4 JSON Types and Operators
PostgreSQL 9.4 JSON Types and OperatorsPostgreSQL 9.4 JSON Types and Operators
PostgreSQL 9.4 JSON Types and OperatorsNicholas Kiraly
 
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Konstantin V. Shvachko
 
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Ontico
 
Top k string similarity search
Top k string similarity searchTop k string similarity search
Top k string similarity searchChiao-Meng Huang
 
Covering the earth and the cloud the next generation of spatial in sql server...
Covering the earth and the cloud the next generation of spatial in sql server...Covering the earth and the cloud the next generation of spatial in sql server...
Covering the earth and the cloud the next generation of spatial in sql server...Texas Natural Resources Information System
 

Similar to Accelerating Local Search with PostgreSQL (KNN-Search) (20)

MongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
MongoDB Chicago - MapReduce, Geospatial, & Other Cool FeaturesMongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
MongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
 
Dive into EXPLAIN - PostgreSql
Dive into EXPLAIN  - PostgreSqlDive into EXPLAIN  - PostgreSql
Dive into EXPLAIN - PostgreSql
 
On Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data TypesOn Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data Types
 
ClusterAnalysis
ClusterAnalysisClusterAnalysis
ClusterAnalysis
 
SQL: Query optimization in practice
SQL: Query optimization in practiceSQL: Query optimization in practice
SQL: Query optimization in practice
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
 
The state of geo in ElasticSearch
The state of geo in ElasticSearchThe state of geo in ElasticSearch
The state of geo in ElasticSearch
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford
 
Graph Regularised Hashing
Graph Regularised HashingGraph Regularised Hashing
Graph Regularised Hashing
 
k-means Clustering and Custergram with R
k-means Clustering and Custergram with Rk-means Clustering and Custergram with R
k-means Clustering and Custergram with R
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
 
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
 
PostgreSQL 9.4 JSON Types and Operators
PostgreSQL 9.4 JSON Types and OperatorsPostgreSQL 9.4 JSON Types and Operators
PostgreSQL 9.4 JSON Types and Operators
 
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
 
Knn 160904075605-converted
Knn 160904075605-convertedKnn 160904075605-converted
Knn 160904075605-converted
 
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
 
Top k string similarity search
Top k string similarity searchTop k string similarity search
Top k string similarity search
 
Lecture 3.pdf
Lecture 3.pdfLecture 3.pdf
Lecture 3.pdf
 
Covering the earth and the cloud the next generation of spatial in sql server...
Covering the earth and the cloud the next generation of spatial in sql server...Covering the earth and the cloud the next generation of spatial in sql server...
Covering the earth and the cloud the next generation of spatial in sql server...
 

More from Jonathan Katz

Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Vectors are the new JSON in PostgreSQL (SCaLE 21x)Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Vectors are the new JSON in PostgreSQL (SCaLE 21x)Jonathan Katz
 
Vectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQLVectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQLJonathan Katz
 
Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15Jonathan Katz
 
Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!Jonathan Katz
 
High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!Jonathan Katz
 
Get Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAMGet Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAMJonathan Katz
 
Safely Protect PostgreSQL Passwords - Tell Others to SCRAM
Safely Protect PostgreSQL Passwords - Tell Others to SCRAMSafely Protect PostgreSQL Passwords - Tell Others to SCRAM
Safely Protect PostgreSQL Passwords - Tell Others to SCRAMJonathan Katz
 
Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesJonathan Katz
 
Building a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationBuilding a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationJonathan Katz
 
Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018Jonathan Katz
 
An Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & KubernetesAn Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & KubernetesJonathan Katz
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWJonathan Katz
 
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesJonathan Katz
 
Indexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data TypesIndexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data TypesJonathan Katz
 

More from Jonathan Katz (14)

Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Vectors are the new JSON in PostgreSQL (SCaLE 21x)Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Vectors are the new JSON in PostgreSQL (SCaLE 21x)
 
Vectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQLVectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQL
 
Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15
 
Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!
 
High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!
 
Get Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAMGet Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAM
 
Safely Protect PostgreSQL Passwords - Tell Others to SCRAM
Safely Protect PostgreSQL Passwords - Tell Others to SCRAMSafely Protect PostgreSQL Passwords - Tell Others to SCRAM
Safely Protect PostgreSQL Passwords - Tell Others to SCRAM
 
Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with Kubernetes
 
Building a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationBuilding a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management Application
 
Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018
 
An Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & KubernetesAn Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & Kubernetes
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDW
 
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
 
Indexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data TypesIndexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data Types
 

Recently uploaded

COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 

Recently uploaded (20)

COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 

Accelerating Local Search with PostgreSQL (KNN-Search)

  • 1. Accelera'ng Local Search With PostgreSQL 9.1 Jonathan S. Katz Exco Ventures PGConf.eu 2011 – Oct 21, 2011
  • 2. Local Search? • Not necessarily loca'on based on places • “How close are two en''es to one another?” • “What are the closest en''es to me?” • “What are my nearest neighbors?” 2
  • 3. Nearest Neighbor Overview • Want to know “how similar” objects are rela've to each other – What are the top “k” choices near me? • Need to define a “metric” for similarity – “distance” 3
  • 4. K-­‐Nearest Neighbor • Given a collec'on of n objects • When trying to classify an unknown object – compute the distance between all known objects – find the k (k ≥ 1) closest objects to the unknown object – classify the object based on class of k closest objects • When k=1, then unknown object is given same classifica'on as object it is closest to 4
  • 5. K=1 Example Voronoi Diagram of order 1 can be used to make k=1 NN queries 5
  • 6. Just a bit more theory… • Voronoi diagrams of order-­‐1 are created in O(n log n) 'me – Looks similar to…? :-­‐) • Queried in O(1) 'me • Therefore: – Pay the 'me penalty to build the index – Query against index quickly 6
  • 7. Applica'ons • Geoloca'on + Op'mizing Posi'oning • Classifica'on • Similarity • Recommenda'on systems • Content-­‐based image retrieval • etc. 7
  • 8. So what about PostgreSQL? • As of PostgreSQL 9.0 – supports geometric types and distances • Points, circles, lines, boxes, polygons • Distance operator: <-­‐> – pg_trgm – supplied module for determining text similarity • similarity(“abc”, “ade”) computes similarity score • <-­‐> defines distance (opposite if similarity), not defined (in 9.0) 8
  • 9. PostgreSQL 9.1: KNN-­‐GiST • Let n = size of a table • Can index data that provides a “<-­‐>” (distance) operator – Geometric – pg_trgm • “k” = LIMIT clause • Known inefficiencies when k=n and n is small 9
  • 10. Example: pg_trgm • Data: – List of 1,000,000 names – 700,000 unique – n = 1,000,000 • Indexes: – CREATE INDEX names_name_idx ON names (name) – CREATE INDEX trgm_idx ON names USING gist (name gist_trgm_ops) • k=10 • Displaying query plan / execu'on 'me aqer 10 runs 10
  • 11. pg_trgm: 9.0 EXPLAIN ANALYZE SELECT name, similarity(name, 'jon') AS sim FROM names WHERE name % 'jon' ORDER BY sim DESC LIMIT 10; 11
  • 12. pg_trgm: 9.0 Limit (cost=2724.95..2724.98 rows=10 width=14) (actual 'me=192.793..192.794 rows=10 loops=1) -­‐> Sort (cost=2724.95..2727.45 rows=1000 width=14) (actual 'me=192.790..192.791 rows=10 loops=1) Sort Key: (similarity(name, 'jon'::text)) Sort Method: top-­‐N heapsort Memory: 25kB -­‐> Bitmap Heap Scan on names (cost=56.47..2703.34 rows=1000 width=14) (actual 'me=188.836..192.499 rows=865 loops=1) Recheck Cond: (name % 'jon'::text) -­‐> Bitmap Index Scan on trgm_idx (cost=0.00..56.22 rows=1000 width=0) (actual 'me=188.652..188.652 rows=865 loops=1) Index Cond: (name % 'jon'::text) Total run'me: 192.881 ms 12
  • 13. pg_trgm: 9.1 EXPLAIN ANALYZE SELECT name, similarity(name, 'jon') AS sim FROM names WHERE name % 'jon' ORDER BY sim DESC LIMIT 10; 13
  • 14. pg_trgm 9.1 Limit (cost=2720.91..2720.93 rows=10 width=14) (actual 'me=202.022..202.023 rows=10 loops=1) -­‐> Sort (cost=2720.91..2723.41 rows=1000 width=14) (actual 'me=202.020..202.021 rows=10 loops=1) Sort Key: (similarity(name, 'jon'::text)) Sort Method: top-­‐N heapsort Memory: 25kB -­‐> Bitmap Heap Scan on names (cost=52.43..2699.30 rows=1000 width=14) (actual 'me=198.324..201.719 rows=865 loops=1) Recheck Cond: (name % 'jon'::text) -­‐> Bitmap Index Scan on names_trgm_idx (cost=0.00..52.18 rows=1000 width=0) (actual 'me=198.156..198.156 rows=865 loops=1) Index Cond: (name % 'jon'::text) Total run1me: 202.113 ms 14
  • 15. Comparable? • Seems to be similar – Need to do more research why – anyone? • However, 9.1 offers improvements for LIKE/ ILIKE search with pg_trgm 15
  • 16. LIKE/ILIKE EXPLAIN ANALYZE SELECT name FROM names WHERE name LIKE '%ata%n'; 16
  • 17. LIKE/ILIKE pg_trgm: 9.0 vs 9.1 Seq Scan on names (cost=0.00..18717.00 rows=99 width=14) (actual 'me=0.339..205.659 rows=665 loops=1) Filter: (name ~~ '%ata%n'::text) Total run1me: 205.743 ms Bitmap Heap Scan on names (cost=9.45..369.20 rows=99 width=14) (actual 'me=122.494..125.967 rows=665 loops=1) Recheck Cond: (name ~~ '%ata%n'::text) -­‐> Bitmap Index Scan on names_trgm_idx (cost=0.00..9.42 rows=99 width=0) (actual 'me=121.972..121.972 rows=3551 loops=1) Index Cond: (name ~~ '%ata%n'::text) Total run1me: 126.065 ms 17
  • 18. Geometry • Data: – 2,000,000 points, from (0,0) -­‐> (10000, 10000) • Index: – CREATE INDEX geoloc_coord_idx ON geoloc USING gist (coord); 18
  • 19. Geometry EXPLAIN ANALYZE SELECT *, coord <-­‐> point(500,500) FROM geoloc ORDER BY coord <-­‐> point(500,500) LIMIT 10; 19
  • 20. Geometry: 9.0 vs 9.1 Limit (cost=80958.28..80958.31 rows=10 width=20) (actual 'me=1035.313..1035.316 rows=10 loops=1) -­‐> Sort (cost=80958.28..85958.28 rows=2000000 width=20) (actual 'me=1035.312..1035.314 rows=10 loops=1) Sort Key: ((coord <-­‐> '(500,500)'::point)) Sort Method: top-­‐N heapsort Memory: 25kB -­‐> Seq Scan on geoloc (cost=0.00..37739.00 rows=2000000 width=20) (actual 'me=0.029..569.501 rows=2000000 loops=1) Total run1me: 1035.349 ms Limit (cost=0.00..0.81 rows=10 width=20) (actual 'me=0.576..1.255 rows=10 loops=1) -­‐> Index Scan using geoloc_coord_idx on geoloc (cost=0.00..162068.96 rows=2000000 width=20) (actual 'me=0.575..1.251 rows=10 loops=1) Order By: (coord <-­‐> '(500,500)'::point) Total run1me: 1.391 ms 20
  • 21. Applica'on Examples • Proximity map search – fast! 21
  • 22. Drawbacks • Performance benefits are limited when: – LIMIT is close to size of data set and data set is large – Data set is small • Time to build index – High transac'on table 22
  • 23. Conclusions • GiST: “Generalized Search Tree” – index is there, up to developers to define access methods of data types – e.g. yields KNN-­‐GiST • Different types of applica'ons can be built – performance enhancements • Next steps? 23
  • 24. My Wish List • Further geometric-­‐type support in Postgres – N-­‐dimensional points – ‘=‘ operator for point type – (PostGIS s'll champion of complex geometric + geographic data types) • Define “distance” over mul'columns with different types? – SELECT (a.name, a.geocode) <-­‐> (b.name, b.geocode) FROM x a, x b; 24
  • 25. References • Oleg Bartunov – “Efficient K-­‐nearest neighbour search in PostgreSQL” (h~p://www.sai.msu.su/~megera/ postgres/talks/pgday-­‐2010.pdf) • Oleg Bartunov and Teodor Sigaev for work on KNN-­‐ GiST and notes on pg_trgm (h~p:// developer.postgresql.org/pgdocs/postgres/ pgtrgm.html) • Hubert “depesz” Lubaczewski – pa~erned benchmarks off of his work -­‐ h~p://www.depesz.com/index.php/ 2010/12/11/wai'ng-­‐for-­‐9-­‐1-­‐knngist/ 25
  • 26. Contact • jonathan.katz@excoventures.com • @jkatz05 • Feedback Please! – h~p://2011.pgconf.eu/feedback 26