ElasticSearch Percolator Bloat
Jun. 16th, 2017 03:17 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Early ElasticSearch History
Back in 2010 Shay Banon created first version of ElasticSearch.
Over the years the product matured.
In November 2012, ElasticSearch team received $10M in Series A funding.
Then in February 2013 they received $24M in Series B funding.
That helped them to produce very robust ElasticSearch 1.0 (2014-02-12) and then ElasticSearch 1.6 (2015-06-09) that we currently use.
$70M bloat
June 2014 - $70M Series C funding.
Shay Banon became a CEO and excused himself from active involvement in development and communicating with customers.
That is where the bloat began.
It looks like ElasticSearch team decided that since they have so much money - they can do pretty much whatever they want.
So they broke backward compatibility of their percolator by squeezing percolator into the standard format of ElasticSearch index.
What is percolator?
ElasticSearch percolator does reverse operation to a standard ElasticSearch query.
Standard ElasticSearch query allows our job seekers to find matching jobs.
Percolator allows job seekers to use their job search query in order to create a job alert.
Then when, in the future, new job is posted (by somebody else) -- the percolator is able to find all job alerts that job seekers created. That allows us to notify all owners of these matching alerts about new matching job (within a minute of receiving a job).
Differences between standard search query and percolator query
Because of the reverse nature of percolator, it functions very different from a standard search query:
1) Standard search query should normally produce only 10 results (users is unlikely to read more) and support paging.
Percolator always wants to get all matching alerts (also known as "percolator queries") - not just 10 of them, because every job seeker wants to get notified about new matching jobs to their favorite job alert.
2) Standard search - ranks search results based on the quality of the match (and then order results by descending rank). Such ranking does NOT make sense for percolator (because every job seeker wants to get notified anyway).
Why use standard search index format for percolator?
So why had ElasticSearch team decided to break backward compatibility and merge Percolator into a standard search index format?
This is their excuse:
---
https://www.elastic.co/blog/elasticsearch-percolator-continues-to-evolve
Prior to 5.0, all percolator queries need to be executed on this in-memory index in order to verify whether the query matches. So the idea is that the less queries that need to be verified by the in-memory index the faster the percolator executes.
---
In my first reading of that ambiguous claim I thought that ElasticSearch would be able to automatically detect what percolator queries is ok to skip, so it would, effectively, improve percolator performance.
What actually happened
We spend few days to setup proper experiment and found out that ElasticSearch 5.4 percolator is 3 times slower than ElasticSearch 1.6 percolator (or in other words, ElasticSearch percolator performance degrades proportionally to the version number).
The correct interpretation of that "less queries that need to be verified" claim actually meant that application developer in ElasticSearch 5.4 has an option to tag percolator queries (alerts), and then write code that would help percolator to skip alerts that have no chance to being triggered by a document we percolate.
But the problem is that it is very hard to come up with such "alerts skipping" algorithm. Percolator is so valuable in the first place exactly because of that ability to determine what alerts match and what alerts do not!
The summary
Series C $70M funding encouraged ElasticSearch team to break backward compatibility and produce useless features (such as paging and ranking in percolator) + degrade performance 3x.
Next: ElasticSearch Percolator Bloat - the Defense
Back in 2010 Shay Banon created first version of ElasticSearch.
Over the years the product matured.
In November 2012, ElasticSearch team received $10M in Series A funding.
Then in February 2013 they received $24M in Series B funding.
That helped them to produce very robust ElasticSearch 1.0 (2014-02-12) and then ElasticSearch 1.6 (2015-06-09) that we currently use.
$70M bloat
June 2014 - $70M Series C funding.
Shay Banon became a CEO and excused himself from active involvement in development and communicating with customers.
That is where the bloat began.
It looks like ElasticSearch team decided that since they have so much money - they can do pretty much whatever they want.
So they broke backward compatibility of their percolator by squeezing percolator into the standard format of ElasticSearch index.
What is percolator?
ElasticSearch percolator does reverse operation to a standard ElasticSearch query.
Standard ElasticSearch query allows our job seekers to find matching jobs.
Percolator allows job seekers to use their job search query in order to create a job alert.
Then when, in the future, new job is posted (by somebody else) -- the percolator is able to find all job alerts that job seekers created. That allows us to notify all owners of these matching alerts about new matching job (within a minute of receiving a job).
Differences between standard search query and percolator query
Because of the reverse nature of percolator, it functions very different from a standard search query:
1) Standard search query should normally produce only 10 results (users is unlikely to read more) and support paging.
Percolator always wants to get all matching alerts (also known as "percolator queries") - not just 10 of them, because every job seeker wants to get notified about new matching jobs to their favorite job alert.
2) Standard search - ranks search results based on the quality of the match (and then order results by descending rank). Such ranking does NOT make sense for percolator (because every job seeker wants to get notified anyway).
Why use standard search index format for percolator?
So why had ElasticSearch team decided to break backward compatibility and merge Percolator into a standard search index format?
This is their excuse:
---
https://www.elastic.co/blog/elasticsearch-percolator-continues-to-evolve
Prior to 5.0, all percolator queries need to be executed on this in-memory index in order to verify whether the query matches. So the idea is that the less queries that need to be verified by the in-memory index the faster the percolator executes.
---
In my first reading of that ambiguous claim I thought that ElasticSearch would be able to automatically detect what percolator queries is ok to skip, so it would, effectively, improve percolator performance.
What actually happened
We spend few days to setup proper experiment and found out that ElasticSearch 5.4 percolator is 3 times slower than ElasticSearch 1.6 percolator (or in other words, ElasticSearch percolator performance degrades proportionally to the version number).
The correct interpretation of that "less queries that need to be verified" claim actually meant that application developer in ElasticSearch 5.4 has an option to tag percolator queries (alerts), and then write code that would help percolator to skip alerts that have no chance to being triggered by a document we percolate.
But the problem is that it is very hard to come up with such "alerts skipping" algorithm. Percolator is so valuable in the first place exactly because of that ability to determine what alerts match and what alerts do not!
The summary
Series C $70M funding encouraged ElasticSearch team to break backward compatibility and produce useless features (such as paging and ranking in percolator) + degrade performance 3x.
Next: ElasticSearch Percolator Bloat - the Defense
no subject
Date: 2017-06-16 09:44 pm (UTC)no subject
Date: 2017-06-16 11:30 pm (UTC)There is also a chance that standard search in ElasticSearch 5.4 is still fast.
We would just have to use older ElasticSearch 1.6 for percolator queries.
There is also some promising fork: https://github.com/meltwater/elasticsearch-batch-percolator
They promise 1000x faster percolator.
But we did not find out yet if it supports our case (elasticsearch-batch-percolator has some limitations about queries it can process).
Percolator in Elastic vs Manticore Search
Date: 2018-03-19 03:05 pm (UTC)In short words Manticore PQ provides much higher throughput.
Re: Percolator in Elastic vs Manticore Search
Date: 2018-03-19 10:24 pm (UTC)Re: Percolator in Elastic vs Manticore Search
Date: 2018-03-20 01:46 am (UTC)Re: Percolator in Elastic vs Manticore Search
Date: 2018-03-20 06:26 am (UTC)Re: Percolator in Elastic vs Manticore Search
Date: 2018-03-20 07:06 am (UTC)2) I can't even install it
3) "can no longer use filter queries" (from their doc), but with Manticore you can filter by whatever you want: full-text, attributes, expressions, do geo filtering etc.
4) 1K dps at 225K rules seems worse than 13K dps at 100K rules, of course it all depends on the rules, docs and hardware etc, but I wouldn't say their performance is better
As for Elastic itself, yes, it's much more popular, but it's a different technology: there're people who like native SQL support, who like the performance and lower resource consumption C++ gives comparing to Java, who like to be able to index existing data at high speed directly from mysql/postgres/xml and not insert every document with a script etc.
Re: Percolator in Elastic vs Manticore Search
Date: 2018-03-20 07:25 am (UTC)3) I am not sure what filter queries are not available in elasticsearch-batch-percolator, but our full-text search + basic geo-filtering works fine.
4) I do not remember exact numbers, but I think elasticsearch-batch-percolator gave about 40x speed improvement relative to standard ElasticSearch percolator (when we measured it).
Why do you think ElasticSearch is more popular than Sphinx, if Sphinx is less resource consuming?
https://trends.google.com/trends/explore?date=all&q=Sphinx,elasticsearch
Percolator in Elastic vs Manticore Search
Date: 2018-03-21 03:10 am (UTC)ElasticSearch is more popular now because they had a chance to raise millions of $$ in funding and hire bigger team to do more marketing, ELK thing and so on. And Sphinx didn't.
Re: Percolator in Elastic vs Manticore Search
Date: 2018-03-21 06:16 am (UTC)Re: Percolator in Elastic vs Manticore Search
Date: 2018-03-21 06:30 am (UTC)Re: Percolator in Elastic vs Manticore Search
Date: 2018-03-21 09:40 am (UTC)https://manticoresearch.com/professional-support does not list any prices.
Re: Percolator in Elastic vs Manticore Search
Date: 2018-03-21 09:46 am (UTC)Re: Percolator in Elastic vs Manticore Search
Date: 2018-03-21 09:51 am (UTC)Re: Percolator in Elastic vs Manticore Search
Date: 2018-03-21 09:55 am (UTC)