ElasticSearch: Using Function Scores to Pack Far, Far Too Much Information Into a Single Query

Jason Rosendale
Entelo Engineering
Published in
7 min readNov 6, 2017

--

Suppose someone wants to search for candidates to fill a new software role. They head on over to Entelo (of course!) and whip up a search query: “senior java developer”.

There are way too many matches for these search criteria, so our user decides to be a little more selective by filtering their search. Specifically, they decide that they want to see only candidates that:

  1. are within 50 miles of San Francisco
  2. have an advanced degree
  3. are significantly more likely than their peers to be looking for a new position
  4. currently work in the advertising industry
  5. have at least 10 years of experience
  6. have COBOL experience
  7. have a known email address
  8. have an exclamation point in their job title!
  9. have made contributions to open source projects
  10. have at some point worked for a company with 10,000+ employees

So they run their query again with these filters and find:

They’ve gone from too many results to far too few. It’s clear that they’re overfiltering, but there’s no indication of what they should change to make sure that they get neither too many nor too few results.

We’re a caring and empathetic engineering team over here at Entelo (of course!), so we wanted to help users in this situation. We decided that whenever over-filtering was detected on a query, we would automatically suggest alternate sets of query filters to the user (for example, “expanding your search radius by 8 miles would give you 1500 more results!”). We want to guarantee two things:

  1. that our suggested improvement would actually return more results than the original search. A user would be disappointed if we suggested that they expand their search radius by 8 miles only to find that they still got zero hits on their search for senior COBOL/Node.js developers in Duluth.
  2. that our suggestion would be the minimal possible change that would increase the number of results. If we cared only about increasing the number of search results we would just suggest that the user remove every single filter. We want our suggestion to preserve their original intent as much as possible.

In order to guarantee both of these objectives we need to know the answer to a lot of questions about the query:

  • How many more results would we get if we didn’t use the nth filter?
  • How many of the candidates matching the search query also matched filter n?
  • Which subsets of these filters could we use and still get more than 10 results?
  • Are there any redundant pairs of filters? For example, it’s possible that everyone with an advanced degree also has a known email, which would make filter #7 unnecessary.
  • Are there any disjoint pairs of filters? It might be the case that nobody in the advertising industry has an exclamation point in their job title, which means that we’d never get any results if we used filters #4 and #8 simultaneously.

In order to answer all of these questions we would need to know how many candidates matched each of the 2¹⁰ possible combinations of filters.

Surprisingly enough, we can do this in a single search!

How to Get 2 Filter Counts in a Single Search

ElasticSearch has a “function score” feature that allows us to assign a configurable number of points to a document if it passes a particular filter. We can use this function score to create a bitfield! We define a function score that adds 1 point if the document matches the first filter, another 2 points if it matches the second filter, 4 more points for the third, and in general adding 2ⁿ points if a document matches the nth filter. We then aggregate the results to get the number of documents that attain each score:

{
"query": {
"function_score": {
"query": {
"query_string": {
"query": "senior java developer"
}
},
"functions": [
{ "script_score": { "script": "0" } },
{ "filter": { "term" : { ... }, "weight": 1 },
{ "filter": { "geo_distance" : { ... } }, "weight": 2 },
{ "filter": { "range" : { ... } }, "weight": 4 },
{ "filter": { "term" : { ... }, "weight": 8 },
{ "filter": { "term" : { ... }, "weight": 16 },
...
{ "filter": { "script" : { ... }, "weight": 512 }
],
"score_mode": "sum",
"boost_mode": "replace"
}
},
"size": 0,
"aggregations": {
"top_hits": {
"terms": {
"size": 1024,
"script": "_score.intValue()",
"lang": "groovy"
}
}
}
}

The "script_score":{"script":"0"} filter makes sure that documents that do not match any of the filters still return a score of 0 instead of being completely excluded. The results of our query look like this :

{
"took": 608,
"timed_out": false,
},
"hits": {
"total": 3814020
},
"aggregations": {
"top_hits": {
"buckets": [
{ "key": "0", "doc_count": 3639369 },
{ "key": "1", "doc_count": 63897 },
{ "key": "2", "doc_count": 8474 },
{ "key": "3", "doc_count": 2279 },
{ "key": "4", "doc_count": 315 },
...
{ "key": "1023", "doc_count": 1 }
]
}
}
}

Interpreting These Numbers

Remember that we allocated points as follows:

points filter
1 within 50 miles of SF
2 advanced degree
4 more likely to move positions
8 work in the advertising industry
16 10+ years of experience
32 COBOL experience
64 known email address
128 job title with an exclamation point
256 contributed to open source projects
512 worked for a large company

To more easily explore this data, suppose we’ve dumped our aggregation results into a Ruby hash called scores with the integer-valued scores as keys and the doc_counts as values.

Q: How many people did not match any of the filters?
A: The number of documents that had a score of 0.

scores.get(0)

Q: How many people have contributed to open source projects but meet none of the other criteria?
A: The number of documents that had a score of exactly 256.

scores.get(256)

Q:How many people have contributed to open source projects, regardless of which other criteria they meet?
A:The number of documents that have 256 as a component of their scores (i.e. scores that have “1” for their 9th bit):

scores.select { |key, _| key & 256 }.values.sum

Q: How many people have an advanced degree and more than 10 years of experience?
A: The number of documents that have 2 and 16 as components of their scores:

scores.select { |key, _| key & (2 + 16) }.values.sum

Q: How many people have an advanced degree but do *not* have more than 10 years of experience?
A: The number of documents that have 2 but not 16 as a component of their scores

scores.select { |key, _| key & 2 }.reject { |key, _| key ^ 16 }.values.sum

Q: There are a lot of people with a score of 790. What filters did they match?
A: A score matches filter n if the nth bit is set to 1.

11.times.select { |i| i if (790 >> i) % 2 == 1 }

Using a Function Score and a Relevance Score Simultaneously

One drawback to using functions scores in this way is that the boost_mode: replace option suppresses the documents’ relevance scores. If a document matches all ten filters, it will always get a score of exactly 1023 and will tie for first place with every other document that matches every filter, regardless of how relevant those documents actually are to the query.

We can’t get our relevance score back by changing the boost mode to boost_mode: sum, because the relevance score can be arbitrarily large and it would cause us to lose the ability to interpret our function score as a bitfield. But we can rescale our relevance scores so that they are guaranteed to fall between 0.0 and 0.99. This can be done by replacing the {“script": “0”} filter in our query with the function:

{ "script":"0.99 * _score / (1 + _score)"}

This function maps all positive scores onto the interval between 0 and 0.99. More importantly, if one document has a higher relevance score than a second document then its transformed score will still be higher than the transformed score of the second document.

How Are We Using This?

This function score/aggregation technique is fast, but it does definitely add some overhead to the query execution time. Since very few searches on our platform return zero results, it just doesn’t make sense to automatically include this aggregation with every query. Instead, we wait until we see a search that has returned no results and then quickly, behind the scenes, re-perform that search with our function score aggregation. We analyze the results, choose the optimal set of filters, and then do a third search using that optimal filter set. The user then sees the results of the modified query along with an alert letting them know exactly which filters were modified to allow those results to be shown. In production, performing all three of these searches takes slightly less than 3x as long as the original search (presumably because of filter caching).

--

--