Solr DisMax parser and stop words

If you want to use DisMax parser in Solr you need to be careful how to index the fields that DisMax will be using.

If you mix fields that filter out stop words (plain text) and fields that do not filter out stop words (like author names), your simple queries might end up with no results.

By default, DisMax will display only results that contain all the words from your query string. If your query has stop words like “ants of madagascar”, stop word “of” might not be found in any of the fields – it’s not in author names and it’s filtered out in article body – and you will get zero results.

Possible workarounds:

  • Relax Minimum Match (mm) requirement.
    Downside: Lowering mm will increase number of results. mm of 50% on “ants madagascar” will return all documents that have “ants” and all docs that have “madagascar” in them.
  • Do not filter out stop words.
    Downside: Your index can get large and you might get large number of less relevant results.
  • Use other indexing schemes like N-Grams.

This article explains the details.

Also see this and this discussion.

Category: Software | Tags: , , , , Comment »


Leave a Reply



 

Back to top