In the first two parts (part1、part2) in , We learned how to configure and query solr To get the autocomplete function . today , Let's see if it's suggester Add fields , In this way to provide automatic completion function . 

Component configuration  
Add the following parameters to the configuration component in the previous issue :

<str name="sourceLocation">dict.txt</str> 

So our configuration becomes :

<searchComponent name="suggest" class="solr.SpellCheckComponent"> 
<lst name="spellchecker">
<str name="name">suggest</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
<str name="field">name_autocomplete</str>
<str name="sourceLocation">dict.txt</str>

Use this parameter , We let suggest The component is called dict.txt As a document of solr Configuration dictionary for .

handler To configure  
handler You also need to add an additional parameter to the configuration of :

<str name="spellcheck.onlyMorePopular">true</str> 

The complete configuration is :

<requestHandler name="/suggest" class="org.apache.solr.handler.component.SearchComponent"> 
<lst name="defaults">
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">suggest</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.onlyMorePopular">true</str>
<arr name="components">

This parameter tells us solr, When the number of query results is more than the set count Number of hours , Go back to those with more hits .

We tell solr To use this field , So what does this field look like ? Here's an example :

# sample dict 
Hard disk hitachi
Hard disk wd 2.0
Hard disk jjdd 3.0

What is the result of this dictionary ? Put each phrase on a separate line , Each line ends with the weight of the phrase ( Between the weight and the phrase is TAB Character separation ), This weight is related to spellcheck.onlyMorePopular=true  Hong Kong's parameters , The default value is 1.0. The field must be in UTF-8 The encoding format of storage . Before each line is # Characters will be ignored ( Comment row ).

In this way , We don't need data , Fields are data .

Rebuilding suggester after , Let's take a look at how it works , Enter the command :


The result is zero :

<?xml version="1.0" encoding="UTF-8"?> 
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="spellcheck">
<lst name="suggestions">
<lst name="Dys">
<int name="numFound">3</int>
<int name="startOffset">0</int>
<int name="endOffset">3</int>
<arr name="suggestion">
<str>Hard disk jjdd</str>
<str>Hard disk wd</str>
<str>Hard disk hitachi</str>

As expected ,suggest The results are sorted by weight . The case sensitivity here ( Pay attention to the initials ).

What do you suggest ? If we have a good dictionary , The weight of the dictionary is based on the user's query behavior , Then users will definitely like it ! If there is no good dictionary , It's better not to use this way .

next step  
Next issue , Let's take a look at the different ways of suggest The resulting index structure and size .

