Elasticsearch runtime types runtime fields

Mingyi world 2021-10-14 04:49:58

1、 Practical problems

In actual business , After data import , But it was found that some necessary fields were missing , How to solve it generally ?

such as :emotion Represents emotional value , The value range is :0-1000.

among :300-700 For neutral ;0-300 Represents negative ;700-1000 For the front .

But in real business , We need to : Neutral :0; negative :-1; positive :1.

How to achieve it ?

At this time , Possible solutions :

  • Scheme 1 : Add fields when index is recreated , Clear the existing data and re import the data .

  • Option two : Add fields when index is recreated , The original index passes through reindex Write to new index .

  • Option three : Specify data preprocessing in advance , Combine the pipes ingest Re import or batch update update_by_query Realization .

  • Option four : Keep the original index , adopt script Script implementation .

Scheme 1 、 Two similar , Add new fields to import data .

Option three 、 Option four We simulate and implement a .

2、 Option three 、 Four, realize one

2.1 Option three Ingest Preprocessing implementation

DELETE news_00001
PUT news_00001
{
  "mappings": {
    "properties": {
      "emotion": {
        "type": "integer"
      }
    }
  }
}
POST news_00001/_bulk
{"index":{"_id":1}}
{"emotion":558}
{"index":{"_id":2}}
{"emotion":125}
{"index":{"_id":3}}
{"emotion":900}
{"index":{"_id":4}}
{"emotion":600}
PUT _ingest/pipeline/my-pipeline
{
  "processors": [
    {
      "script": {
        "description": "Set emotion flag param",
        "lang": "painless",
        "source": """
          if (ctx['emotion'] < 300 && ctx['emotion'] > 0)
            ctx['emotion_flag'] = -1;
          if (ctx['emotion'] >= 300 && ctx['emotion'] <= 700)
            ctx['emotion_flag'] = 0;
          if (ctx['emotion'] > 700 && ctx['emotion'] < 1000)
            ctx['emotion_flag'] = 1;
          """
      }
    }
  ]
}
POST news_00001/_update_by_query?pipeline=my-pipeline
{
  "query": {
    "match_all": {}
  }
}

The core of scheme 3 : The preprocessing pipeline is defined :my-pipeline, A logical decision was made in the pipeline , about emotion Different value ranges , Set up emotion_flag Different result values .

The scheme must create pipes in advance , You can specify the default pipeline by writing default_pipeline Or combined with batch update .

There are actually two subdivision implementations :

  • Mode one :udpate_by_query Batch update . Updating the index, especially the full update index, has a great cost .

  • Mode two : The write phase specifies the preprocessing pipeline , Preprocess every piece of data written .

2.2 Option four script Script implementation

POST news_00001/_search
{
  "query": {
    "match_all": {}
  },
  "script_fields": {
    "emotion_flag": {
      "script": {
        "lang": "painless",
        "source": "if (doc['emotion'].value < 300 && doc['emotion'].value>0) return -1; if (doc['emotion'].value >= 300 && doc['emotion'].value<=700) return 0; if (doc['emotion'].value > 700 && doc['emotion'].value<=1000) return 1;"
      }
    }
  }
}

The core of programme 4 : adopt script_field Script implementation .

The scheme only obtains the result value through retrieval , This value You can't For other purposes , such as : polymerization .

Also note that :script_field Script processing fields can have performance problems .

Each of the two options has its own advantages and disadvantages , At this time, we will think further :

Can we not change Mapping、 Do not re import data , We can get the data we want ?

Earlier versions cannot ,7.11 The version after version has a new solution ——Runtime fields Runtime fields .

3、Runtime fields The background

Runtime fields The runtime field is an old script field script field Of Plus edition , Introduced an interesting concept , be called “ Read modeling ”(Schema on read).

Yes Schema on read It's natural to think of Schema on write( Write time modeling ), Traditional non runtime field type They are modeled when writing , and Schema on read Is to find another way 、 Read time modeling .

such , Runtime fields can not only define mappings before indexing , You can also define mappings dynamically at query time , And it has almost all the advantages of regular fields .

Runtime fields Once defined in an index map or query , Can be used immediately for search requests 、 polymerization 、 Screening and sorting .

4、Runtime fields Solve the problem at the beginning of the article

4.1 Runtime fields Actual combat solution

PUT news_00001/_mapping
{
  "runtime": {
    "emotion_flag_new": {
      "type": "keyword",
      "script": {
        "source": "if (doc['emotion'].value > 0 && doc['emotion'].value < 300) emit('-1'); if (doc['emotion'].value >= 300 && doc['emotion'].value<=700) emit('0'); if (doc['emotion'].value > 700 && doc['emotion'].value<=1000) emit('1');"
      }
    }
  }
}
GET news_00001/_search
{
  "fields" : ["*"]
}

4.2 Runtime fields Interpretation of core grammar

First of all :PUT news_00001/_mapping Is already Mapping On the basis of to update Mapping.

This is an update Mapping The way . actually , While creating the index , Appoint runtime field Consistent principle . The implementation is as follows :

PUT news_00002
{
  "mappings": {
    "runtime": {
      "emotion_flag_new": {
        "type": "keyword",
        "script": {
          "source": "if (doc['emotion'].value > 0 && doc['emotion'].value < 300) emit('-1'); if (doc['emotion'].value >= 300 && doc['emotion'].value<=700) emit('0'); if (doc['emotion'].value > 700 && doc['emotion'].value<=1000) emit('1');"
        }
      }
    },
    "properties": {
      "emotion": {
        "type": "integer"
      }
    }
  }
}

second : What's new ?

Added fields , To be exact , added :runtime Type field , The field name is :emotion_flag_new, Field type is :keyword, The field value is written in script script Realized .

What does the script implement ?

  • When emotion Be situated between 0 To 300 Between time ,emotion_flag_new Set to -1 .

  • When emotion Be situated between 300 To 700 Between time ,emotion_flag_new Set to 0.

  • When emotion Be situated between 700 To 1000 Between time ,emotion_flag_new Set to 1.

Third : How to realize retrieval ?

Let's try the traditional retrieval , Take a look at the results .

Let's take a look first Mapping:

{
  "news_00001" : {
    "mappings" : {
      "runtime" : {
        "emotion_flag_new" : {
          "type" : "keyword",
          "script" : {
            "source" : "if (doc['emotion'].value > 0 && doc['emotion'].value < 300) emit('-1'); if (doc['emotion'].value >= 300 && doc['emotion'].value<=700) emit('0'); if (doc['emotion'].value > 700 && doc['emotion'].value<=1000) emit('1');",
            "lang" : "painless"
          }
        }
      },
      "properties" : {
        "emotion" : {
          "type" : "integer"
        }
      }
    }
  }
}

One more. runtime Type field :emotion_flag_new.

perform :

GET news_00001/_search

The results are as follows :

ebf3ff1b829d733f2b1591ddf5b90561.png

perform :

GET news_00001/_search
{
  "query": {
    "match": {
      "emotion_flag_new": "-1"
    }
  }
}

The results are as follows :

876934cb88bf4cd05637a6c2fabd4a09.png

perform :

GET news_00001/_search
{
  "fields" : ["*"],
  "query": {
    "match": {
      "emotion_flag_new": "-1"
    }
  }
}

The results are as follows :

9b7126dfac2b111dd04f30ec27ebad31.png

4.3 Runtime fields Interpretation of core grammar

Why is it added :field:[*] To return the search matching results ?

because :Runtime fields It won't show up in :_source in , however :fields API For all fields Work .

If you need to specify a field , Just write the corresponding field name ; otherwise , Write * Represents all fields .

4.4 If you don't want to start over, define a new field , Can it be implemented on the original field ?

In fact, the above example has solved the problem perfectly , But find fault again , In the original field emotion Can I update the value when querying on ?

The actual combat is as follows :

POST news_00001/_search
{
  "runtime_mappings": {
    "emotion": {
      "type": "keyword",
      "script": {
        "source": """
         if(params._source['emotion'] > 0 && params._source['emotion'] < 300) {emit('-1')}
         if(params._source['emotion'] >= 300 && params._source['emotion'] <= 700) {emit('0')}
         if(params._source['emotion'] > 700 && params._source['emotion'] <= 1000) {emit('1')}
        """
      }
    }
  },
  "fields": [
    "emotion"
  ]
}

Return results :

c55d2ff779cdc52aab2052e9ac90b771.png

Explain it. :

First of all : original Mapping Inside emotion yes integer Type of .

second : What we define is the retrieval time type ,mapping No change , however : Field type when retrieving emotion On the premise that the field name remains unchanged , Changed to :keyword type .

This is a very awesome feature !!!

In the early 5.X、6.X Without this function , In actual business, our processing ideas are as follows :

  • Step one : Stop real-time writing ;

  • Step two : Create a new index , Specify new Mapping, newly added emotion_flag Field .

  • Step three : Resume Writing , The new data will take effect ; Old data reindex To the new index ,reindex At the same time combined with ingest Script processing .

With Runtime field, This rather cumbersome processing “ Helpless pain ” The days are gone forever !

5、Runtime fields Applicable scenario

such as : Log scene . Runtime fields are useful when processing log data , Especially when the data structure is uncertain .

Used runtime field, The index size is much smaller , Logs can be processed faster without indexing them .

6、Runtime fields Advantages and disadvantages

advantage 1: Strong flexibility

Runtime fields are very flexible . Mainly reflected in :

  • When needed , You can add runtime fields to our mapping .

  • When you don't need to , Easily delete them .

The actual operation of deletion is as follows :

PUT news_00001/_mapping
{
 "runtime": {
   "emotion_flag": null
 }
}

That is, set this field to :null, The field no longer appears in the Mapping in .

advantage 2: Break the tradition of defining before using

Runtime fields can be defined at index or query time .

Because the runtime field is not indexed , Therefore, adding runtime fields does not increase the index size , in other words Runtime fields Can reduce storage costs .

advantage 3: Can stop Mapping The explosion

Runtime field Not indexed (indexed) And storage (stored), Can effectively prevent mapping “ The explosion ”.

The reason lies in Runtime field Don't count on  index.mapping.total_fields Limit inside .

shortcoming 1: Querying the run-time field will slow down the search

Queries on run-time fields can sometimes be performance intensive , in other words , The runtime field slows down the search .

7、Runtime fields Use advice

  • weigh the advantages and disadvantages : You can save time by using runtime fields to reduce indexing time CPU Usage rate , But this will slow down the query time , Because data retrieval requires additional processing .

  • Use a combination of : It is recommended to use runtime fields in conjunction with index fields , So that at the write speed 、 Find the right balance between flexibility and search performance .

8、 Summary

This paper introduces several solutions to the problem by adding fields in practice ; Most traditional solutions need to be changed Mapping、 Rebuild index 、reindex Data etc. , Relatively complex .

thus , It is simpler to extend 、 shortcut 7.11 The scheme only after version ——Runtime fields.

Runtime fields The core knowledge points are as follows :

  • Mapping Link definition ;

  • Already exist Mapping Update based on ;

  • Use... When retrieving runtime fields Achieve the purpose of dynamically adding fields ;

  • Overwrite existing Mapping Field type , Ensure that the field names are consistent , Achieve a specific purpose

  • Advantages and disadvantages 、 Applicable scenario 、 Use advice .

You use... In actual combat Runtime fields Did you? ? What's the effect ?

Welcome to leave a message, feedback and exchange .

Reference resources

https://opster.com/elasticsearch-glossary/runtime-fields/

https://www.elastic.co/cn/blog/introducing-elasticsearch-runtime-fields

https://dev.to/lisahjung/beginner-s-guide-understanding-mapping-with-elasticsearch-and-kibana-3646

https://www.elastic.co/cn/blog/getting-started-with-elasticsearch-runtime-fields

recommend

1、 blockbuster | screwing Elasticsearch Methodological cognitive checklist (2021 National day update )

2Elasticsearch 7.X Advanced practical private training course

e10ad8752bc85c2e544707b043594fef.png

Learn more dry goods in less time and faster !

Has led 70 position The ball player passed Elastic The official certification !

China only passes through more than 100 people

471f1adb35c8546c0be9b00cdda5b65b.gif

Learn advanced dry goods one step ahead of colleagues !

Please bring the original link to reprint ,thank
Similar articles