KMP algorithm and optimization (code analysis and solving next array and nextval array)
ZaunEkko 2021-06-13 17:25:35

KMP Algorithm and optimization ( Code analysis and solution next Array and nextval Array )

coming , Here comes the content of data structure and algorithm , This is our specialty , It's all about appetizers , This article , Focus on postgraduate entrance examination 408 Direction , So it's guaranteed that as long as you understand , I will do it , Can this kind of thought still exist ? If you just want to see next Array and nextval The solution of array can jump to the corresponding part directly , The thought summary is very dry ~~

Online next Array version puzzle

Let's summarize , commonly KMP Algorithm next There are two versions of array results , We need to know why there is such a problem , In fact, when the prefix and suffix do not match next The array is 0 Still for 1, Both versions are right, of course , If next The array is 0 Yes, it is , So for the maximum matching length of prefix and suffix, only the value +1 Just follow next An array is 1 It's the same version of , In fact, it's because their source code is different , Or the first subscript of a pattern string is interpreted as 0 perhaps 1, In a word, there is no need to worry about this problem , Just understand the principle ~~

So here , We assume that the maximum matching length of prefix and suffix is 0 when ,next The array value is 1 Version of , Postgraduate entrance examination generally uses this version ( If 0 edition , All content -1 that will do , If you calculate next[5]=6, that -1 Version of next[5] for 5, vice versa )~~

In fact, the above summary is a sentence

next[1]=0,j( Pattern string ) The first subscript of the array is 1, meanwhile , Maximum matching length of prefix and suffix +1 That is to say next The value of the array ,j What it represents is the serial number

408 Against humanity , Generally, the first subscript of an array is 1, On the book in front of the list of learning, we should all see , The first subscript of many arrays in books is for our convenience 1, It's even harder for us to understand the idea , It's anti human , So here is next[1]=0, Maximum matching length of prefix and suffix +1 The version of

The preface and the question lead to

We need to know ,KMP The algorithm is for string matching ~~

for example : A main string "abababcdef" We want to know if we include a pattern string in it "ababc"

The early solution was , Naive pattern matching algorithm , That is, we compare the main string with the pattern string , Different main strings move forward one bit , Start with the next bit and compare with the pattern string , Move one at a time , It's going to be slow , So there are three great gods working together on an algorithm , That's what we call it now KMP Algorithm ~~

Code and understanding

The source code is given here ~~

int Index_KMP(SString S,SString T,intt next[]){
int i = 1,j = 1;// The first subscript of the array is 1
while (i <= S.length && j <= T.length){
if (j == 0 || S.ch[i] == T.ch[j]){// The first subscript of the array is 1,0 Before the first digit of an array , here ++1, Point to the first element of the array
++i;
++j; // Continue to compare the following characters
}
else
j = next[j]; // The pattern string moves right to the subscript , Serial number ( First from 1 Start )
}
if (j > T.length)
return i - T.length; // The match is successful
else
return 0;
}

Next, you can understand the code with me ~~

I don't know how to make a map yet , Here we draw it by hand ~~

 picture

The above is the general situation , So how to understand j=next[1]=0 When? ?

img

Yes , That's the idea of the code , Then we know , The core is to demand next The values of each array , Right , Generally speaking, it means testing us next What is the value of the array ~~

next Solving arrays

Here we need to give the concept first , The prefix of the string and the suffix of the string ~~

The prefix of the string : Contains first character , And does not contain the substring of the last character

Suffix of string : Contains the last character , And does not contain the substring of the first character

When the first j Character matching failed , From before 1~j-1 A string of characters is marked as S, be :next[j]=S The length of the longest prefix equals the length of the prefix +1

meanwhile ,next[1]=0

Such as , Pattern string "ababaa"

Serial number J 1 2 3 4 5 6
Pattern string a b a b a a
next[j] 0

When the sixth string fails to match , So we need to be in front of 5 A string of characters S"ababa" In looking for The longest is equal What is the length of the suffix before +1~~

Such as string S The prefix of can be :"a","ab","aba","abab", The prefix does not include only the last digit

strand S The suffix of can be :"a","ba","aba","baba", The suffix does not include the first digit

So the biggest matching string here is "aba" The length is 3, Then we +1, take 4

Serial number J 1 2 3 4 5 6
Pattern string a b a b a a
next[j] 0 4

Another example , When the second string fails to match , From before 1 A string of characters S"a" in , We know that prefixes should not have , There should be no suffix , So the maximum matching string should be 0, that +1 Is to take 1~~

In fact, here we can know a rule ,next[1] It must be 0( Source code caused by ),next[2] It must be 1( There must be no maximum matching string )~~

Serial number J 1 2 3 4 5 6
Pattern string a b a b a a
next[j] 0 1 4

Another example is , The third string match failed , A string of the first two characters S"ab" In looking for The longest is equal The length of the prefix , After that +1~~

Prefix :"a"

suffix :"b"

So there is no maximum matching string here. The length is 0, Then we +1, take 1

Serial number J 1 2 3 4 5 6
Pattern string a b a b a a
next[j] 0 1 1 4

Next you can do it yourself 4 and 5 The situation of ~~

Here's the direct answer ~~

Serial number J 1 2 3 4 5 6
Pattern string a b a b a a
next[j] 0 1 1 2 3 4

Isn't that easy ?

thus ,next How to find arrays and kmp The understanding of the code is ok 了 ~~


So next , After understanding the above , Let's think about it KMP The problem with the algorithm ~~

KMP The problem with the algorithm

as follows

Main string :"abcababaa"

Pattern string :"ababaa"

For example, this question

We can easily work out next Array

Serial number J 1 2 3 4 5 6
Pattern string a b a b a a
next[j] 0 1 1 2 3 4

At this point, we are the third string matching failure , So our next[3]=1, That's the first character next time "a" And the third character in the main string "c" contrast , But we knew the third character of the pattern string at the beginning "a" and "c" Mismatch , So there's one more meaningless match ? So we'll have kmp An optimization of the algorithm ~~

KMP Optimization of algorithm

We know , The third character of the pattern string "a" Not the third character of the main string "c" Mismatch ,next Arrays need our next[3]=1, That's the first character next time "a" And the third character in the main string "c" contrast , And then the first character of the pattern string "a" Don't "c" matching , It just needs to be next[1]=0, So we're going to leave out the steps , No, you can just let next[3]=0 Well ?

Serial number J 1 2 3 4 5
Pattern string a b a b a
next[j] 0 1 1 2 3
nextval[j] 0 0

So how to save the extra steps ?

This is it. nextval How to find an array ~~

nextval And code understanding

Post the code first

for (int j = 2;j <= T.length;j++){
if (T.ch[next[j]] == T.ch[j])
nextval[j] = nextval[next[j]];
else
nextval[j] = next[j];
}

Such as

Serial number J 1 2 3 4 5 6
Pattern string a b a b a a
next[j] 0 1 1 2 3 4
nextval[j] 0

First , for the first time for loop ,j=2, Current serial number b Of next[2] by 1, That is, the character pointed to by the first serial number a,a!= Current serial number b, therefore nextval[2] Keeping the same is equal to next[2]=1

Serial number J 1 2 3 4 5 6
Pattern string a b a b a a
next[j] 0 1 1 2 3 4
nextval[j] 0 1

The second time for loop ,j=3, Current serial number a Of next[3] by 1, That is, the character pointed to by the first serial number a,a= Current serial number a, therefore nextval[3] be equal to nextval[1]=0

Serial number J 1 2 3 4 5 6
Pattern string a b a b a a
next[j] 0 1 1 2 3 4
nextval[j] 0 1 0

third time for loop ,j=4, Current serial number b Of next[4] by 2, That is, the character pointed to by the second serial number b,b= Current serial number b, therefore nextval[4] be equal to nextval[2]=1

Serial number J 1 2 3 4 5 6
Pattern string a b a b a a
next[j] 0 1 1 2 3 4
nextval[j] 0 1 0 1

this is it , You can practice 5 and 6, It is directly given here ~~

Serial number J 1 2 3 4 5 6
Pattern string a b a b a a
next[j] 0 1 1 2 3 4
nextval[j] 0 1 0 1 0 4

thus nextval You should also know how to find arrays , So if you take the postgraduate entrance examination , So is it equal to giving you a share ?


Small exercise

So you try to find the pattern string next and nextval Array bar ~~

Pattern string :"aaaab"

Serial number j 1 2 3 4 5
Pattern string a a a a b
next[j]
nextval[j]

The answer to the little exercise

Serial number j 1 2 3 4 5
Pattern string a a a a b
next[j] 0 1 2 3 4
nextval[j] 0 0 0 0 4
Please bring the original link to reprint ,thank
Similar articles

2021-06-05

2021-06-05

2021-06-06

2021-06-09

2021-06-09

2021-06-09

2021-06-10

2021-06-11

2021-06-13

2021-06-13

2021-06-15

2021-06-16