Through the previous intermittent learning , Find yourself right HMM The understanding of the model is only superficial , Lead to learning CRF And compare it with the maximum entropy model 、HMM、MEMM It's hard to compare , So I spent another two days watching it HMM, I found that I really learned a lot , Now let's summarize , Try to turn what you have seen into your own , Special thanks 52nlp Website http://www.52nlp.cn/ Translated by Cui Xiaoyuan HMM Related information , English learning website http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/main.html, All of them are floating clouds .

Every time you want to learn something new , I always want to know what it is for , And then I want to know how to use it , I think these are the two minimum principles for contacting and accepting new knowledge .HMM It comes from the needs of the real society , It's for people , We are usually used to looking for the change law of a thing in a period of time , And be able to predict the next change under specific circumstances , Such as forecasting the weather and so on . Of course HMM There are other applications , It will be mentioned later .

since HMM Usefulness is practical , Now let's learn about it .

In real life , Sometimes the change of state is fixed , Such as the change law of traffic lights , The next state is known , This model is called deterministic generative model , In most cases, the change of state is uncertain , According to today's weather, we can infer tomorrow's weather , The change in this state is not clear , This pattern is called indefinite generative pattern , Generally, we prefer to study the generation pattern of uncertainty , So we used HMM. Here's a clear concept ,n rank HMM, It means that the current state is only related to the former n The states are related to , What we usually study is 1 rank HMM, It means that the current state is only related to the previous state , Obviously, this is not a realistic assumption , But it's very good for analysis , At the same time, pay attention to HMM It's a model independent of time , This is also a hypothesis that does not conform to reality .（ It can be seen from this that HMM To be improved ）

HMM The core of education is thought , Algorithm and two matrices .

HMM The two matrices of are state transition matrices （ Characterizing the transition probability between States ）、 Confusion matrix （ It represents the emission symbol in a specific state （ Observation value ） Probability ）.

according to HMM Different applications , There are different algorithms to solve .

There are mainly three types of problems .

①     According to what is known HMM Find the probability of an observation sequence — assessment

This kind of problem assumes that there are several HMM Model , Find the probability of a given observation sequence for each model , The model corresponding to the maximum probability is the system most likely to generate the sequence .

application ： Judge the season according to the observation sequence , speech recognition .

Algorithm ——

Exhaustive search, for example ：

For the relationship between algae and weather , We can use the exhaustive search method to get the following state transition diagram （trellis）： In the figure ,sunny,cloudy,rainy It represents three states ,dry,damp,soggy It means the observed value , The connection between each column and adjacent columns is determined by the state transition probability , The observed state and the hidden state of each column are determined by the confusion matrix （ In a certain state , The probability of launching an observation ） decision . If we use the exhaustive method to calculate the probability of a sequence of observed states , It requires the sum of probabilities of all possible weather state sequences , This trellis The Communist Party of China has 3*3=27 A possible sequence .

Pr(dry,damp,soggy | HMM) = Pr(dry,damp,soggy | sunny,sunny,sunny) + Pr(dry,damp,soggy | sunny,sunny ,cloudy) + Pr(dry,damp,soggy | sunny,sunny ,rainy) + . . . . Pr(dry,damp,soggy | rainy,rainy ,rainy)

It can be seen that the computational complexity is very large , Especially when the state space is large , For a long time . We can use The time invariance of probability Solving complexity .

Recursive method （Forward Algorithm） give an example ：

First define Partial probability To arrive at trellis The probability of an intermediate state in . Put the length to T The observation state sequence of is expressed as ： In the calculation trellis The probability of a certain intermediate state in , It is represented by the sum of all possible paths to the state .

For example t=2 Time , Status as cloudy The probability can be calculated by the following path ： use Pt ( j )  It means in time t when   state j Partial probability of . The calculation method is as follows ：

Pt ( j )= Pr( observation | hidden state is j ) * Pr(all paths to state j at time t)

The partial probability representation of the final observed state , The probability of all possible paths that these states pass through . such as ： This means that the sum of the last partial probabilities is trellis The sum of all possible paths in , That is, at present HMM The probability of observing the sequence under the condition .

Our formula for calculating the partial probability is :

Pt ( j )= Pr( observation | hidden state is j ) x Pr(all paths to state j at time t)

But in the initial state , There is no path to these States . Then we use probability multiply associated observation probability Calculation ： In this way, the partial probability of the state at the initial time is only related to its own probability and the probability of observing the state at that time .

What this algorithm finally needs is the sum of probabilities .

For the length of the observation sequence T, The complexity of the exhaustive method is T The exponential level of ; and Forward Algorithm The complexity of is T Linearity of .

Manual calculation process ： The observation sequence is Damp,Soggy,Dry, Calculate the probability of the model , Different models have different probabilities , The highest probability corresponds to the most likely model that can produce the sequence .

State transition matrix A：

 Sunny Cloudy Rainy Sunny 0.5 0.25 0.25 Cloudy 0.375 0.125 0.375 Rainy 0.125 0.675 0.375

Confusion matrix B：

 Damp Soggy Dry Dryish Sunny 0.15 0.05 0.6 0.20 Cloudy 0.25 0.25 0.25 0.25 Rainy 0.35 0.5 0.05 0.10 The first step is to set up the observation sequence , The initial state P(Sunny)=0.63,P(Cloudy)=0.17,P(Rainy)=0.20. The second step , The initial time of calculation is in the State Sunny Next launch symbol Damp Probability Alpha1 = (0.63 * 0.15) = 0.094500005.

Similarly, the third step , The initial time of calculation is in the State Cloudy Next launch symbol Damp Probability Alpha2 = (0.17 * 0.25) = 0.0425.

Step four , The initial time of calculation is in the State Rainy Next launch symbol Damp Probability Alpha3 = (0.2 * 0.35) = 0.07. Step five , Calculate the second moment in the State Sunny Next launch symbol Soggy Probability , It's a two-part probability , The second moment to reach the State Sunny The sum of all path probabilities is the probability of reaching the state , multiply Sunny Launch symbols Soggy Probability 0.05,Alpha4 = (((0.094500005*0.5) + (0.0425*0.375) + (0.07*0.125)) * 0.05) = 0.0035968751.

Similarly, sixth , Seven steps , obtain ：

Alpha5 = (((0.094500005*0.25) + (0.0425*0.125) + (0.07*0.675)) * 0.25) = 0.019046877

Alpha6 = (((0.094500005*0.25) + (0.0425*0.375) + (0.07*0.375)) * 0.5) = 0.03290625 The eighth , Nine , Ten steps ,Alpha7 = (((0.0035968751*0.5) + (0.019046877*0.375) + (0.03290625*0.125)) * 0.6) = 0.007832578

Alpha8 = (((0.0035968751*0.25) + (0.019046877*0.125) + (0.03290625*0.675)) * 0.25) = 0.0063729496

Alpha9 = (((0.0035968751*0.25) + (0.019046877*0.375) + (0.03290625*0.375)) * 0.05) = 0.001019082 The final will be Alpha7,Alpha8,Alpha9 Add to get the probability of the model （ Generating sequences Damp,Soggy,Dry Probability ）.

②     Find the most likely hidden state sequence according to the observation sequence — decode

Most of the time , We all want to be able to do it according to a given HMM Model , According to the observed state sequence, we can find the potential hidden state sequence that produces this sequence .

Examples of exhaustive search methods ： We can enumerate all the possible implicit state sequences , The probability of the observed state sequence corresponding to each combination of hidden state sequences is calculated . The combination with the highest probability corresponds to the most likely combination of hidden state sequences .

Pr(observed sequence | hidden state combination).

For example, in the picture above trellis in , The most likely hidden state sequence is to make probability ：

Pr(dry,damp,soggy | sunny,sunny,sunny), Pr(dry,damp,soggy | sunny,sunny,cloudy), Pr(dry,damp,soggy | sunny,sunny,rainy), . . . . Pr(dry,damp,soggy | rainy,rainy,rainy)  Get the sequence of maximum values .

Similarly, the calculation of this exhaustive method is too large . To solve this problem , We can use and Forward algorithm The same principle -- Time invariance of probability to reduce the amount of computation .

Reduce complexity recursively  ,Viterbi Algorithm ：

In a given observation sequence and HMM Under the model , We find the most likely hidden state sequence in a recursive way . Again, we define partial probability , That is to say trellis The probability of reaching a certain intermediate state in the . And then I'll show you how to start t=1 and t>1 We'll solve this part of the probability respectively . But should pay attention to , The partial probability here is the path with the highest probability to reach an intermediate state, not the sum of all the probabilities .

Partial probability and partial optimal path

Here's how trellis ： about trellis Each intermediate state and end state in , There is an optimal path to it . He might be like this ： Our paths are partially optimal , Every one of them   Part of the optimal path corresponds to an association probability -- Partial probability δ. And Forward algorithm Different Is the most likely to reach that state One The probability of the path .

δ (i,t) It's all sequences in t Always in the state of i The maximum probability of termination . Of course, the path it corresponds to is the partially optimal path . δ(i,t) For each i,t It's all there . So we can be in time T（ The last state of the sequence ） Find the optimal path of the whole sequence .

Calculation stay t = 1 The initial value of the

Because in t=1 There is no partially optimal path , So you can use the initial state π Vectors help compute . This is related to Forward Algorithm identical  .

Calculation stay t > 1  Partial probability of

Again, we only use t-1 Time information to get t The partial probability of the moment . It can be seen from this picture that we have arrived at X The best path for is one of the following ：

(sequence of states), . . ., A, X

(sequence of states), . . ., B, X

(sequence of states), . . ., C, X

We want to find the one with the highest probability . Think back to the hypothesis of Markov first-order model , A state is related to the state of the moment before it .

Pr (most probable path to A) . Pr (X | A) . Pr (observation | X)

So arrive at X The biggest probability of that is ： The first part consists of t-1 The partial probability of the moment is , The second part is state transition probability , The third part is the probability of confusion matrix .

According to the final formula, we can write the solution of probability as ： Consider the following trellis Now we can get the path with the highest probability of reaching each intermediate or terminal state . But we need to take some measures to document this path . This needs to get the previous state of the optimal path in each state record . Write it down as ： such argmax（ Look for the parameter with the largest score ） The operator selects the index that makes the expression in brackets the largest j.

If someone asks , Why not multiply the observation probability factor in the confusion matrix . This is because we are concerned with the optimal path to the current state , Information about the previous state , It has nothing to do with his corresponding state of observation .

Manual calculation process ： Given the observation sequence, find the most likely hidden state sequence . Set the observation sequence . Delta1 = 0.63 * 0.15 = 0.094500005 Delta2 = 0.17 * 0.25 = 0.0425 Delta3 = 0.2 * 0.35 = 0.07 Delta4 = max ((0.094500005*0.5), (0.0425*0.375), (0.07*0.125)) * 0.05 = 0.0023625002, The last moment is most likely to arrive at Sunny The path of state （Sunny-Sunny）, With the time of the launch symbol Soggy Multiplication of probabilities . Delta5 = max ((0.094500005*0.25), (0.0425*0.125), (0.07*0.675)) * 0.25 = 0.011812501, Most likely to arrive at Cloudy The path of state （Rainy-Cloudy）. Delta6 = max ((0.094500005*0.25), (0.0425*0.375), (0.07*0.375)) * 0.5 = 0.0131250005, Most likely to arrive at Rainy The path of state （Rainy-Rainy）. Delta7 = max ((0.0023625002*0.5), (0.011812501*0.375), (0.0131250005*0.125)) * 0.6 = 0.0026578128, Most likely to arrive at Sunny The path of state （Cloudy-Sunny）. Delta8 = max ((0.0023625002*0.25), (0.011812501*0.125), (0.0131250005*0.675)) * 0.25 = 0.0022148439,  Most likely to arrive at Cloudy The path of state （Rainy-Cloudy）. Delta9 = max ((0.0023625002*0.25), (0.011812501*0.375), (0.0131250005*0.375)) * 0.05 = 2.4609375E-4,  Most likely to arrive at Rainy The path of state （Rainy-Rainy）. Each step looks for the most likely state transition , Finally, we can get ： Facing the observation sequence Damp-Soggy-Dry, The most likely state transition is Rainy-Cloudy-Sunny.

viterbi Two advantages of the algorithm  ：

1） And Forward Algorithm is the same , It greatly reduces the computational complexity

2）viterbi According to the input observation sequence ,“ From left to right ” Give the best understanding according to the context . because viterbi All observation sequence factors will be considered before giving the final choice , In this way, the decision-making will not deviate from the correct answer due to the sudden noise . This often happens in real data .

③     From the observation sequence HMM— Study

This is the hardest HMM application . That is, according to the observation sequence and the hidden state it represents , Generate a triple HMM (,A,B). So that this triplet can best describe a phenomenon we see .

We use it forward-backward algorithm To solve the problems that often appear in reality -- Transfer matrix and confusion matrix can not be obtained directly .

This direction plays an important role in speech processing database . Because it can help us in a big state space , Look at the environment with a long sequence to find the right HMM Model parameters ： The initial state 、 Transfer probability 、 Confusion matrix, etc .

http://blog.sina.com.cn/s/blog_953f8a550100zh35.html

http://yexin218.iteye.com/blog/432120

HMM More articles on the model

1. application HTK Build a voice dialing system 3： Create triphones in bound state HMM Model

selected from :http://maotong.blog.hexun.com/6261873_d.html Su Tonghua Artificial Intelligence Laboratory of Harbin Institute of technology 2006 year 10 month 30 Japan Statement : copyright , Please indicate the author and source of reprint The Department ...

2. application HTK Build a voice dialing system 2： Create monophones HMM Model

selected from :http://maotong.blog.hexun.com/6204849_d.html Su Tonghua Artificial Intelligence Laboratory of Harbin Institute of technology 2006 year 10 month 30 Japan Statement : copyright , Please indicate the author and source of reprint The Department ...

3. The hidden Markov model HMM（ One ）HMM Model

The hidden Markov model HMM( One )HMM Model basis The hidden Markov model HMM( Two ) The forward backward algorithm evaluates the probability of the observation sequence The hidden Markov model HMM( 3、 ... and ) Baum - Welch algorithm to solve HMM Parameters (TODO) The hidden Markov model HMM( Four ) Viterbi ...

4. HMM Model learning notes （ Forward algorithm example ）

HMM Algorithm must have been heard many times , Completely confused by the formula . however HMM The basic theory is very simple . because HMM It's one of the Markov chains , It's just that its state can't be directly observed , But it can be reflected indirectly through the observation vector , That is, every observation ...

5. The beauty of Mathematics ——HMM Model （ Two ） Decoding and Forward Algorithm

The last one discussed HMM The basic concept and some properties of ,HMM It's quite common in reality , So it also brings a series of HMM Application problems .HMM The application mainly faces three aspects : forecast . Decoding and learning . This article focuses on prediction . Simply speaking , Prediction is a given HM ...

6. The beauty of Mathematics ——HMM Model （ One ） Introduce

I always wanted to write something about mathematics blog, This is for data mining analysis ,NLP Treatment and so on have a more important role , Before that CSDN I want to write something on it HMM Articles on , I haven't written it yet , I finally took some time to finish it in recent days HMM The article , Sort it out , So there is this department ...

7. Timing analysis ：HMM Model （ The state space ）

About HMM Model : Timing analysis : The hidden Markov model HMM For gesture recognition : In training, each gesture corresponds to one HMM-Model, Take the highest recognition rate HMM that will do .  It's similar to a single-layer network that encapsulates the functions of multi class recognizer . advantage : especially ...

8. HMM Elegant approach to model input data processing From the actual project

This is how I do the actual project : def mining_ue_procedures_behavior(seq, lengths, imsi_list): print("seq 3:", se ...

9. HMM Model learning notes （ viterbi algorithm ）

viterbi algorithm (Viterbi) viterbi algorithm   edit Viterbi algorithm is a kind of dynamic programming algorithm, which is used to find the most likely observation event sequence - Viterbi path - Implicit state sequence , Especially in the context of Markov information sources and hidden Markov models . The term “ Viterbi ...

Random recommendation

1. Xamarin And Microsoft .NET The foundation

Journalism < Microsoft announces its establishment .NET The foundation fully supports open source projects Include C# compiler Roslyn>, I'm very excited to see everyone open to Microsoft . Before that in .NET The community also has a lot of open source projects , Listed 24 This project has been open source for a long time , This time, ...

2. tmp

Hello Hello everyone , This time I'm bringing you Gear VR4 generation , First of all, I have to thank our tiger friends Hide Brotherhood offers Gear Give us an evaluation , thank thank . The outpost I recorded before also said , This time, Gear VR 4 Generation comparison 3 Generational change is not very ...

3. easyui1.32 All kinds of questions

Question 1 scene :tab Switch , Every tab In the use div Let's play one. dataGrid, Default display:none hide , When display:'block' When ,dataGrid It's not complete , Show only one line . resolvent : cut ...

4. mysql Crossover table

Crossover table , But in MySQL It doesn't have this function in , But I see many friends on the Internet want to find a solution , In a broad sense .http://topic.csdn.net/u/20090530/23/0b782674-4b0b-4c ...

5. How to use BMW Multi Tool 7.3 to replace lost key for BMW X1

BMW Multi Tool 7.3 version is a high quality auto key programmer that supports CAS 1, CAS2, CAS3, CA ...

6. JavaScript client JavaScript And Scripted documents

client JavaScript The existence of static HTML Turn it into interactive Web Applications , scripting Web The content of the page is JavaScript Reasons for existence .   A document object model or DOM It's just one. API, It defines how to access ...

7. Chapter 2 Open Book——22

I dropped my head, letting my hair fall to conceal my face. I lowered my head , Let my hair hang down and hide my face . I was sure,though ...

8. K： Implementation of linear table — Linked list

The concept of a single linked list : The linear list stored by chain storage is called chain list , Each node in the linked list contains a data field for storing values of data elements and a pointer field for storing points to logically adjacent nodes . If a node contains only one pointer field , This linked list is called a single linked list . Single linked list features : ...

9. Java Abnormal reflection ：java.lang.NoSuchFieldException

Copyright notice :[ Sharing is also an improvement ] For personal reprint, please indicate the source at the beginning of the text , Without the author's consent, enterprises are prohibited / Organization reprint , It is forbidden to alter the original without permission , Prohibited for commercial purposes . Today, we use reflection to assign values to objects , There is an attribute that always reports an error , The main error messages are as follows : ...

10. Changes update API

/* Update a Modifier header of type 'PRO' (Promotion) */ l_MODIFIER_LIST_rec.active_flag := 'N'; l_M ...