BTM The principle of LDA It's like , Here is the probability diagram of the model ：

It can be seen from this picture that , And LDA The difference is that after determining the topic distribution and word distribution, two words are selected accordingly （ and LDA Take only one , That is to say, analogy with the common saying of dice ： First throw K Face dice get the theme z, And then according to the corresponding V Face dice , Two consecutive throws , Get a pair of words ）, These two words are called biterm（ It's after a document is divided into words , Set a distance indicator for these words , From I One to the first j Two words in pairs , Can be seen as a pair of biterm）. Get... From the original document biterm The code is as follows ：

def build_Biterms(self, sentence):

"""

obtain document Of biterms

:param sentence: word id list sentence It's about every word after segmentation ID A list of

:return: biterm list

"""

win = 15 # Set window size

biterms = []

for i in xrange(len(sentence)-1):

for j in xrange(i+1, min(i+win+1, len(sentence))):

biterms.append(Biterm(int(sentence[i]),int(sentence[j])))

return biterms

BTM Using the entire text set to estimate a theta, Solved the sparsity problem （ We usually have a lot of data ）. Relaxed mixture of unigram The entire document must belong to the same topic in z Constraints （ It's equivalent to relaxing from the whole document to two words within the window length ）, Strengthened LDA Each word in the dictionary corresponds to a Z Assumptions （BTM Two words in the window length are bound to form a biterm Corresponds to a z）. This hypothesis is very close to human cognition , Because we know , Usually in a short piece of text ,topic A little change .

1. mongodb Introduction learning notes

Mongodb Simple introduction ( Personal learning notes ) 1. Install and register as a service :( Example ) E:\DevTools\mongodb3.2.6\bin>mongod.exe --bind_ip 127.0.0.1 ...

2. javascript Study notes （ One ）

Senior year , There's a lot less class , Suddenly I don't know what to learn . All day in the dormitory , I really want to learn, but I don't have a clue , I saw a movie yesterday . This kind of tangled mood makes me a little out of breath ! I need to find something to do , So I decided to find something to learn . It would be ...

3. js Matching string of regular learning notes

original text :js Matching string of regular learning notes Today I watched the 5 I'll give you some examples , A little bit of a harvest , Record it as a review and share it as well . About matching strings , There are many types , Today's discussion js String matching in code .( Because I want to write a high grammar after learning ...

4. js The longest rule on the left of regular learning notes

original text :js The longest rule on the left of regular learning notes The way I used to judge the regular engine yesterday is to use /nfa|nfa not/ To match "nfa not", And what you get is 'nfa'. In fact, our original intention is to get the whole ...

5. js Regular learning notes NFA engine

original text :js Regular learning notes NFA engine I always thought I was good before , Looking at Cobalt bicarbonate ,Barret Lee After the great gods have made the regular play fascinating, they find that I'm just a five scum .   let me count on you , Ask the great God to teach you . I have a general impression before , just ...

6. js Regular learning notes matching string optimization

original text :js Regular learning notes matching string optimization Yesterday in <js Matching string of regular learning notes > talk about Characters , Except for the first one individual , Only An escape ( Characters ), therefore Time , Only Success . this The first match failed , Need to go back ...

7. CSS Study notes

Sogou home page CSS Study notes 1. The treatment of border   To form the layout effect shown in the figure above , namely , After you click and choose , The border below the navigation is not displayed, while the other borders form a smooth shape . Compared to removing the bottom border of the navigation and then covering the bottom border of the search bar with a blank ...

8. Gcd&amp;Exgcd Algorithm learning notes

Preface For many number theory problems , All need to be involved in Gcd, solve Gcd, Euclidean algorithm is often used , I just memorized it before , I don't really understand and prove . For many solving problems , You can list the bezu equation :ax+by=Gcd(a,b), use E ...

9. logstash Study notes

logstash Study notes label ( The blank space to separate ): Log collection Introduce Logstash is a tool for managing events and logs. You can use ...

Random recommendation

1. fzu 1675 The Seventy-seven Problem

I'll give you a length of 10^5~10^6 length , A string of numbers Among them is 4 The seat is gone Complete the string Make it divisible in the string 77 At the same time As big as possible // First calculate each n*10^m model 77 The cycle section of n=0,1,2..,9// ...

ado.net form Data provider connection // Connection object command executeNonQuery // Carry out addition, deletion and modification executeScalar // Execute the query and return the first row and first column execut ...

3. java Development experience sharing （ Two ）

Two .  database 1. SQL Words are reserved in a sentence . The function name should be capitalized , indicate . Field names are all lowercase Such as :SELECT vc_name,vc_sex,i_age FROM user WHERE i_id = 100 AND i ...

4. 【JavaEE Basics 】 stay Java How to use jdbc Connect Sql2008 database

We are javaEE The development of , It must be the database , So in javaEE The development of , How to use code to implement and SQL2008 It's connected ? In this article , I'll show you how to use it in the simplest way jdbc Conduct SQL2008 Of the database ...

5. Use DFA The algorithm filters sensitive words

The project directory structure is as follows : among resources In the resource directory : stopwd.txt : Pause words , Match time direct filtering . wd.txt: Sensitive Thesaurus . 1.WordFilter Sensitive word filtering : package com.s ...

6. Application of stack —— use JavaScript Describe the data structure

Stack (stack) Also known as the stack , It's an operationally constrained linear table . The limitation is that only insert and delete operations are allowed at one end of the table . This end is called the top of the stack , relatively , Call the other end the bottom of the stack . One . Implement a stack class Stack Stack based features , You can use arrays as lines ...

7. Problem B String class （I）

Encapsulating a string class , Related functions for storing strings and processing , The following operations are supported : 1. STR::STR() Construction method : Create an empty string object . 2. STR::STR(const char *) Construction method : Create a character ...

8. bzoj 2726 [SDOI2012] Task arrangement （ Slope DP+CDQ Divide and conquer ）

[ Topic link ] http://www.lydsy.com/JudgeOnline/problem.php?id=2726 [ The question ] take n Each task is divided into several blocks , Each group Mi The task costs (T+sigma{ t ...

9. [ python ] Regular expressions and re modular

Regular expressions Regular expressions describe : Regular expression is a logical formula for string operation , It is to use some specific characters defined in advance . And the combination of these specific characters , Form a ‘ Rule string ’, This ‘ Rule string ’ be used for      Express a filtering of strings ...

10. ACM Learning process —Hihocoder 1139 Two points &#183; Two points answer （bfs）

http://hihocoder.com/problemset/problem/1139 The tip of the question says two points , But I think it's OK to have two points , At least the title is AC Of ... The idea of dichotomy is the value of dichotomy , Let's see if we can ...