# One 、logit Source of value

Logistic regression generally classifies dependent variables into two categories 0-1 Turn to frequency [0,1], become odds（ The odds ratio is ,[0,+∞]）, then log To become Logit value ([-∞,+∞])

The odds ratio is ：odds=P(y=1)/P（y=0）

logit value ：logit=log(odds)

What is? sigmoid function ？

First, I defined an intuitive concept ` The odds ratio is `

p/(1-p),p yes true The probability of time ,1-p yes false The probability of time , Take advantage of log, namely t=log(p/(1-p)) Do a range conversion , Go to all real fields . Then turn around and ask p, In the end sigmoid function .

sigmoid The interesting thing about functions is , The independent variable is from negative infinity to positive infinity , The dependent variable is 0 To 1. The closer the 0 The greater the change . The derivative is p(1-p), Derivatives are interesting .（ Reference resources ： Talk about logical regression ）

# Two 、logit modeling

utilize logit=Y Modeling , obtain Logit Then we can calculate the probability according to it .Logit= Economic utility , Utility is a continuous variable ,logit The model is equivalent to utility modeling .

So generally speaking , The coefficient of logistic regression is logit The coefficient of the value , It needs to be converted to a probability value .

The simple understanding can be regarded as ：

Input is x, The output is y, One of the temporary variables in the middle is t.w and b Is the model parameter .h(t) Is the probability of belonging to a certain category , Greater than 0.5 I think it belongs to this category , namely y=1.

For simplicity , We can argue that b Always with a value of 1 Of w Multiply . So we put b Put in w. The model is simplified as

This is the formula of logistic regression , It's simple .

（ Reference resources ： Talk about logical regression ）

# 3、 ... and 、logit Function modeling threshold setting

In the risk control model summary ,logistics The threshold is set according to the business owner . Generally, high credit passes automatically , The risks in the process need to be reviewed ; The riskier refuse to borrow .

# Four 、R Language implementation

## 1、 Logical regression

Logistic regression generally uses glm Function binomial(link='logit') Modeling .

lg<-glm(y ~x1,family=binomial(link='logit')) summary(lg)

There are only two uses for the regression coefficient ： The sign 、 Significance . The regression coefficient represents every increase in 1 A unit of x, Will increase logit Increase in value 0.1 A unit of , And it has a positive effect . If you need to know the probability value, you need to recalculate .

## 2、 Stepwise regression was used to screen variables ——step

On top of logical regression , We can use stepwise regression , Eliminating variables .

lg_ms<-step(lg,direction = "both") summary(lg_ms)

## 3、 Validation set prediction ——predict

train$lg_p<-predict(lg_ms, train) summary(train$lg_p)

predict The same is true of the forecast for logit value , It's not probability , It needs to be recalculated

## 4、 Calculate the probability value

1/(1+exp(-1*train$lg_p))

## 5、 Methods of model validation

As a sort class model , It can be used ROC curve /AUC value 、 Cumulative lift curve 、K-S curve 、 Lorentz curve gini To verify （ note ︱ Types of risk control classification models （ Decision making 、 Sort ） Comparison and model evaluation system （ROC/gini/KS/lift））.

