The linear model is expressed by the current combination of features “ result - Feature set ” The correspondence between . Because of the limited expressive power of linear model , In practice , Only by adding “ Feature calculation ” To optimize the model . such as , In the advertisement CTR In application , except “ Title Length 、 Description length 、 Position 、 advertisement id,cookie“ And so on , And there's a lot of combinatorial features （ such as ” Position -cookie“ It indicates the user's preference for bits ）. in fact , Now many search engine advertising systems use Logistic Regression Model （ linear ）, And one of the most important jobs of the model team is “ Feature Engineering (feature engineering)”.

The idea of a linear model is “ Simple model + Complex features ”, Using this combination to achieve complex nonlinear scene description . Because of the simple structure of the model , The training of this approach / The estimated calculation cost is relatively small ; however , Feature selection is a labor-intensive task , And require relevant personnel to have a deeper understanding of the business .

Another idea of modeling is " Complex models + Simple features “. That is to weaken the importance of feature engineering , Using complex nonlinear models to learn the relationship between features , Enhance the ability of expression . The deep neural network model is such a nonlinear model .

The image above shows an image with an input layer , An output layer , Deep neural networks with two hidden layers . One of the models has 9 Nodes .

The introduction of neural network is very detailed in many literatures , Now take the above picture as an example , Let's focus on backpropagation The derivation process of the algorithm .

backpropagation Very similar to the gradient method , It's essentially finding the partial derivative of each parameter , Then find the next search point in the direction of the partial derivative , With \${W_{04}}\$ For example ：

Combine the above derivation with , You can get \${W_{04}}\$ The gradient direction of ：

The other iterative processes are similar to the gradient descent method .

It is worth noting that , although DNN The requirements for feature engineering are relatively low , But the training time is more complicated , The explainability of tangent weight is very poor , Not easy to debug. therefore , For a new application , A better way is to use Logistic Regression This kind of linear model began to be applied , When iteration matures , Try again DNN Model .

