[Converge] Gradient Descent  Several solvers
[Converge] Weight Initialiser
[Converge] Backpropagation Algorithm 【BP Implementation details 】
[Converge] Feature Selection in training of Deep Learning 【 The effect of characteristic correlation 】
[Converge] Training Neural Networks 【cs231nlec5&6, recommend 】
[Converge] Batch Normalisation

 SGD（ Stochastic gradient descent ）
 LBFGS（ Limited BFGS）> Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm
 CG（ conjugate gradient method ）

 One disadvantage of gradient descent method in deep network is that the iterative change of weights is very small , It's easy to converge to a local optimum ;
 Another disadvantage is that the gradient descent method can't deal with ill conditioned curvature （ such as Rosenbrock function ） The error function of .
Convolution neural network structure change ——Maxout Networks,Network In Network,Global Average Pooling
By the way, understand the relevant concepts .
Reference material [] Maxout Networks,
[] http://www.jianshu.com/p/96791a306ea5
[] Deep learning： Fortyfive (maxout Simple understanding )
[] Paper notes 《Maxout Networks》 && 《Network In Network》
[] Fully convolutional networks for semantic segmentation,
[] http://blog.csdn.net/u010402786/article/details/50499864
[] Deep learning （ hexacosa ）Network In Network Learning notes
[] Network in Nerwork,
[] Improving neural networks by preventing coadaptation of feature detectors
Reference
1、Maxout Network
Put forward a concept —— Linear variation +Max Operation can fit any convex function , Including activation functions (such as Relu).
(1)
If the activation function uses sigmoid Function words , In the process of forward propagation , The output expression of hidden layer node is ：
W yes 2 dimension , This means that the second one is taken out i Column （ Corresponding to the first i Output nodes ）, Subscript i The preceding ellipsis corresponds to... In all lines i In column .
(2)
If it is maxout Activation function , Then the output expression of the hidden layer node is ：
W yes 3 dimension , Size is d*m*k,
 d Represents the number of input layer nodes ,
 m Represents the number of hidden layer nodes ,
 k Represents the expansion of each hidden layer node k Intermediate nodes , this k All the intermediate nodes are linear outputs , and maxout Each node of the is to take this k The maximum output value of each intermediate node .
Refer to a Japanese maxout ppt One page in ppt as follows ：
The consciousness of this picture is that , The hidden nodes in the purple circle expand into 5 A yellow node , take max.Maxout The ability of fitting is very strong , It can fit any convex function .
From left to right , In turn, they fit out ReLU,abs, Quadratic curve .
The author also proves this conclusion from a mathematical point of view , That is, just 2 individual maxout Nodes can fit any convex function （ Subtracting the ）, The premise is that the number of intermediate nodes can be any number , As shown in the figure below , You can read the details paper[1].
maxout A strong assumption of is that the output is in a convex set in the input space …. Does this hypothesis hold ？ although ReLU yes Maxout A special case of —— In fact, we can't get ReLU It's just the right situation , We're learning this nonlinear transformation , Using a combination of multiple linear transformations +Max operation .
Jeff: Whether it has a certain practical value ？ It's still a good call ？ It doesn't feel like the same improvement , Get to know a little bit about .
2、Network In Network
Some concepts of this paper , Include 1*1 Convolution ,global average pooling It has become the standard structure of network design , Have a unique view .
Look at the first NIN, Originally 11*11*3*96（11*11 Convolution of kernel, Output map 96 individual ） For one patch Output 96 A little bit , It's output feature map Of the same pixel 96 individual channel, But now there's an extra layer MLP, Put this 96 A point makes a full connection , Output again 96 A little bit
It's clever , This is new MLP Layer on layer Equivalent to One 1 * 1 The convolution of layer ,
In this way, it is very convenient to design the neural network structure , Just add one after the original convolution layer 1*1 The convolution of layer , Without changing the output size.
Be careful , Every convolution layer will follow ReLU. therefore , It's like the Internet is getting deeper , I understand that in fact, deepening is the main factor to improve the effect .
Significance lies in ： It becomes a comprehensive effect of different feature extractors , save NN Space , But the effect is guaranteed , It makes sense to simplify the network .
【 Explain with examples , See original 】
Here's a concept built up , A fully connected network can be converted to 1*1 Convolution of , This idea It will be useful in many networks in the future , such as FCN[5].
3、Global Average Pooling
stay Googlenet In the network , Also used. Global Average Pooling, It's actually inspired by Network In Network.
Global Average Pooling It's usually used at the end of the network , Used to replace full connection FC layer , Why replace FC？ Because in use , for example alexnet and vgg Networks are convoluting and softmax It's connected in series fc layer , I found some shortcomings ：
（1） The number of parameters is very large , Sometimes a network exceeds 80~90% In the last few layers FC Layer ;
（2） Easy to overfit , quite a lot CNN The over fitting of the network mainly comes from the final fc layer , Because there are too many parameters , But there's no right regularizer; Over fitting leads to the weakening of the generalization ability of the model ;
（3） A very important point in practical application ,paper There is no mention of ：FC The required input and output are fix Of , That is to say, the image must be of a given size , But in the actual , There are big and small images ,fc It's not convenient ;
The author puts forward Global Average Pooling, It's easy to do , It's for each individual feature map Take the whole picture average. Ask for output nodes And classification category The number is the same , So the back can be connected directly softmax 了 .
Author points out ,Global Average Pooling The benefits of a ：
 Because it's forcing the last feature map The quantity is equal to category Number , therefore feature map It will be interpreted as categories confidence maps.
 No parameters , So it's not over fitting ;
 The calculation of a plane , Making use of spatial information , It's more robust to images changing in space ;
for instance ：
If , The last layer of data is 10 individual 6*6 Characteristic graph ,global average pooling It is to calculate the average value of all pixels in each feature map , Output a data value ,
such 10 A feature map will output 10 Data points , Put these data points together into a 1*10 In terms of vectors , It becomes an eigenvector , You can send it to softmax It is calculated in the classification of
From: https://alexisbcook.github.io/2017/globalaveragepoolinglayersforobjectlocalization/
In mid2016, researchers at MIT demonstrated that CNNs with GAP layers (a.k.a. GAPCNNs) that have been trained for a classification task can also be used for object localization.
That is, a GAPCNN not only tells us what object is contained in the image  it also tells us where the object is in the image, and through no additional work on our part! The localization is expressed as a heat map (referred to as a class activation map), where the colorcoding scheme identifies regions that are relatively important for the GAPCNN to perform the object identification task.

 And Dropout The difference is , It's not randomly clearing the output of hidden layer nodes 0,
 The input weights of each node connected to it are expressed as 1p The probability of clear 0.
[UFLDL] *Train and Optimize More articles about
 I am AI Knowledge system navigation  AI menu
Relevant Readable Links Name Interesting topic Comment Edwin Chen Nonparametric Bayes Boss Xu Yida Dirichlet Process Study ...
 [AI] Deep Mathematics  Bayes
Mathematics is like the universe , Leeks only care about the practical part . scikitlearn (sklearn) Official document in Chinese scikitlearn Machine Learning in Python A novel onli ...
 Deep Learning 19_ Deep learning UFLDL course ：Convolutional Neural Network_Exercise（ Stanford deep learning course ）
Theoretical knowledge :Optimization: Stochastic Gradient Descent and Convolutional Neural Network CNN Convolution neural network derivation and implementation .Deep lear ...
 Deep Learning 1_ Deep learning UFLDL course ：Sparse Autoencoder practice （ Stanford deep learning course ）
1 Preface The purpose of my blog is , Actually, I feel a lot of things , If you don't move for a long time, you will forget , In order to deepen the learning and memory, and facilitate later may forget to quickly recall what they have learned . First , I found some information on the Internet , I saw the introduction and said UFLDL Very not ...
 UFLDL Tutorial of （ One ）sparseae_exercise
below , take UFLDL In the tutorial sparseae_exercise The functions and notes in the exercise are listed below First , The calling relation of each function is given The main function :train.m (1) call sampleIMAGES Function to extract multiple... From a known image ...
 Deep learning Deep Learning UFLDL newest Tutorial Learning notes 1：Linear Regression
1 Preface Andrew Ng Of UFLDL stay 2014 year 9 At the end of the month . For starting research Deep Learning It's really great news for children's shoes ! new Tutorial Compared to the old Tutorial Added Conv ...
 UFLDL Lesson notes and exercise answers 5 （ Self coding linear decoder and processing large image ** Convolution and pooling ）
Self active encoding linear decoder Self active encoder linear decoder is mainly considering the output assumption of the last layer of sparse self active encoder sigmoid function . Because sparse self active encoder learning is that the output is equal to the input .simoid The range of the function is [0,1] Between , this ...
 ( turn ) How to Train a GAN? Tips and tricks to make GANs work
How to Train a GAN? Tips and tricks to make GANs work from :https://github.com/soumith/ganhacks While r ...
 Deep Learning 13_ Deep learning UFLDL course ：Independent Component Analysis_Exercise（ Stanford deep learning course ）
Preface Theoretical knowledge :UFLDL course .Deep learning: Thirtythree (ICA Model ).Deep learning: Thirtynine (ICA Model exercises ) Experimental environment :win7, matlab2015b,16G Memory ,2T machine ...
Random recommendation
 Maven Unit test report and test coverage
Yes junit Unit test reports : Results like this  T E S T S  ...
 Failed to create the part's controls [eclipse]
View source code appear Failed to create the part's controls resolvent : eclipse.ini Add : startup plugins/org.eclipse.eq ...
 SocketTcpServer
Customize SocketTcpServer, Although there are many such components now , But sometimes you need to integrate it into your framework or product , You don't need particularly powerful features , Customize according to the needs . The most basic problem is to determine the end of the packet , Didn't like super ...
 HTTP Each state returns a value
Reprinted from :http://desert3.iteye.com/blog/1136548 502 Bad Gateway:tomcat Didn't start up 504 Gateway Timeout: nginx ...
 SQL Server 2008 Spatial data application series 8 ： be based on Bing Maps(Silverlight) Spatial data storage of
original text :SQL Server 2008 Spatial data application series 8 : be based on Bing Maps(Silverlight) Spatial data storage of Friendship tips , The prerequisites for you to read this blog are as follows : 1. This example is based on Microsoft S ...
 *42. Trapping Rain Water Rainwater collection
1. Original title Given n Nonnegative integers indicate that each width is 1 Height map of columns of , Calculate columns arranged in this way , How much rain can be received after rain . Above is an array of [0,1,0,2,1,0,1,3,2,1,2,1] Height map of representation , Here ...
 [JDBC] You're really going to shut it down properly connection Do you ?
Connection conn = null; PreparedStatement stmt = null; ResultSet rs = null; try { conn = DriverManag ...
 CSV Insert file into mysql The columns specified in the table
Reference material : CSV Insert file into mysql The columns specified in the table
 【 Front end learning notes 05】JavaScript data storage Cookie Related method encapsulation
//Cookie Set up // Set new cookie function setCookie(name,value,duration){ var date = new Date(); date.setTime( ...
 Record a glibc The resulting segment error and gdb transplant
In the last article, I have a record of a paragraph error , Now record where the segment error was . // from GNU Download the currently in use glibc Source code and the latest glibc Source code // The address is as follows : http:// ...