Welcome to reprint , Reprint please indicate : This article from the Bin The column blog.csdn.net/xbinworld.

Technical communication QQ Group :433250724, Welcome to the algorithm 、 Technology interested students join .

neural network , Or parameter initialization of deep learning algorithm is a very important aspect , The traditional initialization method initializes parameters randomly from Gaussian distribution . It's even directly initialized to 1 perhaps 0. This way, violence is direct , But often the effect is mediocre . The narration of this article comes from a discussion post abroad [1], Let's elaborate on our own understanding .

First of all, let's think about , Why in neural network algorithms ( To simplify the problem , We use the most basic DNN Think about ) in , The choice of parameters is very important ? With sigmoid function (logistic neurons) For example , When x When the absolute value of becomes larger , Function values are getting smoother , Tend to saturate , At this point, the reciprocal of the function tends to 0, for example , stay x=2 when , The derivative of the function is about 1/10, And in the x=10 when , The derivative of the function has become about 1/22000, in other words , The input to the activation function is 10 Than when 2 The learning speed of neural network is slower when the network is in the process of learning 2200 times !

In order to make the neural network learn faster , We want to activate the function sigmoid The derivative of is larger . Numerically , About sigmoid The input is in [-4,4] Between , See above . Of course , It doesn't have to be that precise . We know , A neuron j The input of is the weighted sum of the outputs of the previous layer of neurons ,xj=∑iai⋅wi+bj. therefore , We can control the range of the initial value of the weight parameter , So that the input of neurons falls within the range we need .

One is relatively simple 、 The effective way is : The weight parameter initialization is uniformly random from the interval .

(−1d√,1d√), among d It's the number of inputs to a neuron .

In order to show the rationality of this value , Let's briefly review some basic knowledge :

1. Consistent with uniform distribution U(a,b) The mathematical expectation and variance of random variables are respectively —— Mathematical expectation :E(X)=(a+b)/2, variance :D(X)=(b-a)²/12

2. If the random variable X,Y It's independent of each other , that Var(X+Y) = Var(X)+Var(Y), If X,Y Is independent of each other and the mean value is 0, that Var(X*Y) = Var(X)*Var(Y)

therefore , If we limit the input of neurons (xi) It's the mean =0, Standard deviation =1 Of , that

Var(wi)=(2d√)2/12=13d
Var(∑i=1dwixi)=d∗Var(wi)=13

in other words , Random d A weighted sum of input signals , The weight comes from (−1d√,1d√) Uniform distribution , Obey the mean =0, variance =1/3 Is a normal distribution , And with the d irrelevant . So the input to the neuron falls in the interval [-4,4] The probability outside is very small .


A more general form can be written as :

∑i=0d<wixi>=∑i=0d<wi><xi>=0
*(∑i=0dwixi)(∑i=0dwixi)*=∑i=0d<w2i><x2i>=σ2d

Another relatively new initial value method

according to Glorot & Bengio (2010) [4], initialize the weights uniformly within the interval [−b,b], where

b=6Hk+Hk+1−−−−−−−−−√,

Hk and Hk+1 are the sizes of the layers before and after the weight matrix, for sigmoid units. Or hyperbolic tangent units: sample a Uniform [−b,b] with

b=46Hk+Hk+1−−−−−−−−−√,

Initial value methods for other scenarios [2]

  • in the case of RBMs, a zero-mean Gaussian with a small standard deviation around 0.1 or 0.01 works well (Hinton, 2010) to initialize the weights.

  • Orthogonal random matrix initialization, i.e. W = np.random.randn(ndim, ndim); u, s, v = np.linalg.svd(W) then use u as your initialization matrix.


Reference material

[1] http://stats.stackexchange.com/questions/47590/what-are-good-initial-weights-in-a-neural-network

[2] Bengio, Yoshua. “Practical recommendations for gradient-based training of deep architectures.” Neural Networks: Tricks of the Trade. Springer Berlin Heidelberg, 2012. 437-478.

[3] LeCun, Y., Bottou, L., Orr, G. B., and Muller, K. (1998a). Efficient backprop. In Neural Networks, Tricks of the Trade.

[4] Glorot, Xavier, and Yoshua Bengio. “Understanding the difficulty of training deep feedforward neural networks.” International conference on artificial intelligence and statistics. 2010.

Deep learning method ( 6、 ... and ): neural network weight Parameters how to initialize more related articles

  1. Deep learning method ( Ten ): Convolution neural network structure change ——Maxout Networks,Network In Network,Global Average Pooling

    Welcome to reprint , Reprint please indicate : This article from the Bin The column blog.csdn.net/xbinworld. Technical communication QQ Group :433250724, Welcome to the algorithm . Technology interested students join . The next few blog posts will go back to neural network structure ...

  2. Deep learning method ( 11、 ... and ): Convolution neural network structure change ——Google Inception V1-V4,Xception(depthwise convolution)

    Welcome to reprint , Reprint please indicate : This article from the Bin The column blog.csdn.net/xbinworld. Technical communication QQ Group :433250724, Welcome to the algorithm . Students who are interested in machine learning technology join in . The last one talked about deep learning methods ( Ten ) ...

  3. Deep learning method ( 13、 ... and ): Convolution neural network structure change —— Deformable convolution network deformable convolutional networks

    We introduced it in the last post : Deep learning method ( Twelve ): Convolution neural network structure change --Spatial Transformer Networks,STN Creatively in CNN The structure is loaded with a learnable affine transformation , The aim is to increase CNN The rotation of the ...

  4. AI What practitioners need to apply 10 It's a deep learning method

    https://zhuanlan.zhihu.com/p/43636528 https://zhuanlan.zhihu.com/p/43734896 Abstract : Want to understand artificial intelligence , I don't know these ten deep learning methods ...

  5. PyTorch Learning Series ( Nine )—— Parameters _ initialization

    from:http://blog.csdn.net/VictoriaW/article/details/72872036 Before I learned the method of weight initialization in neural network So how to be in pytorch It can be realized in the future . P ...

  6. SLAM Will it be replaced by deep learning ?

    Increasingly, I feel that my understanding of deep learning is superficial , This passage is regarded as a previous understanding . Last week I went to watch bubble robots and AR A forum jointly organized by the Chinese people's Congress . In the round table stage , Mr. Zhang Guofeng asked a question :SLAM Will it be replaced by deep learning ? It's a very interesting remark ...

  7. Deep learning method ( Nine ): In natural language processing Attention Model Attention model

    Welcome to reprint , Reprint please indicate : This article from the Bin The column blog.csdn.NET/xbinworld. Technical communication QQ Group :433250724, Welcome to the algorithm . Technology interested students join . In depth learning methods in the last blog post ( 8、 ... and ):Enc ...

  8. libevent Source code depth analysis 6

    libevent Source code depth analysis 6 —— New event handling framework Zhang liang The front is right libevent Event processing framework and event The structure is described , Now is the time to dissect libevent The detailed processing flow of the event , This section will analyze lib ...

  9. Torch The network layer Parameter initialization problem

    Torch The network layer Parameter initialization problem Reference link : https://github.com/Kaixhin/nninit from Torch The bag that comes with , You can see :https://github.com/tor ...

Random recommendation

  1. Unity AssetBundle Climbing crater

    This article is from AssetBundle The packaging , Use , Management and memory usage are analyzed comprehensively , Yes AssetBundle Use some of the pits in the process of filling guidelines as well as spray !   AssetBundle yes Unity The recommended ...

  2. jquery Let the scroll bar jump to the bottom

    selector.scrollTop(50000);   Add a maximum value : perhaps The formula : The height of the inner container plus the height of the outer container padding, Minus the height of the outer container : var tableHeight = $(' ...

  3. C Language socket Get web source

    Maybe you're using python, similar urlopen(url).read() You can get the source code of ordinary web pages , Or with java Network library plus stream operation , Or other high-level languages . But have you ever thought about using C language ? I used to think ...

  4. Trouble Shooting

    Here's the thing , I'm writing something like Online-Judge The system of , use python Easy to implement , Compile source , Run the program , Finally, compare the program output with the standard output to get the result . Now there's a problem , In case the program crashes at runtime , For example, in addition to 0 abnormal ,a ...

  5. 5.spark Elastic distributed data sets

    Elastic distributed data sets 1 Why Apache Spark 2 About Apache Spark 3 How to install Apache Spark 4 Apache Spark How it works 5 spark Elastic distributed data sets ...

  6. [PDFBox] Backstage operation pdf Tool class of

    PDFBox yes Apache The next operation pdf Class library of . It also provides a command line tool , Also provided java Third party class library called . Download address :https://pdfbox.apache.org/ The following experiment is based on JDK ...

  7. laravel in with Associated queries limit query fields

    I've learned laravel5.6 frame , It's really elegant , such as ActiveJieSuan model This paper makes the following connection :(laravel Model associations can be viewed https://laravelacademy.org/post ...

  8. ArrayList Two noteworthy problems in the source code

    1.“ do things sloppily ” The deletion of Test code : package com.demo; import java.util.ArrayList; public class TestArrayList { public ...

  9. html5 meta The cognitive reserve of labels

    In the development of mobile or PC In addition to '<meta charset="UTF-8">' This is the one that sets the encoding format meta label , There are other settings One .<meta name=& ...

  10. minix The realization of time conversion in (asctime.c)

    stay minix2.0 In the source code , There are quite classic time conversion functions (src\ src\ lib\ ansi\ asctime.c), Let's analyze it today asctime.c The source code First, introduce a few related header files : 1 ...