Technical communication QQ Group ：433250724, Welcome to the algorithm 、 Technology interested students join .

neural network , Or parameter initialization of deep learning algorithm is a very important aspect , The traditional initialization method initializes parameters randomly from Gaussian distribution . It's even directly initialized to 1 perhaps 0. This way, violence is direct , But often the effect is mediocre . The narration of this article comes from a discussion post abroad , Let's elaborate on our own understanding .

First of all, let's think about , Why in neural network algorithms （ To simplify the problem , We use the most basic DNN Think about ） in , The choice of parameters is very important ？ With sigmoid function （logistic neurons） For example , When x When the absolute value of becomes larger , Function values are getting smoother , Tend to saturate , At this point, the reciprocal of the function tends to 0, for example , stay x=2 when , The derivative of the function is about 1/10, And in the x=10 when , The derivative of the function has become about 1/22000, in other words , The input to the activation function is 10 Than when 2 The learning speed of neural network is slower when the network is in the process of learning 2200 times ！ In order to make the neural network learn faster , We want to activate the function sigmoid The derivative of is larger . Numerically , About sigmoid The input is in [-4,4] Between , See above . Of course , It doesn't have to be that precise . We know , A neuron j The input of is the weighted sum of the outputs of the previous layer of neurons ,xj=∑iai⋅wi+bj. therefore , We can control the range of the initial value of the weight parameter , So that the input of neurons falls within the range we need .

## One is relatively simple 、 The effective way is ： The weight parameter initialization is uniformly random from the interval .

(−1d√,1d√), among d It's the number of inputs to a neuron .

In order to show the rationality of this value , Let's briefly review some basic knowledge ：

1. Consistent with uniform distribution U（a,b） The mathematical expectation and variance of random variables are respectively —— Mathematical expectation ：E(X)=(a+b)/2, variance ：D(X)=(b-a)²/12

2. If the random variable X,Y It's independent of each other , that Var(X+Y) = Var(X)+Var(Y), If X,Y Is independent of each other and the mean value is 0, that Var(X*Y) = Var(X)*Var(Y)

therefore , If we limit the input of neurons (xi) It's the mean =0, Standard deviation =1 Of , that

Var(wi)=(2d√)2/12=13d
Var(∑i=1dwixi)=d∗Var(wi)=13

in other words , Random d A weighted sum of input signals , The weight comes from (−1d√,1d√) Uniform distribution , Obey the mean =0, variance =1/3 Is a normal distribution , And with the d irrelevant . So the input to the neuron falls in the interval [-4,4] The probability outside is very small .

A more general form can be written as ：

∑i=0d<wixi>=∑i=0d<wi><xi>=0
*(∑i=0dwixi)(∑i=0dwixi)*=∑i=0d<w2i><x2i>=σ2d

## Another relatively new initial value method

according to Glorot & Bengio (2010) , initialize the weights uniformly within the interval [−b,b], where

b=6Hk+Hk+1−−−−−−−−−√,

Hk and Hk+1 are the sizes of the layers before and after the weight matrix, for sigmoid units. Or hyperbolic tangent units: sample a Uniform [−b,b] with

b=46Hk+Hk+1−−−−−−−−−√,

## Initial value methods for other scenarios 

• in the case of RBMs, a zero-mean Gaussian with a small standard deviation around 0.1 or 0.01 works well (Hinton, 2010) to initialize the weights.

• Orthogonal random matrix initialization, i.e. W = np.random.randn(ndim, ndim); u, s, v = np.linalg.svd(W) then use u as your initialization matrix.

## Reference material

 Bengio, Yoshua. “Practical recommendations for gradient-based training of deep architectures.” Neural Networks: Tricks of the Trade. Springer Berlin Heidelberg, 2012. 437-478.

 LeCun, Y., Bottou, L., Orr, G. B., and Muller, K. (1998a). Efficient backprop. In Neural Networks, Tricks of the Trade.

 Glorot, Xavier, and Yoshua Bengio. “Understanding the difficulty of training deep feedforward neural networks.” International conference on artificial intelligence and statistics. 2010.

## Deep learning method （ 6、 ... and ）： neural network weight Parameters how to initialize more related articles

1. Deep learning method （ Ten ）： Convolution neural network structure change ——Maxout Networks,Network In Network,Global Average Pooling

Welcome to reprint , Reprint please indicate : This article from the Bin The column blog.csdn.net/xbinworld. Technical communication QQ Group :433250724, Welcome to the algorithm . Technology interested students join . The next few blog posts will go back to neural network structure ...

2. Deep learning method （ 11、 ... and ）： Convolution neural network structure change ——Google Inception V1-V4,Xception（depthwise convolution）

Welcome to reprint , Reprint please indicate : This article from the Bin The column blog.csdn.net/xbinworld. Technical communication QQ Group :433250724, Welcome to the algorithm . Students who are interested in machine learning technology join in . The last one talked about deep learning methods ( Ten ) ...

3. Deep learning method （ 13、 ... and ）： Convolution neural network structure change —— Deformable convolution network deformable convolutional networks

We introduced it in the last post : Deep learning method ( Twelve ): Convolution neural network structure change --Spatial Transformer Networks,STN Creatively in CNN The structure is loaded with a learnable affine transformation , The aim is to increase CNN The rotation of the ...

4. AI What practitioners need to apply 10 It's a deep learning method

https://zhuanlan.zhihu.com/p/43636528 https://zhuanlan.zhihu.com/p/43734896 Abstract : Want to understand artificial intelligence , I don't know these ten deep learning methods ...

5. PyTorch Learning Series ( Nine )—— Parameters _ initialization

from:http://blog.csdn.net/VictoriaW/article/details/72872036 Before I learned the method of weight initialization in neural network So how to be in pytorch It can be realized in the future . P ...

6. SLAM Will it be replaced by deep learning ？

Increasingly, I feel that my understanding of deep learning is superficial , This passage is regarded as a previous understanding . Last week I went to watch bubble robots and AR A forum jointly organized by the Chinese people's Congress . In the round table stage , Mr. Zhang Guofeng asked a question :SLAM Will it be replaced by deep learning ? It's a very interesting remark ...

7. Deep learning method （ Nine ）： In natural language processing Attention Model Attention model

Welcome to reprint , Reprint please indicate : This article from the Bin The column blog.csdn.NET/xbinworld. Technical communication QQ Group :433250724, Welcome to the algorithm . Technology interested students join . In depth learning methods in the last blog post ( 8、 ... and ):Enc ...

8. libevent Source code depth analysis 6

libevent Source code depth analysis 6 —— New event handling framework Zhang liang The front is right libevent Event processing framework and event The structure is described , Now is the time to dissect libevent The detailed processing flow of the event , This section will analyze lib ...

9. Torch The network layer Parameter initialization problem

Torch The network layer Parameter initialization problem Reference link : https://github.com/Kaixhin/nninit from Torch The bag that comes with , You can see :https://github.com/tor ...

## Random recommendation

1. Unity AssetBundle Climbing crater

This article is from AssetBundle The packaging , Use , Management and memory usage are analyzed comprehensively , Yes AssetBundle Use some of the pits in the process of filling guidelines as well as spray !   AssetBundle yes Unity The recommended ...

selector.scrollTop(50000);   Add a maximum value : perhaps The formula : The height of the inner container plus the height of the outer container padding, Minus the height of the outer container : var tableHeight = \$(' ...

3. C Language socket Get web source

Maybe you're using python, similar urlopen(url).read() You can get the source code of ordinary web pages , Or with java Network library plus stream operation , Or other high-level languages . But have you ever thought about using C language ? I used to think ...

4. Trouble Shooting

Here's the thing , I'm writing something like Online-Judge The system of , use python Easy to implement , Compile source , Run the program , Finally, compare the program output with the standard output to get the result . Now there's a problem , In case the program crashes at runtime , For example, in addition to 0 abnormal ,a ...

5. 5.spark Elastic distributed data sets

Elastic distributed data sets 1 Why Apache Spark 2 About Apache Spark 3 How to install Apache Spark 4 Apache Spark How it works 5 spark Elastic distributed data sets ...

6. [PDFBox] Backstage operation pdf Tool class of

PDFBox yes Apache The next operation pdf Class library of . It also provides a command line tool , Also provided java Third party class library called . Download address :https://pdfbox.apache.org/ The following experiment is based on JDK ...

7. laravel in with Associated queries limit query fields

I've learned laravel5.6 frame , It's really elegant , such as ActiveJieSuan model This paper makes the following connection :(laravel Model associations can be viewed https://laravelacademy.org/post ...

8. ArrayList Two noteworthy problems in the source code

1.“ do things sloppily ” The deletion of Test code : package com.demo; import java.util.ArrayList; public class TestArrayList { public ...

9. html5 meta The cognitive reserve of labels

In the development of mobile or PC In addition to '<meta charset="UTF-8">' This is the one that sets the encoding format meta label , There are other settings One .<meta name=& ...

10. minix The realization of time conversion in （asctime.c）

stay minix2.0 In the source code , There are quite classic time conversion functions (src\ src\ lib\ ansi\ asctime.c), Let's analyze it today asctime.c The source code First, introduce a few related header files : 1 ...