Welcome to reprint , Reprint please indicate ： This article from the Bin The column blog.csdn.net/xbinworld.

Technical communication QQ Group ：433250724, Welcome to the algorithm 、 Technology interested students join .

neural network , Or parameter initialization of deep learning algorithm is a very important aspect , The traditional initialization method initializes parameters randomly from Gaussian distribution . It's even directly initialized to 1 perhaps 0. This way, violence is direct , But often the effect is mediocre . The narration of this article comes from a discussion post abroad [1], Let's elaborate on our own understanding .

First of all, let's think about , Why in neural network algorithms （ To simplify the problem , We use the most basic DNN Think about ） in , The choice of parameters is very important ？ With sigmoid function （logistic neurons） For example , When x When the absolute value of becomes larger , Function values are getting smoother , Tend to saturate , At this point, the reciprocal of the function tends to 0, for example , stay x=2 when , The derivative of the function is about 1/10, And in the x=10 when , The derivative of the function has become about 1/22000, in other words , The input to the activation function is 10 Than when 2 The learning speed of neural network is slower when the network is in the process of learning 2200 times ！

In order to make the neural network learn faster , We want to activate the function sigmoid The derivative of is larger . Numerically , About sigmoid The input is in [-4,4] Between , See above . Of course , It doesn't have to be that precise . We know , A neuron j The input of is the weighted sum of the outputs of the previous layer of neurons ,xj=∑iai⋅wi+bj. therefore , We can control the range of the initial value of the weight parameter , So that the input of neurons falls within the range we need .

** One is relatively simple 、 The effective way is ： The weight parameter initialization is uniformly random from the interval .**

**(−1d√,1d√), among d It's the number of inputs to a neuron .**

In order to show the rationality of this value , Let's briefly review some basic knowledge ：

**1. Consistent with uniform distribution U（a,b） The mathematical expectation and variance of random variables are respectively —— Mathematical expectation ：E(X)=(a+b)/2, variance ：D(X)=(b-a)²/12**

**2. If the random variable X,Y It's independent of each other , that Var(X+Y) = Var(X)+Var(Y), If X,Y Is independent of each other and the mean value is 0, that Var(X*Y) = Var(X)*Var(Y)**

therefore , If we limit the input of neurons (xi) It's the mean =0, Standard deviation =1 Of , that

** in other words , Random d A weighted sum of input signals , The weight comes from (−1d√,1d√) Uniform distribution , Obey the mean =0, variance =1/3 Is a normal distribution , And with the d irrelevant . So the input to the neuron falls in the interval [-4,4] The probability outside is very small .**

A more general form can be written as ：

** Another relatively new initial value method **

according to Glorot & Bengio (2010) [4], initialize the weights uniformly within the interval [−b,b], where

Hk and Hk+1 are the sizes of the layers before and after the weight matrix, for **sigmoid** units. Or hyperbolic **tangent** units: sample a Uniform [−b,b] with

** Initial value methods for other scenarios [2]**

in the case of RBMs, a zero-mean Gaussian with a small standard deviation around 0.1 or 0.01 works well (Hinton, 2010) to initialize the weights.

Orthogonal random matrix initialization, i.e. W = np.random.randn(ndim, ndim); u, s, v = np.linalg.svd(W) then use u as your initialization matrix.

## Reference material

[1] http://stats.stackexchange.com/questions/47590/what-are-good-initial-weights-in-a-neural-network

[2] Bengio, Yoshua. “Practical recommendations for gradient-based training of deep architectures.” Neural Networks: Tricks of the Trade. Springer Berlin Heidelberg, 2012. 437-478.

[3] LeCun, Y., Bottou, L., Orr, G. B., and Muller, K. (1998a). Efficient backprop. In Neural Networks, Tricks of the Trade.

[4] Glorot, Xavier, and Yoshua Bengio. “Understanding the difficulty of training deep feedforward neural networks.” International conference on artificial intelligence and statistics. 2010.

## Deep learning method （ 6、 ... and ）： neural network weight Parameters how to initialize more related articles

- Deep learning method （ Ten ）： Convolution neural network structure change ——Maxout Networks,Network In Network,Global Average Pooling
Welcome to reprint , Reprint please indicate : This article from the Bin The column blog.csdn.net/xbinworld. Technical communication QQ Group :433250724, Welcome to the algorithm . Technology interested students join . The next few blog posts will go back to neural network structure ...

- Deep learning method （ 11、 ... and ）： Convolution neural network structure change ——Google Inception V1-V4,Xception（depthwise convolution）
Welcome to reprint , Reprint please indicate : This article from the Bin The column blog.csdn.net/xbinworld. Technical communication QQ Group :433250724, Welcome to the algorithm . Students who are interested in machine learning technology join in . The last one talked about deep learning methods ( Ten ) ...

- Deep learning method （ 13、 ... and ）： Convolution neural network structure change —— Deformable convolution network deformable convolutional networks
We introduced it in the last post : Deep learning method ( Twelve ): Convolution neural network structure change --Spatial Transformer Networks,STN Creatively in CNN The structure is loaded with a learnable affine transformation , The aim is to increase CNN The rotation of the ...

- AI What practitioners need to apply 10 It's a deep learning method
https://zhuanlan.zhihu.com/p/43636528 https://zhuanlan.zhihu.com/p/43734896 Abstract : Want to understand artificial intelligence , I don't know these ten deep learning methods ...

- PyTorch Learning Series ( Nine )—— Parameters _ initialization
from:http://blog.csdn.net/VictoriaW/article/details/72872036 Before I learned the method of weight initialization in neural network So how to be in pytorch It can be realized in the future . P ...

- SLAM Will it be replaced by deep learning ？
Increasingly, I feel that my understanding of deep learning is superficial , This passage is regarded as a previous understanding . Last week I went to watch bubble robots and AR A forum jointly organized by the Chinese people's Congress . In the round table stage , Mr. Zhang Guofeng asked a question :SLAM Will it be replaced by deep learning ? It's a very interesting remark ...

- Deep learning method （ Nine ）： In natural language processing Attention Model Attention model
Welcome to reprint , Reprint please indicate : This article from the Bin The column blog.csdn.NET/xbinworld. Technical communication QQ Group :433250724, Welcome to the algorithm . Technology interested students join . In depth learning methods in the last blog post ( 8、 ... and ):Enc ...

- libevent Source code depth analysis 6
libevent Source code depth analysis 6 —— New event handling framework Zhang liang The front is right libevent Event processing framework and event The structure is described , Now is the time to dissect libevent The detailed processing flow of the event , This section will analyze lib ...

- Torch The network layer Parameter initialization problem
Torch The network layer Parameter initialization problem Reference link : https://github.com/Kaixhin/nninit from Torch The bag that comes with , You can see :https://github.com/tor ...

## Random recommendation

- Unity AssetBundle Climbing crater
This article is from AssetBundle The packaging , Use , Management and memory usage are analyzed comprehensively , Yes AssetBundle Use some of the pits in the process of filling guidelines as well as spray ! AssetBundle yes Unity The recommended ...

- jquery Let the scroll bar jump to the bottom
selector.scrollTop(50000); Add a maximum value : perhaps The formula : The height of the inner container plus the height of the outer container padding, Minus the height of the outer container : var tableHeight = $(' ...

- C Language socket Get web source
Maybe you're using python, similar urlopen(url).read() You can get the source code of ordinary web pages , Or with java Network library plus stream operation , Or other high-level languages . But have you ever thought about using C language ? I used to think ...

- Trouble Shooting
Here's the thing , I'm writing something like Online-Judge The system of , use python Easy to implement , Compile source , Run the program , Finally, compare the program output with the standard output to get the result . Now there's a problem , In case the program crashes at runtime , For example, in addition to 0 abnormal ,a ...

- 5.spark Elastic distributed data sets
Elastic distributed data sets 1 Why Apache Spark 2 About Apache Spark 3 How to install Apache Spark 4 Apache Spark How it works 5 spark Elastic distributed data sets ...

- [PDFBox] Backstage operation pdf Tool class of
PDFBox yes Apache The next operation pdf Class library of . It also provides a command line tool , Also provided java Third party class library called . Download address :https://pdfbox.apache.org/ The following experiment is based on JDK ...

- laravel in with Associated queries limit query fields
I've learned laravel5.6 frame , It's really elegant , such as ActiveJieSuan model This paper makes the following connection :(laravel Model associations can be viewed https://laravelacademy.org/post ...

- ArrayList Two noteworthy problems in the source code
1.“ do things sloppily ” The deletion of Test code : package com.demo; import java.util.ArrayList; public class TestArrayList { public ...

- html5 meta The cognitive reserve of labels
In the development of mobile or PC In addition to '<meta charset="UTF-8">' This is the one that sets the encoding format meta label , There are other settings One .<meta name=& ...

- minix The realization of time conversion in （asctime.c）
stay minix2.0 In the source code , There are quite classic time conversion functions (src\ src\ lib\ ansi\ asctime.c), Let's analyze it today asctime.c The source code First, introduce a few related header files : 1 ...