- 1 brief introduction
- 2 I-A Potential of DL for the physical layer
- 3 I-B Historical context and related work
- 4 II DEEP LEARNING BASICS
- 5 III EXAMPLES OF MACHINE LEARNING APPLICATIONS FOR THE PHYSICAL LAYER
- 6 IV. DISCUSSION AND OPEN RESEARCH CHALLENGES
- 7 Source download （404）
- A complete transmitter and receiver in a given channel is introduced . The key idea is to put the transmitter , The channel and receiver are represented as a deep neural network , It can be trained as an automatic encoder . The advantage is that it can be applied to the channel model and loss function of unknown optimal solution .
- Extend this concept to multiple transceiver counterwork networks , All transmitter and receiver implementations can be jointly optimized for one or more common or single performance metrics .
- The neural network can be integrated into the end-to-end training process of signal transformation task
- The results of the experiment reflect the sustainable development trend of deep learning in various fields , The features learned here eventually outperform and replace the long used expert features .
- I-A： Discuss DL Potential value in the physical layer
- I-B：DL Related content of
- II：DL The background of
- III： Introduce several DL Applications in information transmission
- IV： Discuss open questions and key areas of future research
- V : summary
- 2.1 DL This model has no requirement of mathematical model and specific hardware configuration, and can get better performance
- 2.2 DL The end-to-end information transmission system of , Don't divide a system into several independent modules as before . It can provide a simple way to optimize performance .
- 2.3 Learning algorithms are faster than they can be executed , Lower losses .
- 2.4 Massively parallel processing architecture with distributed storage architecture .
- There are two kinds of will DL The main methods applied to the physical layer . The goal is to use DL improvement / Enhance part of the existing algorithm , Or replace them completely .
What are the levels of neural networks
What are the activation functions
What are the loss functions
- Many tools can be used to quickly build neural networks . such as Keras
- Width ” Used to describe the number of output activations per layer or average of all layers
The main points of this section ：
- This paper introduces how to realize end-to-end automatic decoding and use SGD Communication system for algorithm training
- This concept will be extended to composite transmitters and receivers
- Then we will introduce RTN To improve the performance of fading channels
- Demonstrated CNN Application of modulation classification task in original RF time series data .
Simple end-to-end transmitters - delivery - Receiver model
Communication system of automatic encoder based on Gaussian channel
- Transmitter: Using deep learning method will s Mapping to x. As long as you compress and reconstruct the input nonlinearly . It's the process from generating signals to sending signals .s It's a M Dimensional one-hot vector .
- Receiver: It's also a feedforward neural network , The last layer uses softmax To classify . The output is s It's a choice with the highest probability
- have access to SGD The algorithm trains the end-to-end automatic coder , Using the appropriate cross entropy loss function .
be based on BPSK modulation , Hamming code combined with binary hard decision or maximum likelihood function decoding communication system . The figure below a It is the block error rate of automatic encoder in several benchmark communication schemes （ With fixed energy constraints Hamming (7,4) code;autoencoder (7,4) ）.
- The result shows that , The automatic encoder has learned the function of encoder and decoder without any prior knowledge , Its performance and use MLD Of Hamming Same code .
- Experiments show that ,SGD Using two transport layers instead of one can converge to a better global solution . By adding this dimension parameter to the search space , This kind of solution is more likely to be a saddle point in the optimization process (saddle points) appear , In fact, it helps to reduce the possibility of convergence to the suboptimal minimum .
- Use Adam With 0.001 The rate of learning is fixed Eb / N0 = 7 dB(Eb Represents the average signal energy per bit ,N0 Represents the power spectral density of the noise ) Training on duty . We have observed , increase batch size , At the same time, reducing the learning rate during training can help improve the accuracy .
The figure below b It is the block error rate of automatic encoder in several benchmark communication schemes （an (8,8) and (2,2) communications system ）
Experiments show that ： The automatic encoder is in （2,2） And uncoded BPSK same BLER, But in Eb = N0 Within the whole range of , It's better than （8,8） the latter . This means that the automatic encoder has learned some joint coding and modulation schemes , So the coding gain is obtained .
The layout of the automatic encoder
X Signal generation （ Quadrature phase shift keying （QPSK） Constellations ）
Shows a simple （2; 2） System , The system converges rapidly to the classical quadrature phase shift keying （QPSK） Constellations .
A model composed of two Gaussian interference channels
How to train two coupled automatic encoders with contradictory targets ( The two methods )
A method for the ： Is to minimize the weighted sum of two losses
Another way ： ???
By automatic encoder and different parameters of QAM The time-sharing automatic encoder realizes the communication between two user interference channels BLER
experimental analysis ： Two types of automatic encoders NN The layout is in the table above IV Provided in the , The method is to n Replace with 2n. We have used the average power constraint to compete with higher-order modulation schemes . As a benchmark for comparison , Select uncoded BLER 22k/n-QAM Time sharing with two transmitters （TS） The same rate when used together . Although automatic encoder and time sharing have （1; 1） and （2;2） same BLER, But the former stay 10-3 Of BLER Next , about （4; 4） About 0.7 dB The real gain of , about （4; 8） About 1 dB The real gain of . Its reason and III-A The reasons explained in section are similar .
Experiments show that ： Transmitters have learned to use binary phase shift keying in the quadrature direction （BPSK） Constellations of . In this way, it can be realized with QPSK Same time sharing performance . however , about （2; 2）, The constellations of learning are no longer orthogonal , It can be interpreted as some form of superimposed coding . We can see , The constellations of both transmitters are similar to ellipses with orthogonal axes and different focal lengths . （4; 8） It's better than （4; 4） The effect is more obvious , Because the number of constellations has increased .
Use RTN Network to enhance signal processing . In the receiver . It mainly consists of three parts ：
- The first part ： Learning parameter estimators （ Calculate the input vector y, Output w）
- The second part ： Parameter Converter （ Apply deterministic functions to y, The function consists of w A parameterized And suitable for the spread of the phenomenon ）
- The third part ： The discriminant network of learning （ Get normalized output ）
principle ： It's by optimizing the parameter estimation . It's not about directly optimizing the parameters .RTN Simplify the target manifold by merging domain knowledge , It is similar to the role of convolution layer in transmitting translation variance when appropriate . This leads to a simpler search space and improved generalization of the above auto encoder and RTN It can be extended with minor modifications , Direct manipulation IQ Signals, not symbols , So pulse shaping can be effectively processed 、 timing 、 Frequency and phase offset compensation, etc .
Analysis of experimental results ： There are two advantages
first ： In the multipath fading channel , With or without RTN Of BLER Comparison of . added RTN after ,BLER To reduce the
the second ： added RTN The convergence of post training will be faster
shortcoming ： After expanding the encoder and decoder network and increasing the number of iterations , The performance difference will decrease
Applied to debug classification CNN The structure of the neural network is as follows ：
The picture below shows how to CNN Classification accuracy and use of 1000 Extreme gradient enhancement of two estimators and single scikit-learn Decision trees are compared
The image below shows SNR = 10 dB when CNN Confusion matrix of , Reveals QAM16 And QAM64 And broadband FM（WBFM） With the bilateral belt AM（AM-DSB） The confusion between
Analysis of experimental results ： The short-term nature of these examples makes this task at the difficult end of the modulation classification spectrum , Because we can't calculate the expert characteristics with high stability for a long time . From low to medium SNR Within the scope of ,CNN The performance of the proposed classifier is better than that of the enhanced feature-based classifier 4 dB, And high SNR Similar performance . In the case of a single tree , Performance ratio at SNR CNN Bad 6 dB, But the performance is poor in high SNR 3.5％.
A. Data sets and challenges
B. Data representation, loss functions, and training SNR
C. Complex-valued neural networks
D. ML-augmented signal processing
E. System identification for end-to-end learning
unfortunately , Read a whole paper , Eager to , It turns out that the code is gone . Cry to death ...