An introduction to deep learning for the physical layer
BetterBench 2021-06-04 21:15:20

1 brief introduction

1.1 Main contributions

  • A complete transmitter and receiver in a given channel is introduced . The key idea is to put the transmitter , The channel and receiver are represented as a deep neural network , It can be trained as an automatic encoder . The advantage is that it can be applied to the channel model and loss function of unknown optimal solution .
  • Extend this concept to multiple transceiver counterwork networks , All transmitter and receiver implementations can be jointly optimized for one or more common or single performance metrics .
  • The neural network can be integrated into the end-to-end training process of signal transformation task
  • The results of the experiment reflect the sustainable development trend of deep learning in various fields , The features learned here eventually outperform and replace the long used expert features .

1.2 The general content of the paper

  • I-A: Discuss DL Potential value in the physical layer
  • I-B:DL Related content of
  • II:DL The background of
  • III: Introduce several DL Applications in information transmission
  • IV: Discuss open questions and key areas of future research
  • V : summary

2 I-A Potential of DL for the physical layer

  • 2.1 DL This model has no requirement of mathematical model and specific hardware configuration, and can get better performance
  • 2.2 DL The end-to-end information transmission system of , Don't divide a system into several independent modules as before . It can provide a simple way to optimize performance .
  • 2.3 Learning algorithms are faster than they can be executed , Lower losses .
  • 2.4 Massively parallel processing architecture with distributed storage architecture .

3 I-B Historical context and related work

  • There are two kinds of will DL The main methods applied to the physical layer . The goal is to use DL improvement / Enhance part of the existing algorithm , Or replace them completely .


4.1 Basic introduction

  • What are the levels of neural networks
     Insert picture description here

  • What are the activation functions
     Insert picture description here

  • What are the loss functions
     Insert picture description here

4.2 A. Convolutional layers

4.2 B. Deep learning libraries

  • Many tools can be used to quickly build neural networks . such as Keras

4.3 C. Network dimensions and training

  • Width ” Used to describe the number of output activations per layer or average of all layers


The main points of this section :

  • This paper introduces how to realize end-to-end automatic decoding and use SGD Communication system for algorithm training
  • This concept will be extended to composite transmitters and receivers
  • Then we will introduce RTN To improve the performance of fading channels
  • Demonstrated CNN Application of modulation classification task in original RF time series data .

5.1 A. Autoencoders for end-to-end communications systems

  • Simple end-to-end transmitters - delivery - Receiver model
     Insert picture description here

  • Communication system of automatic encoder based on Gaussian channel
     Insert picture description here

  • Autoencoder

    • Transmitter: Using deep learning method will s Mapping to x. As long as you compress and reconstruct the input nonlinearly . It's the process from generating signals to sending signals .s It's a M Dimensional one-hot vector .
    • Receiver: It's also a feedforward neural network , The last layer uses softmax To classify . The output is s It's a choice with the highest probability
    • have access to SGD The algorithm trains the end-to-end automatic coder , Using the appropriate cross entropy loss function .
  • be based on BPSK modulation , Hamming code combined with binary hard decision or maximum likelihood function decoding communication system . The figure below a It is the block error rate of automatic encoder in several benchmark communication schemes ( With fixed energy constraints Hamming (7,4) code;autoencoder (7,4) ).
     Insert picture description here

    • The result shows that , The automatic encoder has learned the function of encoder and decoder without any prior knowledge , Its performance and use MLD Of Hamming Same code .
    • Experiments show that ,SGD Using two transport layers instead of one can converge to a better global solution . By adding this dimension parameter to the search space , This kind of solution is more likely to be a saddle point in the optimization process (saddle points) appear , In fact, it helps to reduce the possibility of convergence to the suboptimal minimum .
    • Use Adam With 0.001 The rate of learning is fixed Eb / N0 = 7 dB(Eb Represents the average signal energy per bit ,N0 Represents the power spectral density of the noise ) Training on duty . We have observed , increase batch size , At the same time, reducing the learning rate during training can help improve the accuracy .
  • The figure below b It is the block error rate of automatic encoder in several benchmark communication schemes (an (8,8) and (2,2) communications system )
     Insert picture description here

Experiments show that : The automatic encoder is in (2,2) And uncoded BPSK same BLER, But in Eb = N0 Within the whole range of , It's better than (8,8) the latter . This means that the automatic encoder has learned some joint coding and modulation schemes , So the coding gain is obtained .

  • The layout of the automatic encoder
     Insert picture description here

  • X Signal generation ( Quadrature phase shift keying (QPSK) Constellations )
     Insert picture description here

Shows a simple (2; 2) System , The system converges rapidly to the classical quadrature phase shift keying (QPSK) Constellations .

5.2 B. Autoencoders for multiple transmitters and receivers

  • A model composed of two Gaussian interference channels
     Insert picture description here

  • How to train two coupled automatic encoders with contradictory targets ( The two methods )

  • A method for the : Is to minimize the weighted sum of two losses

  • Another way : ???

  • By automatic encoder and different parameters of QAM The time-sharing automatic encoder realizes the communication between two user interference channels BLER
     Insert picture description here

experimental analysis : Two types of automatic encoders NN The layout is in the table above IV Provided in the , The method is to n Replace with 2n. We have used the average power constraint to compete with higher-order modulation schemes . As a benchmark for comparison , Select uncoded BLER 22k/n-QAM Time sharing with two transmitters (TS) The same rate when used together . Although automatic encoder and time sharing have (1; 1) and (2;2) same BLER, But the former stay 10-3 Of BLER Next , about (4; 4) About 0.7 dB The real gain of , about (4; 8) About 1 dB The real gain of . Its reason and III-A The reasons explained in section are similar .
 Insert picture description here

Experiments show that : Transmitters have learned to use binary phase shift keying in the quadrature direction (BPSK) Constellations of . In this way, it can be realized with QPSK Same time sharing performance . however , about (2; 2), The constellations of learning are no longer orthogonal , It can be interpreted as some form of superimposed coding . We can see , The constellations of both transmitters are similar to ellipses with orthogonal axes and different focal lengths . (4; 8) It's better than (4; 4) The effect is more obvious , Because the number of constellations has increased .

5.3 C. Radio transformer networks for augmented signal processing algorithms

Use RTN Network to enhance signal processing . In the receiver . It mainly consists of three parts :

  • The first part : Learning parameter estimators ( Calculate the input vector y, Output w)
  • The second part : Parameter Converter ( Apply deterministic functions to y, The function consists of w A parameterized And suitable for the spread of the phenomenon )
  • The third part : The discriminant network of learning ( Get normalized output )
     Insert picture description here

principle : It's by optimizing the parameter estimation . It's not about directly optimizing the parameters .RTN Simplify the target manifold by merging domain knowledge , It is similar to the role of convolution layer in transmitting translation variance when appropriate . This leads to a simpler search space and improved generalization of the above auto encoder and RTN It can be extended with minor modifications , Direct manipulation IQ Signals, not symbols , So pulse shaping can be effectively processed 、 timing 、 Frequency and phase offset compensation, etc .

Analysis of experimental results : There are two advantages

  • first : In the multipath fading channel , With or without RTN Of BLER Comparison of . added RTN after ,BLER To reduce the
     Insert picture description here

  • the second : added RTN The convergence of post training will be faster
     Insert picture description here

shortcoming : After expanding the encoder and decoder network and increasing the number of iterations , The performance difference will decrease

5.4 CNNs for classification tasks

  • Applied to debug classification CNN The structure of the neural network is as follows :
     Insert picture description here

  • The picture below shows how to CNN Classification accuracy and use of 1000 Extreme gradient enhancement of two estimators and single scikit-learn Decision trees are compared
     Insert picture description here

  • The image below shows SNR = 10 dB when CNN Confusion matrix of , Reveals QAM16 And QAM64 And broadband FM(WBFM) With the bilateral belt AM(AM-DSB) The confusion between
     Insert picture description here

  • Analysis of experimental results : The short-term nature of these examples makes this task at the difficult end of the modulation classification spectrum , Because we can't calculate the expert characteristics with high stability for a long time . From low to medium SNR Within the scope of ,CNN The performance of the proposed classifier is better than that of the enhanced feature-based classifier 4 dB, And high SNR Similar performance . In the case of a single tree , Performance ratio at SNR CNN Bad 6 dB, But the performance is poor in high SNR 3.5%.


A. Data sets and challenges
B. Data representation, loss functions, and training SNR
C. Complex-valued neural networks
D. ML-augmented signal processing
E. System identification for end-to-end learning

7 Source download (404)

unfortunately , Read a whole paper , Eager to , It turns out that the code is gone . Cry to death ...

Please bring the original link to reprint ,thank
Similar articles