PyTorch Basic practice
PyTorch Basics
In this book , We use it widely PyTorch To implement our deep learning model .PyTorch It's open source 、 Community driven deep learning framework . And Theano、Caffe and TensorFlow Different ,PyTorch A kind of “ Tape based automatic differentiation ” Method , Allows us to dynamically define and execute computational graphics . This is very helpful for debugging and building complex models with minimal effort .
dynamic VS Static calculation diagram image Theano、Caffe and TensorFlow Such a static framework needs to first declare 、 Compile and execute the calculation diagram . Although this leads to a very efficient implementation （ Very useful in production and mobile settings ）, But it may become very troublesome in the process of research and development .
image Chainer、DyNet and PyTorch Such a modern framework The dynamic calculation diagram is realized , To support a more flexible imperative development style , Instead of compiling the model before each execution .
Dynamic calculation diagram Modeling NLP Especially useful for tasks , Each input may lead to a different graph structure .
PyTorch It is an optimized tensor operation library , It provides a series of packages for deep learning .
The core of this library is tensor , It is a mathematical object that contains some multidimensional data .
0 The order tensor is a number , perhaps Scalar .
First order tensor （ First order tensor ） Is an array of numbers , Or a vector . Similarly , The secondorder tensor is an array of vectors , Or a matrix .
therefore , Tensor can be generalized to scalar n
Dimension group .
In the following sections , We will use PyTorch Learn the following :
 Create tensor
 Operations and tensors
 Indexes 、 Slice and connect with tensor
 Calculate the gradient with tensor
 Use a gpu Of CUDA tensor
In the rest of this section , We will use... First PyTorch To get familiar with all kinds of PyTorch operation . We recommend that you now have installed PyTorch And ready Python 3.5+ The notebook , And follow the examples in this section . We also recommend that you complete the exercises later in this section .
install PyTorch
The first step is through pytorch.org Select your system preferences and install on your machine PyTorch. Choose your operating system , Then select package manager （ We recommend conda/pip
）, Then select the... You are using Python edition （ We recommend 3.5+）. This will generate the command , So that you can perform the installation PyTorch. At the time of writing ,conda The installation commands of the environment are as follows :
conda install pytorch torchvision c pytorch
Be careful ： If you have a support CUDA Graphics processor unit （GPU）, You should also choose the right CUDA edition . For more details , Please refer to pytorch.org Installation instructions on .
Please refer to : PyTorch The latest installation tutorial （20210727）
Create tensor
First , We define an auxiliary function , describe （x
）, It sums up the tensor x
Various properties of , For example, the type of tensor 、 The dimension of the tensor and the content of the tensor :
Input[0]:
def describe(x):
print("Type: {}".format(x.type()))
print("Shape/size: {}".format(x.shape))
print("Values: \n{}".format(x))
PyTorch Allow us to use torch
Packages create tensors in many different ways . One way to create a tensor is to initialize it by specifying the dimension of a random tensor , For example 13 Shown .
Example 13： stay PyTorch Use in torch.Tensor
Create tensor
Input[0]:
import torch
describe(torch.Tensor(2, 3))
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 3.2018e05, 4.5747e41, 2.5058e+25],
[ 3.0813e41, 4.4842e44, 0.0000e+00]])
We can also create a tensor through the uniform distribution on the random initialization value interval （0,1
） Or standard normal distribution （ The tensor is initialized randomly from a uniform distribution , say , It's important , As you will see in chapters 3 and 4 ）, See example 14.
Example 14： Create randomly initialized tensors
Input[0]：
import torch
describe(torch.rand(2, 3)) # uniform random
describe(torch.randn(2, 3)) # random normal
Output[0]：
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0.0242, 0.6630, 0.9787],
[ 0.1037, 0.3920, 0.6084]])
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[0.1330, 2.9222, 1.3649],
[ 2.3648, 1.1561, 1.5042]])
We can also create tensor , All tensors are filled with the same scalar . For create 0 or 1 tensor , We have builtin functions , For filling in specific values , We can use fill_()
Method .
Any underlined （_
） Of PyTorch Methods all refer to local （in place） operation ; in other words , It modifies content in place without creating new objects , As the sample 15 Shown .
Example 15： Create a filled tensor
Input[0]:
import torch
describe(torch.zeros(2, 3))
x = torch.ones(2, 3)
describe(x)
x.fill_(5)
describe(x)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0., 0., 0.],
[ 0., 0., 0.]])
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 1., 1., 1.],
[ 1., 1., 1.]])
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 5., 5., 5.],
[ 5., 5., 5.]])
Example 16 Demonstrates how to use Python The list creates tensors declaratively .
Example 16： Create and initialize tensors from the list
Input[0]:
x = torch.Tensor([[1, 2, 3],
[4, 5, 6]])
describe(x)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 1., 2., 3.],
[ 4., 5., 6.]])
Values can come from the list （ As in the previous example ）, It can also come from NumPy Array . Of course , We can also from PyTorch Tensor transformation to NumPy Array .
Be careful , The type of this tensor is a double
tensor , Not the default FloatTensor
. This corresponds to NumPy Data type of random matrix float64
, As the sample 17 Shown .
Example 17： from NumPy Create and initialize tensors
Input[0]:
import torch
import numpy as np
npy = np.random.rand(2, 3)
describe(torch.from_numpy(npy))
Output[0]:
Type: torch.DoubleTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0.8360, 0.8836, 0.0545],
[ 0.6928, 0.2333, 0.7984]], dtype=torch.float64)
Use... In processing Numpy Legacy Library of format values （legacy libraries） when , stay NumPy and PyTorch The ability to switch between tensors becomes very important .
Tensor type and size
Each tensor has a related type and size . Use torch
The default tensor type at . The tensor constructor is torch.FloatTensor
. however , The tensor can be specified at initialization , You can also use the type conversion method to convert the tensor to another type later （float
、long
、double
etc. ）. There are two ways to specify the initialization type , One is to call a specific tensor type directly （ Such as FloatTensor
and LongTensor
） Constructor for , The other is to use special methods torch.tensor
, And provide dtype
, For example 18 Shown .
Example 18： Tensor properties
Input[0]:
x = torch.FloatTensor([[1, 2, 3],
[4, 5, 6]])
describe(x)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 1., 2., 3.],
[ 4., 5., 6.]])
Input[1]:
x = x.long()
describe(x)
Output[1]:
Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 1, 2, 3],
[ 4, 5, 6]])
Input[2]:
x = torch.tensor([[1, 2, 3],
[4, 5, 6]], dtype=torch.int64)
describe(x)
Output[2]:
Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 1, 2, 3],
[ 4, 5, 6]])
Input[3]:
x = x.float()
describe(x)
Output[3]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 1., 2., 3.],
[ 4., 5., 6.]])
We use the shape characteristics and size method of tensor object to obtain the measured value of its size . The two ways to access these metrics are basically the same . In the debug PyTorch Code , Checking the shape of the tensor becomes an essential tool .
Tensor operation
After creating the tensor , It can be like dealing with traditional programming language types （ Such as +
、
、*
and /
） Operate on them like that . Except for the operator , We can also use .add()
Functions like that , As the sample 19 Shown , These functions correspond to the symbolic operator .
Example 19： Tensor operation ： Add
Input[0]:
import torch
x = torch.randn(2, 3)
describe(x)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0.0461, 0.4024, 1.0115],
[ 0.2167, 0.6123, 0.5036]])
Input[1]:
describe(torch.add(x, x))
Output[1]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0.0923, 0.8048, 2.0231],
[ 0.4335, 1.2245, 1.0072]])
Input[2]:
describe(x + x)
Output[2]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0.0923, 0.8048, 2.0231],
[ 0.4335, 1.2245, 1.0072]])
There are also some operations that can be applied to specific dimensions of tensors . As you may have noticed , about 2D tensor , We represent rows as dimensions 0, The list is shown as dimension 1, As the sample 110 Shown .
Example 110： Dimension based tensor operations
Input[0]:
import torch
x = torch.arange(6)
describe(x)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([6])
Values:
tensor([ 0., 1., 2., 3., 4., 5.])
Input[1]:
x = x.view(2, 3)
describe(x)
Output[1]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0., 1., 2.],
[ 3., 4., 5.]])
Input[2]:
describe(torch.sum(x, dim=0))
Output[2]:
Type: torch.FloatTensor
Shape/size: torch.Size([3])
Values:
tensor([ 3., 5., 7.])
Input[3]:
describe(torch.sum(x, dim=1))
Output[3]:
Type: torch.FloatTensor
Shape/size: torch.Size([2])
Values:
tensor([ 3., 12.])
Input[4]:
describe(torch.transpose(x, 0, 1))
Output[4]:
Type: torch.FloatTensor
Shape/size: torch.Size([3, 2])
Values:
tensor([[ 0., 3.],
[ 1., 4.],
[ 2., 5.]])
Usually , We need to perform more complex operations , Include index 、 section 、 Connection and mutation （indexing,slicing,joining and mutation） The combination of . And NumPy Like other digital libraries ,PyTorch There are also builtin functions , This kind of tensor operation can be made very simple .
Indexes , Slicing and linking
If you are a NumPy user , Then you may be very familiar with the example 111 Shown in PyTorch Indexing and slicing scheme .
Example 111： Slice and index tensors
Input[0]:
import torch
x = torch.arange(6).view(2, 3)
describe(x)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0., 1., 2.],
[ 3., 4., 5.]])
Input[1]:
describe(x[:1, :2])
Output[1]:
Type: torch.FloatTensor
Shape/size: torch.Size([1, 2])
Values:
tensor([[ 0., 1.]])
Input[2]:
describe(x[0, 1])
Output[2]:
Type: torch.FloatTensor
Shape/size: torch.Size([])
Values:
1.0
Example 112 Demonstrated PyTorch It also has functions for complex indexing and slicing operations , You may be interested in effectively accessing discontinuous positions of tensors .
Example 112： Complex index ： Discontinuous index of tensor
Input[0]:
indices = torch.LongTensor([0, 2])
describe(torch.index_select(x, dim=1, index=indices))
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values:
tensor([[ 0., 2.],
[ 3., 5.]])
Input[1]:
indices = torch.LongTensor([0, 0])
describe(torch.index_select(x, dim=0, index=indices))
Output[1]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0., 1., 2.],
[ 0., 1., 2.]])
Input[2]:
row_indices = torch.arange(2).long()
col_indices = torch.LongTensor([0, 1])
describe(x[row_indices, col_indices])
Output[2]:
Type: torch.FloatTensor
Shape/size: torch.Size([2])
Values:
tensor([ 0., 4.])
Pay attention to the index （indices） It's a long tensor ; This is the use of PyTorch Function to index . We can also use the builtin connection function to connect tensors , As the sample 113 Shown , By specifying tensors and dimensions .
Example 113： Connection tensor
Input[0]:
import torch
x = torch.arange(6).view(2,3)
describe(x)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0., 1., 2.],
[ 3., 4., 5.]])
Input[1]:
describe(torch.cat([x, x], dim=0))
Output[1]:
Type: torch.FloatTensor
Shape/size: torch.Size([4, 3])
Values:
tensor([[ 0., 1., 2.],
[ 3., 4., 5.],
[ 0., 1., 2.],
[ 3., 4., 5.]])
Input[2]:
describe(torch.cat([x, x], dim=1))
Output[2]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 6])
Values:
tensor([[ 0., 1., 2., 0., 1., 2.],
[ 3., 4., 5., 3., 4., 5.]])
Input[3]:
describe(torch.stack([x, x]))
Output[3]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 2, 3])
Values:
tensor([[[ 0., 1., 2.],
[ 3., 4., 5.]],
[[ 0., 1., 2.],
[ 3., 4., 5.]]])
PyTorch Efficient linear algebraic operations on tensors are also realized , Like multiplication 、 Inverse sum trace , As the sample 114 Shown .
Example 114： Linear algebra on tensor ： Multiplication
Input[0]:
import torch
x1 = torch.arange(6).view(2, 3)
describe(x1)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0., 1., 2.],
[ 3., 4., 5.]])
Input[1]:
x2 = torch.ones(3, 2)
x2[:, 1] += 1
describe(x2)
Output[1]:
Type: torch.FloatTensor
Shape/size: torch.Size([3, 2])
Values:
tensor([[ 1., 2.],
[ 1., 2.],
[ 1., 2.]])
Input[2]:
describe(torch.mm(x1, x2))
Output[2]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values:
tensor([[ 3., 6.],
[ 12., 24.]])
up to now , We've looked at creating and manipulating constants PyTorch The method of tensor object . Like a programming language （ Such as Python） Variables encapsulate a piece of data , Additional information about data （ Such as memory address storage , for example ),PyTorch Tensors deal with the bookkeeping required to build a computational graph （bookkeeping） what is needed Build a calculation diagram Machine learning is just instantiated by enabling a boolean flag .
Tensor and calculation diagram
PyTorch Tensor classes encapsulate data （ The tensor itself ） And a series of operations , Such as algebraic operations 、 Indexing and shaping operations .
However ,115 The example shown is , When requires_grad
The boolean flag is set to True
Tensor , Bookkeeping is enabled , Traceable Gradient tensor as well as Gradient function , These two needs are based on the discussion of promoting gradient learning “ Supervised learning paradigm ”.
Example 115： Create tensors for gradient records
Input[0]:
import torch
x = torch.ones(2, 2, requires_grad=True)
describe(x)
print(x.grad is None)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values:
tensor([[ 1., 1.],
[ 1., 1.]])
True
Input[1]:
y = (x + 2) * (x + 5) + 3
describe(y)
print(x.grad is None)
Output[1]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values:
tensor([[ 21., 21.],
[ 21., 21.]])
True
Input[2]:
z = y.mean()
describe(z)
z.backward()
print(x.grad is None)
Output[2]:
Type: torch.FloatTensor
Shape/size: torch.Size([])
Values:
21.0
False
When you use requires_grad=True
When creating tensors , You need to PyTorch To manage the calculation of gradients bookkeeping Information .
First ,PyTorch The value passed forward by the trace . then , At the end of the calculation , Use a single scalar to calculate the backward pass .
Reverse transfer By using a tensor backward()
Method to initialize , This tensor is obtained by evaluating a loss function . Pass back calculates the gradient value for the tensor object participating in the pass forward .
Generally speaking , gradient It's a value , It represents the slope of the function output relative to the function input .
In the calculation drawing settings , Every parameter in the model has a gradient , It can be considered as the contribution of the parameter to the error signal . stay PyTorch in , have access to .grad
Member variables access the gradient of nodes in the calculation graph . Optimizer usage .grad
The variable updates the value of the parameter .
up to now , We have been CPU Allocate tensor on memory . When doing linear algebraic operations , If you have one GPU, Then it may be meaningful to use it .
Use GPU, First need to allocate GPU Tensor on memory . Yes gpu Your access is through a named CUDA It's special API On going .
CUDA API By NVIDIA Created , And only in NVIDIA gpu Upper use .PyTorch Provided CUDA Tensor objects are different from conventional objects in use cpu Binding tensors make no difference , Except for different internal distribution methods .
CUDA tensor
PyTorch Let create these CUDA Tensors become very easy （ Example 116）, It takes the tensor from CPU Transferred to the GPU, While maintaining its underlying types .PyTorch The preferred method in is device independent , And write it in GPU or CPU Code that works on .
In the following code snippet , We use torch.cuda.is_available()
Check GPU Is it available , And then use torch.device
Retrieve device name . then , All future tensors will be instantiated , And use .to(device)
Method to move it to the target device .
Example 116： establish CUDA tensor
Input[0]:
import torch
print (torch.cuda.is_available())
Output[0]:
True
Input[1]:
# preferred method: device agnostic tensor instantiation
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print (device)
Output[1]:
cuda
Input[2]:
x = torch.rand(3, 3).to(device)
describe(x)
Output[2]:
Type: torch.cuda.FloatTensor
Shape/size: torch.Size([3, 3])
Values:
tensor([[ 0.9149, 0.3993, 0.1100],
[ 0.2541, 0.4333, 0.4451],
[ 0.4966, 0.7865, 0.6604]], device='cuda:0')
Right CUDA He Fei CUDA Object to operate on , We need to make sure they're on the same device . If we don't , The calculation will be interrupted , This is shown in the following code snippet .
for example , When calculating the monitoring indicators that do not belong to the calculation chart , This will happen . When you manipulate two tensor objects , Make sure they're on the same device . Example 117 Shown .
Example 117： blend CUDA Tensor sum CPU Bound tensor
Input[0]
y = torch.rand(3, 3)
x + y
Output[0]

RuntimeError Traceback (most recent call last)
1 y = torch.rand(3, 3)
> 2 x + y
RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.FloatTensor for argument #3 'other'
Input[1]
cpu_device = torch.device("cpu")
y = y.to(cpu_device)
x = x.to(cpu_device)
x + y
Output[1]
tensor([[ 0.7159, 1.0685, 1.3509],
[ 0.3912, 0.2838, 1.3202],
[ 0.2967, 0.0420, 0.6559]])
please remember , Take data from GPU Moving back and forth is very expensive . therefore , Typical processes include GPU Perform many parallel calculations on , The final result is then transmitted back to CPU. This will allow you to make the most of gpu. If you have several CUDA Visible devices （ namely , Best practice is to use... When executing programs CUDA_VISIBLE_DEVICES
environment variable , As shown in the figure below :
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py
In this book, we do not deal with parallelism and multi gpu Training , But they are essential in scaling experiments , Sometimes even when training large models . We suggest you refer to PyTorch Documentation and discussion forums , For more help and support on this topic .
practice
The best way to master a topic is to solve a problem . Here are some warmup exercises . Many questions will involve access to official documents [1] And looking for useful features . .

Create a 2D tensor and then add a dimension of size 1 inserted at dimension 0.

Remove the extra dimension you just added to the previous tensor.

Create a random tensor of shape 5x3 in the interval [3, 7)

Create a tensor with values from a normal distribution (mean=0, std=1).

Retrieve the indexes of all the nonzero elements in the tensor torch.Tensor([1, 1, 1, 0, 1]).

Create a random tensor of size (3,1) and then horizontally stack 4 copies together.

Return the batch matrixmatrix product of two 3dimensional matrices (a=torch.rand(3,4,5), b=torch.rand(3,5,4)).

Return the batch matrixmatrix product of a 3D matrix and a 2D matrix (a=torch.rand(3,4,5), b=torch.rand(5,4)).
Solutions

a = torch.rand(3, 3) a.unsqueeze(0)

a.squeeze(0)

3 + torch.rand(5, 3) * (7  3)

a = torch.rand(3, 3) a.normal_()

a = torch.Tensor([1, 1, 1, 0, 1]) torch.nonzero(a)

a = torch.rand(3, 1) a.expand(3, 4)

a = torch.rand(3, 4, 5) b = torch.rand(3, 5, 4) torch.bmm(a, b)

a = torch.rand(3, 4, 5) b = torch.rand(5, 4) torch.bmm(a, b.unsqueeze(0).expand(a.size(0), * b.size()))
summary
In this chapter , We introduced the goal of this book —— natural language processing （NLP） And deep learning —— The supervised learning paradigm is understood in detail .
At the end of this chapter , You should now be familiar with or at least understand the various terms , For example, observation 、 The goal is 、 Model 、 Parameters 、 forecast 、 Loss function 、 Express 、 Study / Training and reasoning . You also learned how to use single hot coding to input learning tasks （ Observations and objectives ） Encoding .
We also study count based representation , Such as TF and TFIDF. We first understand what is a calculation diagram , Static and dynamic calculation diagrams , as well as PyTorch Tensor manipulation . In chapter two , We're interested in the traditional NLP It is summarized . Chapter two , This chapter should lay the necessary foundation for you , If you are new to the subject of this book , And prepare for the rest of your book .
The key is TFIDF
Word frequency （TF）= The number of times a word appears in the article / The total number of words in the article
Reverse document frequency （IDF）= log( The total number of documents in the corpus / ( Number of documents containing the word + 1))
TF It should be easy to understand that it is to calculate word frequency ,IDF Measure how common words are . For calculation IDF We need to prepare a corpus in advance to simulate the language environment , If a word is more common , Then the larger the denominator in the formula , The closer the inverse document frequency is 0. Here's the denominator +1
It's to avoid denominator 0 The situation of
TFIDF = Word frequency （TF）× Reverse document frequency （IDF）
TFIDF It can be realized very well The purpose of extracting keywords in the article .
For the purpose of learning , Quote from this book , Non commercial use , I recommend you to read this book , Learning together ！！！
come on. !
thank !
Strive !