# [natural language processing] Introduction to pytorch (essential basic knowledge)

ZSYL 2021-10-14 06:36:24

# PyTorch Basics

In this book , We use it widely PyTorch To implement our deep learning model .PyTorch It's open source 、 Community driven deep learning framework . And Theano、Caffe and TensorFlow Different ,PyTorch A kind of “ Tape based automatic differentiation ” Method , Allows us to dynamically define and execute computational graphics . This is very helpful for debugging and building complex models with minimal effort .

dynamic VS Static calculation diagram image Theano、Caffe and TensorFlow Such a static framework needs to first declare 、 Compile and execute the calculation diagram . Although this leads to a very efficient implementation （ Very useful in production and mobile settings ）, But it may become very troublesome in the process of research and development .

image Chainer、DyNet and PyTorch Such a modern framework The dynamic calculation diagram is realized , To support a more flexible imperative development style , Instead of compiling the model before each execution .

Dynamic calculation diagram Modeling NLP Especially useful for tasks , Each input may lead to a different graph structure .

PyTorch It is an optimized tensor operation library , It provides a series of packages for deep learning .

The core of this library is tensor , It is a mathematical object that contains some multidimensional data .

0 The order tensor is a number , perhaps Scalar .

First order tensor （ First order tensor ） Is an array of numbers , Or a vector . Similarly , The second-order tensor is an array of vectors , Or a matrix .

therefore , Tensor can be generalized to scalar `n` Dimension group .

In the following sections , We will use PyTorch Learn the following :

• Create tensor
• Operations and tensors
• Indexes 、 Slice and connect with tensor
• Calculate the gradient with tensor
• Use a gpu Of CUDA tensor

In the rest of this section , We will use... First PyTorch To get familiar with all kinds of PyTorch operation . We recommend that you now have installed PyTorch And ready Python 3.5+ The notebook , And follow the examples in this section . We also recommend that you complete the exercises later in this section .

## install PyTorch

The first step is through pytorch.org Select your system preferences and install on your machine PyTorch. Choose your operating system , Then select package manager （ We recommend `conda/pip`）, Then select the... You are using Python edition （ We recommend 3.5+）. This will generate the command , So that you can perform the installation PyTorch. At the time of writing ,conda The installation commands of the environment are as follows :

``````conda install pytorch torchvision -c pytorch
``````

Be careful ： If you have a support CUDA Graphics processor unit （GPU）, You should also choose the right CUDA edition . For more details , Please refer to pytorch.org Installation instructions on .

Please refer to : PyTorch The latest installation tutorial （2021-07-27）

## Create tensor

First , We define an auxiliary function , describe （`x`）, It sums up the tensor `x` Various properties of , For example, the type of tensor 、 The dimension of the tensor and the content of the tensor :

``````Input[0]:
def describe(x):
print("Type: {}".format(x.type()))
print("Shape/size: {}".format(x.shape))
print("Values: \n{}".format(x))
``````

PyTorch Allow us to use `torch` Packages create tensors in many different ways . One way to create a tensor is to initialize it by specifying the dimension of a random tensor , For example 1-3 Shown .

Example 1-3： stay PyTorch Use in `torch.Tensor` Create tensor

``````Input[0]:
import torch
describe(torch.Tensor(2, 3))
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 3.2018e-05, 4.5747e-41, 2.5058e+25],
[ 3.0813e-41, 4.4842e-44, 0.0000e+00]])
``````

We can also create a tensor through the uniform distribution on the random initialization value interval （`0,1`） Or standard normal distribution （ The tensor is initialized randomly from a uniform distribution , say , It's important , As you will see in chapters 3 and 4 ）, See example 1-4.

Example 1-4： Create randomly initialized tensors

``````Input[0]：
import torch
describe(torch.rand(2, 3)) # uniform random
describe(torch.randn(2, 3)) # random normal
Output[0]：
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0.0242, 0.6630, 0.9787],
[ 0.1037, 0.3920, 0.6084]])
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[-0.1330, -2.9222, -1.3649],
[ 2.3648, 1.1561, 1.5042]])
``````

We can also create tensor , All tensors are filled with the same scalar . For create 0 or 1 tensor , We have built-in functions , For filling in specific values , We can use `fill_()` Method .

Any underlined （`_`） Of PyTorch Methods all refer to local （in place） operation ; in other words , It modifies content in place without creating new objects , As the sample 1-5 Shown .

Example 1-5： Create a filled tensor

``````Input[0]:
import torch
describe(torch.zeros(2, 3))
x = torch.ones(2, 3)
describe(x)
x.fill_(5)
describe(x)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0., 0., 0.],
[ 0., 0., 0.]])
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 1., 1., 1.],
[ 1., 1., 1.]])
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 5., 5., 5.],
[ 5., 5., 5.]])
``````

Example 1-6 Demonstrates how to use Python The list creates tensors declaratively .

Example 1-6： Create and initialize tensors from the list

``````Input[0]:
x = torch.Tensor([[1, 2, 3],
[4, 5, 6]])
describe(x)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 1., 2., 3.],
[ 4., 5., 6.]])
``````

Values can come from the list （ As in the previous example ）, It can also come from NumPy Array . Of course , We can also from PyTorch Tensor transformation to NumPy Array .

Be careful , The type of this tensor is a `double` tensor , Not the default `FloatTensor`. This corresponds to NumPy Data type of random matrix `float64`, As the sample 1-7 Shown .

Example 1-7： from NumPy Create and initialize tensors

``````Input[0]:
import torch
import numpy as np
npy = np.random.rand(2, 3)
describe(torch.from_numpy(npy))
Output[0]:
Type: torch.DoubleTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0.8360, 0.8836, 0.0545],
[ 0.6928, 0.2333, 0.7984]], dtype=torch.float64)
``````

Use... In processing Numpy Legacy Library of format values （legacy libraries） when , stay NumPy and PyTorch The ability to switch between tensors becomes very important .

## Tensor type and size

Each tensor has a related type and size . Use `torch` The default tensor type at . The tensor constructor is `torch.FloatTensor`. however , The tensor can be specified at initialization , You can also use the type conversion method to convert the tensor to another type later （`float``long``double` etc. ）. There are two ways to specify the initialization type , One is to call a specific tensor type directly （ Such as `FloatTensor` and `LongTensor`） Constructor for , The other is to use special methods `torch.tensor`, And provide `dtype`, For example 1-8 Shown .

Example 1-8： Tensor properties

``````Input[0]:
x = torch.FloatTensor([[1, 2, 3],
[4, 5, 6]])
describe(x)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 1., 2., 3.],
[ 4., 5., 6.]])
Input[1]:
x = x.long()
describe(x)
Output[1]:
Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 1, 2, 3],
[ 4, 5, 6]])
Input[2]:
x = torch.tensor([[1, 2, 3],
[4, 5, 6]], dtype=torch.int64)
describe(x)
Output[2]:
Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 1, 2, 3],
[ 4, 5, 6]])
Input[3]:
x = x.float()
describe(x)
Output[3]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 1., 2., 3.],
[ 4., 5., 6.]])
``````

We use the shape characteristics and size method of tensor object to obtain the measured value of its size . The two ways to access these metrics are basically the same . In the debug PyTorch Code , Checking the shape of the tensor becomes an essential tool .

## Tensor operation

After creating the tensor , It can be like dealing with traditional programming language types （ Such as `+``-``*` and `/`） Operate on them like that . Except for the operator , We can also use `.add()` Functions like that , As the sample 1-9 Shown , These functions correspond to the symbolic operator .

Example 1-9： Tensor operation ： Add

``````Input[0]:
import torch
x = torch.randn(2, 3)
describe(x)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0.0461, 0.4024, -1.0115],
[ 0.2167, -0.6123, 0.5036]])
Input[1]:
Output[1]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0.0923, 0.8048, -2.0231],
[ 0.4335, -1.2245, 1.0072]])
Input[2]:
describe(x + x)
Output[2]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0.0923, 0.8048, -2.0231],
[ 0.4335, -1.2245, 1.0072]])
``````

There are also some operations that can be applied to specific dimensions of tensors . As you may have noticed , about 2D tensor , We represent rows as dimensions 0, The list is shown as dimension 1, As the sample 1-10 Shown .

Example 1-10： Dimension based tensor operations

``````Input[0]:
import torch
x = torch.arange(6)
describe(x)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([6])
Values:
tensor([ 0., 1., 2., 3., 4., 5.])
Input[1]:
x = x.view(2, 3)
describe(x)
Output[1]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0., 1., 2.],
[ 3., 4., 5.]])
Input[2]:
describe(torch.sum(x, dim=0))
Output[2]:
Type: torch.FloatTensor
Shape/size: torch.Size([3])
Values:
tensor([ 3., 5., 7.])
Input[3]:
describe(torch.sum(x, dim=1))
Output[3]:
Type: torch.FloatTensor
Shape/size: torch.Size([2])
Values:
tensor([ 3., 12.])
Input[4]:
describe(torch.transpose(x, 0, 1))
Output[4]:
Type: torch.FloatTensor
Shape/size: torch.Size([3, 2])
Values:
tensor([[ 0., 3.],
[ 1., 4.],
[ 2., 5.]])
``````

Usually , We need to perform more complex operations , Include index 、 section 、 Connection and mutation （indexing,slicing,joining and mutation） The combination of . And NumPy Like other digital libraries ,PyTorch There are also built-in functions , This kind of tensor operation can be made very simple .

## Indexes , Slicing and linking

If you are a NumPy user , Then you may be very familiar with the example 1-11 Shown in PyTorch Indexing and slicing scheme .

Example 1-11： Slice and index tensors

``````Input[0]:
import torch
x = torch.arange(6).view(2, 3)
describe(x)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0., 1., 2.],
[ 3., 4., 5.]])
Input[1]:
describe(x[:1, :2])
Output[1]:
Type: torch.FloatTensor
Shape/size: torch.Size([1, 2])
Values:
tensor([[ 0., 1.]])
Input[2]:
describe(x[0, 1])
Output[2]:
Type: torch.FloatTensor
Shape/size: torch.Size([])
Values:
1.0
``````

Example 1-12 Demonstrated PyTorch It also has functions for complex indexing and slicing operations , You may be interested in effectively accessing discontinuous positions of tensors .

Example 1-12： Complex index ： Discontinuous index of tensor

``````Input[0]:
indices = torch.LongTensor([0, 2])
describe(torch.index_select(x, dim=1, index=indices))
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values:
tensor([[ 0., 2.],
[ 3., 5.]])
Input[1]:
indices = torch.LongTensor([0, 0])
describe(torch.index_select(x, dim=0, index=indices))
Output[1]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0., 1., 2.],
[ 0., 1., 2.]])
Input[2]:
row_indices = torch.arange(2).long()
col_indices = torch.LongTensor([0, 1])
describe(x[row_indices, col_indices])
Output[2]:
Type: torch.FloatTensor
Shape/size: torch.Size([2])
Values:
tensor([ 0., 4.])
``````

Pay attention to the index （indices） It's a long tensor ; This is the use of PyTorch Function to index . We can also use the built-in connection function to connect tensors , As the sample 1-13 Shown , By specifying tensors and dimensions .

Example 1-13： Connection tensor

``````Input[0]:
import torch
x = torch.arange(6).view(2,3)
describe(x)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0., 1., 2.],
[ 3., 4., 5.]])
Input[1]:
describe(torch.cat([x, x], dim=0))
Output[1]:
Type: torch.FloatTensor
Shape/size: torch.Size([4, 3])
Values:
tensor([[ 0., 1., 2.],
[ 3., 4., 5.],
[ 0., 1., 2.],
[ 3., 4., 5.]])
Input[2]:
describe(torch.cat([x, x], dim=1))
Output[2]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 6])
Values:
tensor([[ 0., 1., 2., 0., 1., 2.],
[ 3., 4., 5., 3., 4., 5.]])
Input[3]:
describe(torch.stack([x, x]))
Output[3]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 2, 3])
Values:
tensor([[[ 0., 1., 2.],
[ 3., 4., 5.]],
[[ 0., 1., 2.],
[ 3., 4., 5.]]])
``````

PyTorch Efficient linear algebraic operations on tensors are also realized , Like multiplication 、 Inverse sum trace , As the sample 1-14 Shown .

Example 1-14： Linear algebra on tensor ： Multiplication

``````Input[0]:
import torch
x1 = torch.arange(6).view(2, 3)
describe(x1)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0., 1., 2.],
[ 3., 4., 5.]])
Input[1]:
x2 = torch.ones(3, 2)
x2[:, 1] += 1
describe(x2)
Output[1]:
Type: torch.FloatTensor
Shape/size: torch.Size([3, 2])
Values:
tensor([[ 1., 2.],
[ 1., 2.],
[ 1., 2.]])
Input[2]:
describe(torch.mm(x1, x2))
Output[2]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values:
tensor([[ 3., 6.],
[ 12., 24.]])
``````

up to now , We've looked at creating and manipulating constants PyTorch The method of tensor object . Like a programming language （ Such as Python） Variables encapsulate a piece of data , Additional information about data （ Such as memory address storage , for example ),PyTorch Tensors deal with the bookkeeping required to build a computational graph （bookkeeping） what is needed Build a calculation diagram Machine learning is just instantiated by enabling a boolean flag .

## Tensor and calculation diagram

PyTorch Tensor classes encapsulate data （ The tensor itself ） And a series of operations , Such as algebraic operations 、 Indexing and shaping operations .

However ,1-15 The example shown is , When `requires_grad` The boolean flag is set to `True` Tensor , Bookkeeping is enabled , Traceable Gradient tensor as well as Gradient function , These two needs are based on the discussion of promoting gradient learning “ Supervised learning paradigm ”.

Example 1-15： Create tensors for gradient records

``````Input[0]:
import torch
describe(x)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values:
tensor([[ 1., 1.],
[ 1., 1.]])
True
Input[1]:
y = (x + 2) * (x + 5) + 3
describe(y)
Output[1]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values:
tensor([[ 21., 21.],
[ 21., 21.]])
True
Input[2]:
z = y.mean()
describe(z)
z.backward()
Output[2]:
Type: torch.FloatTensor
Shape/size: torch.Size([])
Values:
21.0
False
``````

When you use `requires_grad=True` When creating tensors , You need to PyTorch To manage the calculation of gradients bookkeeping Information .

First ,PyTorch The value passed forward by the trace . then , At the end of the calculation , Use a single scalar to calculate the backward pass .

Reverse transfer By using a tensor `backward()` Method to initialize , This tensor is obtained by evaluating a loss function . Pass back calculates the gradient value for the tensor object participating in the pass forward .

Generally speaking , gradient It's a value , It represents the slope of the function output relative to the function input .

In the calculation drawing settings , Every parameter in the model has a gradient , It can be considered as the contribution of the parameter to the error signal . stay PyTorch in , have access to `.grad` Member variables access the gradient of nodes in the calculation graph . Optimizer usage `.grad` The variable updates the value of the parameter .

up to now , We have been CPU Allocate tensor on memory . When doing linear algebraic operations , If you have one GPU, Then it may be meaningful to use it .

Use GPU, First need to allocate GPU Tensor on memory . Yes gpu Your access is through a named CUDA It's special API On going .

CUDA API By NVIDIA Created , And only in NVIDIA gpu Upper use .PyTorch Provided CUDA Tensor objects are different from conventional objects in use cpu Binding tensors make no difference , Except for different internal distribution methods .

## CUDA tensor

PyTorch Let create these CUDA Tensors become very easy （ Example 1-16）, It takes the tensor from CPU Transferred to the GPU, While maintaining its underlying types .PyTorch The preferred method in is device independent , And write it in GPU or CPU Code that works on .

In the following code snippet , We use `torch.cuda.is_available()` Check GPU Is it available , And then use `torch.device` Retrieve device name . then , All future tensors will be instantiated , And use `.to(device)` Method to move it to the target device .

Example 1-16： establish CUDA tensor

``````Input[0]:
import torch
print (torch.cuda.is_available())
Output[0]:
True
Input[1]:
# preferred method: device agnostic tensor instantiation
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print (device)
Output[1]:
cuda
Input[2]:
x = torch.rand(3, 3).to(device)
describe(x)
Output[2]:
Type: torch.cuda.FloatTensor
Shape/size: torch.Size([3, 3])
Values:
tensor([[ 0.9149, 0.3993, 0.1100],
[ 0.2541, 0.4333, 0.4451],
[ 0.4966, 0.7865, 0.6604]], device='cuda:0')
``````

Right CUDA He Fei CUDA Object to operate on , We need to make sure they're on the same device . If we don't , The calculation will be interrupted , This is shown in the following code snippet .

for example , When calculating the monitoring indicators that do not belong to the calculation chart , This will happen . When you manipulate two tensor objects , Make sure they're on the same device . Example 1-17 Shown .

Example 1-17： blend CUDA Tensor sum CPU Bound tensor

``````Input[0]
y = torch.rand(3, 3)
x + y
Output[0]
----------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
1 y = torch.rand(3, 3)
----> 2 x + y
RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.FloatTensor for argument #3 'other'
Input[1]
cpu_device = torch.device("cpu")
y = y.to(cpu_device)
x = x.to(cpu_device)
x + y
Output[1]
tensor([[ 0.7159, 1.0685, 1.3509],
[ 0.3912, 0.2838, 1.3202],
[ 0.2967, 0.0420, 0.6559]])
``````

please remember , Take data from GPU Moving back and forth is very expensive . therefore , Typical processes include GPU Perform many parallel calculations on , The final result is then transmitted back to CPU. This will allow you to make the most of gpu. If you have several CUDA Visible devices （ namely , Best practice is to use... When executing programs `CUDA_VISIBLE_DEVICES` environment variable , As shown in the figure below :

``````CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py
``````

In this book, we do not deal with parallelism and multi gpu Training , But they are essential in scaling experiments , Sometimes even when training large models . We suggest you refer to PyTorch Documentation and discussion forums , For more help and support on this topic .

# practice

The best way to master a topic is to solve a problem . Here are some warm-up exercises . Many questions will involve access to official documents [1] And looking for useful features . .

1. Create a 2D tensor and then add a dimension of size 1 inserted at dimension 0.

2. Remove the extra dimension you just added to the previous tensor.

3. Create a random tensor of shape 5x3 in the interval [3, 7)

4. Create a tensor with values from a normal distribution (mean=0, std=1).

5. Retrieve the indexes of all the nonzero elements in the tensor torch.Tensor([1, 1, 1, 0, 1]).

6. Create a random tensor of size (3,1) and then horizontally stack 4 copies together.

7. Return the batch matrix-matrix product of two 3-dimensional matrices (a=torch.rand(3,4,5), b=torch.rand(3,5,4)).

8. Return the batch matrix-matrix product of a 3D matrix and a 2D matrix (a=torch.rand(3,4,5), b=torch.rand(5,4)).

# Solutions

9. a = torch.rand(3, 3) a.unsqueeze(0)

10. a.squeeze(0)

11. 3 + torch.rand(5, 3) * (7 - 3)

12. a = torch.rand(3, 3) a.normal_()

13. a = torch.Tensor([1, 1, 1, 0, 1]) torch.nonzero(a)

14. a = torch.rand(3, 1) a.expand(3, 4)

15. a = torch.rand(3, 4, 5) b = torch.rand(3, 5, 4) torch.bmm(a, b)

16. a = torch.rand(3, 4, 5) b = torch.rand(5, 4) torch.bmm(a, b.unsqueeze(0).expand(a.size(0), * b.size()))

# summary

In this chapter , We introduced the goal of this book —— natural language processing （NLP） And deep learning —— The supervised learning paradigm is understood in detail .

At the end of this chapter , You should now be familiar with or at least understand the various terms , For example, observation 、 The goal is 、 Model 、 Parameters 、 forecast 、 Loss function 、 Express 、 Study / Training and reasoning . You also learned how to use single hot coding to input learning tasks （ Observations and objectives ） Encoding .

We also study count based representation , Such as TF and TF-IDF. We first understand what is a calculation diagram , Static and dynamic calculation diagrams , as well as PyTorch Tensor manipulation . In chapter two , We're interested in the traditional NLP It is summarized . Chapter two , This chapter should lay the necessary foundation for you , If you are new to the subject of this book , And prepare for the rest of your book .

The key is TF-IDF

` Word frequency （TF）= The number of times a word appears in the article / The total number of words in the article `

` Reverse document frequency （IDF）= log( The total number of documents in the corpus / ( Number of documents containing the word + 1))`

TF It should be easy to understand that it is to calculate word frequency ,IDF Measure how common words are . For calculation IDF We need to prepare a corpus in advance to simulate the language environment , If a word is more common , Then the larger the denominator in the formula , The closer the inverse document frequency is 0. Here's the denominator `+1` It's to avoid denominator 0 The situation of

`TF-IDF = Word frequency （TF）× Reverse document frequency （IDF）`

TF-IDF It can be realized very well The purpose of extracting keywords in the article .

For the purpose of learning , Quote from this book , Non commercial use , I recommend you to read this book , Learning together ！！！

come on. !

thank !

Strive !