1. Preface

IA32 Machine code and assembly code are the same as the original C The code is very different , because Some states for C It's hidden for programmers . For example, a program pointer that contains the memory location of the next code to be executed (program counter or PC) as well as 8 A register . One more thing to note is : Assembly code ATT Format and Intel Format .ATT The format is GCC and objdump And so on , stay CSAPP This format is used all the time in . and Intel The format is usually in Intel Of IA32 Architecture documents and Microsoft's Windows In the technical documentation . The main difference between the two is :
  • Intel Format ignores suffixes in instructions that imply the length of operands , for example mov instead of ATT Format movl.
  • Intel Format ignores... Before register name %, for example esp instead of ATT Format %esp.
  • Intel Formats describe memory locations in different ways , for example DWORD PTR [ebp+8] instead of ATT Format 8(%ebp).
  • Intel The order of the operands of a format instruction is the same as ATT The exact opposite of the format ,ATT The format is always the last operand that is the target , for example movl %eax, (%edx).
Besides , As 16 A legacy of bit processor architecture , Today's instructions still use word finger 2 Bytes 16 position , While using double word finger 4 Bytes . So the instruction usually uses B、W、L Indicates that the operands are 1、2、4 Byte instructions , For example, three versions of the data movement directive movb、movw、movl.

This chapter is represented by the machine level underlying representation of learning programs , Learn to read the underlying code . Why reverse engineering is so hard ? Because the source code and compiled code are often not one-to-one correspondence . The compiler introduces new variables that don't exist in the source code , At the same time, in order to save the use of registers , Compilers also often map multiple values to a register . For a cycle , By observing how the register is initialized before the loop , Update and condition detection in the loop and use after the loop , Can get some clues .


2. Registers and addressing

In the notes in Chapter one we see , A large part of program execution time is moving data around . So the processor supports register only 1、2、4 Bytes , At the same time, it also supports multiple addressing modes . As shown in the table in the right half of the figure below , In this way, we can flexibly load data from memory into registers , Or save the value in the register to memory .



Although it looks a little dazzled , But actually the most basic form is the last one :Imm(Eb, Ei, s)=Imm+R[Eb]+R[Ei]*s (R[X] Of registers X Value ). There are four parameters that control addressing , It seems a little too flexible , Let's imagine its application scenario . Don't think about it first Imm, Then the most typical application is to access a data item in an array . If the array is int x[4], Now Eb Is the first address of the array , amount to x, and Ei Is to access the subscript of the data item , and s Is the length of the data type in the array . For example, we are going to visit x[3], Then it's equivalent to (x, 3, sizeof(int))=x+3*4. use C Language means *(x+3), because C The language automatically moves according to the type length of the pointer ( The compiler automatically generates the correct code ), So we don't have to calculate the offset by ourselves sizeof(int), But that's all in the future . Then add Imm What kind of application scenarios can there be , It's very simple , Namely visit struct in Of Array Of A certain item . As shown in the figure below , You can access an item in the array in the structure with a single instruction .




3. Commonly used instructions

Here are some of the most common assembly instructions and their meanings :
  • mov: Data mobility .IA32 Imposed a limit : Both operands of a move instruction cannot be memory addresses . So it takes two instructions to copy data from one memory location to another .
  • leal: Load address . The effect is mov Imm(%a, %b, s), %x Will %x The assignment is Imm+%a+s*%b, instead of M[Imm+%a+s*%b], So there are two very useful scenarios :1) Copy address . for example int *x=a Compiled as mov (%eax), %edx, that int x=&a Compiled as leal (%eax), %edx. therefore leal It won't really be a Value ( namely (%eax)) Save to x( namely %edx), And just a The address of ( In fact, that is %eax) Save to x.2) Simple arithmetic operations . The second application that comes naturally to mind is to use leal An instruction compresses simple arithmetic operations , for example leal 7(%edx, %edx, 4)=5x+7.
  • jmp: Jump directly to the label , Or indirectly jump to the address specified in the register . For direct jump , In assembly language, it is usually symbolic label representation . But then the assembler or linker encodes it , The most common way to code is PC Relative address . The box 1、2、4 The offset of a byte indicates Jump to destination address and jmp The address of the next instruction following the instruction , As shown in the figure below . But why is it next to jmp The address of the next instruction instead of jmp This one ? In fact, there are also historical reasons , because Early processor implementations were updated first PC The counter as the first step , And then execute the current instruction . So when instructions are executed , Actually PC It's pointing to the next command , So the offset of jump is relative to the next instruction .



4. What happens when you type cast

When a signed integer is converted to an unsigned integer , We expect the compiler to turn negative numbers into 0, The positive number remains unchanged , A positive number longer than the maximum length is assigned to TMax. But in fact Integer conversion of the same length is just a simple copy , Don't do anything? . And when you need both length conversion and type conversion ,C The language first performs length conversion . After length conversion, both integers become the same length , So we only need to pay attention to how the extension and truncation between integers of different lengths are carried out :
  • Expand : Zero extension without sign , That is to fill the high position with zero . There is a sign to extend the sign , Use the highest position - The sign bit fills the high bit .
  • truncation : Simply throw away the high bits . For the little tail , It's the reverse , Copy the high bit of the register as %al.


Because signed integers are encoded by inverse codes on most machines , Signed extension of an inverse code does not change its value , It is proved in the second chapter . This is the magic of inverse code !0 There is a unique expression , And the value remains the same when the sign is extended ! The point is : The high level expands into a 1 after ,-2w+2w-1=-2w-1, Or the original value before expansion .



5. Why should logic operations be short circuited

In the notes of the second chapter, there are two differences between bit operation and logic operation , One is that in the eyes of logic operation, only TRUE and FALSE, Not 0 Whatever it is, it will be seen as TRUE. The second difference is the short circuit effect of logic operation . So why do logic operations short circuit ? Because logic operations use jmp Realized . In assembly language , Judge each part of the conditional expression one by one , When a certain part judges the result, it jumps directly . It is because The logic operation is to decide where to run , It's not like a bit operation leads to a final result , So assembly language can be realized by jump , So there is the property of short circuit in high-level language .




6. Local variables are actually in registers

Actually Local variables are stored directly in registers , Most of the time, it's always in the register , Instead of landing in memory . For example 7 The function in the section swap_add(), Function runtime stack frame ( Memory ) No local variables are actually saved . The local variables and logic of the whole function are in registers and ALU Execution completed in .

In the following cases , Local variables are stored in memory ( On the stack ):
  • When there are not enough registers to hold all local variables . After all, there are only eight registers .
  • Some local variables are arrays or struct, Therefore, it must be accessed through a pointer .
  • When addressing local variables & Operation time , So we have to generate a memory address for it .

7. Runtime code and stack

Let's take a look at an example of a function call , Learn how the underlying code works .


caller() The code is as follows :


swap_add() The code is as follows :


The code generated by the compiler follows certain rules , So it's performing all kinds of jumps 、 Data coverage and other problems will not occur when the function is called , So that the program can run correctly .



8. The nature of the pointer

Maybe I've heard it before , A pointer is essentially a memory address . But there was no epiphany before , Now strengthen understanding by studying the underlying knowledge . As can be seen from the figure below , Pointer value is actually a very natural operation , Because most of the time we It is impossible to put all the data represented by a variable in a register , For example, arrays or structures . If the register can put down the whole array and structure , Of course we don't have to use the pointer . So naturally , We will first load the memory address of the first address of the data ( It's the pointer !) To register , Then go to the memory location pointed to by the register .

Six Star Classic CSAPP- note (3) Machine level representation of programs

  1. Six Star Classic CSAPP Notes Series - author : Zero hair in Western Dynasty

    Six Star Classic CSAPP note (1) Computer system Cruise Six Star Classic CSAPP note (2) The operation and representation of information Six Star Classic CSAPP- note (3) The machine level representation of a program

  2. Six Star Classic CSAPP note (1) Computer system Cruise

    CSAPP namely <Computer System: A Programmer Perspective> For short , In Chinese, < Deep understanding of computer systems >. I believe many programmers have read , I didn't buy the old version before ...

  3. Six Star Classic CSAPP note (2) The operation and representation of information

    2.Representing and Manipulating Information This chapter starts from binary . The word is long . Byte order , All the way to Boolean algebra . An operation , Finally, no sign . Signed integers . The representation and operation of floating point numbers . Of course, there are some places ...

  4. Six Star Classic CSAPP- note (7) Loading and linking ( On )

    Six Star Classic CSAPP- note (7) Loading and linking 1. Object file (Object File) 1.1 file type Object files come in three forms : Relocatable object file (Relocatable object file): Contains binary ...

  5. Six Star Classic CSAPP- note (11) Network programming

    Six Star Classic CSAPP- note (11) Network programming reference < Deep understanding of computer systems > Simple study Unix/Linux Basic knowledge of network programming , Further study Linux Network programming and TCP/IP The agreement has to refer to Steve ...

  6. Six Star Classic CSAPP- note (12) Concurrent programming ( On )

    Six Star Classic CSAPP- note (12) Concurrent programming ( On ) 1. Concurrent (Concurrency) We often unknowingly talk about or use concurrency , But never think deeply about concurrency . We can always " meet " Concurrent , Because concurrency isn't just ...

  7. Six Star Classic CSAPP- note (10) System IO

    Six Star Classic CSAPP- note (10) System I/O 1.Unix I/O Runtime systems in all languages provide a high level of abstraction I/O Operation function . for example ,ANSI C In standard I/O The library provides information such as printf and scanf etc. I/O slow ...

  8. CSAPP: Chapter 3 machine level representation of programs 2

    CSAPP: The machine level representation of a program 2 Key points : The arithmetic . Logical operation Arithmetic logic operations 1. Load valid address 2. Unary and binary operations 3. Shift operation Arithmetic logic operations As shown in the figure x86-64 Some integer and logical operations of , Most operations are divided into instruction classes ( ...

  9. CSAPP: Chapter 3 machine level representation of programs 1

    CSAPP: The machine level representation of a program 1 Key points : data format . Operand indicator . Data format access information operands indicator example data format The term word (word) Express 16 Bit data type ,32 The number of digits is double words (double words), ...

Random recommendation

  1. ( Reprint )Win8.1 Version of

    Win8.1 What are the versions ? Windows 8.1 What's the difference between different versions of . Differences and differences ? Win8.1 The ultimate version has wood and ? Win8.1 Which is the best version ? Which version should I use Windows 8.1? Soft media used to ...

  2. JAVA Comprehensive exercises of basic grammar —— Student achievement management system

    The code is as follows :package com.lovo.manager; import java.util.Scanner; /** * Student management * * @author Administrator * */ p ...

  3. ASP.NET MVC Dynamic generation of website menu and submenu

    Developing ASP.NET MVC Website time ,Insus.NET Want to realize the main menu and sub menu of the website dynamically . You need to manage this in the website management background 2 A watch (Menu,SubMenu) Information about , add to , Delete , edit , Update, etc. . Sequen ...

  4. In the use of EF Code First When developing , Encountered “ Relationship ” problem , And Solutions

    Entity Framework Code First abbreviation EF CF It's OK , It's in development , The principle of code first , Developers don't have to think about Some problems on the database side ( There is no need to operate on the database manager in the development process ) Words return ...

  5. WPF Use cefsharp

    Recently, it will be used in company projects cefsharp.wpf, I don't know why it can't run successfully according to the configuration on the Internet , How to configure can refer to the following blog article : http://www.cnblogs.com/TianFang/p/4573 ...

  6. QT Development in environment socketCan interface program

    Use header file #include <QMainWindow>#include "QTimer"#include "QTime"#include &quo ...

  7. Reverse Polish notation java

    describe   An inverse Polish expression is an arithmetic expression that precedes an operator , For example, ordinary expressions 2 + 3 The inverse Polish expression of is + 2 3. The advantage of inverse Polish expressions is that there is no precedence relationship between operators , You don't have to change the order of operations with parentheses , for example (2 + 3) ...

  8. WebService(2)-XML Series of Java and Xml Switch between

    Source code download : link :http://pan.baidu.com/s/1ntL1a7R password: rwp1 This paper mainly describes : Use jaxb Finish the object and xml Conversion between TestJava2xml.java ...

  9. Impala Code generation technology in

    Cloudera Impala It's for Hadoop Open source created by ecosystem MPP(massive parallel processing) database , It is mainly designed for analytical query load , Instead of OLTP.Impala It's the most ...

  10. 079、 Monitoring tools sysdig (2019-04-26 Friday )

    Reference resources https://www.cnblogs.com/CloudMan6/p/7646995.html   sysdig  It's a lightweight system monitoring tool , At the same time, he supports the container . adopt sysdig We can get close to ...