In the field of network congestion control , We know that there is a very famous algorithm called Nagle Algorithm (Nagle algorithm), This is the inventor who used it John Nagle To name it after ,John Nagle stay 1984 This algorithm was first used to solve the network congestion problem of Ford Motor Company (RFC 896), The specific description of the problem is : If our application comes out at once 1 Bytes of data , And this 1 Bytes of data are sent to the remote server in the form of network packets , So it's easy to overload the network with too many packets . such as , When users use Telnet When connecting to a remote server , Every keystroke produces 1 Bytes of data , And then send out a packet , therefore , In a typical case , Transmit one that only owns 1 A packet of bytes of valid data , But it costs money 40 Bytes long header ( namely ip head 20 byte +tcp head 20 byte ) Additional cost of , This payload (payload) Extremely low utilization is collectively referred to as stupid window syndrome (Silly Window Syndrome). You can see , This is the case for light load networks , It may be acceptable , But for heavily loaded networks , It is very likely that congestion and paralysis will occur easily if it cannot be carried .
In view of the situation mentioned above ,Nagle The improvement of the algorithm is : If the sender wants to send a packet containing a small number of characters multiple times ( In general , The length is less than MSS The packet of is a small packet , Relative to this , The length is equal to MSS The data package of is big package , For some comparison , And Zhongbao , That is, the length is longer than the small package , But less than one MSS My bag ), The sender will send the first packet first , And cache a small amount of character data arriving later instead of sending it immediately , Until receiving the response of the receiver to the previous packet segment ACK confirm 、 Or the current character belongs to emergency data , Or accumulate a certain amount of data ( For example, the cached character data has reached the maximum length of the packet segment ) And so on many kinds of situations only then composes a big data packet to send out , What are the specific situations , Let's look at the Kernel Implementation :
1383:        Filename : \linux-3.4.4\net\ipv4\tcp_output.c
1384:        /* Return 0, if packet can be sent now without violation Nagle's rules:
1385:         * 1. It is full sized.
1386:         * 2. Or it contains FIN. (already checked by caller)
1387:         * 3. Or TCP_CORK is not set, and TCP_NODELAY is set.
1388:         * 4. Or TCP_CORK is not set, and all sent packets are ACKed.
1389:         *    With Minshall's modification: all sent small packets are ACKed.
1390:         */
1391:        static inline int tcp_nagle_check(const struct tcp_sock *tp,
1392:                                          const struct sk_buff *skb,
1393:                                          unsigned mss_now, int nonagle)
1394:        {
1395:                return skb->len < mss_now &&
1396:                        ((nonagle & TCP_NAGLE_CORK) ||
1397:                         (!nonagle && tp->packets_out && tcp_minshall_check(tp)));
1398:        }
1399:        
1400:        /* Return non-zero if the Nagle test allows this packet to be
1401:         * sent now.
1402:         */
1403:        static inline int tcp_nagle_test(const struct tcp_sock *tp, const struct sk_buff *skb,
1404:                                         unsigned int cur_mss, int nonagle)
1405:        {
1406:                /* Nagle rule does not apply to frames, which sit in the middle of the
1407:                 * write_queue (they have no chances to get new data).
1408:                 *
1409:                 * This is implemented in the callers, where they modify the 'nonagle'
1410:                 * argument based upon the location of SKB in the send queue.
1411:                 */
1412:                if (nonagle & TCP_NAGLE_PUSH)
1413:                        return 1;
1414:        
1415:                /* Don't use the nagle rule for urgent data (or for the final FIN).
1416:                 * Nagle can be ignored during F-RTO too (see RFC413.
1417:                 */
1418:                if (tcp_urg_mode(tp) || (tp->frto_counter == 2) ||
1419:                    (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN))
1420:                        return 1;
1421:        
1422:                if (!tcp_nagle_check(tp, skb, cur_mss, nonagle))
1423:                        return 1;
1424:        
1425:                return 0;
1426:        }
This is a piece of Linux The kernel code is very easy to see , Because there's enough comment code . From function tcp_nagle_test() Look up the , The first 1412 Line is to judge parameters directly , If it's outside ( That is, the caller ) I set up TCP_NAGLE_PUSH Flag , Take the initiative to ban Nagle Algorithm or take the initiative to pull the plug ( Next section TCP_CORK Content ) Or explicitly connect to the last package ( For example, connection. close() Packets sent before ), This is, of course, the return 1 So that the packets are sent out immediately ; The first 1418-1420 One line of code deals with special packages , That's the emergency packet 、 belt FIN The end bag of the flag and the belt F-RTO Flag bag ; The first 1422 Go into tcp_nagle_check() Function to judge , The header comments for this function are a bit confusing and unclear , I'll explain it sentence by sentence , The first thing to see is if the function returns 1, The packet is not sent immediately ; Let's look at the concrete implementation :skb->len < mss_now True means that if the packet data length is less than the current MSS;nonagle & TCP_NAGLE_CORK If it is true, it means that it has been actively plugged or clearly marked, and there will be data coming immediately ( The kernel is represented as MSG_MORE);!nonagle True means enable Nagle Algorithm ;tp->packets_out True means that there are packets that have not been sent out ACK confirm ;tcp_minshall_check(tp) yes Nagle Algorithm improvement , Think it's the same as the previous one , Let's talk about it later . To combine these conditions with or is : If the packet data length is less than the current MSS &&(( Kasai 、 Come here with the data )||( Enable Nagle Algorithm && There are packets that have not been sent out ACK confirm )), So cache the data instead of sending it immediately .
 
Upper left ( The design of the desktop is the transmitter , It's also called client , The server host is the receiving end , It's also called the server ) It's not turned on Nagle In the case of algorithms , At this time, the data packets transmitted by the client application layer are immediately sent to the network ( Leave aside the inherent limitations of the sending window and the receiving window , The same below ), Regardless of the size of the packet , Therefore, it is possible to have multiple packets of the connection at the same time in the network ; As shown in the figure on the right , Before receiving the server's request for the first packet ACK Confirmation before , The data packets sent down by the client application layer are cached , When I received ACK After confirmation ( The picture shows this , Of course, there are other situations , It has been described in detail before ) To send it out , In this way, not only will the total package number change from the original 3 One becomes 2 individual , Network load reduction , meanwhile , Both the client and the server need to handle only two packages , The consumption of CPU And so on .
Nagle The algorithm can improve the network utilization in some scenarios 、 Reduce package handling ( Client or server ) Host resource consumption and work well , But in some situations, the disadvantages outweigh the advantages , To clarify this problem, we need to introduce another concept , That is, delayed confirmation (Delayed ACK). Delayed acknowledgement is another optimization to improve network utilization , But it's aimed at ACK Confirmation package . We know , about TCP In terms of the agreement , Under normal circumstances , The receiver will send a message to the sender for every packet it receives ACK Confirmation package ( As shown in the figure above ); And a relative optimization is to ACK Delay processing , namely ACK Send with data package or window update notification package, etc ( file RFC 1122), Of course, these packets are sent from the receiving end to the sending end ( The receiver and the sender are just a relative concept ) Of :
 
The picture on the left shows the general situation , Top right ( Here's just a picture of ACK There are two cases in the delayed acknowledgement mechanism : By reverse data carrying ACK And timeout sending ACK) in , Data packets A Of ACK It's a packet sent back to the sender through the receiver a I brought it with me , And the corresponding packet a Of ACK It was sent after waiting for a timeout . in addition , although RFC 1122 On the standard document , The maximum timeout is 500 millisecond , But in practice, the maximum time-out is usually 200 millisecond ( It doesn't mean waiting for every timeout 200 millisecond , Because when you receive the data , The timer may have been going through some time , The maximum in the worst case is 200 millisecond , The average wait timeout value is 100 millisecond ), For example linux3.4.4 There is one TCP_DELACK_MAX The macro of identifies the maximum value of the timeout :
115:        Filename : \linux-3.4.4\include\net\tcp.h
116:        #define TCP_DELACK_MAX        ((unsigned)(HZ/5))        /* maximal time to delay before sending an ACK */
Looking back Nagle Algorithm and ACK The interaction of delayed confirmation , Still, for example , If the sender has a piece of data to send to the receiver , The length of this piece of data is less than the maximum of two packets , in other words , according to Nagle Algorithm , After the sender sends out the first packet , The remaining data is not enough to form a packet that can be sent immediately ( That is, the remaining data length is not greater than or equal to MSS), So the sender will wait , Until receiving the first packet from the receiving end ACK Confirm or send more data to the application layer ( Only the first condition is considered here , I will receive ACK); And at the receiving end , because ACK The role of delayed confirmation mechanism , It won't be sent immediately ACK, But wait , until ( Please refer to the kernel function for details tcp_send_delayed_ack(), Because the situation involved is too complicated , And it has little to do with the current content , So skip , We are only based on RFC 1122 Look at ):1, Receive the second big packet from the sender ;2, Waiting for timeout ( such as ,200 millisecond ). Of course , If it has reverse packets to send , Then you can carry ACK, But in the worst case , The end result is that the second packet at the sender needs to wait 200 Milliseconds to be sent to the network . And in the image HTTP In such applications , The data at a certain time is basically unidirectional , So the probability of the worst is very high , And the second packet is often used to identify the successful end of the request or response , If both the request and the response have to wait for a timeout , So the delay has to increase 400 millisecond .
In the above scenario Nagle Details of the algorithm's shortcomings and improvements are described in the document :http://tools.ietf.org/id/draft-minshall-nagle-01.txt in , stay linux This improvement has also been applied in the kernel , That's the function that I haven't explained in detail before tcp_minshall_check():
1376:        Filename : \linux-3.4.4\net\ipv4\tcp_output.c
1377:        /* Minshall's variant of the Nagle send check. */
1378:        static inline int tcp_minshall_check(const struct tcp_sock *tp)
1379:        {
1380:                return after(tp->snd_sml, tp->snd_una) &&
1381:                        !after(tp->snd_sml, tp->snd_nxt);
1382:        }
The function name is named after the name of the proposer of the improvement , The implementation of this function is very simple , But to understand it, you have to know what these fields mean (RFC 793、RFC 1122):tp->snd_nxt, The next byte to send ( Serial number , Same after );tp->snd_una, The next byte to confirm , If its value is equal to tp->snd_nxt, It means that all the sent data has been confirmed ;tp->snd_sml, The last byte of the most recent packet that has been sent out ( Be careful , Not necessarily confirmed ). The specific diagram is as follows :
 
Summarize all the previous introductions ,Minshall Yes Nagle In a word, the improvement of the algorithm is : When judging whether the current packet can be sent , Just check whether the latest packet has been confirmed ( Other conditions that need to be judged , For example, whether the packet length is greater than MSS When that doesn't change , Let's assume that in the end , It's up to you here to decide whether to send ), If it is , As mentioned earlier tcp_minshall_check(tp) Function returns false , So the function tcp_nagle_check() return 0, So it means you can send ( The image above in the previous illustration ), Otherwise delay waiting ( The figure below in the previous illustration ). The principle is simple , Now that all the packets sent have been confirmed , That is to say, there are no packets currently connected on the network , So sending even a small packet doesn't matter , And more importantly , In doing so , Shorten the delay , Improved bandwidth utilization .
So for the previous example , Because the first packet is big , So no matter what it corresponds to ACK Whether it has been received or not does not affect the check and judgment on whether to send the second packet , At this point, because all the packets have been confirmed ( In fact, it's because I haven't sent any packets ), So the second packet can be sent directly without waiting .
Tradition Nagle The algorithm can be seen as a package - stop - Such agreement , It doesn't send a second packet until it receives a confirmation from the previous one , Unless it is “ Forced to ”, And improved Nagle The algorithm is a compromise , If the unconfirmed one is not a small bag , So the second packet can be sent out , But it's guaranteed to be in the same RTT Inside , There is only one currently connected packet on the network ( Because if the previous packet is not confirmed , It's not going to send out a second packet ); however , The improved Nagle In some special cases, the algorithm will be disadvantageous , For example, the following situation (3 Data blocks arrive one after another , There is no other data coming in the future ), Tradition Nagle The algorithm has only one packet , And improved Nagle The algorithm produces 2 A small bag ( The second packet is delay wait timeout generation ), But it didn't have a particularly big impact ( So it's a compromise ):
 
TCP Medium Nagle The algorithm is enabled by default , But it's not suitable for any situation , about telnet or rlogin This kind of remote login application is really suitable for ( It was designed for that ), But in some application scenarios, we need to turn it off . In the link :http://www.isi.edu/lsam/publicat ... ractions/node2.html mentioned Apache Yes HTTP Persistent connection (Keep-Alive,Prsistent-Connection) The odd number package highlighted in processing & End the packet problem (The Odd/Short-Final-Segment Problem), It's a parallel relationship , That is, the problem is that an odd number of packets have been sent out , And there's an end bag ( ad locum , The end of the bag doesn't mean the strap FIN Flag bag , It's about a HTTP The end package of the request or response ) Waiting to be sent out . Let's take a look at the details of the specific problems , With 3 A package +1 Take the ending packet as an example , Here's a possible scenario :
 
The last packet contains the last bits of the entire response data , So it's the end of the package , If at present HTTP It's a non persistent connection , So when the connection is closed , Finally, the packet will be sent out immediately , It's not going to be a problem ; however , If at present HTTP It's a persistent connection ( Not pipelining Handle ,pipelining only HTTP 1.1 Support , And there are quite a few old but still widely used browser versions that do not support ,nginx At present pipelining My support is weak , It can only process the next request after the previous request has been completely processed ), That is to say, continuous Request/Response、Request/Response、…, Handle , Well, because the last packet was Nagle Algorithm impact can not be sent out in time ( Specifically, the client will not issue a new request before ending the previous request request data , That makes it impossible to carry ACK And delayed confirmation , Then the server did not receive the client's confirmation of the last packet, resulting in the last packet could not be sent out ), Lead to n Requests / Response failed to end , So that the client side n+1 Time of Request Request data cannot be sent .
 
Because of this problem , So in this case ,nginx It will shut down automatically Nagle Algorithm , Let's see nginx Code :
2436:        Filename : \linux-3.4.4\net\ipv4\tcp_output.c
2437:        static void
2438:        ngx_http_set_keepalive(ngx_http_request_t *r)
2439:        {
2440:        …
2623:            if (tcp_nodelay
2624:                && clcf->tcp_nodelay
2625:                && c->tcp_nodelay == NGX_TCP_NODELAY_UNSET)
2626:            {
2627:                ngx_log_debug0(NGX_LOG_DEBUG_HTTP, c->log, 0, "tcp_nodelay";
2628:        
2629:                if (setsockopt(c->fd, IPPROTO_TCP, TCP_NODELAY,
2630:                               (const void *) &tcp_nodelay, sizeof(int))
2631:                    == -1)
2632:                {
2633:        …
2646:                c->tcp_nodelay = NGX_TCP_NODELAY_SET;
2647:            }
Nginx Inside this function , It means that the current connection is persistent . The first 2623 Local variables for rows tcp_nodelay It's used to mark TCP_CORK Option , By configuration instructions tcp_nopush Appoint , By default off, stay linux Next ,nginx hold TCP_NODELAY and TCP_CORK These two options are completely mutually exclusive ( In fact, they can be used together , The next section details ), Ban TCP_CORK Option , local variable tcp_nodelay The value is 1( From this variable we can see that ,nginx The use of these two options ,TCP_CORK Priority is higher than TCP_NODELAY);clcf->tcp_nodelay Corresponding TCP_NODELAY Configuration instructions for options tcp_nodelay The configuration of the value , By default 1;c->tcp_nodelay Used to mark whether the socket has been set up at present TCP_NODELAY Options , The first time it's done here , In general, that is NGX_TCP_NODELAY_UNSET( Unless it's not IP Agreements, etc ), Because there's only one place to set up TCP_NODELAY Options . therefore , On the whole , If this is true , To be the first 2629 Line to line socket settings TCP_NODELAY prohibit Nagle Algorithm ( Field c->tcp_nodelay To be an assignment NGX_TCP_NODELAY_SET, Indicates that... Has been set for this set of interfaces TCP_NODELAY Options ), The final response data will be sent out immediately , So as to solve the possible problems mentioned above .

http://lenky.info/ebook/

TCP_NODELAY More related articles in detail

  1. Nginx The configuration file nginx.conf Detailed explanation of Chinese ( turn )

    ######Nginx The configuration file nginx.conf Detailed explanation of Chinese ##### # Definition Nginx Running users and user groups user www www; #nginx Number of processes , It is recommended to set equal to CPU Total core number . worker_ ...

  2. CentOS 6.3 Next Samba How to install and configure the server ( Graphic, )

    This article mainly introduces CentOS 6.3 Next Samba How to install and configure the server ( Graphic, ), Friends in need can refer to   One . brief introduction   Samba It's a way to make Linux System application Microsoft Network communication protocol software , ...

  3. Nginx Configuration file details

    Nginx It's a performance oriented design HTTP The server , Compare with Apache.lighttpd Have less memory , High stability advantage . ######Nginx The configuration file nginx.conf Detailed explanation of Chinese ##### # Definition Ngin ...

  4. Nginx Detailed explanation of main configuration parameters ,Nginx Configure the website

    1.Niginx Main configuration file parameter details a. The above blog said in Linux Install in nginx. The blog address is :http://www.cnblogs.com/hanyinglong/p/5102141.html b. When ...

  5. Nginx Detailed explanation of Chinese 、 Configuration deployment and high concurrency optimization

      One .Nginx Common commands : 1. start-up Nginx          /usr/local/nginx/sbin/nginxpoechant@ubuntu:sudo ./sbin/nginx2. stop ...

  6. redis Configuration details

    ##redis Configuration details # Redis configuration file example. # # Note that in order to read the configuration fil ...

  7. TCP/IP Detailed explanation ( turn )

    TCP/IP Explain the learning notes in detail (1)- Basic concepts Why is there TCP/IP agreement All over the world , All kinds of computers run different operating systems to serve you , The way these computers express the same information is very different . It's like in the Bible ...

  8. Nginx The configuration file (nginx.conf) Configuration details (2)

    Nginx Configuration file for nginx.conf Configuration details are as follows : user nginx nginx ; Nginx Users and groups : user   Group .window I don't specify worker_processes 8; Working process : number ...

  9. TCP IP Detailed explanation ( turn )

    When I was studying the basis of network in University, the teacher said , The network is divided into physical layer from bottom to top . Data link layer . The network layer . Transport layer . The session layer . Presentation layer and application layer . Network seven layer protocol is abbreviated as OSI.TCP/IP The physical layer is removed , And put the top three ( The session layer . Presentation layer and application layer ) Generally referred to as the ...

Random recommendation

  1. ansible Of SSH Connection problem

    Problem description : stay ansible After installation, it is generally necessary to use SSH Connect to the target host that needs to be managed , I met the following problems at the beginning : # ansible -m ping all 10.200.xx.xx | UNREAC ...

  2. CAN Bus communication : Realization MIC-3680 And F28335 CAN Bus communication settings

    Blogger original : Realization MIC-3680 And F28335 CAN Bus communication settings (MIC-3680CAN The module filter uses the single filter mode ): CAN2.0A agreement : ( Use 11 Bit identifier ) <1>F28335 send out , ...

  3. [CoreOS Reprint ] CoreOS Practice Guide ( 5、 ... and ): Distributed data storage Etcd( On )

    Reprint :http://www.csdn.net/article/2015-01-22/2823659 Abstract : stay “ Walking in the clouds :CoreOS Practice Guide ” The first few articles in the series , It introduces how to set up respectively CoreOS colony , system service ...

  4. Batch cleanup VisualStudio Solutions folder

         Many times we need to delete it manually bin, obj Such a folder . These folders are made up of Visual Studio Generated when compiling a project , It includes the assembly of the current project . A solution can contain many projects , So there are a lot of them ...

  5. js ajax post Submit ie And fox 、 The code submitted by Google is inconsistent , Leading to Chinese miscode

    Today, I met a problem. I have been looking for it for a long time and found that : Use js ajax post Submit ie And fox . The code submitted by Google is inconsistent , Leading to Chinese miscode //http://www.cnblogs.com/QGC88 $.ajax({ url ...

  6. be based on WebForm+EasyUI Business management system formation journey -- home page Portal Interface drag and drop (Ⅵ)

    Part 1 < be based on WebForm+EasyUI Business management system formation journey -- structure Web Interface >, This paper mainly introduces the system interface layout . Export data, etc . This article will introduce the home page Portal Interface drag and drop . One . home page Portal Interface drag and drop ...

  7. Why do I insist on DBA Be sure to understand development

    Why do I insist on DBA Be sure to understand development time 2016-03-23 15:34:08 Happy life of Zhang bichi http://pottievil.com/ Why do I insist on dba Be sure to understand development / The theme DBA The database is on hand recently ...

  8. Java Software system function design practical training video tutorial

    Java Software system function design practical training video tutorial The first 01 lesson : Introduction to the whole course and miscellaneous introduction 02 lesson : Common concepts and methods of software function design part 03 lesson : Some thoughts on software design 04 lesson : The business and corresponding mode of the first week's assignment : The comprehensive application is simple ...

  9. px and em The difference between , css The weight

    PX characteristic :px Pixels (Pixel). Relative length unit . Pixels px It's relative to the screen resolution of the monitor . EM characteristic  1. em Is not fixed :2. em Inherits the font size of the parent element . priority :!important> ...

  10. es6 introduction 2-- Object deconstruction assignment

    Deconstruct assignment :ES6 Allows you to extract values from arrays or objects according to certain rules , And assign values to variables . Straight white dots , The structure on both sides of the equal sign is the same , The value on the right is assigned to the variable on the left . One . Deconstruction and assignment of arrays : 1. Basic usage let [a, b, c] = [ ...