MySQL deep dive: analyzing performance schema memory management

Ali Technology 2021-10-14 07:57:22

 picture


One   introduction


MYSQL Performance schema(PFS) yes mysql Provides powerful performance monitoring and diagnostic tools , Provides a way to check at run time server Special method of internal implementation .PFS By monitoring server Internal registered events to collect information , An event can theoretically be server Any internal execution behavior or resource occupation , For example, a function call 、 A system call wait、SQL Parsing or sorting status in the query , Or memory resource occupation, etc .

PFS Store the collected performance data in performance_schema In the storage engine ,performance_schema The storage engine is a memory table engine , That is, all the collected diagnostic information will be saved in memory . The collection and storage of diagnostic information will bring some additional overhead , In order to have as little impact on the business as possible ,PFS Performance and memory management are also very important .

This article mainly through to PFS Read the source code of engine memory management , Reading PFS Memory allocation and release principle , In depth analysis of some of the problems , And some improvement ideas . The source code analysis of this paper is based on Mysql-8.0.24 edition .

Two   Memory management model


PFS Memory management has several key features :

  • Memory allocation to Page In units of , One Page Multiple can be stored in the record

  • Pre assigned parts at system startup pages, Dynamic growth as needed during operation , but page It is a mode of only adding but not recycling

  • record Your application and release are unlocked

1  Core data structure


PFS_buffer_scalable_container yes PFS The core data structure of memory management , The overall structure is shown below :

 picture


Container Contains multiple page, Every page There are a fixed number of records, Every record Corresponding to an event object , such as PFS_thread. Every page Medium records The quantity is fixed , but page The number will increase as the load increases .

2  Allocate when Page selection strategy


PFS_buffer_scalable_container yes PFS The core data structure of memory management

The key data structures related to memory allocation are as follows :

PFS_PAGE_SIZE // Every page Size , global_thread_container China and Murdoch think 256PFS_PAGE_COUNT // page Maximum number of ,global_thread_container China and Murdoch think 256
class PFS_buffer_scalable_container { PFS_cacheline_atomic_size_t m_monotonic; // Monotonically increasing atomic variables , For lockless selection page PFS_cacheline_atomic_size_t m_max_page_index; // The maximum... Currently allocated page index size_t m_max_page_count; // Maximum page Number , No new... Will be assigned after page std::atomic<array_type *> m_pages[PFS_PAGE_COUNT]; // page Array native_mutex_t m_critical_section; // Create a new page A lock needed for }

First m_pages Is an array , Every page There may be free Of records, Or maybe the whole page All are busy Of ,Mysql Adopted a relatively simple strategy , Rotate and try each one one one by one page Is there any free time , Until the distribution is successful . If you rotate all pages Still not allocated successfully , This time a new... Will be created page To expand , Until you reach page The upper limit of the number .

Rotation training is not always from the second stage 1 individual page Start looking for , Instead, use atomic variables m_monotonic Start looking at the location of the record , m_monotonic Every time in page The allocation failure in is plus 1.

The core simplified code is as follows :

value_type *allocate(pfs_dirty_state *dirty_state) { current_page_count = m_max_page_index.m_size_t.load();  monotonic = m_monotonic.m_size_t.load(); monotonic_max = monotonic + current_page_count; while (monotonic < monotonic_max) { index = monotonic % current_page_count; array = m_pages[index].load(); pfs = array->allocate(dirty_state); if (pfs) { // Allocation successful return  return pfs; } else { // Allocation failed , Try the next one page,  // because m_monotonic It is concurrent accumulation , There may be local monotonic Variables are not linearly increasing , It could be from 1 Directly into 3 Or bigger , // So at the moment while The cycle is not strictly rotational page, It's probably jumping and trying , The replacement said that we will train all together in rotation under concurrent access page. // This algorithm actually has some problems , It can lead to something page Skipped ignored , So as to intensify the expansion of new capacity page The risk of , More on that later . monotonic = m_monotonic.m_size_t++; } }  // Rotate all Page Failed to allocate after , If the upper limit is not reached , Start expanding page while (current_page_count < m_max_page_count) { // Because of concurrent access , To avoid creating new at the same time page, Here's a sync lock , Also the whole PFS Memory allocation unique lock  native_mutex_lock(&m_critical_section); // Take the lock successfully , If array No more null, It indicates that it has been successfully created by other threads  array = m_pages[current_page_count].load(); if (array == nullptr) { // Grab the creation page The responsibility of the  m_allocator->alloc_array(array); m_pages[current_page_count].store(array); ++m_max_page_index.m_size_t; } native_mutex_unlock(&m_critical_section);  // In the new page Try again to allocate  pfs = array->allocate(dirty_state); if (pfs) { // The allocation is successful and returns  return pfs; } // Allocation failed , Continue trying to create new page Up to the upper limit  }}

Let's analyze the rotation training in detail page The question of strategy , because m_momotonic The accumulation of atomic variables is concurrent , Will lead to some page Skipped rotation training it , This exacerbates the expansion of new capacity page The risk of .

Take an extreme example , It's easier to explain the problem , Suppose there are currently 4 individual page, The first 1、4 individual page Full no available record, The first 2、3 individual page Available record.

When it comes at the same time 4 Threads are concurrent Allocate request , At the same time, I got m_monotonic=0.

monotonic = m_monotonic.m_size_t.load();

At this time, all threads try to start from 1 individual page Distribute record Will fail ( Because the first 1 individual page Yes no available record), Then add up to try the next page

monotonic = m_monotonic.m_size_t++;

At this time, the problem comes , Because atomic variables ++ Is to return the latest value ,4 Threads ++ Success is sequential , The first 1 individual ++ After the thread of monotonic The value is 2, The first 2 individual ++ The thread of is 3, And so on . So you see the second 3、4 A thread skipped page2 and page3, Lead to 3、4 The thread will end the rotation and fail to enter the process of creating a new thread page In the process of , But at this point page2 and page3 There is free time in record serviceable .

Although the above examples are extreme , But in Mysql Concurrent access , concurrent application PFS Memory causes a portion to be skipped page The situation should still be very easy to happen .

3  Page Inside Record selection strategy


PFS_buffer_default_array Is each Page Maintain a group of records Management category .

The key data structure is as follows :

class PFS_buffer_default_array {PFS_cacheline_atomic_size_t m_monotonic; // Monotonically increasing atomic variable , Used for selection free Of recordsize_t m_max; // record Maximum number of T *m_ptr; // record Corresponding PFS object , such as PFS_thread}

Every Page It's actually a fixed length array , Every record Objects have 3 Status FREE , DIRTY ALLOCATED , FREE I'm free record have access to , ALLOCATED Is assigned successfully , DIRTY It's an intermediate state , Indicates that it has been occupied but has not been allocated successfully .

Record The essence of choice is to find and seize the status of rotation training free Of record The process of .

The core simplified code is as follows :

value_type *allocate(pfs_dirty_state *dirty_state) { // from m_monotonic Start to try the rotation search at the recorded position  monotonic = m_monotonic.m_size_t++; monotonic_max = monotonic + m_max;
while (monotonic < monotonic_max) { index = monotonic % m_max; pfs = m_ptr + index; // m_lock yes pfs_lock structure ,free/dirty/allocated Three states are maintained by this data structure // How to realize atomic state migration will be described in detail later if (pfs->m_lock.free_to_dirty(dirty_state)) { return pfs; } // At present record Not for free, Atomic variable ++ Try the next one monotonic = m_monotonic.m_size_t++; }}

choice record Main process and selection page Basically similar , The difference is page Inside record The quantity is fixed , Therefore, there is no logic of capacity expansion .

Of course, the selection strategy is the same , There will be the same problem , there
m_monotonic Atomic variable ++ Is multithreaded and concurrent , Similarly, if the concurrency is large, there will be record Skipped and selected , This leads to page Even if there is free Of record It may not have been selected .

So that is page Choose even if it's not skipped ,page Internal record There is also a chance of being skipped and not selected , worse , Further exacerbated the growth of memory .

4  pfs_lock


Every record There is one. pfs_lock , To maintain it in page Allocation status in (free/dirty/allocated), as well as version Information .

Key data structure :

struct pfs_lock {
std::atomic m_version_state;
}

pfs_lock Use 1 individual 32 Bit unsigned integer to save version+state Information , The format is as follows :

 picture


state
low 2 Bit bytes indicate allocation status .

state PFS_LOCK_FREE = 0x00
state PFS_LOCK_DIRTY = 0x01
state PFS_LOCK_ALLOCATED = 0x11

version

initial version by 0, Add... For each successful allocation 1,version It means that record The number of successful assignments
Mainly look at the state migration code :

// below 3 A macro is mainly used for bit operation , Convenient operation state or version#define VERSION_MASK 0xFFFFFFFC#define STATE_MASK 0x00000003#define VERSION_INC 4
bool free_to_dirty(pfs_dirty_state *copy_ptr) { uint32 old_val = m_version_state.load();
// Judge the present state Is it FREE, If not , Direct return failure if ((old_val & STATE_MASK) != PFS_LOCK_FREE) { return false; }
uint32 new_val = (old_val & VERSION_MASK) + PFS_LOCK_DIRTY;
// At present state by free, Try to state It is amended as follows dirty,atomic_compare_exchange_strong Belong to optimistic lock , Multiple threads may be simultaneously // Modify the atomic variable , But only 1 Modification succeeded . bool pass = atomic_compare_exchange_strong(&m_version_state, &old_val, new_val);
if (pass) { // free to dirty success copy_ptr->m_version_state = new_val; }
return pass;}
void dirty_to_allocated(const pfs_dirty_state *copy) { /* Make sure the record was DIRTY. */ assert((copy->m_version_state & STATE_MASK) == PFS_LOCK_DIRTY); /* Increment the version, set the ALLOCATED state */ uint32 new_val = (copy->m_version_state & VERSION_MASK) + VERSION_INC + PFS_LOCK_ALLOCATED;
m_version_state.store(new_val);}

The state transition process is easy to understand ,  from dirty_to_allocated and allocated_to_free The logic is simpler , Because only record Status is free when , Its state migration has the problem of concurrent multiple writes , once state Turn into dirty, At present record It is equivalent to being occupied by a thread , Other threads will not attempt to operate on this record 了 .

version The growth of is in state Turn into PFS_LOCK_ALLOCATED when


5  PFS Memory free


PFS Memory release is relatively simple , Because of every record All recorded where they were container and page, call deallocate Interface , Finally, set the status to free It's done. .

The bottom will go into pfs_lock To update the status :

struct pfs_lock { void allocated_to_free(void) { /* If this record is not in the ALLOCATED state and the caller is trying to free it, this is a bug: the caller is confused, and potentially damaging data owned by another thread or object. */ uint32 copy = copy_version_state(); /* Make sure the record was ALLOCATED. */ assert(((copy & STATE_MASK) == PFS_LOCK_ALLOCATED)); /* Keep the same version, set the FREE state */ uint32 new_val = (copy & VERSION_MASK) + PFS_LOCK_FREE;
m_version_state.store(new_val); }}

3、 ... and   Optimization of memory allocation


As we analyzed earlier, whether it is page still record There is a chance of skipping rotation training , Even if there is... In the cache free The distribution of the members of the group will also be unsuccessful , Leads to the creation of more page, Take up more memory . The main problem is that this memory will not be released once allocated .

In order to improve PFS Memory hit rate , Try to avoid the above problems , Some ideas are as follows :

 while (monotonic < monotonic_max) { index = monotonic % current_page_count; array = m_pages[index].load(); pfs = array->allocate(dirty_state); if (pfs) { // Record the successful allocation index m_monotonic.m_size_t.store(index); return pfs; } else { // Local variables are incremented , Avoid avoiding concurrent accumulation and skipping some pages monotonic++; } }

Another point , Each search starts from the last successful location , This will inevitably lead to the conflict of concurrent access , Because everyone starts from the same position , A certain randomness should be added to the starting search position , This can avoid a large number of conflicts .

Summarized below :

  1. Every time Allocate From the most recent assignment index Start looking for , Or start looking at random locations

  2. Every Allocate Strictly rotate all pages or records


Four   Optimization of memory release


PFS The biggest problem with memory release is that once the created memory is not released , until shutdown. If you encounter hot business , During the peak period of business, a lot of page Of memory , In the low peak stage of business, it is still not released .

To achieve periodic detection and reclaim memory , Without affecting the efficiency of memory allocation , It is quite complicated to implement a set of lockless recycling mechanism .

There are mainly the following points to consider :

  1. The release must be in the form of page Unit , That is, the released page In all of the records All must be guaranteed to be free, And make sure you stay free Of page Will no longer be assigned to

  2. Memory allocation is random , Overall, memory can be recycled , But maybe everyone page There are some. busy Of , How to better coordinate this situation

  3. How to determine the threshold of release , Also avoid frequent assignments + The problem of release

in the light of PFS Optimization of memory release ,PolarDB Regular recycling has been developed and provided PFS Memory features , In view of the limitation of this space , It will be introduced later .

5、 ... and   About us


PolarDB It is a cloud native distributed relational database independently developed by Alibaba , On 2020 in Gartner Global database Leader quadrant , And got it. 2020 The first prize of scientific and technological progress awarded by China Electronics Society in .PolarDB Cloud based native distributed database architecture , Provide large-scale online transaction processing capability , It also has the ability of parallel processing of complex queries , In the field of cloud native distributed database, it has reached the international leading level , And has been widely recognized by the market . In the best practices within Alibaba Group ,PolarDB And fully supported 2020 Double 11, Tianmao , And refresh the database processing peak record , the height is 1.4 Billion TPS. Welcome people with lofty ideals to join us , Please send your resume to zetao.wzt@alibaba-inc.com , We look forward to working with you to build a world-class next-generation cloud native distributed relational database .

Reference resources :

[1] MySQL Performance Schema
https://dev.mysql.com/doc/refman/8.0/en/performance-schema.html

[2] MySQL · Best practices · Have you been together today ?--- Insight PolarDB 8.0 Parallel query
http://mysql.taobao.org/monthly/2019/11/01/

[3] Source code mysql / mysql-server 8.0.24
https://github.com/mysql/mysql-server/tree/mysql-8.0.24



Advanced application skills


In this course, you will practice cloud development of advanced applications , Including the development and deployment of several common applications , For example, open Web Applications and applets , Based on the rapid creation of open source applications and cloud native services DevOps practice .1. Web、 Hands on practice of applets and open source applications ;2. Cloud native DevOps Practice ;3. Integration with local development processes .

Click to read the original text to view the course details ~
Please bring the original link to reprint ,thank
Similar articles