If you need to store a lot of URL And need to be based on URL Search for . If you use B-Tree To store URL, There will be a lot of storage , because URL It's very long in itself . Under normal circumstances, there will be the following query :

SELECT id FROM url WHERE url="http://www.baidu.com";

If you delete the original URL Index on , And add an indexed url_crc Column , Use CRC32 do hash , You can query in the following way :

SELECT id FROM url WHERE url='http://www.baidu.com' AND rul_crc=CRC32('http://www.baidu.com');

This is very high performance , because MySQL The optimizer will use this highly selective and small volume based on url_crc Column index to complete the lookup . Even if there are multiple identical index values , The search is still quick , Only according to hash A quick integer comparison of values will find the index entry , Then return the corresponding lines one by one . Another way is to do it completely URL String indexing , That would be very slow .

The drawback of this implementation is that it needs maintenance hash value . It can be maintained manually , It can be implemented by trigger . If this is the way , remember , Do not use SHA1() and MD5() As a hash function . Because these two functions work out hash Value is a very long string , It's going to waste more space , It's also slower to compare .SHA1() and MD5() It's a strong encryption function , The design goal is to eliminate conflicts to the maximum extent , You don't need to do that here . Simple hash The conflict of functions is in an acceptable range , Colleagues can provide better performance .

If the data table is very large ,CRC32() There will be a lot of hash Conflict , You can consider implementing a simple 64 position hash function . This custom function returns an integer , Instead of strings . A simple way to use MD5() Function returns part of the value as a custom hash function . It's better than writing one yourself hash The performance of the algorithm is poor , But this is the easiest way to do it .

SELECT CONV(RIGHT(MD5('http://www.baidu.com'),16),16,10) AS HASH64.

Handle hash Conflict . When using hash When the index is queried , Must be in WHERE Clause contains a constant value :

SELECT id from url WHERE url=crc32('http://www.baidu.com') AND url='http://www.baidu.com';

Once it appears hash Conflict , Another string of hash The values are exactly the same , The following statement does not work correctly :

SELECT id from url WHERE url=crc32('http://www.baidu.com');

Because of the so-called ‘ Birthday paradox ’ appear hash The probability of conflict may grow much faster than you think ,CRC32() The return is 32 An integer , When the index has 9.3W When recording , The probability of conflict is 1%. for example , We will '/usr/share/dic/words' The word countdown table in , And carry on crc32() Calculation , In the end there will be 98569 That's ok . This has happened once hash The conflict . To avoid hash The question of conflict , Must be in WHERE In terms of hahs Values and corresponding column values . If you do not want to query specific values , For example, just counting the number of records ( Imprecise ), Then you can leave the column value blank , Use it directly crc32() Of hash Value query . You can also use FNV64() Function as hash function ,hash The value is 64 position , Very fast , And conflict is more than crc32() Much less .

Mysql The basis of the index ( Next ) More articles about

  1. mysql Use of index [ Next ]

    Then the first one , Let's move on to the index . This time we're going to explore the use of union index and union . The specification of multi table query . Keep looking at the data : mysql> select * from student order by ID d ...

  2. Mysql The basis of the index ( On )

    To understand Mysql How the index works in , The easiest way is to read a book " Index part ": If you want to find a specific topic in a Book , I usually read first " Indexes ", Find the corresponding page number . stay My ...

  3. MySQL The architecture and MySQL Indexes

    1.  MySQL framework 1.1         Logical architecture 1.1.1   Connection Pool: Connection pool * Manage buffered user connections , Thread processing and other requirements requiring caching . * Be responsible for monitoring MySQL Se ...

  4. (2.8)Mysql And SQL Basics —— The classification and use of index

    (2.8)Mysql And SQL Basics —— The classification and use of index keyword :mysql Indexes ,mysql Add index ,mysql Modify the index ,mysql Delete index Logically : 1. primary key ( Clustered index )( It's also the only index , It is not allowed to have ...

  5. mysql One of the indexes : Index basis (B-Tree Indexes 、 Hash index 、 Cluster index 、 The full text (Full-text) Index difference )( unique index 、 Leftmost prefix index 、 Prefix index 、 Multi column index )

    When there is no index mysql How to query data Index has a crucial influence on the speed of query , Understanding indexes is also the starting point for database performance tuning . Consider the following , Suppose a table in the database has 10^6 Bar record ,DBMS The page size of is 4K, And store 10 ...

  6. MySQL Index basic knowledge points

    What is index An index is like a book catalog , It is a data structure maintained by the database storage engine to quickly find records , It is the most effective way to optimize query performance . MySQL Indexes are implemented at the storage engine level, not at the server level , The indexing of different storage engines works in different ways ...

  7. 【 Tamp Mysql Basics 】MySQL stay Linux System configuration files and log details

    This paper addresses Share the outline : 1. summary 2. Explain the configuration file 3. Detailed log 1. summary MySQL Configuration file in Windows To call down my.ini, stay MySQL Under the installation root directory of : stay Linux To call down my.cnf, The file is located at ...

  8. Mysql Index basis

    Mysql Index basis Basic concepts : Index is a special database structure , It can be used to quickly query specific records in database tables . Index is an important way to improve the performance of database . Indexes are created on tables , It is a structure that sorts the values of one or more columns in a database table . It can be mentioned ...

  9. MySQL The basics of indexing in

    This article is about MySQL The basics of indexing in . It mainly talks about the meaning and principle of index . Create and delete operations . The data structure of the index is not involved . High performance strategy, etc . One . summary 1. The meaning of index : Used to improve the efficiency of database retrieval data , Improve database performance . Count ...

Random recommendation

  1. centos/redhat install mysql

    1. from http://dev.mysql.com/downloads/repo/ Download the corresponding rpm file , My version is 7, So download :mysql-community-release-el7-5.noarch. ...

  2. UIImageView Of contentMode attribute

    UIViewContentModeScaleToFill UIViewContentModeScaleAspectFit UIViewContentModeScaleAspectFill UIView ...

  3. WP Development Notes —— Page reference

    WP APP The parameters between pages can be transferred through the program's App Class to set global variables . because App Class inherits from Application class , And by Application Of Current Property to get the... Associated with the current program App ...

  4. utilize js Make html table Pagination example (js Implement paging )

    occasionally table The number of columns for is too long , Not conducive to user query , So the use of JS Made a table paging , Here's the code One .JS Code <script type="text/javascript" ...

  5. utilize Jenkins Automated deployment tools build indirectly kettle The scheduling platform of

    About Jenkins I will not talk about the introduction of , Own Baidu , Because this tool call script is just the tip of the iceberg of its function , Other functions I don't understand , Because it's not that field .         Now let me introduce why we need a scheduling platform , And after learning ...

  6. Ajax In cross domain requests Cookie problem ( Default without cookie Etc )

    1. Native Ajax Request mode , Set cross domain request with detailed parameters var xhr = new XMLHttpRequest(); xhr.open("POST", "http://xx ...

  7. php interview

    1. stay PHP in , The name of the current script ( Excluding paths and query strings ) Record in predefined variables (1) in : And the link to the current page URL Record in predefined variables (2) in . answer :echo $_SERVER['PHP_SELF']; echo ...

  8. WAS Cluster series (6): Cluster building : step 4: install WAS Upgrade software

    Step by step click " next step ", Pay attention to one process , For example, here are some examples : " Upgrade software " Installation path settings , The suggestion is the same as the previous one WAS And IHS The absolute path of the installation is the same , As you can see below : Step by step click , Wan Bian ...

  9. be based on Xcode8 Plug-in development ~ One click detection processing header file reference

    Xcode8 Opening up a new one Extension:Xcode Source Editor Extension, The purpose is to allow developers to work independently for IDE Writing a plug-in , Although the functions provided by the system are still limited , But it doesn't get in our way ...

  10. Deep learning PyTorch actual combat (1)—— Basic learning and building environment

    Studying recently PyTorch frame , I bought one < Deep learning PyTorch Actual computer vision >, Start with learning , Xiao Bian will arrange study notes , And blogging , I hope I can finish this book well , Finally, I can use this framework skillfully . PyTorch ...