This paper is suitable for relational database , Such as Oracle、MySQL.
State owned Legalists have family rules , There are development standards for development , So what are the design criteria for database design ？
Blogger recently searched out a better article in database design , Reprint it to you , I hope that's helpful ！
1. The relationship between the original document and the entity
It can be one-on-one 、 One to many 、 Many to many The relationship between . In general , They are one-on-one relationships ： That is, an original document corresponds to and only corresponds to one entity . In special circumstances , They can be one to many or many to one relationships , That is, one original document corresponds to multiple entities , Or multiple original documents corresponding to one entity . The entity here can be understood as a basic table . After making clear the corresponding relationship , It's good for us to design the input interface .
〖 example 1〗： An employee resume , In human resource information system , There are three basic tables ： Basic information form of employees 、 Social relations table 、 Resume form . This is it. “ One original document corresponds to multiple entities ” Typical example .
2. Primary and foreign keys
generally speaking , An entity cannot have neither a primary key nor a foreign key . stay E—R In the figure , An entity in the part of a leaf , You can define a primary key , You can also not define a primary key ( Because it has no offspring ), But you have to have a foreign key ( Because it has a father ).
Design of primary key and foreign key , In the design of global database , Occupy an important position . When the design of the global database is completed , An American database design expert said ：“ key , There are keys everywhere , Besides the keys , nothing in the world ”, That's his database design experience , It also reflects his understanding of the core of information systems ( Data model ) A highly abstract idea of . because ： Primary keys are highly abstract entities , Pairing of primary and foreign keys , Represents the connection between entities .
3. The properties of the basic table
Basic table and intermediate table 、 Temporary tables are different , Because it has the following four characteristics ：
(1) Atomicity . The fields in the basic table are not decomposable .
(2) Primitiveness . The records in the basic table are raw data （ Basic data ） The record of .
(3) deductively . From the basic table and code table data , You can derive all the output data .
(4) stability . The structure of the basic table is relatively stable , The records in the table should be kept for a long time .
After understanding the nature of the basic table , When designing a database , You can combine the basic table with the intermediate table 、 Temporary tables distinguish .
4. Paradigm standards
The relationship between the underlying table and its fields , Try to satisfy the third paradigm . however , Database design to meet the third paradigm , It's often not the best design . In order to improve the efficiency of the database , It's often necessary to lower the paradigm standards ： Add redundancy appropriately , To achieve the purpose of exchanging space for time .
〖 example 2〗： There's a basic list of goods , As shown in the table 1 Shown .“ amount of money ” The existence of this field , Indicates that the design of the table does not satisfy the third normal form ,
because “ amount of money ” Can be “ The unit price ” multiply “ Number ” obtain , explain “ amount of money ” It's a redundant field . however , increase “ amount of money ” This redundant field , It can improve the speed of query statistics , This is how space is exchanged for time .
stay Rose 2002 in , There are two types of rules ： Data columns and computed columns .“ amount of money ” Such columns are called “ Calculated column ”, and “ The unit price ” and “ Number ” Such columns are called “ Data columns ”.
surface 1 Table structure of commodity table
Name of commodity Product model The unit price Number amount of money
The TV 29 " 2,500 40 100,000
5. To understand the three paradigms in a popular way
To understand the three paradigms in a popular way , Great for database design . In database design , In order to better apply the three paradigms , It has to be understood in a popular way
Three paradigms ( Popular understanding is sufficient understanding , It's not the most scientific and accurate understanding )：
First normal form ：1NF It's an atomic constraint on properties , Properties are required to be atomic , It can't be broken down ;
Second normal form ：2NF It's a constraint on the uniqueness of records , The record is required to have a unique identification , That is, the uniqueness of the entity ;
Third normal form ：3NF It's a constraint on field redundancy , That is, any field cannot be derived from other fields , It requires that the fields are not redundant .
No redundant database design can be done . however , No redundant database is not necessarily the best database , Sometimes in order to improve operation efficiency , We must lower the standard of paradigm , Keep redundant data in place . The specific way is ： Follow the third paradigm in conceptual data model design , The work of reducing the standard of normal form should be considered in the design of physical data model . Lowering the paradigm is adding fields , Allow redundancy .
6. We should be good at identifying and correctly handling many to many relationships
If there is a many to many relationship between two entities , This relationship should be eliminated . The way to eliminate it is , Add a third entity between the two . such , It turns out that a many to many relationship , Now it's two one to many relationships . To reasonably allocate the attributes of the original two entities to the three entities . The third entity here , In essence, it's a more complicated relationship , It corresponds to a basic table . In general , Database design tools don't recognize many to many relationships , But can handle many to many relationships .
〖 example 3〗： stay “ Library Information System ” in ,“ The book ” Is an entity ,“ readers ” It's also an entity . The relationship between the two entities , It's a typical many to many relationship ： A book can be borrowed by multiple readers at different times , A reader can borrow more than one book . So , To add a third entity between the two , The entity is named “ Borrow and return books ”, Its property is ： Borrowing and returning time 、 Loan return sign (0 To borrow a book ,1 To return a book ), in addition , It should also have two foreign keys (“ The book ” Primary key of ,“ readers ” Primary key of ), Make it compatible with “ The book ” and “ readers ” Connect .
The book 1 and The entity is named “ Borrow and return books ” n
readers 1 and The entity is named “ Borrow and return books ” n
7. Primary key PK Method of value taking
PK Is a table join tool for programmers , It could be a string of numbers with no physical meaning , Add automatically by program 1 To achieve . It can also be a physically meaningful field name or a combination of field names . But the former is better than the latter . When PK When it is a combination of field names , It is recommended not to have too many fields , More than that, not only does the index take up a lot of space , And it's slow .
8. Correct understanding of data redundancy
The repetition of primary and foreign keys in multiple tables , It's not data redundancy , The concept must be clear , In fact, many people don't know . The repetition of non key fields , It's data redundancy ！ And it's a kind of low-level redundancy , The redundancy of repetition . Advanced redundancy is not a repetition of fields , It's the derivation of the field .
〖 example 4〗： In commodities “ The unit price 、 Number 、 amount of money ” Three fields ,“ amount of money ” Is the “ The unit price ” multiply “ Number ” derived , It's redundancy , And it's a kind of advanced redundancy . The purpose of redundancy is to improve processing speed . Only low level redundancy can increase data inconsistency , Because of the same data , can
From different times 、 place 、 The character is entered multiple times . therefore , We Advanced redundancy is advocated ( Derived redundancy ), Against low-level redundancy ( Repetitive redundancy ).
9. E--R There is no standard answer to the picture
Information systems E--R There is no standard answer to the picture , Because its design and painting are not unique , As long as it covers the business scope and functional content of the system requirements ,
It's feasible . On the contrary, we should modify E--R chart . Although it doesn't have the only standard answer , It doesn't mean you can design at will . well E—R The standard of the graph is ： The structure is clear 、 The relevance is simple 、 The number of entities is moderate 、 The distribution of attributes is reasonable 、 No low-level redundancy .
10 . View technology is very useful in database design
And the basic table 、 Code table 、 The middle table is different , A view is a virtual table , It depends on the real table of the data source . Views are for programmers to use the database A window , It is a form of base table data synthesis , It's a way of data processing , Is a means of user data confidentiality . For complicated processing 、 Improve the speed of operation and save storage space , The definition depth of a view should not exceed three levels . If the three-level view is not enough , The temporary table should be defined on the view , Redefine the view on the temporary table . In this way, we repeatedly define , The depth of the view is unlimited .
For some and national politics 、 economic 、 technology 、 Information systems related to military and security interests , Views are more important . After the physical design of the basic tables of these systems is completed , Immediately create the first level view on the base table , The number and structure of this layer , The number and structure of the basic table are exactly the same . And stipulate , All the programmers , Only operate on the view . Only database administrators , With multiple people in charge “ Security key ”, In order to operate directly on the basic table . Please think about it ： Why is that ？
11. In the middle of table 、 Reports and temporary tables
An intermediate table is a table that holds statistics , It's for the data warehouse 、 Designed to output reports or query results , Sometimes it doesn't have a primary key or a foreign key ( Except for data warehouse ). Temporary tables are designed by programmers , Keep temporary records , For personal use . The base table and the intermediate table are composed of DBA maintain , Temporary tables are maintained automatically by programmers themselves .
12. Integrity constraints are manifested in three aspects
Domain integrity ： use Check To implement constraints , In the database design tool , When defining the value range of a field , There is one Check Button , It defines the value city of a field .
Referential integrity ： use PK、FK、 Table level triggers to implement .
User defined integrity ： It's business rules , Using stored procedures and triggers to implement .
13. The way to prevent database design from patching is “ Three less principles ”
(1) The less the number of tables in a database, the better . Only the number of tables is missing , To explain the systematic E--R The pictures are few and fine , Remove redundant entities that are repetitive , A high degree of abstraction of the objective world has been formed , The system data integration is carried out , Prevent patching design ;
(2) The less the number of fields in a table that combines primary keys, the better . Because of the role of the primary key , One is to build a primary key index , The second is the foreign key as a sub table , Therefore, the number of fields in the combined primary key is less , Not only does it save running time , And it saves index storage space ;
(3) The fewer fields in a table, the better . Only the number of fields is less , In order to show that there is no data duplication in the system , And there's very little data redundancy , What's more important is to urge readers to learn “ Column to row ”, This prevents the fields in the sub table from being pulled into the main table , Leave a lot of empty fields in the main table . So-called “ Column to row ”, It is to pull out part of the main table , In addition, create a separate sub table . This method is very simple , Some people are just not used to 、 Don't adopt 、 Don't execute .
The practical principle of database design is ： Find the right balance between data redundancy and processing speed .“ Three little ” It's a whole concept , Comprehensive view , One principle cannot be isolated . The principle is relative , Not absolutely .“ Three more ” The principle must be wrong . Just imagine ： If you cover the same function of the system , A hundred entities ( A thousand attributes ) Of E--R chart , Definitely more than 200 entities ( A total of 2000 attributes ) Of E--R chart , It's much better .
promote “ Three little ” principle , Is to ask the reader to learn to use the database design technology to carry on the system data integration . The step of data integration is to integrate file system into application database , Set the application database into a topic database , Set the topic database into a global comprehensive database . The higher the degree of integration , The more data is shared , The less information islands there are , The whole enterprise information system E—R The number of entities in the graph 、 Number of primary keys 、 The less the number of attributes .
promote “ Three little ” The purpose of the principle , Is to prevent readers from using Patching Technology , Constantly add, delete and modify the database , The enterprise database has become a random design of database tables “ rubbish dump ”, Or database table “ The courtyard ”, Finally, the basic tables in the database are created 、 Code table 、 In the middle of table 、 The temporary watch is a mess , Countless （ That is to dynamically create tables and increase the number of tables ）, The information system of enterprises and institutions can not be maintained and paralyzed .
“ Three more ” Principles anyone can do , The principle is “ Patch method ” The fallacy of designing databases .“ Three little ” Principle is the principle of few but fine , It requires a high level of database design skills and art , Not everyone can do it , Because the principle is to stop using “ Patch method ” The theoretical basis of designing database .
14. How to improve the efficiency of database operation
Under the given system hardware and system software conditions , The way to improve the efficiency of database system is ：
(1) In the physical design of the database , Lower the paradigm , Add redundancy , Use fewer triggers , Multi purpose stored procedures .
(2) When the calculation is very complicated 、 And when the number of records is very large ( For example, 10 million ), Complex computing needs to be outside the database first , Use as a file system C++ After the language calculation is completed , Finally, the database is added to the table . This is the experience of telecom billing system design .
(3) Found too many records in a table , For example, more than 10 million , The table should be split horizontally . The method of horizontal segmentation is , With the table primary key PK A certain value of is the boundary , Divide the record level of the table into two tables （ That is, you can maintain the table Too many rows in table Manually split into two Build two tables union The view of is transparent to the program ）. If you find too many fields in a table , For example, more than 80 , be Split the table vertically , Decompose the original table into two tables .
(4) For database management system DBMS System optimization , That is to optimize various system parameters , Such as the number of buffers .
(5) Using data oriented SQL Language programming when , Try to optimize the algorithm .
All in all , To improve the efficiency of database operation , Must be optimized from the database system level 、 Database design level optimization 、 Program implementation level optimization , Work on these three levels at the same time .
These 14 techniques , It's a lot of people in a lot of database analysis and design practice , Gradually summed up . For the use of these experiences , The reader can't be too rigid , Rote learning , But digest and understand , Seeking truth from facts , Master flexibly . And gradually do ： Develop in application , Apply in development .
Reprinted from ：http://www.javaeye.com/topic/28161