author | Yu Yang
In the daily development process , The code asset problems we usually face are divided into two categories ： Code quality issues and code security vulnerabilities .
1、 Code quality issues
Code quality is an old topic , But the problem is that everyone knows it's important , But I don't know how to promote and maintain the common property of this team . On the one hand, developers may go online in time for functions , Neglecting the control of quality , On the other hand, developers have different coding habits and program understanding styles .
In the long run, the decline of code quality is usually self causal , It tends to decline because of business pressure , Therefore, the development efficiency decreases , Further increase business pressure , Lead to a vicious circle .
2、 Code security issues
Security problems are often hidden in security conscious coding logic and open source dependent components that have not been detected or maintained , It is difficult to be detected in time in daily development and code review .
Code security can also be analyzed in two aspects ：
- Coding security issues , namely ： Safety specification problems , By avoiding non-conforming code from entering the enterprise code base , Reduce privacy data disclosure 、 Injection risk 、 The emergence of security policy vulnerabilities .
- Rely on security issues , namely ： Open source relies on security vulnerabilities introduced by third-party components . according to Synopsys 2020 The open source security report shows ,99% The above organizations use open source technology . Use the technology exchange brought by open source components and cooperate on the shoulders of giants 、 Reduce development costs 、 Speed up the iteration cycle 、 There is no need to repeat the advantages of improving software quality , however , While open source software brings a series of convenience , There are also a lot of security risks , According to audit ,75% Code inventory in security vulnerabilities , among 49% Contains high-risk problems , in addition 82% Our code base is still in use for more than 4 Year of outdated Components .
Code security problems , On the one hand, access inspection is also required , Configure security coding and card detection according to business scenarios and specifications . On the other hand, regular maintenance is required , Detect and repair new security vulnerabilities in time .
5 A common code monitoring tool recommended by Alibaba
1、 Code quality check
- Java Code specification detection
In practice, Alibaba , Due to historical barriers and differences in business styles , Engineering structures vary greatly , Code styles are very different , Different specifications , Communication costs a lot , Inefficient cooperation , Maintenance costs are high . The group has grown to its present scale , Need specialized technology, group army, iterative 、 Intensive development , Instead of repeatedly building wheels , A truly professional team must have a unified development protocol , This represents efficiency 、 Resonance 、 feelings 、 sustainable .
Based on the above background , Ali made 《 Alibaba Java Development Manual 》, As Ali's internal Java Development specifications followed by engineers , Covering programming protocols 、 Unit test protocol 、 Exception log protocol 、MySQL Statute 、 Project specifications 、 Safety regulations, etc . This is nearly 10000 Ali Java Experience summary of technical elites , And has experienced many large-scale front-line actual combat tests and improvements .
Making traffic regulations is ostensibly to limit the right to drive , In fact, it is to protect the personal safety of the public . Imagine if there is no speed limit , No traffic lights , There is no right-hand driving clause , Who dares to go on the road . Empathy , For Software , Developing specifications is by no means eliminating the creativity of code content 、 Elegance , It's about limiting over personalization , Promote relative standardization , Do things together in a generally accepted way .
therefore , The goal of the code specification is ：1、 Code of effective ： Unified standards , Improve communication efficiency and R & D efficiency .2、 Code out quality ： Prevent trouble before it happens , Improve quality awareness and system maintainability , Reduce the failure rate .3、 Code out feelings ： craftsmanship spirit , Pursue the ultimate spirit of excellence , Polish boutique code .
The code specification passed IDE Test plug-in 、 Pipeline integration test 、 Code review integration and other tools , Deeply integrated into various development activities of Alibaba . meanwhile , In the cloud code hosting platform Codeup in , It also has built-in Integration Java Code specification detection capability , It provides more convenient and rapid inspection for developers during code submission and code review .
- Code smart patch recommendation
Defect detection and patch recommendation have been difficult problems in the field of software engineering for decades , It is also one of the issues most concerned by researchers and front-line developers , The defect here is not a network vulnerability 、 System defects , It's a defect hidden in the code . Help developers identify these defects , And repair , It can greatly improve the quality of software .
Based on the popular defect detection methods in the industry and academia , And analyze and avoid its limitations , Alibaba Codeup The Algorithm Engineers proposed a new algorithm , Achieve more accurate and efficient analysis of code defects and recommend optimization solutions , The algorithm has been approved by the international software engineering conference （ICSE） Included .
1、 according to commit message Find the repair type commit, Only the files involved are less than 5 One of the commit（ Too many files involved commit May dilute repair behavior ）. This step relies heavily on the good of the developer commit habit , I hope developers can make good use of commit, Write well message.
2、 From these restorative commit Extract deleted content and new content at the file level , namely Defect and Patch pairs(DP Pair), There will inevitably be a lot of noise in this step .
3、 Use improved DBSCAN Method pair buggy and patch For simultaneous clustering , Bring together similar defects and patches .（ You can also do fragment level clustering ） By clustering similar defects and fixes , Reduce a lot of noise left in the previous step , At the same time, the mistakes we made together in the historical code submission have a strong reference significance .
4、 The defect code and patch code are summarized by using the self-developed template extraction method , And adapt the context according to different variables .
The code patch recommendation service is currently applied to the automatic code scanning scenario of merge requests , During the code review process, detect the code fragments that can be optimized and give optimization suggestions , Precipitate the manual experience in historical review and continuously improve the quality of enterprise code .
2、 Code security detection
- Sensitive information detection
In recent years , There have been several incidents of sensitive information in the industry （API Key、Database credential、OAuth token etc. ） Through the event that some sites are unconsciously leaked out , It brings security risks to enterprises , Even direct economic losses .
In our practice , We also face a similar problem , Hard coding problems occur very frequently , And lack of effective identification mechanism . Therefore, developers and enterprise managers urgently need a stable and sound sensitive information detection method and system . Through research, we know that , At present, most of the existing sensitive information detection tools only use rule matching or information entropy technology , As a result, its recall rate or accuracy is difficult to meet expectations . Therefore, based on rule matching and information entropy technology , Combined with context semantics , A sensitive information detection tool using multi-layer detection model is proposed ——SecretRadar.
SecretRadar The technical realization idea of is mainly divided into three layers , The first layer adopts the traditional sensitive information recognition technology of rule matching , Rule matching has good accuracy and scalability , But it depends very much on the curing length 、 Prefix 、 Variable name , It is difficult to cope with different coding styles of different developers , It is easy to cause underreporting . For scenes that are difficult to capture with fixed rules , In the second layer, we use the information entropy algorithm . Information entropy algorithm is used to measure the degree of code line confusion , The recognition effect of randomly generated key and random identity information is good . But the information entropy algorithm also has its limitations , As the recall rate increases, false positives also increase . Therefore, in the third layer, we use template clustering and context semantic analysis to optimize filtering , Common keywords are extracted according to the aggregation of information entropy results , Improve the accuracy of the model by combining the context semantics and the current syntax structure .
Sensitive information detection tools not only serve our internally developed students , More than... Are also supported on the cloud effect platform 2 Ten thousand code base 、3 Thousands of enterprises , Help developers solve more than 9 Million hard coding problems .
- Source code vulnerability detection
Alibaba adopts Sourcebrella Pinpoint Source umbrella detection engine , Source code vulnerability detection , It mainly involves injection risk and security policy risk detection .
The source umbrella detection engine is the Hong Kong University of science and technology Prism The technical research results of the research group in the past ten years . The engine has absorbed the research results of software verification technology in the world in recent ten years , And improve and innovate , Independently designed and implemented a set of software verification system with leading technology . The main verification method is to translate the programming language into mathematical expressions such as first-order logic and linear algebra , The causes of defects are deduced through formal verification technology . so far , A total of four papers related to core technology have been published , A piece of PLDI And three ICSE, Research students can click the link at the end of the article to read .
The source umbrella detection engine can find hidden information in large open source projects with high activity 10 Years of defects , With MySQL testing  For example , These defects cannot be scanned by other inspection tools on the market , And can be in 1.5 Hours to complete 200 Detection of Wanhang large open source project . While maintaining the efficiency of scanning , It can also control the false positive rate to 15% about . For complex and massive analysis projects , The scanning efficiency and false positive rate of source umbrella detection engine are also at the leading level in the industry .
「 Source code vulnerability detection 」 It integrates the security analysis capability of the source umbrella detection engine , Be able to analyze accuracy 、 Speed 、 The depth and other aspects are balanced, and better analysis results are obtained , Core advantages ：
1、 Support byte code analysis , The code logic of both third-party packages will not be missed ;
2、 Good at logic analysis across long function call links ;
3、 Can handle references 、 Indirect data modification caused by pointer, etc ;
4、 High precision , Compared with similar tools , Such as Clang、Infer, Better performance in accuracy and effective problem identification ;
5、 Good performance , At present, single application average 5 The analysis is completed in about minutes ;
The source umbrella detection engine can accurately track the data flow in the code , Have high-depth and high-precision function call chain analysis ability , You can find depth problems that span multiple layers of functions . While finding defects, it can also give the process of problem triggering , Complete display of relevant control flow and data flow , This can help developers quickly understand and fix problems , Improve software quality at a lower cost in the early stage of software development , Significantly reduce production costs , Improve R & D efficiency .
- Dependency packet vulnerability detection
We expect the security and credibility of open source components , Establish an effective detection and management mechanism for developers , Therefore, we implement the dependent package vulnerability detection service and the dependent package security problem report . In the process of practice , Developers generally reflect that the cost of repairing dependent package vulnerabilities is mostly higher than that of repairing their own coding vulnerabilities , Thus unwilling or difficult to deal with such problems . The reason is , On the one hand, most vulnerabilities are not directly introduced , But the dependent third-party component indirectly depends on other components , On the other hand, it is uncertain which version is clean, usable and compatible .
To make it easier for developers to fix , We further identify and analyze the reference relationship of dependencies , Clearly mark direct and indirect dependencies , And locate the specific dependency package import file , It is convenient for developers to quickly find the location of key problems . meanwhile , Through the aggregation of vulnerability data , Smart recommends version upgrade suggestions for repairing vulnerabilities , Because a dependency may correspond to multiple vulnerability problems , Developers can evaluate whether to accept the adoption of . By analyzing the differences between different versions API Changes and code calls , Measure the cost of version upgrades , Automatically create repair reviews for developers , To help developers maintain code security more efficiently .
** Whether it's code quality detection or code security detection , above 5 Ali code automatic detection tool , Developers can work in the cloud Codeup Free experience in .
Detection service application
1、 Submit code
The most direct application of detection service is in the code submission scenario , Enterprises can according to business scenarios and specifications , Develop and configure inspection plans for different projects . When developers push code changes to the server , Automatically trigger the detection service configured by the current code base , You can check the current status for developers commit Full volume problems in version , Help developers find new problems as soon as possible , And confirm the solution of the stock problem . By accessing the above detection services , From the code specification 、 Code quality 、 Multiple dimensions such as code security are tested and shifted to the left , Rapid detection and feedback when developers have just finished coding .
2、 Code review
In enterprise project collaboration , Developers often merge the feature branch code into the trunk branch by merging requests , The merge request process requires code review and manual inspection by the project development leader or module leader . On the one hand, manual code review requires a lot of effort , On the other hand, manual review is difficult to cover potential problems in all dimensions of the code . therefore , Through reasonable configuration of detection services , It can greatly reduce the workload of manual review , Accelerate the work process of code review . meanwhile , By enriching 、 Screening 、 Precipitation detection rule set and manual experience , The detection service can be more suitable for the business scenario of the enterprise , Avoid non-conforming or risky code from entering the enterprise code base .
3、 Code metrics
In addition to helping developers find and solve problems early in the code submission and code review stages , It can also help managers with enterprise code quality measurement and risk Visualization . Through the construction of enterprise level report service and project task management , It can more intuitively measure the safety and quality problems in the process of project evolution .
- Pinpoint: Fast and Precise Sparse Value Flow Analysis for Million Lines of Code
- SMOKE: Scalable Path-Sensitive Memory Leak Detection for Millions of Lines of Code
- Pipelining Bottom-up Data Flow Analysis
- Conquering the Extensional Scalability Problem for Value-Flow Analysis Frameworks
5 Alibaba common code detection tool for free experience , Only 2 Step ,Cherry keyboard 、 Take the doll home ,100% Take the prize ！
all 2021 Years. , You also think code detection = grammar / Style scan ？
Millions of scanning software purchased by large factories every year , What are you buying ？ How to spend no money on whoring ？
to ground DevOps Of the 2 What's the next step ？
What is the quality and security improvement tool with the lowest access cost ？
Cloud effect DevOps The laboratory specially launched 【1 Minute code automatic bug catching 】 Activities
1-3 minute , Give your code a full physical examination .**
Experience complete , You can also smoke Cherry Mechanical keyboard 、 Alibaba cloud customization GIt Command mouse pad 、 Building blocks, planets, etc ,1000 Share gifts ,100% Win the prize ！
Click on the link below , Get involved now ！_ notes ： This activity is only available to new cloud users _https://developer.aliyun.com/adc/series/activity/bugdetect