Official account , Find out CV The beauty of Technology
I love computer vision
Professional computer vision technology sharing platform ,“ There's value, there's depth ”, Share open source technology and latest paper interpretation , Disseminating industry best practices in visual technology . You know / Microblogging ： I love computer vision , Official website www.52cv.net .KeyWords： Deep learning 、 machine learning 、 Computer vision 、 Artificial intelligence .
436 Original content
This paper is about 52CV Fans contribute , Blog address ：https://blog.csdn.net/liuxiaoheng1992/article/details/120228724 As an algorithm engineer , This paper mainly wants to summarize some pitfalls encountered in model tuning and replication algorithms （ A line in it may be a summary I spent a week or more getting ）, I hope it can help the readers .
Familiar with data
The model is a condensed version of the data ----Andrew NG Said the 28 law , namely 80% The data of +20% Model of = better AI For a new task , Need to be familiar with your data . Take the test task , You can write a visual code to see if the annotation is reasonable , Check the size distribution of the object to be tested （ for example anchor The presupposition of ）, View category distribution （ For example, is there an extreme distribution ） wait .
When receiving a new task in a new field , We need to investigate algorithms in related fields , Have a general understanding of the development of this field , Master some key algorithms （ For example, over the years SOTA） The idea of . Although the research will take some time , But in this way, we can do less experiments in algorithm selection , The cost performance is very high . Just stand on their shoulders . A less desirable idea ：
Too much on the indicators . Some algorithm engineers encounter the situation that the effect of indicators in their own data set is not very good , Change another algorithm immediately , Or change it right away backbone, Or change it right away loss To do experiments .（ We need to carefully analyze why the effect is not good , It's your own training problem , Or the current data is not suitable for the algorithm , Is the evaluation index unreasonable , There is still a problem with the realization of the evaluation indicators .）
No relevant research , Go straight up SOTA Algorithm . There will be some undesirable problems in doing so , such as SOTA You may not have optimized your scene data , For example, most of the current tasks are small goals （ By analyzing the data ）, although SOTA Of mAP Very high , however small mAP Lower than the previous algorithm , Then use it with caution . such as SOTA Using a heavy network , But the task is fast , Or both speed and effect , Use it with caution .
Optimize the algorithm based on the existing implementation
For a task, after selecting the appropriate algorithm , If there is a good open source implementation , It's best to use open source projects to reproduce algorithms . The purpose of this ：
More convenient and in-depth understanding of the specific details of the algorithm , For example, the code may have added a... On some layers not mentioned in the article shift operation , For example, some of the articles mentioned trick The code does not implement , For example, the code uses additional data training, but the article does not mention , For example, the data enhancement method described in the article is different from the implementation of the code .（ This may happen when open source replicators do not “ One to one ” Reproduce the situation of the paper , It may also happen when the author of the paper does not realize it ）
Can quickly master the basic performance of the algorithm , For example, the approximate running speed of the reproduction algorithm （ Especially when the article is not given ） And the results achieved
You don't have to do some useless work by yourself . You know, rewriting and debugging a new model is not only time-consuming and laborious , Maybe it's because the article didn't write some details clearly , So you can hardly reproduce the corresponding results .
Using algorithms that have been replicated in open source projects （ The reproduction here is not completely consistent with the results of the code author or article author , Maybe it's data enhancement , Random seeds lead to biased results , However, we have obtained the results of eight, nine and ten ） Come on The improved model can have the following ideas ：
Whether the code implements some of the rising points of the article trick, If not, you can try
The article will generally analyze the experimental results , There will be some of the author's own views later , They may explain why some articles have poor algorithms
Some articles will write about their possible future work , This is also an improvement idea .
You need to visually view the experimental results （ Especially running your own data set ）, The results may be different from the problems shown by the author in the public data set , Analyze the reasons for the poor effect
from 0 The recurrence algorithm
Recurrence algorithm is a big project , It's not just a lot of code or a lot of work , But there is no basic version , Too many uncontrollable factors are introduced and debugging is difficult , For example, whether there is a problem with the data interface , Is the model built correctly , Is there a problem with the training method . In the reproduction algorithm or optimization algorithm, it is a headache that all training is normal ,loss The curve is better than you think , After a year of training ,（just kidding, maybe longer）, Test it and find that the effect is extremely poor , I'm sorry to say I wrote the code myself . A year went by . Here are some suggestions ：
Try to test every detail , From the data interface , Model , To loss Output , To the final evaluation code . Make sure every part is controllable .
Test data interface , From a single process ,batch by 1 Start , It is convenient to print values for comparison .
Don't go randomly , Try to ensure that the problem can be reproduced. For example, do not add random data enhancement first , The random seed of the model is fixed .
With a small amount of data , So you can do experiments quickly , You can also make the model fit quickly . If the model can be fitted, it can roughly determine what the model can learn .
Try to reproduce the original text , Before it can be reproduced , Don't add too many unique ideas . Such as training parameters , Model backbone, Data enhancement methods, etc. first follow the article . If you don't know, you can try email The author or find relevant circles to discuss .
Log printing is complete , For example, solution loss by nan The situation of , Need to know is forward The cause is still bp Lead to .
Some training suggestions that may be useful
Ensure that the data is reliable
A pre training model is best used
Usually, the learning rate parameter is less than 1e-5 It's basically useless , such as cosine perhaps step operation , The final learning rate is 1e-5 Just fine . Of course, special tasks are different
bn Remember to turn on updates during training （ especially tf Little buddy , Easy to leak ）, Otherwise, the possible problem is when training loss It's falling fast , The test feels that the model does not converge
sgd It's great , But the experiment used adam Maybe the convergence speed is better
If you want to squeeze out the performance of an algorithm , Please ensure that the current model can reach the corresponding performance before pressing . Instead of blindly changing modules , Crazy tuning , That might just be a waste of time .
Don't trust your tuning skills , There is no better baseline Under the circumstances , There will be no qualitative leap in parameter adjustment （ Unless the previous parameter caused some kind of bug）
Data hours , Using the pre training model, remember to fix the model parameters of the first few layers , You can also use a lower learning rate
loss balance Sometimes it's useful
Repeat training may increase points , After training a model , Use the trained model for pre training , Continue to train with the same set of parameters . It's kind of like CyclicLR.
DL There is no formula support like machine learning , Many of them are make sense Just do an experiment to verify , So read as many papers as you can , Look at other people's experiments , This can reduce unnecessary experiments
This article is to share some of my experiences , I hope readers can use , If there is any serious error, please inform , Don't want to mislead others
Welcome to join 「 Computer vision 」 Exchange group notes ：CV