On deep learning and Practice

I love computer vision 2021-10-14 07:57:50

Official account , Find out CV The beauty of Technology

 I love computer vision
I love computer vision
Professional computer vision technology sharing platform ,“ There's value, there's depth ”, Share open source technology and latest paper interpretation , Disseminating industry best practices in visual technology . You know / Microblogging : I love computer vision , Official website www.52cv.net .KeyWords: Deep learning 、 machine learning 、 Computer vision 、 Artificial intelligence .
436 Original content
official account
This paper is about 52CV Fans contribute , Blog address :

As an algorithm engineer , This paper mainly wants to summarize some pitfalls encountered in model tuning and replication algorithms ( A line in it may be a summary I spent a week or more getting ), I hope it can help the readers .

Familiar with data

The model is a condensed version of the data ----Andrew NG Said the 28 law , namely 80% The data of +20% Model of = better AI

For a new task , Need to be familiar with your data . Take the test task , You can write a visual code to see if the annotation is reasonable , Check the size distribution of the object to be tested ( for example anchor The presupposition of ), View category distribution ( For example, is there an extreme distribution ) wait .

Algorithm selection

When receiving a new task in a new field , We need to investigate algorithms in related fields , Have a general understanding of the development of this field , Master some key algorithms ( For example, over the years SOTA) The idea of . Although the research will take some time , But in this way, we can do less experiments in algorithm selection , The cost performance is very high . Just stand on their shoulders .
A less desirable idea :
  1. Too much on the indicators . Some algorithm engineers encounter the situation that the effect of indicators in their own data set is not very good , Change another algorithm immediately , Or change it right away backbone, Or change it right away loss To do experiments .( We need to carefully analyze why the effect is not good , It's your own training problem , Or the current data is not suitable for the algorithm , Is the evaluation index unreasonable , There is still a problem with the realization of the evaluation indicators .)
  2. No relevant research , Go straight up SOTA Algorithm . There will be some undesirable problems in doing so , such as SOTA You may not have optimized your scene data , For example, most of the current tasks are small goals ( By analyzing the data ), although SOTA Of mAP Very high , however small mAP Lower than the previous algorithm , Then use it with caution . such as SOTA Using a heavy network , But the task is fast , Or both speed and effect , Use it with caution .

Optimize the algorithm based on the existing implementation

For a task, after selecting the appropriate algorithm , If there is a good open source implementation , It's best to use open source projects to reproduce algorithms .
The purpose of this :
  1. More convenient and in-depth understanding of the specific details of the algorithm , For example, the code may have added a... On some layers not mentioned in the article shift operation , For example, some of the articles mentioned trick The code does not implement , For example, the code uses additional data training, but the article does not mention , For example, the data enhancement method described in the article is different from the implementation of the code .( This may happen when open source replicators do not “ One to one ” Reproduce the situation of the paper , It may also happen when the author of the paper does not realize it )
  2. Can quickly master the basic performance of the algorithm , For example, the approximate running speed of the reproduction algorithm ( Especially when the article is not given ) And the results achieved
  3. You don't have to do some useless work by yourself . You know, rewriting and debugging a new model is not only time-consuming and laborious , Maybe it's because the article didn't write some details clearly , So you can hardly reproduce the corresponding results .
Using algorithms that have been replicated in open source projects ( The reproduction here is not completely consistent with the results of the code author or article author , Maybe it's data enhancement , Random seeds lead to biased results , However, we have obtained the results of eight, nine and ten ) Come on The improved model can have the following ideas :
  1. Whether the code implements some of the rising points of the article trick, If not, you can try
  2. The article will generally analyze the experimental results , There will be some of the author's own views later , They may explain why some articles have poor algorithms
  3. Some articles will write about their possible future work , This is also an improvement idea .
  4. You need to visually view the experimental results ( Especially running your own data set ), The results may be different from the problems shown by the author in the public data set , Analyze the reasons for the poor effect

from 0 The recurrence algorithm

Recurrence algorithm is a big project , It's not just a lot of code or a lot of work , But there is no basic version , Too many uncontrollable factors are introduced and debugging is difficult , For example, whether there is a problem with the data interface , Is the model built correctly , Is there a problem with the training method .
In the reproduction algorithm or optimization algorithm, it is a headache that all training is normal ,loss The curve is better than you think , After a year of training ,(just kidding, maybe longer), Test it and find that the effect is extremely poor , I'm sorry to say I wrote the code myself . A year went by .
Here are some suggestions :
  1. Try to test every detail , From the data interface , Model , To loss Output , To the final evaluation code . Make sure every part is controllable .
  2. Test data interface , From a single process ,batch by 1 Start , It is convenient to print values for comparison .
  3. Don't go randomly , Try to ensure that the problem can be reproduced. For example, do not add random data enhancement first , The random seed of the model is fixed .
  4. With a small amount of data , So you can do experiments quickly , You can also make the model fit quickly . If the model can be fitted, it can roughly determine what the model can learn .
  5. Try to reproduce the original text , Before it can be reproduced , Don't add too many unique ideas . Such as training parameters , Model backbone, Data enhancement methods, etc. first follow the article . If you don't know, you can try email The author or find relevant circles to discuss .
  6. Log printing is complete , For example, solution loss by nan The situation of , Need to know is forward The cause is still bp Lead to .

Some training suggestions that may be useful

  1. Ensure that the data is reliable
  2. A pre training model is best used
  3. Usually, the learning rate parameter is less than 1e-5 It's basically useless , such as cosine perhaps step operation , The final learning rate is 1e-5 Just fine . Of course, special tasks are different
  4. bn Remember to turn on updates during training ( especially tf Little buddy , Easy to leak ), Otherwise, the possible problem is when training loss It's falling fast , The test feels that the model does not converge
  5. sgd It's great , But the experiment used adam Maybe the convergence speed is better
  6. If you want to squeeze out the performance of an algorithm , Please ensure that the current model can reach the corresponding performance before pressing . Instead of blindly changing modules , Crazy tuning , That might just be a waste of time .
  7. Don't trust your tuning skills , There is no better baseline Under the circumstances , There will be no qualitative leap in parameter adjustment ( Unless the previous parameter caused some kind of bug)
  8. Data hours , Using the pre training model, remember to fix the model parameters of the first few layers , You can also use a lower learning rate
  9. loss balance Sometimes it's useful
  10. Repeat training may increase points , After training a model , Use the trained model for pre training , Continue to train with the same set of parameters . It's kind of like CyclicLR.
  11. DL There is no formula support like machine learning , Many of them are make sense Just do an experiment to verify , So read as many papers as you can , Look at other people's experiments , This can reduce unnecessary experiments
  12. This article is to share some of my experiences , I hope readers can use , If there is any serious error, please inform , Don't want to mislead others


Welcome to join 「 Computer vision Exchange group notes :CV

Please bring the original link to reprint ,thank
Similar articles