In recent days, , Google fired Timnit Gebru——AI Ethics Development Engineer , The problem of algorithm bias has attracted attention again .
Timnit Gebru yes AI A leader in model risk and inequality analysis , She was fired by Google for an unpublished paper . This paper questions ： Is the language model too big ？ Who will benefit from it ？ Whether they increase prejudice and inequality ？
Timnit Gebru Recently, he was fired by Google
Timnit Gebru Your doubts are not groundless .2016 year , Microsoft's artificial intelligence chat robot Tay go online . However Tay Just started chatting with netizens , I was “ Teach bad ” 了 , Became an anti Semitic 、 sexualgender discrimination 、 Racial discrimination is the same as a whole “ Bad girl ”.
This year, 7 month , MIT was forced to delete 8000 Million Tiny Images Data sets . The data set has been widely used in machine learning models such as image recognition , But it contains racism 、 Hate images of offensive labels such as women .
The MIT website then released a statement , He said he was unaware of the existence of these offensive labels , And 8000 Ten thousand pieces are only 32*32 Pixel images are difficult to manually clean , This leads to discriminatory results .
MIT Of 8000 Million Tiny Images Forever off the shelf because of discriminatory labels
The same problem occurred in Duke University's PULSE Algorithm . The purpose of this algorithm is to clear the partially blurred face image , However, when experimenting with blurred photos of former US President Barack Obama , But got a white face .
AI experts Yann LeCun Attribute this phenomenon to the deviation of the data set . in other words , Most of the training data sets used in the algorithm are white faces , Therefore, the training results will tilt towards the white face .
Obama's image is blurred, but it presents a white face
For a long time , There is a misunderstanding about computer technology ： Algorithmic decision making is more fair , Because mathematics is about equations , Not skin color .
《 Brief history of mankind 》 The author of a book called this misunderstanding “ Data religion ”—— It is believed that the use of data will become the basis of all decision-making in the future , It is considered that the algorithm can eliminate human bias in decision-making process .
But algorithmic discrimination is not “ A small problem ”, When these discrimination involves credit evaluation 、 Crime risk assessment 、 During major activities such as Employment Evaluation , The result of artificial intelligence decision will affect or even determine the loan amount 、 Penalty options 、 Employment or not , At this time, discrimination is no longer insignificant .
More and more artificial intelligence enterprises and scientific research institutions begin to find effective methods to solve algorithm bias .
Synthesized Previously, it launched a set of open source tools that can quickly identify and eliminate algorithm deviations . The company said , Users only need to upload structured data files to start analyzing their potential gender 、 Age 、 race 、 religious 、 Bias in data attributes such as sexual orientation .
The research team at Princeton University School of engineering has also developed a tool for marking potential deviations in artificial intelligence training image sets . The tool name is REVISE, It uses statistical methods to check the impact of the data set on the target population 、 Underrepresentation of gender and geographical location .
Datahall is the world's leading artificial intelligence data service provider , Always pay attention to strengthening ethical construction . In order to avoid the risk of algorithm deviation , Datahall has developed richer data source types , Designed and made 《23,349 Human multi-color race face multi pose data 》 and 《26,129 Many people, many races 7 Expression recognition data 》. Data collection balances race 、 Skin colour 、 Age 、 Distribution of attributes such as gender , And all have been authorized by the person being collected .
23,349 Human multi-color race face multi pose data
Example of multi-color human face multi pose data , Has been authorized by the collector
The data includes yellow people 、 black 、 white 、 Brown people and Indians , Each person collects 29 Zhang image , cover 28 Zhang duoguang 、 Multi pose 、 Multi scene pictures and 1 ID photo .
Through to AI At present, there is a lack of human face collection in the industry , The purpose of this data is to improve the feature offset in the algorithm , Improve the accuracy of feature description by user algorithm .
26,129 Many people, many races 7 Expression recognition data
Multi racial 7 Expression recognition data , Has been authorized by the collector
This data is generated by 17,945 A yellow man 、3,546 A white man 、3,727 A black man 、911 A brown man （ Mexican ） Participate in recording , Among them, men 13,963 people , women 12,166 people . The data diversity covers different facial postures 、 Different expressions 、 Different lighting and different scenes . Subject to the accuracy of expression , The accuracy is more than 97%, The accuracy of expression naming is also 97% above .
Robin Li, founder of Baidu 2018 In Guiyang big data Expo, it was proposed that AI Ethical principles ： First of all ,AI The highest principle is safe and controllable . second ,AI Our innovative vision is to promote more equal access to technological capabilities . Third ,AI The value of existence is to teach people to learn , Let people grow , Instead of replacing people 、 Surpassing man . Last ,AI The ultimate ideal is to bring more freedom and possibility to human beings .
Datang always insists on strengthening the construction of technical ethics 、 Adhere to the concept of science and technology for the good . at present , Datahall has accumulated rich experience in multi-color face annotation , It can effectively avoid the algorithm bias caused by the deviation of the data set , Users can rest assured to use .