人与计算机视觉
直到我开始学习计算机视觉,我才意识到人类的视野真是太神奇了。因为我们自然地长大了,所以我们倾向于对我们如何看待世界进行太多思考。我们认为愿景不是我们所做的。我们四处走走,世界就在“外面”。有什么有趣的?
How Do Babies See?
考虑一下婴儿第一次睁开眼睛。眼睛闭着眼睛until around 26 weeks为了让视网膜发展,但之后,眼睛会开始眨眼。当妈妈走进明亮的阳光时,有些人可以过滤她的身体,婴儿可以开始练习看见。但是直到诞生才真正开始工作,并且在开始的六个月内,视力不断发展。
尽管如此,出生时,一个婴儿已经喜欢面部形状而不是面部形状。到了三个月,一个婴儿就能识别出其初级保健给予者的面孔以及成年人可以识别面孔。太棒了!上图是我们认为婴儿在那几个月中看到的东西的代表。出生时,他们还没有发展彩色视觉,他们只能集中精力约12英寸。在三个月的时间里,彩色视觉和专注能力更加发达,但直到六个月的视力才稳定。
我们看不到眼睛 - 我们的大脑看到。您的眼睛只是传感器。愿景是一项非常复杂的资源密集型任务。关于三十%的大脑is involved in processing vision, compared to 3% for hearing. There is a part of your brain, called the fusiform gyrus that specifically works to recognize faces.
面部识别对我们的生存至关重要 - 人类是社交的,能够互相认识是一项重要技能。我们一直在环顾四周并扫描面孔 - 我们非常擅长看到面孔,有时他们真的不在那里时会看到它们。这称为Pareidolia。它过去被认为是精神疾病的迹象,但我们知道这是一件很正常的事情。只是您的大脑,总是在寻找图案,在没有真正的情况下找到面部图案。
有些人是面孔的“超级认可”,有些人恰恰相反。Prosopagnosia或“面部失明”是一种认知障碍,即使人们可以看到其他物体,人们也无法识别熟悉的面孔。在极端情况下,他们甚至无法认识自己的脸。
我们在上下文中看到
当我们看到时,我们的大脑正在利用一生的背景和经验来帮助这是我们看到比计算机好得多的原因之一。想想何时开车。您只能在您面前看到汽车的后部,但是您的大脑“看到”了整个汽车,并为其分配了空间。即使您实际上看不到整辆车,您的行为也好像可以。大多数错觉都在您的大脑“可以看到”的事物上发挥作用。
I’ve talked about this example before but I think it’s worth repeating. Look at the image below. It represents something you are probably familiar with. If you don’t know what it is, take a moment and see if you can figure it out.
我要离开一些空白,所以我不会太快地放弃它。
即使第一张图像仅为15像素和7种颜色,大多数人还是会弄清楚这一点。太棒了!计算机无法做到这一点。如果您为认识的人拍摄并将其撕成一半的照片,您可能仍然能够识别该人,但是计算机会挣扎。另一方面,我们真的很擅长认识到我们已经熟悉的人们的面孔。我们不像我们认为认识陌生人那样好。在这方面,计算机比人类要好得多。想象一下,您是正在检查ID的酒吧入口处的人。您查看驾驶执照,然后一遍又一遍地看着该人,通常在照明效果不佳的位置。这是一项无聊的重复任务,人类努力保持所需的重点。 Computers can compare millions of faces very quickly without ever getting bored or tired.
So How Do Computers See?
In the old days, we talked abouteigenfaces. This was an approach that tried to see an image holistically instead of pixel by pixel. The basic idea was to express a particular face as a “sum” of notional faces developed through a machine learning process. This way, a face could be expressed as essentially a numerical expression. Faces were compared in their similarity in vector space, not the visual similarity that humans use.
现代面部识别系统使用Neural Networks, which are part of Machine Learning and Artificial Intelligence and what is now being called “Deep Learning” which is awesome because I needed more jargon in my life. The network learns to perform a task by analyzing training examples that have been hand-labeled in advance. To teach a computer to recognize a hot dog, for example, you would feed thousands of labeled images of hot dogs and other objects that are not hot dogs, and the computer will compare them and eventually learn how to identify a hot dog. This is called training the network. This is a simplistic explanation, but the key thing to know here is that the selection of training examples and how they are labelled in critical. One criticism of modern face recognition algorithms is that the face images used to train them were predominately young, white, males and so the algorithms identify young, white, males better than anyone else. We have never once had this problem with our system, but it is important for anyone training neural networks to be mindful of their training data.
Neural networks are loosely modeled on the human brain, or at least how we think the brain might work. But fundamentally, our understanding of human brains is very shallow, as is our understanding of why neural networks work. We can measure how effective they are, but they can never explainwhythey make a specific decision.
我们也无法像纠正人类的方式纠正神经网络。例如,我可以培训某人如何通过查看成熟的水果来识别它。我可以问他们他们看到了什么以及为什么,我可以纠正他们,直到他们接受专家级别的培训为止。在此过程中,他们正在不断重新训练自己的内部神经网络。由于神经网络永远无法解释其决定,因此我们唯一的选择是尝试向其投入更多数据或使用其他类型的网络。与人类学习相比,这是一个非常低效的过程。
Of course, most of the time you never think of why you recognize an object or a familiar face — it just happens effortlessly and seems so easy that it hardly seems like you are doing things at all. The next time you recognize your family member in a blurry picture take a moment and think about how amazing that ability is.
Originally published at blinkidentity.com/forum