Can my computer tell if I'm surprised or scared?

ogcnhs
Jul 27, 2020
3 min read

Updated: Jul 28, 2020

In fifth grade, my class had a totally-futuristic discussion about a robot's ability to comprehend humans and our emotional states. And just a few weeks ago, I followed a guided project on Coursera to recognize facial expressions present in an image or video. Crazy what humanity has accomplished in these six years! The emotion-detection algorithm utilized a deep neural network, and I'd like to highlight some of the key concepts I learned throughout this project.

The task of the Neural Network was to receive an image of a face and label it as one of the seven universal facial emotions: happiness, sadness, fear, disgust, anger, contempt, and surprise.

Look at the computer output above, for example. The woman's facial expression (happiness) was correctly identified by the CNN. She looks like one extremely happy gal!

The neural network consisted of 4 convolutional blocks and 2 dense layers before the softmax activation functions (see picture below steps).

Now, that sounds intimidating, but here's how it works:

The input is a pixelized version of an image-- each pixel in the image is assigned a number based on its color intensity, and a matrix of these numbers is created.
Each convolutional block takes that matrix of numbers, identifies a pattern in it, and passes the modified matrix onto the next block, which will identify a different pattern. For example, in terms of facial expression recognition, the first block could recognize all the horizontal lines in this image and decide whether or not those lines are synonymous with a person's mouth, eyes, eyebrows, hairline, etc. The second block could then label the curvature of a face's mouth, which is a factor in deciding how happy/mad an individual is.
Keep in mind that the human who is coding this neural network does not know ahead of time what features each convolutional block will identify. This is the idea of CNNs, where we do not physically decide which features indicate happiness, disgust, etc, and tell the computer what to search for in an image. The CNN itself will look for similarities/patterns in the data and will automatically associate these with the expressions they indicate.
Pooling is optional but normally occurs after the convolutional layer to convert the spatial information determined in the previous layer into features (horizontal lines --> mouth, eyebrows, etc).
After the blocks, the matrix is passed through 2 dense layers (non-linear), which is different than a block simply because every neuron in the previous layer is connected to every neuron in the dense layer. In blocks, neurons only have to move on to the next layer if they pass a certain condition, but in dense layers, every neuron is connected!

And finally, the softmax function. This is an activation function that outputs a probability distribution of the seven classes of facial expressions. For example, for an image of a surprised face, the function may output a probability distribution of Happy- 0.1, Sad- 0.02, Fear- 0.15, Disgust- 0.01, Anger- 0.07, Contempt- 0.01, Surprise- 0.74. Surprise had a much higher probability than the rest, the network will display that the image is that of surprise.

This was an entertaining project, and by the end, I was able to hook the model up to my own computer camera's live stream feed. I made silly faces and watched the label change from happy to surprised to disgust over and over again! The FBI agent living

in my camera probably thought I was crazy!

But why did I just remember this project? Turns out that facial expression detection is a lot more applicable than sitting in front of my camera and making silly faces. Recently, I've been training at Affectiva, a revolutionary Emotion AI company that develops solutions to a host of important problems such as automotive safety. My group and I are developing an algorithm that will indicate a driver's aggressiveness level (0-100) not based on his speed, acceleration, or pulse rate, but instead, based on his facial expressions! Facial cues such as furrowed eyebrows, clenched teeth, and widened eyes may indicate aggression, and our CNN will automatically figure those out for us.

image from Affectiva

Aggression detection, whichever route it is achieved through, is the technology that can prevent accidents and save lives on the road! Once an aggressive driver is identified, the car can implement many actions that can calm the driver or slow down the vehicle. Without going into the details of this project (which I will share when it is complete), I want to say I think this type of detection is the future of safe driving!

Whether it's your computer recognizing that you're sad or your car tracking your aggression, expression recognition is an innovative technology that seeks to better understand humans and thus improve our world with its every application. I'm excited to see what it accomplishes in the future, as well as to share with you our aggression detection project!

Can my computer tell if I'm surprised or scared?

Recent Posts

Comments