I have several videos taken of people sitting at a computer, being recorded from a webcam at the top of the monitor, where the room is lit by strong fluorescent lighting overhead, as commonly found in buildings. The closest light is located above and just behind the person, causing shadows, including of the chin on the neck.
I need to run these videos through some software that deciphers facial expressions, but as they are, the software is unable to find the correct chin boundary and ends up including the neck and the very upper part of the chest in the face itself, thus getting quite messed up in recognizing expressions.
I have tried using Movavi and VideoPad to play with various settings, but am unable to get any improvements.
Is this even doable?? How?
+ Reply to Thread
Results 1 to 9 of 9
It depends on the quality of the source footage. You can usually boost shadows and increase contrast in low luminance areas selectively to enhance the chin/neck boundary . But on low quality footage, usually the first area to go from compression losses is dark areas. If you post an unedited original sample , someone can tell you if, or what, can be done.
Another more advanced approach might be to use facial tracking to isolate the face and composite in layers. e.g. after effects . But since some of the data is derived from the relationship between the eyes, nose, mouth etc... the chin might be able to be "rescued" in that way if there isn't large movements or deviations. But again, results depend on the source footage characteristics.
The quality of the video . eg. Is it "clear" or not? Is it "clean" and look high quality to you? Are their compression artifacts / macroblocking, especially in the shadows, those sorts of things - because not only do those interfere with the tracking accuracy of your (and other) programs, it also means there will be less data in the neckline region to enhance
Youtube re-encodes everything, so quality is lower, but it looks to be poor quality to begin with
But on a video like that I would increase the contrast and saturation, and that should increase the separation of the neckline, it might be enough for your program to discriminate the neckline
[Attachment 43226 - Click to enlarge]
Which application are you using for the facial expression determination ?
I don't know what your application is "looking" for. How does it determine what is a neckline ? If it's looking only at greyscale values for example, increasing saturation isn't going to work as well
On "clean" footage after effects can usually do a good job of face tracking, you can track features (like eyes, nose etc..) or just face outlines. If you have an outline track and mask, you should be able to plug a composite into your program. But "noisy" footage and compression artifact interfere with the tracking accuracy, probably just like it's causing problems in your program. In that case, sometimes denoising as a pre-manipulation can help track accuracy
Here is an example of a face track / feature track in AE
Notice the results are a bit "jittery" . That's partially because of the low quality youtube footage, where compression artifacts and macroblocks interfere with the tracking. You can see a "flickering" in the frame quality in the original (left side), that's mainly from temporal compression , it might have been in the "original" but without a doubt youtube always makes it worse. The cleaner the source footage, the cleaner the tracking results
If you only needed the neck/jaw outline for your program, you should be able to use mask from the track , or composite it or even just keep the face only
I'm very new at photos/videos, so there's a lot I don't know. I tried out your suggestion about increasing saturation and contrast. I played around with various values and it does seem to be making a difference in one of the real videos, but not in the test one I had posted on YouTube.
The software I'm using is called FaceReader. It maps 500 points on to facial landmarks (like eyebrows) and uses something called the "Active Appearance Model" - it's their version of the implementation, but the original says that AAM "contains a statistical model of the shape and grey-level appearance of the object of interest" (Cootes et al. 1998). An "appearance vector" models this information and is then fed through a neural net whose output is the intensity of each of 6/7 basic emotions. Interesting in theory anyways.
Attached is an image of an example of when it mis-reads the face (due to lighting issues, I'm guessing). It makes this mistake frequently through about half of our 100+ videos. The manual states that changing settings such as contrast after-the-fact won't have any effect, but it's worth a try since this data was collected with some effort...
I think I found the software in question, but there isn't a usable demo for video (the online demo will take a still image)
You want to manipulate the video in such a way that emphasizes the differences between the chin vs. the neck . Increasing the contrast and saturation was just a "simple" way of doing that should be available in almost any video editor including videopad, movavi . But since it's looking at "grey-level appearance" , manipulations like saturation (color intensity) will have probably less of an effect for that program
So next step is exploring more advanced ways to create separation - that is masking out the region of interest. ie. "rotoscoping" . Basically YOU determine what you want to include or exclude. It's tedious work, if you do it frame by frame, so people automate parts of the process by motion tracking . I suggested Adobe After Effects, because it has a built in face tracker. It's quite common software and public libraries, or schools or universities/colleges will often have it if you don't have an Adobe subscription. So I'm suggesting you mask out the face with the face tracker on the video that give that software problems, then use that as input for that program .
If there are jumps, or poor tracking, you can tweak or manually edit the track points if you need to (each frame has keyframes), but on "clean" footage it should be pretty much automatic. I uploaded the video sample , everything "automatic", no correction for the jitteryness or jumps. So try that out in your program , and if it works , then AE might be worth investigating farther for this
If you think this is "bad" , what about an "overweight" subject with "multiple chins" . Basically there is no way AI or some software that will be able to accurately make the boundary determinations automatically. So in those cases you might have to rotoscope manually
But in the future, you should pay more attention to the lighting setup, or consult your local A/V department or ask a videographer or photographer about ideal lighting setups
Hi again. I appreciate you having taken the time to help me figure this out.
So I tried the video you sent in FaceReader, and in almost all frames it can't find the face (even though the face is the only thing there). I thought maybe it was because of the poor YouTube video quality, so I went back to the original video to make another video with face tracking in After Effects (had to do some tutorials first), but same results. After trying the face outline tracking, I also tried one with an elliptical mask with just a bit more than the face included, but stopping at the chin - here it could find the face, and it stopped elongating the face, but not sure if the analysis is any more valid than before.
And yes I really should have done a better job with the lighting - big lesson learnt here. We ran several people in a pilot, but for some reason it mostly did OK with those people (maybe they weren't totally focusing on the task at hand with their chins down).
I'll go with your original suggestion since it did make some positive difference with the video I tried.
But I think AI/computers are getting better over time at doing human things and so someday they will be able to discern triple chins or whatever. After all, whatever happens in our brain isn't magic and so computers should be able to reproduce it eventually I would think.
Anyways, thanks again!!
If it "can't find" - that suggests it's looking for feature points outside of the face/region
So the next other manipulation you should try is to composite 2 layers with large contrast . So not completely "black", but enough difference still that it makes the chin line easily identifiable. So I would put the same face masked layer on the top layer, and below, place a darkened version of the original . You can even try some enhancements on the face layer, eg. make it brighter, saturation, whatever works