top of page
  • Writer's pictureGraeme Devine

Body Tracking

Well I fixed all those bugs by 8am.


Well, except for the one that says the video stream is 640x480. It’s not 640x480, I can pipe it all the way up to 4k, it’s just that the logging program always prints 640x480. I spent a lot of time checking that before thinking to check the actual logging. It’s inside the Chrome WebRTC library on device so… yay Chrome.


So if you look at what we capture:




It looks pretty good. We get it all, the problem has been noise frame to frame. Because we analyze video and deduce pose from that we have a problem with the resolution of pixels equaling actual space. One pixel to the left or right or up or down is 2-3 cm (1”, please can we get with metric at some point?) depending on how far away the camera is.


So, if we’ve got data coming in at 30-60 frames per second it can shake quite a bit. We’ve got to smooth the data.


At first I used a lerp function. This basically takes 75% of the old value and uses 25% of the new value, rinse and repeat, eventually you get close to the new value smoothly and spiky data is smoothed away. The problem with lerp is that it is like living in glue, you are stuck 75% in the past. So when you wave your hand back and forth there’s no way it will track because by the time you get your hand from A to B and back to A it’s just basically lerping (A+B)*0.5.


So. What to do? Well it turns out that audio has a good solution to this called a low pass filter (used for lots of things in audio production). And in particular a low pass mean filter. Go google "low pass filter motion capture” and you’ll see a bazillion PDF university papers on doing this. Well okay, about ten. What you don’t see is references to GitHub or StackOverflow for people using it. When you’re on Mars it turns out that sometimes you have to read the university papers and then write code.


And basically, a low pass mean filter takes the spikes out of the number given a series of data (I use 10). Here’s what it does to the input data.



Is that correct? Who knows? Is it valid? Yes. Is it responsive to hand waves? Yes.


My only remaining bug is probably getting over myself and sharing (well, there is still the many instances code to write, everyone having their own robot code, but that’s a deployment issue more than a demo issue). Tomorrow I’m moving on from the Metropolis mocap demo to the hut pose demo. I’ve got a new hut asset to try out.



Which has a lot of great assets we can probably combine with the first hut (which has the better interior) to get what we want for this proof of concept. So the interesting thing to look at here is not so much the scene, but the models and parts of the scene we can use.


Graeme.


bottom of page