Mobcount Counting with Video
All Posts

Stereo Depth

First three months of 2019 were spent improving the core stereo photogrammetry disparity algorithm. This consisted of a major overhaul changing from a per block algorithm to per pixel algo. This is working in Matlab and an example of a disparity map is above. The image is false coloured so each colour change reflects a 1 pixel disparity change between the stereo pair.

The image shows a distance to a near shed roof of approx 35-40m and the large building in the background is approx 150m away and I have tested out to 800m. Pixel accuracy is approximately 1/4 pel.

The last two months have been moving to a new image platform - a pair of Hikvision 20MP colour camera using the SONY IMX183. I’ve also been porting the Matlab code to CUDA and early results give a full 20MP depth image within a few seconds.

If we look below we can see the 3D view of the scene from Google ( Their photogrammetry is pretty amazing)

Google view

Then below is an early version of the CUDA 20MP image - still a bit noisy, but note the church spire in red which is 800m away and at 2-3 pixels of disparity.

Cuda image   - Reduced size

Click here to see the full 20MP image and note detail like television aerials etc.

Click here to see the left and right stereo pair. NB these are demosaiced but the core algorithm operates on the raw sensor CFA data, also it does not rectify the images.

Next steps are to move the algorithm to consider previous frames, with a target framerate between 2-6fps.

Please email me jimd -at- yantantethera.com if you have any questions.

Stereo Depth

I have spent a lot of the year looking at depth from stereo, or photogrammetry. Inspired by Elphel who are using high-res commodity CMOS sensors for mid/long range depth from stereo, I’ve taken a slightly different angle. Instead of using DCT matching to get sub-pixel resolution, I’ve been experimenting with direct block matching at the Bayer CFA pattern level. It’s not yet clear whether this is a fundamentally better or worse approach than using the Fourier domain, but I have some early results.

The current set up is from taking still images from a Ricoh GR (prime lens, no low-pass filter) and then matching them. Baseline is usually 25cm. I can currently get approximately 1/4 subpixel matching off tiled 16 pixel blocks. Z accuracy is roughly 1% at 50m, 2% at 100m.

Still early days but I’ve started to get some results. An example is below. Next steps include separating out 2 surfaces when they overlap on one block, then looking at possible hardware platforms, performance, use cases and people to work with.

Sample Stereo depth video - Work in Progress

In the video above, the animated grey masks blocks beyond a certain z distance to the camera. The animation shows z distance decrease then increase at 1/4 subpixel matching steps. It shows 2 pixel square blocks in x,y direction. The image is 2k x 1k cropped from larger 12MP image.

Unmatched pixels show occlusions which appear to the left of foreground objects, and errors due to mis-match, other errors include specular reflection (car) ,low-light (under the car) and uniform horizontal features on the house.

This video shows the data from one frame displaying 2k x 1k, A key question is whether it is possible to deliver multiple frames per second , using full 5MP/ 8MP camera or similar. Please email me jimd -at- yantantethera.com if you have any questions.

Elphel

Elphel have a very interesting piece on a 4-camera approach to real-time mid-range RGBD. The core of the approach is to use block matching, and to target sub-pixel accuracy and to avid intermediate representations. To see a demo of their results try their scene viewer . Click on a picture on the left, then a numbered link under the thumbnail and play around with the RGBD image.

I think this is potentially a great platform for mid-range (5-200m) depth mapping in the optical domain, which can then be used in a variety of methods. A simple one would be to provide labelled data to support occlusions in people tracking, but I think there’s a lot more here.

[Image above from elphel Thanks Andrey!]

Lidar Scan

Q3 Q4 of 2017 were quiet in people counting terms. I was contracting as a technical PM on a project to get a pan-tilt-zoom CCTV camera to automatically detect and track targets using lidar and moving thermal sensors. Automating the camera movement to track and zoom in on moving people and cars can give the benefits of a PTZ camera without the overhead of an operator.

Interesting and technically challenging work, that project has now been delivered so it’s back to counting and also to the optical detection domain!

Counting pedestrians  - Work in Progress

A quick update of work in progress.

The scenes are in rough order of complexity in terms of people count. Coloured boxes show a person tracked across multiple tracks. The thinner purple/pink box shows a single track. Each unique person is tracked at least once, but full re-identification needs to be completed.

Sample Detection Videos - Work in Progress

.

.

.

.

Counting pedestrians from video

You upload video, we count what’s in it. Recent advances in Deep Neural Network and GPU performance mean it is now affordable to automatically count objects from generic video.

Previously, if you were counting, you would need to use manual solutions, or specialist overhead cameras. Now we can use video from mobile phones, CCTV, Gopro (tm) cameras etc.

Trials will launch in January 2017, for a service which counts objects in uploaded or web video. Customers will get a dashboard with a summary and analysis of the data. There will be a low cost per video-hour for this service.

Currently we are working on counting people, cars, bicycles, horses etc., but if there is something you would like counted in your videos, get in touch - jimd (at) yantantethera.com

  • Accurate counting of objects from video
  • Charged per video hour
  • Fully outsouced service - just upload your video or send URLs
  • People, Bicycles, Cars, Horses etc.
  • Deep Neural Net technology running on powerful GPU systems
  • Real world video, no configuration needed
  • MOBCOUNT available for trial users January 2017

Sample Detection Videos - Work in Progress

.

.

.

Thanks for watching

Panic Monster reads up - thanks to Wait but Why Research is focussed on object detection with Deep Learning, tracking in video, re-identification and real-time performance. We have worked in Internet video and camera technology for many years, but have been focussed on this area since early 2014, attending CVPR2014/2015, BMVA2015 and similar.

Current research includes YOLO, Single Shot Multibox detectors and Translation aware Fully Convolutional Networks, as well as video tubelets and person re-identification. Arxiv-sanity is our friend.

Very interested in discussing approaches, especially CNN person re-identification - jimd (at) yantantethera.com

Vision trade fair We will be exhibiting at VISION 2016, 8-10 Nov in Stuttgart. We will be in a small :) booth IJ64 on the north side, under the company name Yan Tan Tethera. Drop an email if you are interested in booking a time - jimd (at) yantantethera.com

What does the Future hold

Technically, we are betting on the following :-

  1. Deep Neural Nets will continue to improve at a rapid pace, Google/Facebook will share research as they have all the data
  2. GPUs will continue to improve, though short term NVIDIA will remain dominant, mid term new chip architectures may make an impact
  3. Camera technology, driven by mobile phones, will get better and better
  4. Tightly controlled interaction between image sensors and software (computational photography) will become more accessible
  5. The mobile phone will remain the dominant hardware and UI interaction platform
  6. People will produce too much video. Other people will sell good and bad tools to deal with this