Deep Video Analytics provides a platform for indexing and extracting information from videos and images. Deep learning detection and recognition algorithms are used for indexing individual frames / images along with detected objects. The goal of Deep Video analytics is to become a quickly customizable platform for developing visual & video analytics applications, while benefiting from seamless integration with state or the art models released by the vision research community.
Self-promotion: If you are interested in Healthcare & Machine Learning please take a look at my another Open Source project Computational Healthcare
- Visual Search using Nearest Neighbors algorithm as a primary interface
- Upload videos, multiple images (zip file with folder names as labels)
- Provide Youtube url to be automatically processed/downloaded by youtube-dl
- Metadata stored in Postgres
- Operations (Querying, Frame extraction & Indexing) performed using celery tasks and RabbitMQ
- Separate queues and workers for selection of machines with different specifications (GPU vs RAM)
- Videos, frames, indexes, numpy vectors stored in media directory, served through nginx
- Explore data, manually run code & tasks without UI via a jupyter notebook explore.ipynb
- Some documentation on design decision, architecture and deployment.
We take significant efforts to ensure that following models (code+weights included) work without having to write any code.
- Indexing using Google inception V3 trained on Imagenet
- Single Shot Detector (SSD) Multibox 300 training using VOC
- Alexnet using Pytorch (disabled by default; set ALEX_ENABLE=1 in environment variable to use)
- YOLO 9000 (disabled by default; set YOLO_ENABLE=1 in environment variable to use)
- Face detection/alignment/recognition using MTCNN and Facenet
- Facebook FAISS for fast approximate similarity search (coming very soon!)
- Text detection models
- Soundnet (requires extracting mp3 audio)
- Open Images dataset pretrained inception v3
- Mapnet (requires converting models from Marvin)
- Keras-js which uses Keras inception for client side indexing
Please take a look at this board for planned future tasks
Pre-built docker images for both CPU & GPU versions are available on Docker Hub.
Deep Video analytics is implemented using Docker and works on Mac, Windows and Linux.
git clone https://github.com/AKSHAYUBHAT/DeepVideoAnalytics
cd DeepVideoAnalytics/docker && docker-compose up You need to have latest version of Docker and nvidia-docker installed. The GPU Dockerfile is slightly different from the CPU version dockerfile.
pip install --upgrade nvidia-docker-compose
git clone https://github.com/AKSHAYUBHAT/DeepVideoAnalytics
cd DeepVideoAnalytics/docker_GPU && ./rebuild.sh
nvidia-docker-compose up Its possible to deploy Deep Video Analytics on multiple machines. Configuring Postgres and RabbitMQ is straightforward. The main issues is correctly mounting the shared media folder (ideally a mounted EFS or NFS). Please read this regarding trade offs.
We provide an AMI with all dependencies such as docker & nvidia drivers pre-installed. To use it start a P2.xlarge instance with ami-b3cc1fa5 (N. Virginia) and ports 8000, 6006, 8888 open (preferably to only your IP). Run following commands after logging into the machine via SSH.
cd deepvideoanalytics && git pull
cd docker_GPU && ./rebuild.sh && nvidia-docker-compose up You can optionally specify "-d" at the end to detach it, but for the very first time its useful to read how each container is started. After approximately 5 ~ 1 minutes the user interface will appear on port 8000 of the instance ip. The Process used for AMI creation is here
Security warning: The current GPU container uses nginx <-> uwsgi <-> django setup to ensure smooth playback of videos. However it runs nginix as root (though within the container). Considering that you can now modify AWS Security rules on-the-fly, I highly recommend allowing inbound traffic only from your own IP address.)
Following options can be specified in docker-compose.yml, or your environment to selectively enable/disable algorithms.
- ALEX_ENABLE=1 (to use Alexnet with PyTorch. Otherwise disabled by default)
- YOLO_ENABLE=1 (to use YOLO 9000. Otherwise disabled by default)
- SCENEDETECT_DISABLE=1 (to disable scene detection, Otherwise enabled by default)
- Pytorch License
- Darknet License
- AdminLTE2 License
- FabricJS License
- Modified PySceneDetect License
- Modified SSD-Tensorflow Individual files are marked as Apache
- FAISS License (Non Commercial)
- Facenet License
- MTCNN TensorFlow port of MTCNN for face detection/alignment
- Docker
- Nvidia-docker
- OpenCV
- Numpy
- FFMPEG
- Tensorflow
- Schroff, Florian, Dmitry Kalenichenko, and James Philbin. "Facenet: A unified embedding for face recognition and clustering." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
- Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
- Zhang, Kaipeng, et al. "Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks." IEEE Signal Processing Letters 23.10 (2016): 1499-1503.
- Liu, Wei, et al. "SSD: Single shot multibox detector." European Conference on Computer Vision. Springer International Publishing, 2016.
- Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
- Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
- Johnson, Jeff, Matthijs Douze, and Hervé Jégou. "Billion-scale similarity search with GPUs." arXiv preprint arXiv:1702.08734 (2017).
Citation for Deep Video Analytics coming soon.
Copyright 2016-2017, Akshay Bhat, Cornell University, All rights reserved.






