Deep Video Analytics provides a platform for indexing and extracting information from videos and images. Deep learning detection and recognition algorithms are used for indexing individual frames/images along with detected objects. The goal of Deep Video analytics is to become a quickly customizable platform for developing visual & video analytics applications, while benefiting from seamless integration with state or the art models & datasets released by the vision research community.
- Visual Search using Nearest Neighbors algorithm as a primary interface
- Upload videos, multiple images (zip file with folder names as labels)
- Provide Youtube url to be automatically processed/downloaded by youtube-dl
- Leverage pre-trained object recognition/detection, face recognition models for analysis and visual search.
- Query against pre-indexed external datasets containing millions of images.
- Metadata stored in Postgres, Operations performed asynchronously using celery tasks.
- Separate queues and workers for selection of machines with different specifications (GPU vs RAM).
- Videos, frames, indexes, numpy vectors stored in media directory, served through nginx
- Explore data, manually run code & tasks without UI via a jupyter notebook explore.ipynb
We take significant efforts to ensure that following models (code+weights included) work without having to write any code.
- Indexing using Google inception V3 trained on Imagenet
- Single Shot Detector (SSD) Multibox 300 training using VOC
- Alexnet using Pytorch (disabled by default; set ALEX_ENABLE=1 in environment variable to use)
- YOLO 9000 (disabled by default; set YOLO_ENABLE=1 in environment variable to use)
- Face detection/alignment/recognition using MTCNN and Facenet
self-promotion: If you are interested in Healthcare & Machine Learning please take a look at my another Open Source project Computational Healthcare
- Pytorch License
- Darknet License
- AdminLTE2 License
- FabricJS License
- Modified PySceneDetect License
- Modified SSD-Tensorflow Individual files are marked as Apache
- FAISS License (Non Commercial)
- Facenet License
- MTCNN TensorFlow port of MTCNN for face detection/alignment
- Locally Optimized Product Quantization License
- Docker
- Nvidia-docker
- OpenCV
- Numpy
- FFMPEG
- Tensorflow
- Schroff, Florian, Dmitry Kalenichenko, and James Philbin. "Facenet: A unified embedding for face recognition and clustering." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
- Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
- Zhang, Kaipeng, et al. "Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks." IEEE Signal Processing Letters 23.10 (2016): 1499-1503.
- Liu, Wei, et al. "SSD: Single shot multibox detector." European Conference on Computer Vision. Springer International Publishing, 2016.
- Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
- Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
- Johnson, Jeff, Matthijs Douze, and Hervé Jégou. "Billion-scale similarity search with GPUs." arXiv preprint arXiv:1702.08734 (2017).
Citation for Deep Video Analytics coming soon.
Copyright 2016-2017, Akshay Bhat, Cornell University, All rights reserved.
Please contact me for more information, I plan on relaxing the license soon, once a beta version is reached (To the extent allowed by the code/models included.e.g. FAISS disallows commercial use.).

