References

#Deep Video Analytics •

Author Akshay Bhat, Cornell University.

Deep Video Analytics provides a platform for indexing and extracting information from videos and images. Deep learning detection and recognition algorithms are used for indexing individual frames/images along with detected objects. The goal of Deep Video analytics is to become a quickly customizable platform for developing visual & video analytics applications, while benefiting from seamless integration with state or the art models & datasets released by the vision research community.

For quick overview of vision behind this project please go through this presentation.

self-promotion: If you are interested in Healthcare & Machine Learning please take a look at my another Open Source project Computational Healthcare

Features

Visual Search using Nearest Neighbors algorithm as a primary interface
Upload videos, multiple images (zip file with folder names as labels)
Provide Youtube url to be automatically processed/downloaded by youtube-dl
Leverage pre-trained object recognition/detection, face recognition models for analysis and visual search.
Query against pre-indexed external datasets containing millions of images.
Metadata stored in Postgres, Operations performed asynchronously using celery tasks.
Separate queues and workers for selection of machines with different specifications (GPU vs RAM).
Videos, frames, indexes, numpy vectors stored in media directory, served through nginx
Explore data, manually run code & tasks without UI via a jupyter notebook explore.ipynb
Some documentation on design decision, architecture and deployment.

Models included out of the box

We take significant efforts to ensure that following models (code+weights included) work without having to write any code.

Indexing using Google inception V3 trained on Imagenet
Single Shot Detector (SSD) Multibox 300 training using VOC
Alexnet using Pytorch (disabled by default; set ALEX_ENABLE=1 in environment variable to use)
YOLO 9000 (disabled by default; set YOLO_ENABLE=1 in environment variable to use)
Face detection/alignment/recognition using MTCNN and Facenet

External datasets indexed for use

Planned models and datasets

MultiNet or KittiBox
Text detection models
Soundnet (requires extracting mp3 audio)
Open Images dataset pretrained inception v3
Keras-js which uses Keras inception for client side indexing

Approximate Nearest Neighbors indexing algorithms

To Do

Please take a look at this board for planned future tasks

Installation

Pre-built docker images for both CPU & GPU versions are available on Docker Hub.

Machines without an Nvidia GPU

Deep Video analytics is implemented using Docker and works on Mac, Windows and Linux.

git clone https://github.com/AKSHAYUBHAT/DeepVideoAnalytics 
cd DeepVideoAnalytics/docker && docker-compose up

Machines with Nvidia GPU

You need to have latest version of Docker and nvidia-docker installed. The GPU Dockerfile is slightly different from the CPU version dockerfile.

pip install --upgrade nvidia-docker-compose
git clone https://github.com/AKSHAYUBHAT/DeepVideoAnalytics 
cd DeepVideoAnalytics/docker_GPU && ./rebuild.sh 
nvidia-docker-compose up

Multiple machines

Its possible to deploy Deep Video Analytics on multiple machines. Configuring Postgres and RabbitMQ is straightforward. The main issues is correctly mounting the shared media folder (ideally a mounted EFS or NFS). Please read this regarding trade offs.

Amazon P2 instance

We provide an AMI with all dependencies such as docker & nvidia drivers pre-installed. To use it start a P2.xlarge instance with ami-b3cc1fa5 (N. Virginia) and ports 8000, 6006, 8888 open (preferably to only your IP). Run following commands after logging into the machine via SSH.

cd deepvideoanalytics && git pull 
cd docker_GPU && ./rebuild.sh && nvidia-docker-compose up

You can optionally specify "-d" at the end to detach it, but for the very first time its useful to read how each container is started. After approximately 5 ~ 1 minutes the user interface will appear on port 8000 of the instance ip. The Process used for AMI creation is here

Security warning: The current GPU container uses nginx <-> uwsgi <-> django setup to ensure smooth playback of videos. However it runs nginix as root (though within the container). Considering that you can now modify AWS Security rules on-the-fly, I highly recommend allowing inbound traffic only from your own IP address.)

Options

Following options can be specified in docker-compose.yml, or your environment to selectively enable/disable algorithms.

ALEX_ENABLE=1 (to use Alexnet with PyTorch. Otherwise disabled by default)
YOLO_ENABLE=1 (to use YOLO 9000. Otherwise disabled by default)
SCENEDETECT_DISABLE=1 (to disable scene detection. Otherwise enabled by default)
RESCALE_DISABLE=1 (to disable rescaling of frame extracted from videos. Otherwise enabled by default)

Architecture

User Interface

Search across frames

And specific detected objects such as Faces

Past queries

Video list / detail

Frame detail

View status of running tasks/queries, retry/rerun failed tasks

Libraries & Code used

Pytorch License
Darknet License
AdminLTE2 License
FabricJS License
Modified PySceneDetect License
Modified SSD-Tensorflow Individual files are marked as Apache
FAISS License (Non Commercial)
Facenet License
MTCNN TensorFlow port of MTCNN for face detection/alignment
Locally Optimized Product Quantization License
Docker
Nvidia-docker
OpenCV
Numpy
FFMPEG
Tensorflow

References

Schroff, Florian, Dmitry Kalenichenko, and James Philbin. "Facenet: A unified embedding for face recognition and clustering." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
Zhang, Kaipeng, et al. "Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks." IEEE Signal Processing Letters 23.10 (2016): 1499-1503.
Liu, Wei, et al. "SSD: Single shot multibox detector." European Conference on Computer Vision. Springer International Publishing, 2016.
Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
Johnson, Jeff, Matthijs Douze, and Hervé Jégou. "Billion-scale similarity search with GPUs." arXiv preprint arXiv:1702.08734 (2017).

Citation

Citation for Deep Video Analytics coming soon.

Copyright

Please contact me for more information, I plan on relaxing the license soon, once a beta version is reached (To the extent allowed by the code/models included.e.g. FAISS disallows commercial use.).

Name		Name	Last commit message	Last commit date
Latest commit History 396 Commits
darknet		darknet
docker		docker
docker_GPU		docker_GPU
dva		dva
dvaapp		dvaapp
dvalib		dvalib
experiments		experiments
faiss		faiss
logs		logs
notes		notes
templates		templates
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
fabfile.py		fabfile.py
manage.py		manage.py
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Author Akshay Bhat, Cornell University.

self-promotion: If you are interested in Healthcare & Machine Learning please take a look at my another Open Source project Computational Healthcare

Features

Models included out of the box

External datasets indexed for use

Planned models and datasets

Approximate Nearest Neighbors indexing algorithms

To Do

Installation

Machines without an Nvidia GPU

Machines with Nvidia GPU

Multiple machines

Amazon P2 instance

Options

Architecture

User Interface

Search across frames

And specific detected objects such as Faces

Past queries

Video list / detail

Frame detail

View status of running tasks/queries, retry/rerun failed tasks

Libraries & Code used

References

Citation

Copyright

About

Uh oh!

Releases

Packages

Languages

ubaidsayyed54/DeepVideoAnalytics

Folders and files

Latest commit

History

Repository files navigation

Author Akshay Bhat, Cornell University.

self-promotion: If you are interested in Healthcare & Machine Learning please take a look at my another Open Source project Computational Healthcare

Features

Models included out of the box

External datasets indexed for use

Planned models and datasets

Approximate Nearest Neighbors indexing algorithms

To Do

Installation

Machines without an Nvidia GPU

Machines with Nvidia GPU

Multiple machines

Amazon P2 instance

Options

Architecture

User Interface

Search across frames

And specific detected objects such as Faces

Past queries

Video list / detail

Frame detail

View status of running tasks/queries, retry/rerun failed tasks

Libraries & Code used

References

Citation

Copyright

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages