A real-time hand gesture recognition system that uses MediaPipe for hand detection and custom-trained neural networks to classify both static hand gestures and dynamic finger movements. The system achieves high accuracy (~95%) while maintaining real-time performance suitable for interactive applications.
- Real-time Hand Detection: Leverages MediaPipe's robust hand tracking to detect up to 2 hands simultaneously
- Dual Classification System:
- Static gesture recognition (5 gestures)
- Dynamic movement classification (4 motion patterns)
- Interactive Data Collection: Built-in modes for collecting training data
- Lightweight Models: TensorFlow Lite models optimized for edge deployment
- Visual Feedback: Real-time visualization of hand landmarks and gesture classifications
The system uses MediaPipe's hand detection model to extract 21 3D landmarks from each detected hand. These landmarks represent key points on the hand including:
- Wrist (landmark 0)
- Thumb joints (landmarks 1-4)
- Index finger joints (landmarks 5-8)
- Middle finger joints (landmarks 9-12)
- Ring finger joints (landmarks 13-16)
- Pinky joints (landmarks 17-20)
The pointer functionality is triggered when the system detects a "Pointer" gesture (class 2). When activated:
- The system tracks the index fingertip position (landmark 8)
- A history of the last 16 positions is maintained in a deque
- The movement trajectory is analyzed by the point history classifier
- Visual feedback shows the pointer trail with increasing circle sizes
- Input: 42 normalized landmark coordinates (21 points in 2D)
- Architecture:
Input(42) , Dropout(0.2) , Dense(20, ReLU) , Dropout(0.4) , Dense(10, ReLU) , Dense(5, Softmax) - Output Classes:
- 0: Open (open palm)
- 1: Close (closed fist)
- 2: Pointer (index finger extended)
- 3: OK (thumb and index forming circle)
- 4: Peace Sign (index and middle fingers extended)
- Input: 32 values representing 16 consecutive movement vectors
- Architecture:
Input(32) , Dropout(0.2) , Dense(24, ReLU) , Dropout(0.5) , Dense(10, ReLU) , Dense(4, Softmax) - Output Classes:
- 0: Stop (no significant movement)
- 1: Clockwise (circular clockwise motion)
- 2: Counter Clockwise (circular counter-clockwise motion)
- 3: Move (linear directional movement)
-
Landmark Normalization:
- Coordinates are converted to relative positions from the wrist (landmark 0)
- Values are normalized by the maximum absolute value to ensure scale invariance
- This makes the system robust to different hand sizes and distances from camera
-
Movement History Processing:
- Raw pixel coordinates are converted to relative movements
- Normalized by image dimensions for resolution independence
- Maintains temporal context through a sliding window approach
-
Clone the repository:
git clone https://github.com/NavalShah/Dextra.git cd Dextra -
Install dependencies:
pip install opencv-python mediapipe tensorflow numpy matplotlib
-
Run the application:
python app.py
If you don't want to install Python dependencies manually:
- Download
HandGestureRecognition_Portable.zipfrom the Releases page - Extract the zip file
- Run
setup.batto install dependencies automatically - Run
run.batto start the application
python app.py--device: Camera device index (default: 0)--width: Capture width in pixels (default: 960)--height: Capture height in pixels (default: 540)--min_detection_confidence: Minimum confidence for hand detection (default: 0.7)--min_tracking_confidence: Minimum confidence for hand tracking (default: 0.5)
- ESC: Exit the application
- n: Normal mode (inference only)
- k: Keypoint logging mode (collect static gesture data)
- h: History logging mode (collect movement data)
- 0-9: Class label for data collection modes
-
Collecting Static Gestures (Mode 1):
- Press 'k' to enter keypoint logging mode
- Make the desired hand gesture
- Press the corresponding number (0-4) to save the sample
- Data is appended to
model/keypoint_classifier/keypoint.csv
-
Collecting Movement Patterns (Mode 2):
- Press 'h' to enter history logging mode
- Make the pointer gesture to start tracking
- Perform the desired movement pattern
- Press the corresponding number (0-3) to save the sample
- Data is appended to
model/point_history_classifier/point_history.csv
pip install tensorflow numpy pandas scikit-learn-
Static Gesture Model:
jupyter notebook keypoint_classification.ipynb
- Modify gesture labels in
keypoint_classifier_label.csv - Collect training data using mode 1
- Run the notebook to train and export the model
- Modify gesture labels in
-
Movement Pattern Model:
jupyter notebook point_history_classification.ipynb
- Modify movement labels in
point_history_classifier_label.csv - Collect training data using mode 2
- Run the notebook to train and export the model
- Modify movement labels in
- TensorFlow Lite: Models are converted to TFLite format for efficient inference
- Preprocessing Cache: Landmark normalization is vectorized using NumPy
- Deque Structure: Fixed-size queue for O(1) append operations
- Single-threaded Inference: Reduces overhead for lightweight models
- MediaPipe provides normalized coordinates (0-1 range)
- Converted to pixel coordinates for visualization
- Pointer tracking uses pixel coordinates for precise control
- Movement classification uses normalized deltas for scale invariance
- Hand detection: 0.7 (adjustable via CLI)
- Hand tracking: 0.5 (adjustable via CLI)
- Movement classification: 0.5 (hardcoded in PointHistoryClassifier)
- Gesture-Based Control Systems: Control smart home devices, presentations, or media players
- Sign Language Recognition: Extend the gesture vocabulary for basic sign language
- Virtual Mouse/Pointer: Use hand movements for touchless computer interaction
- Gaming Interface: Create gesture-based controls for games
- Accessibility Tools: Provide alternative input methods for users with mobility limitations
- AR/VR Interaction: Natural hand gesture input for immersive environments
- Support for more complex gestures and combinations
- Two-hand gesture recognition
- Gesture sequence recognition for commands
- Integration with voice commands
- Export to mobile platforms (Android/iOS)
- Real-time gesture customization without retraining
- Python 3.7+
- OpenCV
- MediaPipe
- TensorFlow/TensorFlow Lite
- NumPy
- MediaPipe team for the excellent hand tracking solution
- TensorFlow team for the machine learning framework
- OpenCV community for computer vision tools