
Portfolio
An integrated accessibility pipeline that captures live video, extracts pose/hand/face keypoints, models gesture sequences, outputs text, and synthesizes speech for real-time interaction.

Project details
Real-time capture → keypoint extraction → temporal modeling → text decoding → speech output; designed for sub-second responsiveness with an offline-capable mode.
Keypoints are lighter, faster, more stable for temporal modeling, and reduce privacy risk compared to storing/processing full frames.
Webcam/mobile input → buffer frames → MediaPipe Holistic / OpenPose → temporal windowing → LSTM/Transformer → decode → TTS → audio output (+ optional transcript).
Healthcare front desks and triage
Customer service and government counters
Education support
Workplace HR and onboarding