Portfolio

Real‑Time Sign Language to Speech Translation System: Low‑Latency Video‑to‑Voice Pipeline

An integrated accessibility pipeline that captures live video, extracts pose/hand/face keypoints, models gesture sequences, outputs text, and synthesizes speech for real-time interaction.

MuFaw AI Research LabAccessibility AIPose EstimationSequence ModelsTTSEdge AI

Contact MuFaw Download report

Real‑Time Sign Language to Speech Translation System: Low‑Latency Video‑to‑Voice Pipeline

Project details

What we delivered

Overview

Real-time capture → keypoint extraction → temporal modeling → text decoding → speech output; designed for sub-second responsiveness with an offline-capable mode.

Why Keypoints (Not Raw Video)

Keypoints are lighter, faster, more stable for temporal modeling, and reduce privacy risk compared to storing/processing full frames.

Pipeline

Webcam/mobile input → buffer frames → MediaPipe Holistic / OpenPose → temporal windowing → LSTM/Transformer → decode → TTS → audio output (+ optional transcript).

Use Cases

Healthcare front desks and triage

Customer service and government counters

Education support

Workplace HR and onboarding