Shravan is an AI-powered tool designed to process YouTube videos by converting speech to text, translating transcripts, summarizing content, and enabling contextual Q&A. This project leverages on Speech Recognition, Advance NLP, Hugging Face, and Seq2Seq LLMs
- 🎤 Video-to-Audio Conversion: Extracts audio from locally downloaded YouTube videos.
- 📝 Speech-to-Text Transcription: Converts audio chunks into text using PyDub and Wave2Vec.
- 🌍 Translation Support: Translates transcripts into a user-specified language.
- 📄 Summarization: Provides concise summaries of the original transcript.
- 🤖 Contextual Q&A: Uses FAISS + FLAN-T5 to allow users to ask questions about the video content.
- ⚡ Fast and Efficient: Optimized pipeline for processing large video files quickly.
git clone https://github.com/sagarvk24/EchoTranscribe-AI-YouTube-Video-Processing-with-Speech-to-Text-Q-A-and-Translation.git
cd EchoTranscribe
- 🤗 Hugging Face Transformers (For Q&A & Summarization)
- 🎙️ PyDub, HuggingSound and Wave2Vec (ASR and Speech-To-Text)
- 📦 FAISS (Vector Database for efficient Q&A)
- 🌍 Google Translate API (Text Translation)
- 🏗 PyTorch (Model Inference)
Contributions are welcome! Feel free to fork the repo, create a branch, and submit a pull request.
git checkout -b feature-branch
This project is licensed under the MIT License. See LICENSE for details.
For any queries, reach out via LinkedIn or open an issue on GitHub.
🚀 Happy Transcribing & Exploring!