VISIONFLOW is a powerful, end-to-end solution that transforms meeting recordings into structured, readable documentation using the latest in LLM-based transcription, semantic understanding, and Retrieval-Augmented Generation (RAG).
From raw audio to detailed summaries and visualized logic flows - this system enables faster collaboration, insight extraction, and productivity across teams.
- 🎙️ Automatic Transcription using Whisper
- 🧠 Semantic Understanding of conversations using LLMs
- 📄 Document Generation in DOCX format
- 🔍 RAG Pipeline with ChromaDB + nomic embeddings
- 📈 Auto-generated Visualizations of meeting logic
- 🛠️ Modular, production-ready architecture
- ⚡ Streamlit interface for easy interaction
Layer | Tools |
---|---|
Transcription | faster-whisper |
Semantic Parsing | OpenAI / LLM (custom prompts) |
Vector DB | ChromaDB |
Embeddings | nomic-embed-text |
Visualization | Matplotlib / Graphviz |
UI | Streamlit |
Audio | Pydub |
Document Output | Python-docx |
visionflow/
├── transcription/ # Audio extraction, whisper transcription
├── semantic_analysis/ # LLM-based content parsing & summarization
├── rag_engine/ # ChromaDB setup, retrieval, and generation
├── doc_generation/ # Auto DOCX report creation
├── visualizer/ # Logic flow visualization
├── ui/ # Streamlit interface
├── utils/ # Error handling, logging, helpers
├── main.py # Entrypoint script
└── requirements.txt
git clone https://github.com/king04aman/visionflow.git
cd visionflow
python -m venv venv
source venv/bin/activate # or .\venv\Scripts\activate on Windows
pip install -r requirements.txt
Required for audio processing (Pydub):
- macOS:
brew install ffmpeg
- Ubuntu:
sudo apt install ffmpeg
- Windows: Install Guide
streamlit run ui/app.py
Coming Soon — Example DOCX + Logic Flow Diagram
- Audio is extracted from the video file using FFmpeg + Pydub
- Whisper transcribes the audio with timestamps
- LLM parses the transcript, identifies decisions, actions, topics
- ChromaDB stores and retrieves embeddings for context-aware generation
- DOCX reports and logic flow diagrams are created for end users
You can also run components via CLI:
python main.py --input path/to/video.mp4 --output_dir results/
Options:
--fast
: Use lightweight model for quick testing--visualize
: Generate logic diagrams--debug
: Enable verbose logs
- Add multi-language support
- Export to PDF / Markdown
- Real-time meeting integration (Zoom, Teams)
- Web dashboard & history tracking
Pull requests are welcome! For major changes, please open an issue first to discuss what you’d like to change.
Feel free to open an Issue or connect with me on LinkedIn.
This project is licensed under the MIT License - see the LICENSE file for details.