Skip to content

VISIONFLOW is an AI-powered tool that converts meeting recordings into structured documentation and visual logic flows. It combines Whisper for transcription, LLMs for semantic analysis, and ChromaDB for RAG-based content generation — all wrapped in a modular and streamlit-driven interface.

License

Notifications You must be signed in to change notification settings

king04aman/Visionflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

🎥 VISIONFLOW – Intelligent Video-to-Docs AI System

VISIONFLOW is a powerful, end-to-end solution that transforms meeting recordings into structured, readable documentation using the latest in LLM-based transcription, semantic understanding, and Retrieval-Augmented Generation (RAG).

From raw audio to detailed summaries and visualized logic flows - this system enables faster collaboration, insight extraction, and productivity across teams.

✨ Features

  • 🎙️ Automatic Transcription using Whisper
  • 🧠 Semantic Understanding of conversations using LLMs
  • 📄 Document Generation in DOCX format
  • 🔍 RAG Pipeline with ChromaDB + nomic embeddings
  • 📈 Auto-generated Visualizations of meeting logic
  • 🛠️ Modular, production-ready architecture
  • ⚡ Streamlit interface for easy interaction

🧰 Tech Stack

Layer Tools
Transcription faster-whisper
Semantic Parsing OpenAI / LLM (custom prompts)
Vector DB ChromaDB
Embeddings nomic-embed-text
Visualization Matplotlib / Graphviz
UI Streamlit
Audio Pydub
Document Output Python-docx

📂 Project Structure

visionflow/
├── transcription/       # Audio extraction, whisper transcription
├── semantic_analysis/   # LLM-based content parsing & summarization
├── rag_engine/          # ChromaDB setup, retrieval, and generation
├── doc_generation/      # Auto DOCX report creation
├── visualizer/          # Logic flow visualization
├── ui/                  # Streamlit interface
├── utils/               # Error handling, logging, helpers
├── main.py              # Entrypoint script
└── requirements.txt

🚀 Getting Started

1. Clone the Repo

git clone https://github.com/king04aman/visionflow.git
cd visionflow

2. Setup Environment

python -m venv venv
source venv/bin/activate  # or .\venv\Scripts\activate on Windows
pip install -r requirements.txt

3. Install FFmpeg

Required for audio processing (Pydub):

  • macOS: brew install ffmpeg
  • Ubuntu: sudo apt install ffmpeg
  • Windows: Install Guide

4. Run the App

streamlit run ui/app.py

🧪 Sample Output

Coming Soon — Example DOCX + Logic Flow Diagram

🔍 How It Works

  1. Audio is extracted from the video file using FFmpeg + Pydub
  2. Whisper transcribes the audio with timestamps
  3. LLM parses the transcript, identifies decisions, actions, topics
  4. ChromaDB stores and retrieves embeddings for context-aware generation
  5. DOCX reports and logic flow diagrams are created for end users

⚙️ Configuration & CLI

You can also run components via CLI:

python main.py --input path/to/video.mp4 --output_dir results/

Options:

  • --fast: Use lightweight model for quick testing
  • --visualize: Generate logic diagrams
  • --debug: Enable verbose logs

📌 To-Do / Roadmap

  • Add multi-language support
  • Export to PDF / Markdown
  • Real-time meeting integration (Zoom, Teams)
  • Web dashboard & history tracking

🤝 Contributing

Pull requests are welcome! For major changes, please open an issue first to discuss what you’d like to change.

💬 Questions?

Feel free to open an Issue or connect with me on LinkedIn.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

About

VISIONFLOW is an AI-powered tool that converts meeting recordings into structured documentation and visual logic flows. It combines Whisper for transcription, LLMs for semantic analysis, and ChromaDB for RAG-based content generation — all wrapped in a modular and streamlit-driven interface.

Topics

Resources

License

Stars

Watchers

Forks