Skip to content

This is a user-friendly application built with Streamlit that leverages the power of LangChain and the Groq API to extract and summarize content from YouTube videos, websites, and PDF files. Whether you're conducting research or just want a quick overview of long content, this tool simplifies the summarization process into a few clicks.

Notifications You must be signed in to change notification settings

ashkunwar/Youtube-URL-PDF-summarizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 

Repository files navigation

LangChain: Summarize Text From YouTube, Website, or PDF

WhatsApp Image 2024-12-28 at 17 09 59_40c1935f

This project leverages LangChain, Groq API, and other libraries to provide an intuitive interface for summarizing text content from YouTube videos, websites, or PDF files.

Features

  • YouTube Transcript Summarization: Extracts and summarizes the transcript of a YouTube video.
  • Website Content Summarization: Fetches and summarizes text content from a given URL.
  • PDF Summarization: Summarizes the content of uploaded PDF documents.

How It Works

  1. Input Options:

    • Provide a YouTube video URL.
    • Enter a generic website URL.
    • Upload a PDF file.
  2. Processing:

    • For YouTube videos, it extracts the transcript using the YouTubeTranscriptApi.
    • For websites, it fetches text content using UnstructuredURLLoader.
    • For PDFs, it processes the uploaded file using PyPDFLoader.
  3. Summarization:

    • A pre-defined prompt is used to generate a concise summary of the content using the ChatGroq LLM.
  4. Output:

    • The summarized text is displayed on the Streamlit app interface.

Technologies Used

  • Streamlit: For building the web interface.
  • LangChain: For chaining and managing prompts.
  • ChatGroq: A powerful LLM API for generating summaries.
  • YouTubeTranscriptApi: For fetching YouTube video transcripts.
  • PyPDFLoader: For reading and processing PDF files.
  • UnstructuredURLLoader: For extracting content from websites.

Installation

  1. Clone the repository:

    git clone https://github.com/your-repo-name.git
    cd your-repo-name
  2. Install the required dependencies:

    pip install -r requirements.txt

Usage

  1. Run the Streamlit app:

    streamlit run app.py
  2. Open the app in your browser and provide your Groq API key in the sidebar.

  3. Input a URL (YouTube/website) or upload a PDF file and click "Summarize the Content from YT, Website, or PDF."

Deployed App

Deployed the application on HuggingFace: LangChain Summarizer App

Project Files

  • app.py: Main application file.
  • requirements.txt: List of required Python packages.

Limitations

  • Only works with YouTube videos that have transcripts enabled.
  • Summarization quality depends on the provided content and LLM capabilities.
  • Requires a valid Groq API key to function.

License

This project is licensed under the MIT License.

About

This is a user-friendly application built with Streamlit that leverages the power of LangChain and the Groq API to extract and summarize content from YouTube videos, websites, and PDF files. Whether you're conducting research or just want a quick overview of long content, this tool simplifies the summarization process into a few clicks.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages