Deepfake Audio Detection

This project is a Deepfake Audio Detection system that can identify whether an audio recording is real or synthetically generated. It utilizes advanced machine learning models (CNN and ViT architectures) to perform audio classification. The application can be run as a web service with a user-friendly interface or as a command-line tool.

Features

Deepfake Audio Detection: Classifies audio as "real" or "fake".
Multiple Models: Employs four different models for robust detection:
- CNN Small
- CNN Large
- ViT Small
- ViT Large
Web Interface: Provides an easy-to-use web UI for uploading audio files and viewing detection results from all models.
CLI Tool: Offers a command-line interface for users who prefer terminal-based operations.
Docker Support: Includes a Dockerfile for easy containerization and deployment.

Project Structure

.
├── app/                        # Main application source code
│   ├── audio_processing/       # Audio segmentation and spectrogram utilities
│   ├── models/                 # Pre-trained model files (.pth)
│   ├── routers/                # FastAPI prediction endpoints
│   ├── static/                 # CSS and JavaScript for the web interface
│   ├── templates/              # HTML templates
│   ├── config.py               # Application configuration
│   ├── main.py                 # FastAPI application entry point
│   └── model_definitions.py    # PyTorch model class definitions
├── cli.py                      # Command-Line Interface script
├── Dockerfile                  # For building a Docker container
├── GUIDE.md                    # Detailed user guide
├── workflow.md                 # Description of the application workflow
├── requirements.txt            # Python dependencies for pip
├── environment.yml             # Conda environment definition (alternative)
└── README.md                   # This file

Getting Started

Prerequisites

Python 3.8+
Git

Installation

Clone the repository:

git clone <repository_url>
cd <repository_directory>

Set up a Python virtual environment (recommended):

Using venv:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Or using Conda:

conda create -n deepfake_audio_env python=3.9
conda activate deepfake_audio_env

Install dependencies (for deploy phase):
```
conda env create -f deploy-env.yml
```

Configure Environment Variables: Create a .env file in the project root (you can copy and rename .env.example if provided, or create one from scratch). Define necessary variables, especially model paths if they are not in the default app/models/ directory. Example .env:

MODEL_DIR="app/models/"
CNN_SMALL_MODEL_NAME="best_model_CNN_Small_cnn_3s_dataset_102208.pth"
CNN_LARGE_MODEL_NAME="best_model_CNN_Large_cnn_3s_dataset_114040.pth"
VIT_SMALL_MODEL_NAME="best_model_ViT_Small_vit_3s_dataset_040441.pth"
VIT_LARGE_MODEL_NAME="best_model_ViT_Large_vit_3s_dataset_044740.pth"
HOST="0.0.0.0"
PORT="8000"
DEBUG="False"

Usage

1. Web Application

The web application provides a user interface to upload an audio file and see the prediction results from all four models.

Start the FastAPI server:

uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

(Adjust host and port as needed. --reload is for development.)

Access the application by navigating to http://localhost:8000 (or your configured host/port) in your web browser.

API Documentation: Interactive API docs (Swagger UI) are available at http://localhost:8000/docs.
Prediction Endpoint: POST /predict_audio (accepts audio_file and an optional model_name query parameter).

2. Command-Line Interface (CLI)

The CLI allows for quick predictions directly from the terminal.

Syntax:

python cli.py <audio_file_path> [--model_name <model_key>]

<audio_file_path>: Path to the audio file.
[--model_name <model_key>]: Optional. Specify model (cnn_small, cnn_large, vit_small, vit_large). Defaults to cnn_small.

Example:

python cli.py "path/to/your/audio.wav" --model_name vit_large

Workflow Overview

The application processes audio as follows:

Upload: User uploads an audio file via the web UI.
Requests: The frontend sends separate prediction requests to the backend for each of the four models.
Processing (Backend):
- The audio is received and converted to mono.
- It's segmented into 3-second chunks.
- Each chunk is transformed into a Mel spectrogram, normalized, and resized.
Inference (Backend): The selected model performs inference on each processed chunk.
Results: Predictions (label and confidence) for each chunk are returned. The web UI aggregates and displays these results for each model.

For a more detailed workflow, see workflow.md.

Docker Deployment

A Dockerfile is provided for containerizing the application.

Build the Docker image:

docker build -t deepfake_audio_detector .

Run the Docker container:
```
docker run -d -p 8000:8000 --env-file .env deepfake_audio_detector
```
- Ensure your .env file is configured, especially if MODEL_DIR needs to be different or if models are mounted via volumes.
- To use models from your host machine instead of those copied into the image:
```
docker run -d -p 8000:8000 --env-file .env -v /path/to/your/models_on_host:/app/models deepfake_audio_detector
```
  (Ensure MODEL_DIR in your .env points to /app/models for this to work correctly inside the container).

Contributing

Contributions are welcome! Please refer to GUIDE.md for more detailed information on the project structure and how to set up for development.

Fork the repository.
Create a new branch (git checkout -b feature/your-feature-name).
Make your changes.
Commit your changes (git commit -m 'Add some feature').
Push to the branch (git push origin feature/your-feature-name).
Open a Pull Request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Deepfake Audio Detection

Features

Project Structure

Getting Started

Prerequisites

Installation

Usage

1. Web Application

2. Command-Line Interface (CLI)

Workflow Overview

Docker Deployment

Contributing

About

Uh oh!

Releases 1

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
app		app
datasets/final_dataset		datasets/final_dataset
results		results
training-phase		training-phase
.env		.env
.gitignore		.gitignore
Dockerfile		Dockerfile
GUIDE.md		GUIDE.md
README.md		README.md
audio-env.yml		audio-env.yml
cli.py		cli.py
deploy-env.yml		deploy-env.yml
deploy-phase.code-workspace		deploy-phase.code-workspace
training-phase.code-workspace		training-phase.code-workspace
workflow.md		workflow.md

nekloyh/Deepfake-Audio-detector

Folders and files

Latest commit

History

Repository files navigation

Deepfake Audio Detection

Features

Project Structure

Getting Started

Prerequisites

Installation

Usage

1. Web Application

2. Command-Line Interface (CLI)

Workflow Overview

Docker Deployment

Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages