DATADIG 🇮🇹

Table of Contents 📌

Introduction
Tech Stack
Key Features
Working With DataDig
Extraction Object Model
Contact
DEVS

Introduction

Kotlin application that allows for customizable extraction from documents such as images (jpg, jpeg, png), and pdfs using AI (openAI API and Azure document intelligence API). Keys get added by the user and the templates used to extract are fully customizable by the user. App also comes with amples useful templates and provides files of the extractions in various formats( txt, csv, xml, json)

Key Features 💡

Template Management: Create, edit, and delete templates to define the structure and rules for extracting specific data points from documents.
Camera Integration: Capture images directly within the app or select existing images from the device gallery for extraction.
Background Extraction: A foreground service performs the extraction process in the background, allowing users to continue using the app.
Extraction History: View and manage the results of previous extractions.
Secure API Key Storage: Sensitive API keys are stored securely using encryption.
Multilingual Support: The app is localized to support 5 languages.

Tech Stack 📜

Language: Kotlin
UI Framework: Jetpack Compose
Architecture: Model-View-ViewModel (MVVM)
Navigation: NavHost and NavController
State Management: ViewModel with StateFlow
Dependency Injection: Hilt
Local Database: Realm
Background Processing: Foreground Service
Service-UI Communication: Repository pattern
Security: EncryptedSharedPreferences and MasterKey for API key storage
Localization: Support for 5 languages
OCR: TextRecognition library

Working With DataDig 💕

If you are using DataDig for your company you can handle the flow of data from the extraction in 2 ways:

Implicit Intent (recommended)

Make yourself an android app that can receive the implicit intent: Intent(Intent.ACTION_VIEW).apply { setDataAndType(contentUri, mimeType) flags = Intent.FLAG_GRANT_READ_URI_PERMISSION REMOVED that contains the Uri to the extraction file
1. Register your app to handle the ACTION_VIEW intent in your AndroidManifest.xml.
2. When the user clicks on the extracted file (json or xml recommended) within DataDig, your app will appear as an option to open it.
3. Retrieve the file path from the provided Uri.
4. Parse the file contents.
5. Update your app's database or perform further processing on the extracted information.
This approach offers a seamless way to directly integrate DataDig's output into your app's workflow.
Sharing
1. DataDig provides the option to share extracted data in JSON or XML format.
2. Use this shared data to import it into other systems, perform additional analysis, or any other desired actions.
Please ensure secure handling of the shared data if it contains sensitive information.

Extraction Object Model 📔

This section outlines the classes used to represent the results of a data extraction process in DataDig. (useful for parsing)

`Extraction`

The primary class holding the extracted data.

title (String): Title or name associated with the extraction.
extractedFields (List<ExtractionField>): List of extracted fields and their values.
extractedTables (List<ExtractionTable>): List of extracted tables.
extractionCosts (List<ExtractionCosts>): Cost breakdown of the extraction (model usage, etc.).
exceptionsOccurred (List<ExceptionOccurred>): Any exceptions that occurred during extraction.
language (String, optional): Language detected in the document (if applicable).
model (String, optional): Model used for the extraction.

`ExtractionTable`

Represents a single extracted table.

title (String, optional): Title of the extracted table.
template_table_title (String): The title of the template table used for extraction.
dataframe (String): Internal representation of the table data (not exposed to the user, this is useful to undestand what the LLM saw when extracting).
fields (List<ExtractionTableRow>): Rows within the extracted table.

`ExtractionTableRow`

Represents a row within an extracted table.

rowName (String): Name or identifier for the row.
fields (List<ExtractionField>): Fields and their values within this row.

`ExceptionOccurred`

Details about any exceptions encountered during extraction.

error (String): Error message.
errorType (String): Type or category of the error.
errorDescription (String): Additional description of the error.

`ExtractionField`

Represents a single extracted field.

template_field_title (String): Title of the template field used for extraction
value (String): Extracted value for this field.

`ExtractionCosts`

Breakdown of the costs associated with the extraction process.

name (String): Name of the cost component (e.g., "Model Usage").
cost (Float): Cost in the specified currency.
currency (String): Currency code (e.g., "USD").
tokens (Int): Number of tokens used (if applicable to the cost model).

Contact 📬

Good luck! 🖖 Feel free to reach out to me via:

Email: elia.fri3erg@gmail.com
LinkedIn: elia.friberg
Whatsapp: @elia.friberg

DEVS

Elia Friberg

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
.idea		.idea
.vscode		.vscode
app		app
gradle		gradle
thesis		thesis
.gitignore		.gitignore
README.md		README.md
build.gradle.kts		build.gradle.kts
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DATADIG 🇮🇹

Table of Contents 📌

Introduction

Key Features 💡

Tech Stack 📜

Working With DataDig 💕

Extraction Object Model 📔

`Extraction`

`ExtractionTable`

`ExtractionTableRow`

`ExceptionOccurred`

`ExtractionField`

`ExtractionCosts`

Contact 📬

DEVS

About

Uh oh!

Releases

Packages

Uh oh!

Languages

fri3erg/DataDig-AIExtractor

Folders and files

Latest commit

History

Repository files navigation

DATADIG 🇮🇹

Table of Contents 📌

Introduction

Key Features 💡

Tech Stack 📜

Working With DataDig 💕

Extraction Object Model 📔

Extraction

ExtractionTable

ExtractionTableRow

ExceptionOccurred

ExtractionField

ExtractionCosts

Contact 📬

DEVS

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

`Extraction`

`ExtractionTable`

`ExtractionTableRow`

`ExceptionOccurred`

`ExtractionField`

`ExtractionCosts`

Packages