Kotlin application that allows for customizable extraction from documents such as images (jpg, jpeg, png), and pdfs using AI (openAI API and Azure document intelligence API). Keys get added by the user and the templates used to extract are fully customizable by the user. App also comes with amples useful templates and provides files of the extractions in various formats( txt, csv, xml, json)
- Template Management: Create, edit, and delete templates to define the structure and rules for extracting specific data points from documents.
- Camera Integration: Capture images directly within the app or select existing images from the device gallery for extraction.
- Background Extraction: A foreground service performs the extraction process in the background, allowing users to continue using the app.
- Extraction History: View and manage the results of previous extractions.
- Secure API Key Storage: Sensitive API keys are stored securely using encryption.
- Multilingual Support: The app is localized to support 5 languages.
- Language: Kotlin
- UI Framework: Jetpack Compose
- Architecture: Model-View-ViewModel (MVVM)
- Navigation: NavHost and NavController
- State Management: ViewModel with StateFlow
- Dependency Injection: Hilt
- Local Database: Realm
- Background Processing: Foreground Service
- Service-UI Communication: Repository pattern
- Security: EncryptedSharedPreferences and MasterKey for API key storage
- Localization: Support for 5 languages
- OCR: TextRecognition library
If you are using DataDig for your company you can handle the flow of data from the extraction in 2 ways:
-
Implicit Intent (recommended)
Make yourself an android app that can receive the implicit intent: Intent(Intent.ACTION_VIEW).apply { setDataAndType(contentUri, mimeType) flags = Intent.FLAG_GRANT_READ_URI_PERMISSION REMOVED that contains the Uri to the extraction file
- Register your app to handle the
ACTION_VIEW
intent in yourAndroidManifest.xml
. - When the user clicks on the extracted file (json or xml recommended) within DataDig, your app will appear as an option to open it.
- Retrieve the file path from the provided
Uri
. - Parse the file contents.
- Update your app's database or perform further processing on the extracted information.
This approach offers a seamless way to directly integrate DataDig's output into your app's workflow.
- Register your app to handle the
-
Sharing
- DataDig provides the option to share extracted data in JSON or XML format.
- Use this shared data to import it into other systems, perform additional analysis, or any other desired actions.
Please ensure secure handling of the shared data if it contains sensitive information.
This section outlines the classes used to represent the results of a data extraction process in DataDig. (useful for parsing)
The primary class holding the extracted data.
title
(String): Title or name associated with the extraction.extractedFields
(List<ExtractionField
>): List of extracted fields and their values.extractedTables
(List<ExtractionTable
>): List of extracted tables.extractionCosts
(List<ExtractionCosts
>): Cost breakdown of the extraction (model usage, etc.).exceptionsOccurred
(List<ExceptionOccurred
>): Any exceptions that occurred during extraction.language
(String, optional): Language detected in the document (if applicable).model
(String, optional): Model used for the extraction.
Represents a single extracted table.
title
(String, optional): Title of the extracted table.template_table_title
(String): The title of the template table used for extraction.dataframe
(String): Internal representation of the table data (not exposed to the user, this is useful to undestand what the LLM saw when extracting).fields
(List<ExtractionTableRow
>): Rows within the extracted table.
Represents a row within an extracted table.
rowName
(String): Name or identifier for the row.fields
(List<ExtractionField
>): Fields and their values within this row.
Details about any exceptions encountered during extraction.
error
(String): Error message.errorType
(String): Type or category of the error.errorDescription
(String): Additional description of the error.
Represents a single extracted field.
template_field_title
(String): Title of the template field used for extractionvalue
(String): Extracted value for this field.
Breakdown of the costs associated with the extraction process.
name
(String): Name of the cost component (e.g., "Model Usage").cost
(Float): Cost in the specified currency.currency
(String): Currency code (e.g., "USD").tokens
(Int): Number of tokens used (if applicable to the cost model).
Good luck! 🖖 Feel free to reach out to me via:
- Email: elia.fri3erg@gmail.com
- LinkedIn: elia.friberg
- Whatsapp: @elia.friberg
Elia Friberg