Skip to content

Commit 7db6c54

Browse files
committed
Clean up
1 parent 096859a commit 7db6c54

File tree

2 files changed

+0
-114
lines changed

2 files changed

+0
-114
lines changed

training/README.md

Lines changed: 0 additions & 74 deletions
This file was deleted.

training/train.py

Lines changed: 0 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,43 +1,3 @@
1-
"""Deep Learning Model Training with LSTM
2-
3-
This Python script is used for training a deep learning model using
4-
Long Short-Term Memory (LSTM) networks.
5-
6-
The script starts by importing necessary libraries. These include `sys`
7-
for interacting with the system, `pandas` for data manipulation, `tensorflow`
8-
for building and training the model, `sklearn` for splitting the dataset and
9-
calculating metrics, and `numpy` for numerical operations.
10-
11-
The script expects two command-line arguments: the input file and the output directory.
12-
If these are not provided, the script will exit with a usage message.
13-
14-
The input file is expected to be a CSV file, which is loaded into a pandas DataFrame.
15-
The script assumes that this DataFrame has a column named "Query" containing the text
16-
data to be processed, and a column named "Label" containing the target labels.
17-
18-
The text data is then tokenized using the `Tokenizer` class from
19-
`tensorflow.keras.preprocessing.text` (TF/IDF). The tokenizer is fit on the text data
20-
and then used to convert the text into sequences of integers. The sequences are then
21-
padded to a maximum length of 100 using the `pad_sequences` function.
22-
23-
The data is split into a training set and a test set using the `train_test_split` function
24-
from `sklearn.model_selection`. The split is stratified, meaning that the distribution of
25-
labels in the training and test sets should be similar.
26-
27-
A Sequential model is created using the `Sequential` class from `tensorflow.keras.models`.
28-
The model consists of an Embedding layer, an LSTM layer, and a Dense layer. The model is
29-
compiled with the Adam optimizer and binary cross-entropy loss function, and it is trained
30-
on the training data.
31-
32-
After training, the model is used to predict the labels of the test set. The predictions
33-
are then compared with the true labels to calculate various performance metrics, including
34-
accuracy, recall, precision, F1 score, specificity, and ROC. These metrics are printed to
35-
the console.
36-
37-
Finally, the trained model is saved in the SavedModel format to the output directory
38-
specified by the second command-line argument.
39-
"""
40-
411
import sys
422
import pandas as pd
433
from tensorflow.keras.preprocessing.text import Tokenizer

0 commit comments

Comments
 (0)