Skip to content

Prabhdyals/MachineLearning-Oil-Analysis

 
 

Repository files navigation

Oil and Machine Learning

Description

In this project, we aim to use machine learning models to help predict the price and price direction of oil.


Table of Contents


Goals

Our goal is to compare two or more machine-learning models for identifying price and price direction of oil. For our predictions, we will use natural language processing to draw insights from news articles for the past 22 years. In addition, we will use oil close prices/returns, gold prices, S&P 500, as well as times of unrest (Iraq War 2003-2011). Machine learning typically requires extensive data preparation before the model can be trained. We will use Jupyter to prepare a training and testing dataset, and to train and compare the machine-learning model.


Technologies

Our portfolio analysis will use the following technologies:

  • pandas
  • numpy
  • datetime
  • pathlib
  • nltk
  • matplotlib
  • analyzer
  • dotenv
  • New York Times API
  • yfinance API
  • warnings
  • tensorflow

forthebadge made-with-python
Made withJupyter


Instructions

  1. To get the project started on your local machine, clone the GitHub repository.
  2. The first file we want to run is the crude_news_data. This will get the New York Times API data for a set amount of years. This may take around 45 minutes to run...
  3. The end result of this notebook will export a combined_csv file in a headlines folder, with all other articles throughout each month.
  4. Next, we use the crude_sentiment notebook that will get the news data from the combined_csv and run a sentiment analysis which will export an oil_sentiments csv file.
  5. Once we have the sentiment analysis data, we will load historical oil data and apply time series analysis and modeling to determine whether there is any predictable behavior in the oil_series_analysis notebook.

Conclusion

The oil price prediction worked better with the LSTM model compared to Linear Regression Model and Bayesian Ridge Model. While the Linear Regression uses one feature to predict the price, the Bayesian Ridge model used the five features considered and predicted the price using a normal distribution and probability. The price direction under the classification model worked slightly better in the random forest classifier compared to logistic regression. The feature importance of war in the price prediction was identified to be minimal compared to other features considered which could also be due to the fact that we had considered only one war period (due to lack of data availability).

Questions

1. How has oil prices behaved in the past 22 years?
Oil Price Graph
Oil Return Graph 2019

2. What is the sentiment of oil across the period based on news articles using NLP?
Sentiment Analysis

3. Identify other features for oil price movements (based on avialability of data)
Features

4. Compare model performances with each other when predicting oil prices.
-Linear Regression
Linear Regression
-LSTM
LSTM
-Bayesian Ridge
Bayesian Ridge

5. Compare model performances with each other when predicting oil returns direction.
-Logistic Regression
Classification Report - Logistic Regression
-Random Forest
Classification Report - Random Forest

6. Compare feature importance in the movement of oil prices.
Feature Importance Comparison


Contributors

Our team:


References and Resources

CNN Iraq War News
Yahoo Finance
How to Collect Data From The New York Times Over Any Period of Time
New York Times API
Introduction to Bayesian Linear Regression
Bayesian Ridge Regression

License

License: MIT

Copyright © 2022

About

Machine Learning-In this project, we aim to use machine learning models to help predict the price and price direction of oil.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%