MP as input features of a deep learning model for time-series forecasting #209

fcggamou · 2020-06-17T18:35:03Z

fcggamou
Jun 17, 2020

Hi, I'm new to all of these concepts so please bear with me.
I'm trying to use the matrix profile as the input features of a neural network that generates forecasts over a time series.

Let's say my time series has a daily granularity, and my NN uses a 30-day window as the input for each prediction. So far so good.

So let's say I generate a MP for my time series using m>=30. The resulting MP has now 'm' less data points than my original series. And I am using the latest 30 days as the input for my network, so I actually do not have any MP info to include as an input feature in my network if I want to predict the future.

Am I misunderstanding some fundamental aspect of this? Should I use an 'm' lower than my input window so I can include some of the MP values into it?

Thanks!

seanlaw · 2020-06-17T19:09:25Z

seanlaw
Jun 17, 2020
Maintainer

Welcome to the STUMPY community @fcggamou!

So let's say I generate a MP for my time series using m>=30. The resulting MP has now 'm' less data points than my original series.

Yes, this fundamental point is correct. Technically, there are n - m + 1 values where n is the length of your time series and m is the window size

And I am using the latest 30 days as the input for my network, so I actually do not have any MP info to include as an input feature in my network if I want to predict the future.

I'm no NN expert but maybe you can provide a more concrete example. Let's say you have a time series from the last year (365 days) and so you compute the matrix profile for it which produces 365 - 30 + 1 = 336 matrix profile values. Each value maps to the first 336 30-day sliding windows. Then what?

If I understand correctly, depending on your NN, you have features that represent each sliding window. A feature could be as simple as the average value in each window. And still, you would only have 336 averages. Maybe I'm missing something here and you can provide more information?

0 replies

fcggamou · 2020-06-18T13:39:30Z

fcggamou
Jun 18, 2020
Author

Thanks a lot for your answer.

The difference (I think, but maybe I'm not grasping MP correctly), is that e.g. when calculating a moving average you "lose" data at the start of the series but not at the end.

Let's say we have a year of daily data of a series, so 365 data points of a variable called x, where xt is the value of x at a given day t.
Let's say we want to predict the value of xt+1, using as input a sliding window of the latest 30 days, so we would use as input (xt-29, ..., xt) to predict xt+1. If we repeat this process on the 365 days dataset in order to generate a training set, you can see that we would have 365 - 30 = 335 training examples only, but that's no problem.

Similarly, we could also use the latest 30 values of the moving average of the latest 45 days as input, let's call it MA45. Our input would look like (MA45t-29, ..., MA45t). In this case we need an extra 45 data points as input, so we would have 365 - 30 - 45 training examples. But this is OK also, if I have a big enough dataset, we can still train and make predictions for the future.

The difference I see with the MP, is that we lose data points in the "front" of the dataset, so if at a given point t, we want to use the latest 30 points of MP with m=45, we just can't. In other words, at a given point t, we don't have MPt-1. It is available only after reaching the point t+45.

Maybe I'm missing something, e.g. maybe I shouldn't treat the MP as a sliding window input, or maybe it would be good enough information to use as input to just feed "old values" of MP e.g. MPt-45, ..., MPt-60.

0 replies

seanlaw · 2020-06-18T15:53:21Z

seanlaw
Jun 18, 2020
Maintainer

The difference (I think, but maybe I'm not grasping MP correctly), is that e.g. when calculating a moving average you "lose" data at the start of the series but not at the end.

Hmmm, I get what you mean but I feel like this is mostly a convention. That is, once you compute the average for a given window, that average represents the window as a whole. Should the average of the window be "anchored" to the beginning of the window, the middle of the window, or the end? For a moving average, all of these are equally "correct" in my opinion. Matrix profile happens to anchor the value relative to the start of the window. There's nothing stopping you from shifting everything forward by m. And then you just need to switch your mental model so that the ith matrix profile value now refers to T[i-m : i] (where i corresponds to the last element in the subsequence rather than the first element). Maybe I'm misunderstanding?

Let's say we have a year of daily data of a series, so 365 data points of a variable called x, where xt is the value of x at a given day t.
Let's say we want to predict the value of xt+1, using as input a sliding window of the latest 30 days, so we would use as input (xt-29, ..., xt) to predict xt+1. If we repeat this process on the 365 days dataset in order to generate a training set, you can see that we would have 365 - 30 = 335 training examples only, but that's no problem.

I think it should be 365-30 + 1 = 336 but, otherwise, I am on the same page here.

Similarly, we could also use the latest 30 values of the moving average of the latest 45 days as input, let's call it MA45. Our input would look like (MA45t-29, ..., MA45t). In this case we need an extra 45 data points as input, so we would have 365 - 30 - 45 training examples. But this is OK also, if I have a big enough dataset, we can still train and make predictions for the future.

Sorry, I'm getting lost here and am not able to follow. I can't seem to parse the first sentence.

The difference I see with the MP, is that we lose data points in the "front" of the dataset, so if at a given point t, we want to use the latest 30 points of MP with m=45, we just can't. In other words, at a given point t, we don't have MPt-1. It is available only after reaching the point t+45.

Maybe I'm missing something, e.g. maybe I shouldn't treat the MP as a sliding window input, or maybe it would be good enough information to use as input to just feed "old values" of MP e.g. MPt-45, ..., MPt-60.

Unfortunately, I am unable to get your point 😞 I don't understand the relevance/purpose regarding the 45. In matrix profile and moving average, the sliding window is m=30.

I don't understand what is meant by "we want to use the latest 30 points of MP with m=45"

0 replies

seanlaw · 2020-07-02T14:37:44Z

seanlaw
Jul 2, 2020
Maintainer

@fcggamou Just following up on this to see if you have any further comments or questions?

0 replies

seanlaw · 2020-07-20T15:13:29Z

seanlaw
Jul 20, 2020
Maintainer

@fcggamou I'm going to close this for now. Please feel free to re-open or start a new issue if you have any further questions. Thank you for using STUMPY!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MP as input features of a deep learning model for time-series forecasting #209

Uh oh!

{{title}}

Uh oh!

Replies: 5 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

MP as input features of a deep learning model for time-series forecasting #209

Uh oh!

fcggamou Jun 17, 2020

Replies: 5 comments

Uh oh!

seanlaw Jun 17, 2020 Maintainer

Uh oh!

Uh oh!

fcggamou Jun 18, 2020 Author

Uh oh!

Uh oh!

seanlaw Jun 18, 2020 Maintainer

Uh oh!

seanlaw Jul 2, 2020 Maintainer

Uh oh!

seanlaw Jul 20, 2020 Maintainer

fcggamou
Jun 17, 2020

seanlaw
Jun 17, 2020
Maintainer

fcggamou
Jun 18, 2020
Author

seanlaw
Jun 18, 2020
Maintainer

seanlaw
Jul 2, 2020
Maintainer

seanlaw
Jul 20, 2020
Maintainer