Regression forecasting and predicting – Practical Machine Learning Tutorial with Python p.5

Share it with your friends Like

Thanks! Share it with your friends!


In this video, make sure you define the X’s like so. I flipped the last two lines by mistake:

X = np.array(df.drop([‘label’],1))
X = preprocessing.scale(X)
X_lately = X[-forecast_out:]
X = X[:-forecast_out:]

To forecast out, we need some data. We decided that we’re forecasting out 10% of the data, thus we will want to, or at least *can* generate forecasts for each of the final 10% of the dataset. So when can we do this? When would we identify that data? We could call it now, but consider the data we’re trying to forecast is not scaled like the training data was. Okay, so then what? Do we just do preprocessing.scale() against the last 10%? The scale method scales based on all of the known data that is fed into it. Ideally, you would scale both the training, testing, AND forecast/predicting data all together. Is this always possible or reasonable? No. If you can do it, you should, however. In our case, right now, we can do it. Our data is small enough and the processing time is low enough, so we’ll preprocess and scale the data all at once.

In many cases, you wont be able to do this. Imagine if you were using gigabytes of data to train a classifier. It may take days to train your classifier, you wouldn’t want to be doing this every…single…time you wanted to make a prediction. Thus, you may need to either NOT scale anything, or you may scale the data separately. As usual, you will want to test both options and see which is best in your specific case.

With that in mind, let’s handle all of the rows from the definition of X onward.


Yash Sethia says:

If all we are doing in Linear Regression is trying to find a straight line that minimises the squared mean error, how is the predicted value non-linear ?

yolanda Zhang says:

I don't know the reason why when I run the same codes to predict the gold futures price, I got all the same number. It's so weird, does anyone know what happened?

Binary Bugs says:

Error: unsupported operand type(s) for -: 'str' and 'int'

I am new to this , help

Satyam Kumar says:

Sometimes it seems like he teaches like eminem. 😉 But still very helpful stuff

Steven Clive says:

can someone explain what the .name function is doing?

Grant Hawkins says:

All of your videos are great, I can't believe this is free!

Sam Harrison says:

how the plot understand that date is x-axis?

David Scully says:

For anyone doing this now, the stlye import isn't needed from matplotlib, its not part of the pyplot folder we're already importing;'ggplot')

M. R. K. says:

okay, guess what, I have to wait sometime because I used quandl many times in a short time 😀

Abhijit Mondal says:

when u do: X = X[:-forecast_out] and in the very next line do: X_lately = X[-forecast_out:]…….i don't think it will do what u intended to do…..i mean after changing X ur using X to change X_lately.

i think u should have done this:

# part of X for which label is given

X_previous = X[:-forecast_out]

# part of which X in not available (or to be predicted yet)

X_lately = X[-forecast_out:]


can someone explain to me the purpose of X_lately = X[-forecast_out:]

X = X[:-forecast_out:]

trismono candra krisna says:

I modify the loop as follows,

for i in forecast_set:
next_date = datetime.datetime.fromtimestamp(next_unix)
if next_date.weekday() == 5:
next_date = next_date + datetime.timedelta(days=2)
next_unix += one_day*3
next_unix += one_day
df.loc[next_date] = [np.nan for _ in range(len(df.columns)-1)] + [I]

because Saturday and Sunday the stock is closed and will be open again on Monday

Olamigoke Philip says:

For persons confused about date, time, timestamp, epoch etc.. Here's is a cleaner way to go about it.

# Adding new predicted data to existing record_end

for index, data in enumerate(forecast_set):
next_date = last_date + datetime.timedelta ( days=(index + 1) )
df.loc [next_date] = [ np.nan for _ in range ( len( df.columns ) -1 ) ] + [ data ]

Saqib Perwaiz says:

Even using the code written in the description of the video lengths of x and y are not coming out to be same.
Can anyone suggest some method to get it working?

Martin De Beer says:

Why is my output showing 35 and yours is showing 30?

karakol86 says:

Nothing plotted for me. The for loop works and I created an empty list to see then all the values and I only get one value. I don’t think it is looping all the way through the dataset. New to ML. Any help would be appreciated

Write a comment


Area 51