Prophet, a Facebook Research’s project, has marked its place among the tools used by ML and Data Science enthusiasts for time-series forecasting. Open-sourced on February 23, 2017 (blog), it uses an additive model to forecast time-series data. This article aims at providing an overview of the extensively used tool along with its Pythonic demonstration.
Highlighting features of Prophet
- It performs time-series forecasting “at scale” which means memory usage and computations complexity are not big-deal concerns for the Prophet while making a forecast.
- It can fit time-series data having non-linearity in trends as well as holiday effects.
- It works quite well with data having daily, weekly, monthly and/or yearly seasonality and in cases where we have several seasons of recorded historical data for making future forecasts.
- It has R and Python APIs for time-series forecasting.
- It can be downloaded as a CRAN or PyPI package.
- It is highly susceptible to missing data, outliers and erratic changes in time-series data.
- It makes use of the Stan platform for making forecasts quickly and with easily interpretable parameters.
NOTE: ‘Trend’ in time-series refers to an overall change in the data with time. While the term ‘seasonality’ means the way the data changes over a specific period e.g. week, month, year etc.
Working of Prophet
Image source: Facebook blog
Prophet employs an additive regression model having four constituents at its core:
- A curve for detecting changes in trends of the variable for which forecast is to be made by picking variation-points from the time-series data.
- A yearly seasonal component (uses Fourier series)
- A weekly seasonal component
- A customizable list representing holiday effects in the data
Practical implementation
Here’s a demonstration of using Python API for forecasting avocados’ prices using Prophet. The dataset used is available on Kaggle. The code implementation has been done using Google Colab and fbprophet 0.7.1 library. Step-wise implementation of the code is as follows:
- Install the fbprophet Python library.
!pip install fbprophet
- Import required libraries
import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from fbprophet import Prophet
- Load the avocado dataset.
df = pd.read_csv('avocado.csv')
- Display the initial records of the dataset.
df.head()
Output:
- Get information about columns, number of entries, data types etc. of the dataset.
df.info()
Output:
- Sort the DataFrame in ascending order of recorded date and create a new DataFrame having sorted records.
df1 = df.sort_values("Date")
Display some initial records of the sorted data.
df1.head()
Output:
- Plot the recorded prices and observe the trend.
First, get the minimum and maximum dates in the historical data.
df1[‘Date’].min()
Output: 2015-01-04
df2[‘Date’].max()
Output: 2018-03-25
These outputs show that we have records from January 2015 to March 2018.Plot the prices of that period.
plt.figure(figsize=(25,10)) plt.plot(df1['Date'],df1['AveragePrice'])
Output:
- We can also observe region-wise distribution of the data.
plt.figure(figsize=(25,12)) sns.countplot(x='region',data=df1) plt.xticks(rotation=45)
Output:
(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53]), <a list of 54 Text major ticklabel objects>)
The plot shows that the data is balanced i.e. equally distributed region-wise.
- Know the year-wise count of records in the data.
sns.countplot(x='year',data=df1)
Output:
- Prophet expects a DataFrame as input in which there are two columns specifically named as ‘ds’ and ‘y’. ‘ds’ is the datestamp column while ‘y’ is the numeric variable for which forecast is to be made.So we need to keep only the ‘Date’ and ‘AveragePrice’ columns of df1 DataFrame and rename them as ‘ds’ and ‘y’ respectively.
Extract the two required columns
df1 = df1[['Date','AveragePrice']] df1
Output:
Rename the columns
df1.columns = ['ds','y'] #Display initial columns to check if the columns have got renamed df1.head()
Output:
- Forecast the future prices using Prophet.
Create a Prophet instance
m = Prophet()
Fit the historical data
m.fit(df1)
Create a DataFrame with future dates for forecast.
future = m.make_future_dataframe(periods=365) #periods=365 specifies that forecast will be made for next 1 year
df1 has dates till 25/3/2018 so ‘future’ will be till 25/3/2019. Predict the prices for this new data having future dates as well
forecast = m.predict(future)
Get information on the ‘forecast’ DataFrame created by Prophet.
forecast.info()
Display a few initial records of ‘forecast’.
forecast.head()
Condensed output:
11) Plot the data with recorded as well as forecasted prices.
figure = m.plot(forecast,xlabel='Date',ylabel='Price')
Output:
Our original data had monthly records till February 2019. The blue-shaded portion of the above plot shows the prices predicted for the next one year’s span, i.e. till February 2019.
Actual recorded prices have been marked with black dots in the above plot, while the The blue non-linear line shows the average predicted prices.
- Plot the components of the forecast.
figure = m.plot_components(forecast)
Output:
- The above forecast is made for all regions in general. We can make forecast for a specific region as follows:
Extract data of the required region from the original data.
df2 = df[df['region']=='West']
Display initial records.
df.head()
Output:
- Sort the regional data in ascending order of dates.
df2 = df2.sort_values('Date')
Plot the recorded prices for that specific region.
plt.figure(figsize=(15,10)) plt.plot(df2['Date'],df2['AveragePrice'])
Output:
- Extract the ‘Date’ and ‘AveragePrice’ column and rename them as ‘ds’ and ‘y’ respectively.
df2 = df2[['Date','AveragePrice']] df2.columns = ['ds','y']
- Create Prophet instance and fit the data
m = Prophet() m.fit(df2)
Forecast prices for the next one year for that specific region.
future = m.make_future_dataframe(periods=365) forecast = m.predict(future)
- Plot the recorded and forecasted prices for the region.
figure = m.plot(forecast,xlabel='Date',ylabel='Price')
Output:
(Black dots: actual price values, Blue curve: predicted prices)
figure = m.plot_components(forecast)
Output:
- Check Google colab notebook for the whole code here.