Business Forecasting

Many industries collect and maintain data.  You might think of the shipping industry where large SQL databases are kept to log product inventory and track the number of items shipped monthly or, perhaps weekly. 

Despite its bankruptcy, Circuit City comes to mind with its thousands of product lines being shelved every week. 

Other warehouses of products, such as Walmart, Best Buy, and the multitude of large malls, also contain relational databases of their product transactions.  Perhaps you have a favorite restaurant down the street that tracks weekly customer volume. 

How might these businesses use this data to improve their overall system?

Well, a restaurant can monitor customer behavior in order to associate volume patterns with times of the year.  Be it daily, weekly, monthly, quarterly, or yearly behavioral patterns, the restaurant can model this and project volume for next week, month, quarter, or year. 

In this way, management can utilize this insight for staffing purposes.  In essence, they can save money by avoiding over-staffing and boost customer satisfaction by alleviating the under-staff dilemma. 

Here are a few more industries that might benefit from this process:

  • Hotels – booking data
  • Restaurants – customer volume & staffing data
  • Warehouses – sales data for many product lines (i.e. Circuit City, Walmart, etc.)
  • Malls – sales for different products in shoe & clothing stores, etc.
  • Shipping Industry – monthly cargo
  • Airlines – passenger data
  • App companies – such as Airbnb & Uber

Data Frames & Time Series

Now, these businesses and large companies have large databases of multiple related tables.  But, how might the data look after cleaned and prepared for analysis? 

Well, typically, we see 2 types of formats: a data frame or a time series.

A data frame is really just a cross-sectional slice of data (i.e. one point in time), where each row has a different metric recorded on the same observation. Here, we can predict the value of one metric based on the value of other metrics.

On the other hand, a time series is a collection of equally-spaced (daily, weekly, monthly, etc.) values on the same metric. So, it’s many points in time.

Let’s take a look at a few examples of the 2. 

Take a look below at 4 different data sets (taken from the fpp package developed by Rob J Hyndman using R software): anti-diabetic drug sales in Australia, beer production in Australia, international visitors to Australia, and air passengers on Answett Airlines. 

Right away, as discerned from the first column containing “Date”, we see that all 4 of these data sets are arranged chronologically in a time sequence and, therefore, these are time series data. 

At a closer glance, notice the difference between all 4.  Each contains different time increments in the “Date” column, and we refer to this as periodicity (i.e. periodicity can be daily, weekday, weekly, monthly, quarterly, yearly). 

Starting from the left, we see monthly drug sales, then quarterly beer production, yearly international visitors and, finally, weekly air passengers. 

Also notice that the first 3 contain only 1 measure, while the 4th time series contains 3 measures: First Class, Business Class, and Economy Class.  Keep this in mind, as our forecasting solution can take one or multiple measures (we call instruments) as input.

So, here below is another time series set (also from fpp) containing multiple instruments (this time 8; i.e. visitor nights for 8 regions in Australia) measured quarterly.  Our forecasting app will easily take this data with all 8 instruments as input for a forecast analysis.

…and the Data Frame

Ok, so we have seen enough examples of the composition of a time series data set. 

What about the other type, a data frame? 

With this format, data is stored for many different metrics (i.e. price, number of rooms, average neighborhood income, and number of stories for housing data) with less emphasis placed on time. 

So, the data does not necessarily have to be in a chronological sequence. 

With this type of data, we would not be curious to detect a behavioral patter based on the time of year. 

Rather, we are looking for how the value of one metric (i.e. housing price) varies with the value of another metric (i.e. number of rooms). 

So, we might try to compare the price of houses with few rooms to the price of those with several rooms.  Take a look at the credit data below (also taken from fpp package).

We see what might be a customer ID in the far left corner, but no column for date. 

Hence, this is not time-ordered data, but a frame of different observations (customers) with several measures/variables per observation: score, savings, income, full time employment, single status, time at current address, and time with current employer. 

Since we are not looking at patterns over time, we might build a model to predict credit score based on the values of the other variables.  So, we use a machine learning technique such as regression. 

Analytics

You can see the depiction below of different analytical methods used for different data formats (namely data frames and time series here).

Remember, is that seen on the far right, as forecasting is built for demand, sales, inventory, vacancies, and any other numerical measure.

In other words, it is not concerned with the effect of one measure on another. 

Rather, it projects future values of one measure based solely on its own historical behavioral pattern. 

This pattern usually includes a mixture of the following components: trend, seasonality, cyclicality, and autocorrelation (or the effect of previous values on the current value).

The Problem

Now, many companies in industry out there today are using forecasting already.  So, you ask, what is the actual problem?  Well, in this particular situation, most companies face 1 of 2 problems:

  1. They pay out thousands to millions in licensing costs for commercial software such as SAS, SPSS, or NCSS.
  2. They don’t have the analytical ability to leverage this type of information to improve their business processes (i.e. they don’t have the software or people available to use these methods)

ForecastGuru Solution

Our solution solves these issues in several ways:

  1. Our service is offered at a fraction of the cost of purchasing commercial software
  2. We already have the system developed
  3. The solution is statistically sound

Therefore, the company has no necessity to hire an analyst to create the system, no need to purchases expensive software, thus saving time and money. 

Ok, so you like the sound of what we have, but you are thinking you can build your own model. 

Well, therein lies the problem: there are MANY models that fit data patterns with various behaviors. 

Depending on the behavioral nature of the data in question, one model will forecast points into the future more accurately than other models. 

Getting an accurate model heavily depends on selecting one that correctly illuminates components inherent in the data.  Taking these components into account, forecasts may be based on:

  1. Average of all values
  2. Simply the previous value
    1. With option to stray up or down from previous value
  3. Weighted average of all previous values & levels (no trend or seasonality; only level)
    1. With more emphasis placed on recent time periods (days, weeks, months, etc.)
  4. Trend, Cycle, & Seasonality
    1. Can be determined by smoothing techniques (moving averages, locally weighted regression)
    1. Trends may be constant or may follow a growth/decay rate
    1. Seasonality may follow constant fluctuations, changing fluctuations, or increase/decrease in magnitude with each new turn of season
  5. Differencing a series that is non-stationary
    1. Subtracting most recent observation from every observation
  1. Weighted moving average of error components from recent observations
  2. Autoregressive behavior – the dependency of future values on most recent observations
  3. Some combination of all of the above

It is very hard to determine the components inherent in a time series without graphical depictions, and these depictions take time to produce. 

Added to this, building an analysis for every model type on the same data and then comparing accuracies is extremely time-consuming and unproductive. 

That is why we have automated the model-building process which expediently fits any given numerical time series to 17 of the most commonly used forecasting models used in analytics. 

The tool then assesses the forecasting accuracy of every model (based on mean average percent error, or MAPE), and chooses the model with the least error. 

Naturally, this will be the model that best exemplifies the components inherent in the data (i.e. trend, seasonality, cyclicality, autocorrelation). 

Ok, so you understand, are tired of the discussion, and are ready to see our solution in action!  Let’s walk through an example.