连续型变量的预测误差度量方法

本文总结讨论了针对连续型、单变量的预测误差度量方法

Let $Y_t$ denote the observation at time $t$ and $F_t$ denote the forecast of $Y_t$. Then define the forecast error $e_t=Y_t-F_t$.

Scale-Dependent Measures

These are useful when comparing different methods applied to the same set of data, but should not be used.

Measure	Acronym	Definition	Feature
Mean Square Error	MSE	$mean(e_t^2)$
Root Mean Square Error	RMSE	$\sqrt{MSE}$	Often, the RMSE is preferred to the MSE as it is on the same scale as the data
Mean Absolute Error	MAE	$mean(abs(e_t))$
Median Absolute Error	MdAE	$median(e_t)$

Measures Based on Percentage Errors

The percentage error is given by $p_t=100e_t/Y_t$. Percentage errors have the advantage of being scale-independent, and so are frequently used to compare forecast performance across different data sets.
These measures have the disadvantage of being infinite or undefined if $Y_t=0$ for any $t$ in the period of interest, and having an extremely skewed distribution when any value of $Y_t$ is close to zero.

Measure	Acronym	Definition	Feature
Mean Absolute Percentage Error	MAPE	$mean(abs(p_t))$	MAPE is often substantially larger than the MdAPE due to the skewed distribution when $Y_t$ is close to zero
Median Absolute Percentage Error	MdAPE	$median(abs(p_t))$
Root Mean Square Percentage Error	RMSPE	$\sqrt{mean(p_t^2)}$
Root Median Square Percentage Error	RMdSPE	$\sqrt{median(p_t^2)}$

The MAPE and MdAPE also have the disadvantage that they put a heavier penalty on positive errors than on negative errors. This observation led to the use of the so-called symmetric measures.
The problems arising from small values of $Y_t$ may be less severe for sMAPE and sMdAPE. However, even there if $Y_t$ is close to zero, $F_t$ is also likely to be close to zero.
Measures based on percentage errors are often highly skewed, and therefore transformations (such as logarithms) can make them more stable.

Measure	Acronym	Definition
Symmetric Mean Absolute Percentage Error	sMAPE	$mean(200*abs(Y_t-F_t)/(Y_t+F_t))$
Symmetric Median Absolute Percentage Error	sMdAPE	$median(200*abs(Y_t-F_t)/(Y_t+F_t))$

Measures Based on Relative Errors

An alternative way of scaling is to divide each error by the error obtained using another standard method of forecasting.
Let $r_t = e_t / e_t^*$ denote the relative error.
where $e_t^*$ is the forecast error obtained from the benchmark method.

Measure	Acronym	Definition
Mean Relative Absolute Error	MRAE	$mean(abs(r_t))$
Median Relative Absolute Error	MdRAE	$median(abs(r_t))$
Geometric Mean Relative Absoluate Error	GMRAE	$gmean(abs(r_t))$

Relative Measures

Rather than use relative errors, one can use relative measures.
For example, let $MAE_b$ denote the MAE from the benchmark method.
Then, a relative $MAE$ is given by $RelMAE = MAE/MAE_b$. Similar measures can be defined using RMSEs, MdAEs, MAPEs, etc.
When $RelMAE < 1$, the proposed method is better than the benchmark method, and when $RelMAE > 1$, the proposed method is worse than the benchmark method.

Percent Better

A related approach is to use the percentage of forecasts for which a given method is more accurate than the benchmark method. This is often known as Percent Better and can be expressed as $PB(MAE)=100mean(I(MAE<MAE_b))$

Weighted Measures

It is reasonable to assume that every prediction should not be treated equally.

For instance, we can assign weights in a way that the higher the weight, the higher importance we are placing on more recent data.
The weighted Mean Absolute Error for a recommender system can be computed as following, where
- $U$ represents the number of users;
- $N_i$ , the number of items predicted for the $i^{th}$ user;
- $r_{i,j}$, the rating given by the $i^{th}$ user to the item $I_j$;
- $p_{i,j}$, the rating predicted by the model;
- $w_{i,j}$ represents the weight associated to this prediction.

There Is Also Another Error Metric ?
$$WAPE = 100 \times \frac{sum(abs(Y_t-F_t))}{sum(Y_t)}$$

Scaled Errors

By scaling the error based on the in-sample MAE from the naive (random walk) forecast method. Thus, a scaled error is defined as following, which is clearly independent of the scale of the data.

A scaled error is less than one if it arises from a better forecast than the average one-step naive forecast computed in-sample. Conversely, it is greater than one if the forecast is worse than the average one-step naive forecast computed in-sample.
The Mean Absolute Scaled Error is simply
$$MASE=mean(|q_t|)$$
Related measures such as Root Mean Squared Scaled Error (RMSSE) and Median Absolute Scaled
Error (MdASE) can be defined analogously.
Of these measures, we prefer MASE as it is less sensitive to outliers and more easily interpreted than RMSSE, and less variable on small samples than MdASE.

Appendix

MAPE: Mean Absolute Percentage Error, where $A$ is actual value and $F$ is forecast value.
$$MAPE = \frac{100}{n}\sum_{t=1}^n \left | \frac{A_t-F_t}{A_t} \right |$$
RMSPE: Root Mean Square Percentage Error
$$RMSPE = \sqrt {\frac{1}{n}\sum_{t=1}^n (\frac{A_t-F_t}{A_t})^2}$$

Reference

Hyndman, R. J., & Koehler, A. B. (2006). Another look at measures of forecast accuracy. International journal of forecasting, 22(4), 679-688.

Cleger-Tamayo, S., Fernández-Luna, J. M., & Huete, J. F. (2012, September). On the Use of Weighted Mean Absolute Error in Recommender Systems. In RUE@ RecSys (pp. 24-26).

WMAPE?, W. (2017). What’s the gaps for the forecast error metrics: MAPE and WMAPE?. Stackoverflow.com. Retrieved 3 April 2017, from http://stackoverflow.com/questions/12994929/whats-the-gaps-for-the-forecast-error-metrics-mape-and-wmape