Root mean square deviation - Wikipedia
Jump to content
From Wikipedia, the free encyclopedia
(Redirected from
Root-mean-square deviation
Statistical measure
Statistics
Outline
Statisticians
Glossary
Notation
Journals
Lists of topics
List of articles
Category
Mathematics portal
The
root mean square deviation
RMSD
) or
root mean square error
RMSE
) is a frequently used measure of the distances between actual observed values and an
estimation
of them (e.g. true/predicted in regression tasks of
Machine learning
).
The
deviation
is typically simply a differences of
scalars
; it can also be generalized to the
vector lengths
of a
displacement
, as in the
bioinformatics
concept of
root mean square deviation of atomic positions
RMSD of a sample
edit
The RMSD of a
sample
is the
quadratic mean
of the differences between the observed values and predicted ones. These
deviations
are called
residuals
when the calculations are performed over the data sample that was used for estimation (and are therefore always in reference to an estimate) and are called
errors
(or prediction errors) when computed out-of-sample (aka on the full set, referencing a true value rather than an estimate). The RMSD serves to aggregate the magnitudes of the errors in predictions for various data points into a single measure of predictive power. RMSD is a measure of
accuracy
, to compare forecasting errors of different models for a particular dataset and not between datasets, as it is scale-dependent.
RMSD is always non-negative, and a value of 0 (almost never achieved in practice) would indicate a perfect fit to the data. In general, a lower RMSD is better than a higher one. However, comparisons across different types of data would be invalid because the measure is dependent on the scale of the numbers used.
RMSD is the square root of the average of squared errors. The effect of each error on RMSD is proportional to the size of the squared error; thus larger errors have a disproportionately large effect on RMSD. Consequently, RMSD is sensitive to
outliers
Formulas
edit
Estimator
edit
The RMSD of an
estimator
{\displaystyle {\hat {\theta }}}
with respect to an estimated parameter
{\displaystyle \theta }
is defined as the square root of the
mean squared error
RMSD
MSE
{\displaystyle \operatorname {RMSD} ({\hat {\theta }})={\sqrt {\operatorname {MSE} ({\hat {\theta }})}}={\sqrt {\operatorname {E} {\big (}({\hat {\theta }}-\theta )^{2}{\big )}}}.}
For an
unbiased estimator
, the RMSD is the square root of the
variance
, known as the
standard deviation
Samples
edit
If
, ...,
is a sample of a population with true mean value
{\displaystyle x_{0}}
, then the RMSD of the sample is
RMSD
{\displaystyle \operatorname {RMSD} ={\sqrt {{\frac {1}{n}}\sum _{i=1}^{n}(X_{i}-x_{0})^{2}}}}
The RMSD of predicted values
{\displaystyle {\hat {y}}_{t}}
for times
of a
regression's
dependent variable
{\displaystyle y_{t},}
with variables observed over
times, is computed for
different predictions as the square root of the mean of the squares of the deviations:
RMSD
{\displaystyle \operatorname {RMSD} ={\sqrt {\frac {\sum _{t=1}^{T}(y_{t}-{\hat {y}}_{t})^{2}}{T}}}.}
(For regressions on
cross-sectional data
, the subscript
is replaced by
and
is replaced by
.)
In some disciplines, the RMSD is used to compare differences between two things that may vary, neither of which is accepted as the "standard". For example, when measuring the average difference between two time series
{\displaystyle x_{1,t}}
and
{\displaystyle x_{2,t}}
the formula becomes
RMSD
{\displaystyle \operatorname {RMSD} ={\sqrt {\frac {\sum _{t=1}^{T}(x_{1,t}-x_{2,t})^{2}}{T}}}.}
Normalization
edit
Normalizing the RMSD facilitates the comparison between datasets or models with different scales. Though there is no consistent means of normalization in the literature, common choices are the mean or the range (defined as the maximum value minus the minimum value) of the measured data:
max
min
{\displaystyle \mathrm {NRMSD} ={\frac {\mathrm {RMSD} }{y_{\max }-y_{\min }}}}
or
{\displaystyle \mathrm {NRMSD} ={\frac {\mathrm {RMSD} }{\bar {y}}}}
This value is commonly referred to as the
normalized root mean square deviation
or
error
(NRMSD or NRMSE), and often expressed as a percentage, where lower values indicate less residual variance. This is also called
Coefficient of Variation
or
Percent RMS
. In many cases, especially for smaller samples, the sample range is likely to be affected by the size of sample which would hamper comparisons.
Another possible method to make the RMSD a more useful comparison measure is to divide the RMSD by the
interquartile range
(IQR). When dividing the RMSD with the IQR the normalized value gets less sensitive for extreme values in the target variable.
{\displaystyle \mathrm {RMSDIQR} ={\frac {\mathrm {RMSD} }{IQR}}}
where
{\displaystyle IQR=Q_{3}-Q_{1}}
with
CDF
0.25
{\displaystyle Q_{1}={\text{CDF}}^{-1}(0.25)}
and
CDF
0.75
{\displaystyle Q_{3}={\text{CDF}}^{-1}(0.75),}
where CDF
−1
is the
quantile function
When normalizing by the mean value of the measurements, the term
coefficient of variation of the RMSD, CV(RMSD)
may be used to avoid ambiguity.
This is analogous to the
coefficient of variation
with the RMSD taking the place of the
standard deviation
{\displaystyle \mathrm {CV(RMSD)} ={\frac {\mathrm {RMSD} }{\bar {y}}}.}
Applications
edit
In
meteorology
, to see how effectively a
mathematical
model predicts the behavior of the
atmosphere
In
bioinformatics
, the
root mean square deviation of atomic positions
is the measure of the average distance between the atoms of
superimposed
proteins
In
structure based drug design
, the RMSD is a measure of the difference between a crystal conformation of the ligand
conformation
and a
docking
prediction.
In
economics
, the RMSD is used to determine whether an economic model fits
economic indicators
In
experimental psychology
, the RMSD is used to assess how well mathematical or computational models of behavior explain the empirically observed behavior.
In
GIS
, the RMSD is one measure used to assess the accuracy of
spatial analysis
and
remote sensing
In
hydrogeology
, RMSD and NRMSD are used to evaluate the calibration of a groundwater model.
In
imaging science
, the RMSD is part of the
peak signal-to-noise ratio
, a measure used to assess how well a method to reconstruct an image performs relative to the original image.
In
computational neuroscience
, the RMSD is used to assess how well a system learns a given model.
In
protein nuclear magnetic resonance spectroscopy
, the RMSD is used as a measure to estimate the quality of the obtained bundle of structures.
Submissions for the
Netflix Prize
were judged using the RMSD from the test dataset's undisclosed "true" values.
In the simulation of energy consumption of buildings, the RMSE and CV(RMSE) are used to calibrate models to measured
building performance
In
X-ray crystallography
, RMSD (and RMSZ) is used to measure the deviation of the molecular internal coordinates deviate from the restraints library values.
In control theory, the RMSE is used as a quality measure to evaluate the performance of a
state observer
10
In
fluid dynamics
, normalized root mean square deviation (NRMSD), coefficient of variation (CV), and percent RMS are used to quantify the uniformity of flow behavior such as velocity profile, temperature distribution, or gas species concentration. The value is compared to industry standards to optimize the design of flow and thermal equipment and processes.
See also
edit
Root mean square
Mean absolute error
Average absolute deviation
Mean signed deviation
Mean squared deviation
Squared deviations
Errors and residuals in statistics
Coefficient of Variation
Normalized estimation error squared
References
edit
Hyndman, Rob J.; Koehler, Anne B. (2006). "Another look at measures of forecast accuracy".
International Journal of Forecasting
22
(4):
679–
688.
CiteSeerX
10.1.1.154.9771
doi
10.1016/j.ijforecast.2006.03.001
S2CID
15947215
Pontius, Robert; Thontteh, Olufunmilayo; Chen, Hao (2008).
"Components of information for multiple resolution comparison between maps that share a real variable"
(PDF)
Environmental Ecological Statistics
15
(2):
111–
142.
Bibcode
2008EnvES..15..111P
doi
10.1007/s10651-007-0043-y
S2CID
21427573
Willmott, Cort; Matsuura, Kenji (2006). "On the use of dimensioned measures of error to evaluate the performance of spatial interpolators".
International Journal of Geographical Information Science
20
(1):
89–
102.
Bibcode
2006IJGIS..20...89W
doi
10.1080/13658810500286976
S2CID
15407960
"Coastal Inlets Research Program (CIRP) Wiki - Statistics"
. Retrieved
4 February
2015
"FAQ: What is the coefficient of variation?"
. Retrieved
19 February
2019
Armstrong, J. Scott; Collopy, Fred (1992).
"Error Measures For Generalizing About Forecasting Methods: Empirical Comparisons"
(PDF)
International Journal of Forecasting
(1):
69–
80.
CiteSeerX
10.1.1.423.508
doi
10.1016/0169-2070(92)90008-w
S2CID
11034360
Anderson, M.P.; Woessner, W.W. (1992).
Applied Groundwater Modeling: Simulation of Flow and Advective Transport
(2nd ed.). Academic Press.
Ensemble Neural Network Model
ANSI/BPI-2400-S-2012: Standard Practice for Standardized Qualification of Whole-House Energy Savings Predictions by Calibration to Energy Use History
Machine learning
evaluation metrics
Regression
MSE
MAE
sMAPE
MAPE
MASE
MSPE
RMS
RMSE/RMSD
MDA
MAD
Classification
F-score
P4
Accuracy
Precision
Recall
Kappa
MCC
AUC
ROC
Sensitivity and specificity
Logarithmic loss
Clustering
Silhouette
Calinski–Harabasz index
Davies–Bouldin index
Dunn index
Hopkins statistic
Jaccard index
Rand index
Similarity measure
SMC
DBCV index
Ranking
MRR
NDCG
AP
Computer vision
PSNR
SSIM
IoU
NLP
Perplexity
BLEU
MAUVE
Deep learning
Inception score
FID
Recommender system
Coverage
Intra-list similarity
Similarity
Cosine similarity
Euclidean distance
Pearson correlation coefficient
Confusion matrix
Retrieved from "
Categories
Point estimation performance
Statistical deviation and dispersion
Hidden categories:
Articles with short description
Short description matches Wikidata
Root mean square deviation
Add topic