8.7 Vector Distance
- Euclidean distance: In Euclidean space, we often call the L2-norm a Euclidean Norm. This gives the length of the vector from the origin. More generally, the Euclidean distance between two vectors \(\mathbf{x}\) and \(\mathbf{y}\) is the L2-norm of the difference, \[d(\mathbf{x},\mathbf{y}) = ||\mathbf{x} - \mathbf{y}||_2 = \sqrt{\sum_i (x_i - y_i)^2} = \sqrt{(\mathbf{x}-\mathbf{y})^T(\mathbf{x}-\mathbf{y})} \]
This is the distance ``as the crow flies.’’
- Minkowski distance: A more general distance measurement between two vectors is based on the general definition of the vector norm called the Minkowski distance, \[d(\mathbf{x},\mathbf{y}) = ||\mathbf{x} - \mathbf{y}||_p = \left(\sum_i |x_i - y_i|^p \right)^{1/p} \]
If \(p=2\), then you get the Euclidean distance (as the crow flies), \[d(\mathbf{x},\mathbf{y}) = \sqrt{\sum_i (x_i - y_i)^2 }\]
If \(p=1\), then you get the Manhattan distance (city-block distance), \[d(\mathbf{x},\mathbf{y}) = \sum_i |x_i - y_i| \]
If \(p\rightarrow\infty\), then you get Chebyshev distance, \[d(\mathbf{x},\mathbf{y}) =||\mathbf{x}-\mathbf{y}||_{\infty} = \max_i |x_i-y_i|\]
- Mahalanobis distance: In the context of random variables and data analysis, imagine trying to find the distance between two individuals based on values of \(k\) different variables. We could find the differences in values amongst all of the variables between the two individuals. However, the scale and units of each might be different. We could standardized each difference by dividing by the standard deviation of the variable amongst the entire sample so that each difference is comparable. In order to incorporate the dependence (covariance) between the variables as well, the Mahalanobis distance is a modified Euclidean distance and is calculated as \[d(\mathbf{x},\mathbf{y})= \sqrt{(\mathbf{x}-\mathbf{y})^T\mathbf{S}^{-1}(\mathbf{x}-\mathbf{y})} \] where \(\mathbf{S}\) is the \(k\times k\) sample covariance matrix of the variables based on the sample.