Z-score

Z-score

Juan M. Gutierrez

Z-Score, what's and how to do

Juan M. Gutierrez

2023-03-08

What is the Z-score?

A zz-score is the number of standard deviations you report from the mean of a data point. It is a statistical proportion that measures the number of standard deviations below or above the population, it's also a raw score. zz-score is also called a standard score and can be placed onto a normal dispersion curve. zz-scores range from -3 standard deviations (meaning the data would go to the far left of the normal distribution curve) to +3 standard deviations (far right of the normal distribution curve). To use a zz-score, one must know the mean μ\mu and the population standard deviation σ\sigma.

zz-scores is an approach to contrasting the results of a sample test with an ordinary population. In the applications of field studies, sample tests or any study might report several possible outcomes, which may seem useless. Also, zz-score can reveal where the observed data compares to the mean value of the normal distribution.

Z=(xμ)σZ=\frac{(x-\mu )}{\sigma}

  • xx is a raw score to be standardized
  • μ\mu is the mean of the population
  • σ\sigma is the standard deviation of the population

Interpretation

A zz-score of 1 will be one standard deviation above the mean. A score of 2 will be two standard deviations above the mean.

A score of -1.8 is -1.8 standard deviations below the mean. A zz-score reveals where the score lies on a typical scatter curve.

A zz-score of zero tells you that the qualities are normal distribution, while a score of +3 tells you that the value is much higher than normal distribution.

Example

We built a simple series to illustrate the z-score calculation using R.

# Create sample data
data <- c(8, 7, 7, 10, 13, 14, 15, 16, 18)

# Calculate mean and standard deviation
mean_val <- mean(data)
sd_val <- sd(data)

cat("Mean:", mean_val, "\n")
cat("Standard Deviation:", sd_val, "\n")
Mean: 12
Standard Deviation: 4.123106
# Calculate z-scores
z_scores <- (data - mean_val) / sd_val
z_scores
[1] -0.9701425 -1.2126781 -1.2126781 -0.4850713  0.2425356
[6]  0.4850713  0.7276069  0.9701425  1.4552138
# Create a data frame for visualization
df <- data.frame(
  value = data,
  z_score = z_scores
)
print(df)
  value    z_score
1     8 -0.9701425
2     7 -1.2126781
3     7 -1.2126781
4    10 -0.4850713
5    13  0.2425356
6    14  0.4850713
7    15  0.7276069
8    16  0.9701425
9    18  1.4552138
# Plot the z-scores
plot(z_scores, type = "b", main = "Z-Scores", xlab = "Index", ylab = "Z-Score",
     col = "blue", pch = 19)
abline(h = 0, col = "red", lty = 2)
# ggplot2 visualization
library(ggplot2)
ggplot(df, aes(x = seq_along(z_score), y = z_score)) +
  geom_point(color = "blue", size = 3) +
  geom_line(color = "blue") +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Z-Scores Distribution",
       x = "Index",
       y = "Z-Score") +
  theme_minimal()

Summary

zz-scores measure the deviation of data from the mean, expressed in units of standard deviation. They are essential for:

  • Hypothesis testing: Determining whether observed values are statistically significant.
  • Data standardization: Comparing data points from different distributions.
  • Outlier detection: Identifying values that fall far from the mean (typically z>3|z| > 3).

A positive zz-score indicates a value above the mean, while a negative zz-score indicates a value below the mean. The further the zz-score from zero, the more unusual the data point.

🚀Comparte este artículo

Math Chaos Texture
🎓

¿Te gustó este contenido?

Obtén certificados verificables en Python, Data Science y Machine Learning.

Ver Certificaciones Disponibles →