Z-Score, what's and how to do

Juan M. Gutierrez

2023-03-08

What is the Z-score?

A $z$ -score is the number of standard deviations you report from the mean of a data point. It is a statistical proportion that measures the number of standard deviations below or above the population, it's also a raw score. $z$ -score is also called a standard score and can be placed onto a normal dispersion curve. $z$ -scores range from -3 standard deviations (meaning the data would go to the far left of the normal distribution curve) to +3 standard deviations (far right of the normal distribution curve). To use a $z$ -score, one must know the mean $\mu$ and the population standard deviation $\sigma$ .

$z$ -scores is an approach to contrasting the results of a sample test with an ordinary population. In the applications of field studies, sample tests or any study might report several possible outcomes, which may seem useless. Also, $z$ -score can reveal where the observed data compares to the mean value of the normal distribution.

$Z=\frac{(x-\mu )}{\sigma}$

$x$ is a raw score to be standardized
$\mu$ is the mean of the population
$\sigma$ is the standard deviation of the population

Interpretation

A $z$ -score of 1 will be one standard deviation above the mean. A score of 2 will be two standard deviations above the mean.

A score of -1.8 is -1.8 standard deviations below the mean. A $z$ -score reveals where the score lies on a typical scatter curve.

A $z$ -score of zero tells you that the qualities are normal distribution, while a score of +3 tells you that the value is much higher than normal distribution.

Example

We built a simple series to illustrate the z-score calculation using R.

# Create sample data
data <- c(8, 7, 7, 10, 13, 14, 15, 16, 18)

# Calculate mean and standard deviation
mean_val <- mean(data)
sd_val <- sd(data)

cat("Mean:", mean_val, "\n")
cat("Standard Deviation:", sd_val, "\n")

Mean: 12
Standard Deviation: 4.123106

# Calculate z-scores
z_scores <- (data - mean_val) / sd_val
z_scores

[1] -0.9701425 -1.2126781 -1.2126781 -0.4850713  0.2425356
[6]  0.4850713  0.7276069  0.9701425  1.4552138

# Create a data frame for visualization
df <- data.frame(
  value = data,
  z_score = z_scores
)
print(df)

  value    z_score
1     8 -0.9701425
2     7 -1.2126781
3     7 -1.2126781
4    10 -0.4850713
5    13  0.2425356
6    14  0.4850713
7    15  0.7276069
8    16  0.9701425
9    18  1.4552138

# Plot the z-scores
plot(z_scores, type = "b", main = "Z-Scores", xlab = "Index", ylab = "Z-Score",
     col = "blue", pch = 19)
abline(h = 0, col = "red", lty = 2)

# ggplot2 visualization
library(ggplot2)
ggplot(df, aes(x = seq_along(z_score), y = z_score)) +
  geom_point(color = "blue", size = 3) +
  geom_line(color = "blue") +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Z-Scores Distribution",
       x = "Index",
       y = "Z-Score") +
  theme_minimal()

Summary

$z$ -scores measure the deviation of data from the mean, expressed in units of standard deviation. They are essential for:

Hypothesis testing: Determining whether observed values are statistically significant.
Data standardization: Comparing data points from different distributions.
Outlier detection: Identifying values that fall far from the mean (typically $|z| > 3$ ).

A positive $z$ -score indicates a value above the mean, while a negative $z$ -score indicates a value below the mean. The further the $z$ -score from zero, the more unusual the data point.

Z-score

Z-Score, what's and how to do

Juan M. Gutierrez

2023-03-08

What is the Z-score?

Interpretation

Example

Summary

Comportamiento humano y autoridad: cómo entender y prevenir el abuso de poder

ESTRATEGIAS DE MUESTREO

🚀Comparte este artículo

¿Te gustó este contenido?