
Z-score
Z-Score, what's and how to do
Juan M. Gutierrez
2023-03-08
What is the Z-score?
A -score is the number of standard deviations you report from the mean of a data point. It is a statistical proportion that measures the number of standard deviations below or above the population, it's also a raw score. -score is also called a standard score and can be placed onto a normal dispersion curve. -scores range from -3 standard deviations (meaning the data would go to the far left of the normal distribution curve) to +3 standard deviations (far right of the normal distribution curve). To use a -score, one must know the mean and the population standard deviation .
-scores is an approach to contrasting the results of a sample test with an ordinary population. In the applications of field studies, sample tests or any study might report several possible outcomes, which may seem useless. Also, -score can reveal where the observed data compares to the mean value of the normal distribution.
- is a raw score to be standardized
- is the mean of the population
- is the standard deviation of the population
Interpretation
A -score of 1 will be one standard deviation above the mean. A score of 2 will be two standard deviations above the mean.
A score of -1.8 is -1.8 standard deviations below the mean. A -score reveals where the score lies on a typical scatter curve.
A -score of zero tells you that the qualities are normal distribution, while a score of +3 tells you that the value is much higher than normal distribution.
Example
We built a simple series to illustrate the z-score calculation using R.
# Create sample data
data <- c(8, 7, 7, 10, 13, 14, 15, 16, 18)
# Calculate mean and standard deviation
mean_val <- mean(data)
sd_val <- sd(data)
cat("Mean:", mean_val, "\n")
cat("Standard Deviation:", sd_val, "\n")
Mean: 12
Standard Deviation: 4.123106
# Calculate z-scores
z_scores <- (data - mean_val) / sd_val
z_scores
[1] -0.9701425 -1.2126781 -1.2126781 -0.4850713 0.2425356
[6] 0.4850713 0.7276069 0.9701425 1.4552138
# Create a data frame for visualization
df <- data.frame(
value = data,
z_score = z_scores
)
print(df)
value z_score
1 8 -0.9701425
2 7 -1.2126781
3 7 -1.2126781
4 10 -0.4850713
5 13 0.2425356
6 14 0.4850713
7 15 0.7276069
8 16 0.9701425
9 18 1.4552138
# Plot the z-scores
plot(z_scores, type = "b", main = "Z-Scores", xlab = "Index", ylab = "Z-Score",
col = "blue", pch = 19)
abline(h = 0, col = "red", lty = 2)
# ggplot2 visualization
library(ggplot2)
ggplot(df, aes(x = seq_along(z_score), y = z_score)) +
geom_point(color = "blue", size = 3) +
geom_line(color = "blue") +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = "Z-Scores Distribution",
x = "Index",
y = "Z-Score") +
theme_minimal()
Summary
-scores measure the deviation of data from the mean, expressed in units of standard deviation. They are essential for:
- Hypothesis testing: Determining whether observed values are statistically significant.
- Data standardization: Comparing data points from different distributions.
- Outlier detection: Identifying values that fall far from the mean (typically ).
A positive -score indicates a value above the mean, while a negative -score indicates a value below the mean. The further the -score from zero, the more unusual the data point.
Comportamiento humano y autoridad: cómo entender y prevenir el abuso de poder
¿Por qué seguimos órdenes que pueden dañar a otros? Esta es una pregunta que el psicólogo social Stanley Milgram buscó responder en su famoso experimento realizado en la década de 1960.
ESTRATEGIAS DE MUESTREO
En los proyectos de investigación sobre el comportamiento de los agentes económicos que intervienen en los mercados surge la necesidad de contar con una muestra que represente con la mayor precisión el comportamiento de la población.

¿Te gustó este contenido?
Obtén certificados verificables en Python, Data Science y Machine Learning.
Ver Certificaciones Disponibles →