Skip to article frontmatterSkip to article content

What is statistics?

From data to information.

The two main types of statistics:

  1. Descriptive Statistics is about how we describe data. This branch focuses on summarizing and describing the main features of data, such as mean, median, standard deviation, and correlation. Descriptive statistics helps visualize and interpret data through graphical methods like histograms, scatter plots, box plots, and maps.

  2. Inferential Statistics is about what we can learn from the data. This branch involves drawing conclusions about a population based on a sample by using probability theory. Inferential statistics allows us to make predictions, test hypotheses, and evaluate the reliability of conclusions. Some common methods in inferential statistics include regression analysis, hypothesis testing, and confidence intervals.

Descriptive Statistics

The three types of measures in descriptive statistics

Measures of Central Tendency

help us understand the central or ‘average’ values in a dataset.

Central Tendency

Figure 1:The means of the three IRIS species.

Measures of Spread

(variability) describe how the data is spread or dispersed around the central values.

Measuring Spread

Figure 2:The spread of the three IRIS species.

Measures of Shape

describe the shape of the distribution, which could be steep/flat, skewed, or non-normal.

steep or flat

Figure 3:The skewness and kurtosis of the three IRIS species.

Inferential Statistics

Analysis of Relationship

Assessing the strength and direction of associations between variables. Examples include correlation (Pearson’s r for parametric data and Spearman’s rho for non-parametric data), and regression analysis (linear, multiple, and logistic regression). These output could further be used for further analysis and predictions.

relationship

Figure 4:The relationships between petal length and petal weight for the three IRIS species.

Analysis of Differences

Evaluating whether observed differences between groups or conditions are statistically significant. Hypothesis testing forms the foundation for these measures. Parametric tests, such as t-tests and ANOVA, are used when data meets certain assumptions (e.g., normality). Non-parametric tests, such as Mann-Whitney U test, Wilcoxon signed-rank test, and Kruskal-Wallis test, are used when assumptions are violated or with non-normal data.

analysis of differences

Figure 5:The differences in petal length between the three species.

Analysis of Confidence

Provide insights into the reliability and generalizability of statistical findings. Estimation involves calculating point estimates and confidence intervals to infer .red[population] parameters. Sampling distributions and standard errors help quantify the variability of sample statistics. Statistical power and effect size measures ensure that studies have adequate sample sizes and detect meaningful effects.

analysis of confidence

Figure 6:The confidence interval (CI) of the actual population mean could fall within the 95% CI.

Summary

Thinking Statistically while doing Geospatial Visualization

These are some of the key concerns when we are reading/mapping a spatial pattern.