Skip to article frontmatterSkip to article content

The 2 Distance-based Approaches

Everyone look for the nearest neighbor vs. everyone draw a series of search buffer (radius).

Everyone look for the nearest neighbor vs. everyone draw a series of search buffer (radius).

In this section, we focus on the first approach.

4 examples of point distribution: Clustered or Random?

4 types of points in North Singapore: Woodlands Regional Centre, Woodlands West, and Woodgrove, with an area of 2 \text{km} \times 2 \text{km}

4 types of points in North Singapore: Woodlands Regional Centre, Woodlands West, and Woodgrove, with an area of 2km×2km2 \text{km} \times 2 \text{km}

Nearest Neighbor Analysis Calculation Process: In A Nut Shell

Searching for the nearest neighbors.

Searching for the nearest neighbors.

How short is short enough to be considered as “clustered”?

Let’s use the Monte Carlo Simulation approach to test the difference between the observed point pattern and a large number of random patterns generated under the CSR assumption.

Mean nearest neighbor distances for the four examples.

Mean nearest neighbor distances for the four examples.

Option 1: Monte Carlo Simulation

Monte Carlo Simulation for point pattern.

Monte Carlo Simulation for point pattern.

Statistical Testing

Normal distribution pattern for the frequency of the NNA statistics for the simulated pattern.

Normal distribution pattern for the frequency of the NNA statistics for the simulated pattern.

(1) If the observed value is far from the random, we found a clear evidence that the observed pattern is different from random.

If the oberved value is far from the normal bell shape.

If the oberved value is far from the normal bell shape.

(2) If the observed value fall within the random range, then, we did not find any evidence to prove that the observed pattern is different from random.

If the oberved value fall within the normal bell shape.

If the oberved value fall within the normal bell shape.

(3) If the observed fall at the edge of the normal distribution

p-value: the probability of being wrong to reject null hypothesis

See [ESRI “What is a z-score? What is a p-value?”(https://pro.arcgis.com/en/pro-app/latest/tool-reference/spatial-statistics/what-is-a-z-score-what-is-a-p-value.htm) for more details]

If the oberved value fall within or near to the edge of the normal bell shape.

If the oberved value fall within or near to the edge of the normal bell shape.

Let’s say we set the confidence level to 99%, meaning we are willing to accept a 1% risk of being wrong (incorrectly rejecting the null hypothesis, H0). If the obtained p-value is 3%, which is higher than our acceptable risk, then we cannot reject H0.

How the observed mean compared with 10k CSR simulations mean results?

How the observed mean compared with 10k CSR simulations mean results?

The grey area indicates the mean nearest neighbor distance of the 10k random pattern.

Option 2: Z-test approach

Z-test is another option for testing the significant levels of how observed nearest neighbor distance (DˉO\bar{D}_O) is different from the ‘expected’ average nearest neighbor distance (DˉE\bar{D}_E) under CSR.

See ESRI for detail.

The nearest neighbor distance ratio:

ANN=DˉODˉE\text{ANN} = \frac{\bar{D}_O}{\bar{D}_E}

If the index (ANN\text{ANN}) is less than 1, the pattern exhibits clustering. If the index is greater than 1, the trend is toward dispersion.

The expected mean distance could be calculated using the number of point (n) and the study area (A):

DˉE=0.5n/A\bar{D}_E = \frac{0.5}{\sqrt{n/A}}

The z-score can then be calculated using DˉO\bar{D}_O and DˉE\bar{D}_E:

z=DˉODˉESEz = \frac{\bar{D}_O - \bar{D}_E}{SE}

The z-score and p-value for this statistic are sensitive to changes in the study area or changes to the Area parameter. For this reason, only compare z-score and p-value results from this statistic when the study area is fixed.

Z-test and z-score. See ESRI “What is a z-score? What is a p-value?” for more details.

Z-test and z-score. See ESRI “What is a z-score? What is a p-value?” for more details.

Summary: Nearest Neighbor Analysis (NNA)