Basic Geographically Weighted Regression (GWR)

Mathematical Foundation

For a dataset of \(n\) samples and \(p\) independent variables, the basic GWR model at sample \(i\) is defined as:

\[y_i = \beta_{i0} + \sum_{k=1}^{p} \beta_{ik} x_{ik} + \varepsilon_i\]

where:

  • \(y_i\) is the dependent variable,

  • \(x_{ik}\) is the \(k\)-th independent variable,

  • \(\beta_{ik}\) is the \(k\)-th coefficient,

  • \(\beta_{i0}\) is the intercept,

  • \(\varepsilon_i \sim \mathcal{N}(0, \sigma^2)\) is the random error.

The locally weighted least-squares estimator of the coefficients is:

\[\hat{\boldsymbol{\beta}}_i = \left( \mathbf{X}^\top \mathbf{W}_i \mathbf{X} \right)^{-1} \mathbf{X}^\top \mathbf{W}_i \mathbf{y}\]

where \(\mathbf{W}_i = \operatorname{diag}(w_{i1}, w_{i2}, \dots, w_{in})\) is the spatial weighting matrix. Each \(w_{ij}\) is computed by a kernel function \(k(d_{ij}; b)\) based on the distance from sample \(i\) to sample \(j\).

Diagnostic Information

After fitting, the algorithm computes the following diagnostics:

Metric

Meaning

Key

RSS

Residual sum of squares \(\sum (y_i - \hat{y}_i)^2\)

diagnostic['RSS']

AICc

Corrected Akaike information criterion (smaller is better)

diagnostic['AICc']

ENP

Effective number of parameters

diagnostic['ENP']

EDF

Effective degrees of freedom

diagnostic['EDF']

Coefficient of determination

diagnostic['RSquare']

Adjusted R²

Adjusted R-squared

diagnostic['RSquareAdjust']

Key Parameters

Parameter

Description

Default

weight

A single bandwidth weight shared by all variables

Required

distance

The distance metric to use

CRSDistance()

has_intercept

Whether to include an intercept term

True

fit(optimize_bw=...)

Auto-select bandwidth: CV or AIC

None

fit(optimize_var=...)

Auto-select variables via forward selection: AIC change threshold

None

Code Examples

Basic Usage

from pygwmodel import GWRBasic, BandwidthWeight, CRSDistance

algorithm = GWRBasic(
    data,
    depen_var="PURCHASE",
    indep_vars=["FLOORSZ", "UNEMPLOY", "PROF"],
    weight=BandwidthWeight(36.0, adaptive=True),
    distance=CRSDistance()
).fit()

# View diagnostic information
print(algorithm.diagnostic['RSquare'])      # 0.708
print(algorithm.diagnostic['AICc'])          # 2448.27

# Get the result layer (GeoDataFrame)
result = algorithm.result_layer
print(result.columns)  # Intercept, FLOORSZ, ..., Intercept_SE, ..., fitted

Bandwidth Selection

algorithm = GWRBasic(data, y, x, BandwidthWeight(adaptive=True),
                     distance=CRSDistance()).fit(
    optimize_bw=GWRBasic.BandwidthSelectionCriterionType.CV
)
print(algorithm.weight.bandwidth)  # Optimised bandwidth: 67

Independent Variable Selection

algorithm = GWRBasic(data, y, x, BandwidthWeight(36.0, adaptive=True),
                     distance=CRSDistance()).fit(optimize_var=3.0)

# Variable combinations and their AICc values
for vars, aicc in algorithm.indep_var_select_criterions:
    print(f"{vars}: {aicc:.2f}")

print(algorithm.indep_vars)  # ['FLOORSZ', 'PROF'] — optimal subset

Prediction

prediction = algorithm.predict(new_data)
print(prediction.columns)  # Intercept, FLOORSZ, ..., y_hat, residual

References

  • Brunsdon, C., Fotheringham, A. S., & Charlton, M. E. (1996). Geographically weighted regression: a method for exploring spatial nonstationarity. Geographical Analysis, 28(4), 281-298.

  • Fotheringham, A. S., Brunsdon, C., & Charlton, M. (2002). Geographically weighted regression: the analysis of spatially varying relationships. John Wiley & Sons.