Skip to content

Spatial heterogeneity-aware graph convolutional networks (Annals of the AAG, 2025)

Notifications You must be signed in to change notification settings

Nithouson/RegionGCN

Repository files navigation

RegionGCN

This repository contains the implementation and data for the paper:

RegionGCN: Spatial-Heterogeneity-Aware Graph Convolutional Networks

Hao Guo, Han Wang, Di Zhu, Lun Wu, A. Stewart Fotheringham, Yu Liu

Abstract: Modeling spatial heterogeneity in the data generation process is essential for understanding and predicting geographical phenomena. Despite their prevalence in geospatial tasks, neural network models usually assume spatial stationarity, which could limit their performance in the presence of spatial process heterogeneity. By allowing model parameters to vary over space, several approaches have been proposed to incorporate spatial heterogeneity into neural networks. Current geographical weighting approaches, however, are ineffective on graph neural networks, yielding no significant improvement in prediction accuracy. We assume the crux lies in the overfitting risk brought by a large number of local parameters. Accordingly, we propose to model spatial process heterogeneity at the regional level rather than at the individual level, which largely reduces the number of spatially varying parameters. We further develop a heuristic optimization procedure to learn the region partition adaptively in the process of model training. Our proposed spatial-heterogeneity-aware graph convolutional network, named RegionGCN, is applied to the modeling of county-level vote share in the 2016 U.S. presidential election based on socioeconomic attributes. Results show that RegionGCN achieves significant improvement over the basic and geographically weighted GCNs. We also offer an exploratory analysis tool for the spatial variation of nonlinear relationships through ensemble learning of regional partitions from RegionGCN. Our work contributes to the practice of geospatial artificial intelligence in tackling spatial heterogeneity.

[Full-text at the publisher] [arXiv]

File description

We provide our implementation of RegionGCN, as well as codes to reproduce our analysis on 2016 US presidential election.

Core codes

  • reggcn.py: the RegionGCN model and baseline neural networks
  • election.py: county-level vote share prediction with RegionGCN and baseline neural networks

Auxilliary codes

  • election_bench.py: county-level vote share prediction with linear model, XGBoost, GWR
  • election_deepwalk.py: DeepWalk embedding for the adjacency graph of US counties
  • election_enclave.py: find units with single neighbors for RegionGCN-C
  • election_metis_data.py: generate METIS input graph file for region ensemble
  • graph_partition.cpp: call METIS to partition the similarity graph

Data

  • county_attr.csv: county-level vote share and covariate data
  • uselec_emb.pkl: file generated by election_deepwalk.py
  • uselec_enc.pkl: file generated by election_enclave.py

Workflow

1. Preparation

  • Make sure that you have installed Python dependencies. pytorch, pytorch-geometric,scikit-learn, libpysal and pandas are necessary.
  • You also need the C package METIS if a region ensemble is intended.
  • Download the shapefile for US counties in 2016 from Census.gov.
  • Subset the 3,108 counties within the Contiguous US from the shapefile. This can be done in any GIS software.
  • Collect the vote share data and socioeconomic covariates. You may also use our processed data county_attr.csv.

2. Vote share prediction

Run election.py for spatial prediction of vote share with the following models: ANN (model_type = 'ann'), GCN (model_type = 'srgcn'), GWGCN (model_type = 'gwgcn'), RegionGCN (model_type = 'reggcn').

  • You may change random seeds and the hyperparameters such as the learning rate lr, weight decay wd, and number of regions regions.
  • To run RegionGCN, you MUST run GCN first with the same random seed. Trained GCN parameters are used to initialize RegionGCN.
  • The script will output three files:
    • an Excel table with prefix 'log_': for each county, the predicted target value pctdem, indicators for data split splitflag (train:0, val:1, test:2), and region index region (for RegionGCN only)
    • a text file with prefix 'res_': all the parameters and evaluation metrics
    • a pickle file with prefix 'param_': trained model parameters

Run election_bench.py for spatial prediction of vote share with the following models: linear (model_type = 'LR'), SLX (model_type = 'SLX'), XGBoost (model_type = 'XGB'), GWR (model_type = 'GWR').

  • You may change random seeds and the hyperparameters for XGBoost (number of base models n_estimators, and learning_rate).

3. Investigate RegionGCN variants

Run election.py for spatial prediction of vote share with the following models: RegionGCN-F (fixed random zones model_type = 'reggcn-f'), RegionGCN-P (fixed zones from attribute clusters, model_type = 'reggcn-p'), RegionGCN-C (enforce geographically connected zones, model_type = 'reggcn-c'). See our paper for further definitions.

  • RegionGCN-C handles the special case when a unit has only one neighbor (see Note 7 in our paper). These special cases are recorded in uselec_enc.pkl, which is generated using election_enclave.py.

4. Incorporate Deepwalk embeddings

Run election_deepwalk.py to learn node embeddings in the US county adjacency network. You may also use our learned embeddings uselec_emb.pkl.
Run election.py for spatial prediction of vote share with the following models: ANN-DeepWalk (model_type = 'ann-dw'), GCN-DeepWalk (model_type = 'srgcn-dw'), RegionGCN-DeepWalk (model_type = 'reggcn-dw').

  • Similarly, to run RegionGCN-DeepWalk, you MUST run GCN-DeepWalk first with the same random seed.

5. Region Ensemble

  • Assemble zoning results from multiple RegionGCN runs into a single Excel table (this can be done by combining the region columns in the log file from election.py).
  • Run election_metis_data.py to generate the input file following the METIS format.
  • Run graph_partition.cpp to get region ensemble results. You may change the u factor (METIS_OPTION_UFACTOR) and region count (parts).

Note

Due to the computational cost of the dynamic zoning module, the training time of RegionGCN is about a magnitude longer than GCN. Currently, RegionGCN is not scalable to large spatial data sets with over $10^4$ spatial units,

Citation

If you use this code in your research, please cite:

@article{Guo29092025,
author = {Hao Guo and Han Wang and Di Zhu and Lun Wu and A. Stewart Fotheringham and Yu Liu},
title = {RegionGCN: Spatial-Heterogeneity-Aware Graph Convolutional Networks},
journal = {Annals of the American Association of Geographers},
volume = {0},
number = {0},
pages = {1--17},
year = {2025},
publisher = {Taylor \& Francis},
doi = {10.1080/24694452.2025.2558661}
}

Contact

If you have any questions, feel free to contact me through email sinesloop@pku.edu.cn.

About

Spatial heterogeneity-aware graph convolutional networks (Annals of the AAG, 2025)

Topics

Resources

Stars

Watchers

Forks