This repository contains the implementation and data for the paper:
RegionGCN: Spatial-Heterogeneity-Aware Graph Convolutional Networks
Hao Guo, Han Wang, Di Zhu, Lun Wu, A. Stewart Fotheringham, Yu Liu
Abstract: Modeling spatial heterogeneity in the data generation process is essential for understanding and predicting geographical phenomena. Despite their prevalence in geospatial tasks, neural network models usually assume spatial stationarity, which could limit their performance in the presence of spatial process heterogeneity. By allowing model parameters to vary over space, several approaches have been proposed to incorporate spatial heterogeneity into neural networks. Current geographical weighting approaches, however, are ineffective on graph neural networks, yielding no significant improvement in prediction accuracy. We assume the crux lies in the overfitting risk brought by a large number of local parameters. Accordingly, we propose to model spatial process heterogeneity at the regional level rather than at the individual level, which largely reduces the number of spatially varying parameters. We further develop a heuristic optimization procedure to learn the region partition adaptively in the process of model training. Our proposed spatial-heterogeneity-aware graph convolutional network, named RegionGCN, is applied to the modeling of county-level vote share in the 2016 U.S. presidential election based on socioeconomic attributes. Results show that RegionGCN achieves significant improvement over the basic and geographically weighted GCNs. We also offer an exploratory analysis tool for the spatial variation of nonlinear relationships through ensemble learning of regional partitions from RegionGCN. Our work contributes to the practice of geospatial artificial intelligence in tackling spatial heterogeneity.
[Full-text at the publisher] [arXiv]
We provide our implementation of RegionGCN, as well as codes to reproduce our analysis on 2016 US presidential election.
reggcn.py: the RegionGCN model and baseline neural networkselection.py: county-level vote share prediction with RegionGCN and baseline neural networks
election_bench.py: county-level vote share prediction with linear model, XGBoost, GWRelection_deepwalk.py: DeepWalk embedding for the adjacency graph of US countieselection_enclave.py: find units with single neighbors for RegionGCN-Celection_metis_data.py: generate METIS input graph file for region ensemblegraph_partition.cpp: call METIS to partition the similarity graph
county_attr.csv: county-level vote share and covariate datauselec_emb.pkl: file generated byelection_deepwalk.pyuselec_enc.pkl: file generated byelection_enclave.py
- Make sure that you have installed Python dependencies.
pytorch,pytorch-geometric,scikit-learn,libpysalandpandasare necessary. - You also need the C package METIS if a region ensemble is intended.
- Download the shapefile for US counties in 2016 from Census.gov.
- Subset the 3,108 counties within the Contiguous US from the shapefile. This can be done in any GIS software.
- Collect the vote share data and socioeconomic covariates. You may also use our processed data
county_attr.csv.
Run election.py for spatial prediction of vote share with the following models: ANN (model_type = 'ann'), GCN (model_type = 'srgcn'), GWGCN (model_type = 'gwgcn'), RegionGCN (model_type = 'reggcn').
- You may change random seeds and the hyperparameters such as the learning rate
lr, weight decaywd, and number of regionsregions. - To run RegionGCN, you MUST run GCN first with the same random seed. Trained GCN parameters are used to initialize RegionGCN.
- The script will output three files:
- an Excel table with prefix 'log_': for each county, the predicted target value
pctdem, indicators for data splitsplitflag(train:0, val:1, test:2), and region indexregion(for RegionGCN only) - a text file with prefix 'res_': all the parameters and evaluation metrics
- a pickle file with prefix 'param_': trained model parameters
- an Excel table with prefix 'log_': for each county, the predicted target value
Run election_bench.py for spatial prediction of vote share with the following models: linear (model_type = 'LR'), SLX (model_type = 'SLX'), XGBoost (model_type = 'XGB'), GWR (model_type = 'GWR').
- You may change random seeds and the hyperparameters for XGBoost (number of base models
n_estimators, andlearning_rate).
Run election.py for spatial prediction of vote share with the following models: RegionGCN-F (fixed random zones model_type = 'reggcn-f'), RegionGCN-P (fixed zones from attribute clusters, model_type = 'reggcn-p'), RegionGCN-C (enforce geographically connected zones, model_type = 'reggcn-c'). See our paper for further definitions.
- RegionGCN-C handles the special case when a unit has only one neighbor (see Note 7 in our paper). These special cases are recorded in
uselec_enc.pkl, which is generated usingelection_enclave.py.
Run election_deepwalk.py to learn node embeddings in the US county adjacency network. You may also use our learned embeddings uselec_emb.pkl.
Run election.py for spatial prediction of vote share with the following models: ANN-DeepWalk (model_type = 'ann-dw'), GCN-DeepWalk (model_type = 'srgcn-dw'), RegionGCN-DeepWalk (model_type = 'reggcn-dw').
- Similarly, to run RegionGCN-DeepWalk, you MUST run GCN-DeepWalk first with the same random seed.
- Assemble zoning results from multiple RegionGCN runs into a single Excel table (this can be done by combining the
regioncolumns in the log file fromelection.py). - Run
election_metis_data.pyto generate the input file following the METIS format. - Run
graph_partition.cppto get region ensemble results. You may change the u factor (METIS_OPTION_UFACTOR) and region count (parts).
Due to the computational cost of the dynamic zoning module, the training time of RegionGCN is about a magnitude longer than GCN. Currently, RegionGCN is not scalable to large spatial data sets with over
If you use this code in your research, please cite:
@article{Guo29092025,
author = {Hao Guo and Han Wang and Di Zhu and Lun Wu and A. Stewart Fotheringham and Yu Liu},
title = {RegionGCN: Spatial-Heterogeneity-Aware Graph Convolutional Networks},
journal = {Annals of the American Association of Geographers},
volume = {0},
number = {0},
pages = {1--17},
year = {2025},
publisher = {Taylor \& Francis},
doi = {10.1080/24694452.2025.2558661}
}If you have any questions, feel free to contact me through email sinesloop@pku.edu.cn.