High-Performance Computing¶
Supported Parallel Algorithms¶
Algorithm |
Serial |
OpenMP |
CUDA |
MPI |
|---|---|---|---|---|
GWRBasic |
✅ |
✅ |
✅ |
✅ |
GWRMultiscale |
✅ |
✅ |
✅ |
Serial × \(n_{var}\) |
GWSS (Average) |
✅ |
✅ |
— |
— |
GWSS (Correlation) |
✅ |
✅ |
— |
— |
Notes:
Serial (SerialOnly): single-threaded; suitable for small datasets or debugging.
OpenMP: shared-memory multi-threading; suitable for single-node multi-core machines. Requires
ENABLE_OPENMPat build time.CUDA: NVIDIA GPU acceleration; suitable for large datasets. Requires
ENABLE_CUDAat build time.MPI: distributed computing; suitable for multi-node clusters. Requires
ENABLE_MPIat build time.
Multi-Threading (OpenMP)¶
Enables shared-memory parallel computation via OpenMP. The core per-sample model fitting loop is parallelised, with each thread independently computing coefficient estimates for a subset of samples.
Setting the number of threads:
from pygwmodel import GWRBasic, ParallelType, BandwidthWeight, CRSDistance
algorithm = GWRBasic(data, y, x,
weight=BandwidthWeight(36.0, adaptive=True),
distance=CRSDistance()).enable_parallel(
ParallelType.OpenMP, threads=4
).fit()
GWRMultiscale also supports OpenMP:
from pygwmodel import GWRMultiscale
algorithm = GWRMultiscale(data, y, x, weights).enable_parallel(
ParallelType.OpenMP, threads=8
).fit()
Recommended thread count: set to the number of physical CPU cores.
GPU Acceleration (CUDA)¶
Offloads the locally weighted regression matrix operations to a NVIDIA GPU, suitable for larger datasets.
Group Size (group_size)¶
group_size controls how many samples’ coefficient estimates are computed
together on the GPU in one batch. Larger groups make better use of GPU
parallelism but are constrained by GPU memory. The internal constraint is:
where \(k\) is the number of independent variables, \(n\) is the number of samples, and \(g\) is the group size.
Usage:
algorithm = GWRBasic(data, y, x,
weight=BandwidthWeight(36.0, adaptive=True),
distance=CRSDistance()).enable_parallel(
ParallelType.CUDA, gpu_id=0, group_size=64
).fit()
For GWRMultiscale:
algorithm = GWRMultiscale(data, y, x, weights).enable_parallel(
ParallelType.CUDA, gpu_id=0, group_size=128
).fit()
Performance Tips¶
Scenario |
Recommendation |
|---|---|
Small datasets (\(n < 1000\)) |
Serial is sufficient; overhead is negligible |
Medium datasets (\(n < 10^4\)) |
Use OpenMP with thread count equal to CPU cores |
Large datasets (\(n > 10^4\)) |
Use CUDA if available; otherwise OpenMP |
Speeding up GWRMultiscale |
Set |
GWRMultiscale convergence |
Increase |