Skip to content

Array names in NetCDF-4 climate data are filtered out by Pan3D logic #223

@griffin28

Description

@griffin28

The issue seems to originate in the Pan3D code (src/pan3d/xarray/algorithm.py => def available arrays()). The array names are filtered out by the logic in def available_arrays(). I ran this code on our NetCDF-4 climate dataset in Jupyter and if dims.issubset(coords) is causing all of the data variables to be filtered out.

The xarray ds from the notebook is equivalent to self._input from the pan3D code.

Pan3D code that's the issue:

@property
def available_arrays(self):
  """List all available data fields for the `arrays` option"""
    if self._input is None:
        return []

    filtered_arrays = []
    max_dim = 0
    coords = set(self.available_coords)
    for name in set(self._input.data_vars.keys()) - set(self._input.coords.keys()):
        if name.endswith(("_bnds", "_bounds")):
        continue

    dims = set(self._input[name].dims)
    max_dim = max(max_dim, len(dims))
    if dims.issubset(coords):
        filtered_arrays.append(name)

    return [n for n in filtered_arrays if len(self._input[n].shape) == max_dim]

Output from our dataset:

<xarray.Dataset> Size: 504MB
Dimensions: (time: 96, ensemble: 20, y: 128, x: 128)
Coordinates:

  • time (time) object 768B 2025-01-08 00:15:00 ... 2025-01-09 00:00:00
  • ensemble (ensemble) int64 160B 0 1 2 3 4 5 6 7 ... 12 13 14 15 16 17 18 19
    lat (y, x) float64 131kB ...
    lon (y, x) float64 131kB ...
    Dimensions without coordinates: y, x
    Data variables:
    gust_ens (time, ensemble, y, x) float64 252MB ...
    dirc_ens (time, ensemble, y, x) float64 252MB ...
    Lat range: 31.930879493963413 to 36.01532280093412
    Lon range: -120.50104129941263 to -115.57342475137386
    Data Variables: ['gust_ens', 'dirc_ens']
    coords_items: ItemsView(Coordinates:
  • time (time) object 768B 2025-01-08 00:15:00 ... 2025-01-09 00:00:00
  • ensemble (ensemble) int64 160B 0 1 2 3 4 5 6 7 ... 12 13 14 15 16 17 18 19
    lat (y, x) float64 131kB 31.93 31.94 31.94 31.95 ... 36.0 36.01 36.02
    lon (y, x) float64 131kB -119.5 -119.5 -119.4 ... -116.5 -116.4 -116.4)
    available coords: ['ensemble', 'time']
    coords: {'ensemble', 'time'}
    name: gust_ens
    ds[name].dims: ('time', 'ensemble', 'y', 'x')
    dims: {'x', 'ensemble', 'y', 'time'}
    name: dirc_ens
    ds[name].dims: ('time', 'ensemble', 'y', 'x')
    dims: {'x', 'ensemble', 'y', 'time'}
    filtered_arrays: []

Here you can see dims is not a subset of coords so the array names don't get added to filtered_arrays.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions