Skip to article frontmatterSkip to article content

Overview

Output from one run of CESM is the main dataset that we’ll be looking at in this cookbook. Let’s learn how to read it in. And note that this is just one way that CESM output can look. This run has been post-processed, so the data are in the form of “time-series” files, where each file stores one variable across the full timespan of the run. Before this processing, CESM actually outputs data in the form of “history” files instead, where each file contains all variables over a shorter time-slice. We won’t dive into the specifics of CESM data processing here, but this Jupyter book from the CESM tutorial has some more info!

Prerequisites

ConceptsImportanceNotes
Intro to XarrayNecessary
  • Time to learn: 5 min

Imports

import xarray as xr
import glob
import s3fs
import netCDF4

Loading our data into xarray

Our data is stored in the cloud on Jetstream2. We load in each file path, then use xarray’s open_mfdataset() function to load all the files into an xarray Dataset, dropping a few variables whose coordinates don’t fit nicely.

jetstream_url = 'https://js2.jetstream-cloud.org:8001/'

s3 = s3fs.S3FileSystem(anon=True, client_kwargs=dict(endpoint_url=jetstream_url))

# Generate a list of all files in CESM folder
s3path = 's3://pythia/ocean-bgc/cesm/g.e22.GOMIPECOIAF_JRA-1p4-2018.TL319_g17.4p2z.002branch/ocn/proc/tseries/month_1/*'
remote_files = s3.glob(s3path)

# Open all files from folder
fileset = [s3.open(file) for file in remote_files]

# Open with xarray
ds = xr.open_mfdataset(fileset, data_vars="minimal", coords='minimal', compat="override", parallel=True,
                       drop_variables=["transport_components", "transport_regions", 'moc_components'], decode_times=True)
ds

Looks good!


Summary

You’ve learned how to read in CESM output, which we’ll be using for all the following notebooks in this cookbook.