Note

This page was generated from an Jupyter notebook that can be accessed from github.

MESMER-M workflow for multiple scenarios#

Training and emulation of monthly local temperature from yearly local temperature for multiple scenarios and ensemble members. We use an example data set on a coarse grid. This roughly follows the approach outlined in Nath et al. (2022).

MESMER-M trains the local monthly temperature using the local annual temperature (i.e. the temperature from the same grid point) as forcing. This is different from MESMER which uses global mean values as predictors for where local annual mean temperatures. Training MESMER-M consists of 4 steps:

harmonic model: fit the seasonal cycle with a harmonic model
power transformer: make the resulting residuals more normal by using a Yeo-Johnson transformation
cyclo-stationary AR(1) process: the monthly residuals are assumed to follow a cyclo-stationary AR(1) process, where one months value depends on the previous one
local variability: estimate parameters needed to generate local variability

This example can be extended to more scenarios, ensemble members and higher resolution data. See also the MESMER-M calibration and emulation tests in tests/integration/.

import filefisher
import matplotlib.pyplot as plt
import pandas as pd
import scipy as sp
import xarray as xr

import mesmer

Calibration#

Configuration#

LOCALISATION_RADII = list(range(7_500, 12_501, 500))
THRESHOLD_LAND = 1 / 3
REFERENCE_PERIOD = slice("1850", "1900")

# define model and scenarios
model = "IPSL-CM6A-LR"
scenarios = ["ssp126", "ssp585"]

# path of the example data
cmip6_data_path = mesmer.example_data.cmip6_ng_path(relative=True)

Load Data for training the emulator#

We load monthly and annual mean temperatures.

CMIP_FILEFINDER = filefisher.FileFinder(
    path_pattern=cmip6_data_path / "{variable}/{time_res}/{resolution}",
    file_pattern="{variable}_{time_res}_{model}_{scenario}_{member}_{resolution}.nc",
)

Find annual data:

fc_scens_y = CMIP_FILEFINDER.find_files(
    variable="tas", scenario=scenarios, model=model, resolution="g025", time_res="ann"
)

# get the historical members that are also in the future scenarios, but only once
unique_scen_members_y = fc_scens_y.df.member.unique()

fc_hist_y = CMIP_FILEFINDER.find_files(
    variable="tas",
    scenario="historical",
    model=model,
    resolution="g025",
    time_res="ann",
    member=unique_scen_members_y,
)

fc_all_y = fc_hist_y.concat(fc_scens_y)
fc_all_y.df

	variable	time_res	resolution	model	scenario	member
path
../data/cmip6-ng/tas/ann/g025/tas_ann_IPSL-CM6A-LR_historical_r1i1p1f1_g025.nc	tas	ann	g025	IPSL-CM6A-LR	historical	r1i1p1f1
../data/cmip6-ng/tas/ann/g025/tas_ann_IPSL-CM6A-LR_historical_r2i1p1f1_g025.nc	tas	ann	g025	IPSL-CM6A-LR	historical	r2i1p1f1
../data/cmip6-ng/tas/ann/g025/tas_ann_IPSL-CM6A-LR_ssp126_r1i1p1f1_g025.nc	tas	ann	g025	IPSL-CM6A-LR	ssp126	r1i1p1f1
../data/cmip6-ng/tas/ann/g025/tas_ann_IPSL-CM6A-LR_ssp585_r1i1p1f1_g025.nc	tas	ann	g025	IPSL-CM6A-LR	ssp585	r1i1p1f1
../data/cmip6-ng/tas/ann/g025/tas_ann_IPSL-CM6A-LR_ssp585_r2i1p1f1_g025.nc	tas	ann	g025	IPSL-CM6A-LR	ssp585	r2i1p1f1

Find monthly data:

fc_scens_m = CMIP_FILEFINDER.find_files(
    variable="tas", scenario=scenarios, model=model, resolution="g025", time_res="mon"
)

# get the historical members that are also in the future scenarios, but only once
unique_scen_members_m = fc_scens_y.df.member.unique()

fc_hist_m = CMIP_FILEFINDER.find_files(
    variable="tas",
    scenario="historical",
    model=model,
    resolution="g025",
    time_res="mon",
    member=unique_scen_members_m,
)

fc_all_m = fc_hist_m.concat(fc_scens_m)
fc_all_m.df

	variable	time_res	resolution	model	scenario	member
path
../data/cmip6-ng/tas/mon/g025/tas_mon_IPSL-CM6A-LR_historical_r1i1p1f1_g025.nc	tas	mon	g025	IPSL-CM6A-LR	historical	r1i1p1f1
../data/cmip6-ng/tas/mon/g025/tas_mon_IPSL-CM6A-LR_historical_r2i1p1f1_g025.nc	tas	mon	g025	IPSL-CM6A-LR	historical	r2i1p1f1
../data/cmip6-ng/tas/mon/g025/tas_mon_IPSL-CM6A-LR_ssp126_r1i1p1f1_g025.nc	tas	mon	g025	IPSL-CM6A-LR	ssp126	r1i1p1f1
../data/cmip6-ng/tas/mon/g025/tas_mon_IPSL-CM6A-LR_ssp585_r1i1p1f1_g025.nc	tas	mon	g025	IPSL-CM6A-LR	ssp585	r1i1p1f1
../data/cmip6-ng/tas/mon/g025/tas_mon_IPSL-CM6A-LR_ssp585_r2i1p1f1_g025.nc	tas	mon	g025	IPSL-CM6A-LR	ssp585	r2i1p1f1

This found 1 ensemble member for SSP1-2.6 and two for SSP5-8.5 and the corresponding ones in the historical scenario.

To load the data we write a small helper function that loads the data into a DataTree (where each node is a scenario):

def load_data(filecontainer):

    out = xr.DataTree()

    scenarios = filecontainer.df.scenario.unique().tolist()

    # load data for each scenario
    for scen in scenarios:
        files = filecontainer.search(scenario=scen)

        # load all members for a scenario
        members = []
        for fN, meta in files.items():
            time_coder = xr.coders.CFDatetimeCoder(use_cftime=True)
            ds = xr.open_dataset(fN, decode_times=time_coder)
            # drop unnecessary variables
            ds = ds.drop_vars(["height", "time_bnds", "file_qf"], errors="ignore")
            # assign member-ID as coordinate
            ds = ds.assign_coords({"member": meta["member"]})
            members.append(ds)

        # create a Dataset that holds each member along the member dimension
        scen_data = xr.concat(members, dim="member")
        # put the scenario dataset into the DataTree
        out[scen] = xr.DataTree(scen_data)

    return out

Load annual and monthly data:

tas_y_orig = load_data(fc_all_y)
tas_m_orig = load_data(fc_all_m)

This results in two DataTree objects, with 3 nodes, one for each scenario (click on Groups to see the individual Datasets for the three scenarios):

Train the power transformer#

The residuals are not necessarily symmetric - make them more normal using a Yeo-Johnson transformation. For performance reasons we use a constant \(\lambda\) here. Originally, the parameter \(\lambda\) is modelled with a logistic regression using local annual mean temperature as covariate (Nath et al., 2022). Currently "constant" and "logistic" covariance structures are implemented - further options could be implemented and tested.

# yj_transformer = mesmer.stats.YeoJohnsonTransformer("logistic")

yj_transformer = mesmer.stats.YeoJohnsonTransformer("constant")

pt_coefficients = yj_transformer.fit(tas_pooled_y.tas, harmonic_model_fit.residuals)

transformed_resids = yj_transformer.transform(
    tas_pooled_y.tas,
    harmonic_model_fit.residuals,
    pt_coefficients,
)

To illustrate this we plot the skewness of the original and the transformed residuals:

f, ax = plt.subplots()

ax.plot(
    sp.stats.skew(harmonic_model_fit.residuals, axis=0),
    label="original residuals",
)
ax.plot(
    sp.stats.skew(transformed_resids.transformed.T, axis=0),
    label="transformed residuals",
)

ax.axhline(0, lw=0.5, color="0.1")
ax.legend()
ax.set_title("Skewness of residuals")

Text(0.5, 1.0, 'Skewness of residuals')

../_images/627163840f063c8af4b6a0757d6175423fb82a0a2f4ec5f9eeb698e6596b4a6b.png

Saving#

time coordinate#

We need to get the original time coordinate to be able to validate our results later on. If it is not needed to align the final emulations with the original data, this can be omitted, the time coordinates can later be generated for example with

monthly_time = xr.cftime_range("1850-01-01", "2100-12-31", freq="MS", calendar="gregorian")
monthly_time = xr.DataArray(monthly_time, dims="time", coords={"time": monthly_time})

# extract and save time coordinate
hist_time = tas_m.historical.time
scen_time = tas_m.ssp585.time
m_time = xr.concat([hist_time, scen_time], dim="time")

# TODO
# save the parameters to a file
# harmonic_model_fit
# pt_coefficients
# ar1_fit
# localized_ecov
# m_time

MESMER-M workflow for multiple scenarios

Contents

MESMER-M workflow for multiple scenarios#

Calibration#

Configuration#

Load Data for training the emulator#

Preprocessing#

Fit the harmonic model#

Train the power transformer#

Fit cyclo-stationary AR(1) process#

Find localized empirical covariance#

Saving#

time coordinate#

Make emulations#

Configuration#

Random number seed#

Load data needed for emulations#

Preprocessing#

Generate emulations#

Saving and/or Analysis#