Skip to content

Facilitating support for tensor weather data#131

Draft
fnattino wants to merge 2 commits into
mainfrom
60-weather-data
Draft

Facilitating support for tensor weather data#131
fnattino wants to merge 2 commits into
mainfrom
60-weather-data

Conversation

@fnattino

@fnattino fnattino commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Currently PCSE uses custom objects to provide weather data (WeatherDataProvider and WeatherDataContainer). In order to support tensor data, I was initially thinking to implement a new class, as e.g. sketched in #113. However, this approach has the disadvantage that everyone needs to write custom derived subclasses for any new data structure used for the actual data (e.g. for a pandas DataFrame, for a xarray Dataset, etc.). Also, it might be challenging to design a generic class that would be suitable for different use cases (small and large datasets).

But what if we would simply ask for a generic iterator as input to the engine? We could only expect the iterator to return a dictionary of tensors. Just to give a practical example, given a dataframe where each column represent a weather variable and each row a day, one could build the data provider as:

def iterate(df):
    cols = {
        col: torch.tensor(df[col].to_numpy())
        for col in df.columns
    }
    for n in range(len(df)):
        yield {k: v[n] for k, v in cols.items()}

engine.setup(..., weatherdataprovider=iterate(df), ...)

I am sketching here some changes to enable this, it is actually not too big changes. The biggest difference is that one would need to make sure the weatherdataprovider returns weather data in the same order as expected by the simulation.

What do you think @SarahAlidoost ?

@SarahAlidoost

Copy link
Copy Markdown
Collaborator

Currently PCSE uses custom objects to provide weather data (WeatherDataProvider and WeatherDataContainer). In order to support tensor data, I was initially thinking to implement a new class, as e.g. sketched in #113. However, this approach has the disadvantage that everyone needs to write custom derived subclasses for any new data structure used for the actual data (e.g. for a pandas DataFrame, for a xarray Dataset, etc.). Also, it might be challenging to design a generic class that would be suitable for different use cases (small and large datasets).

@fnattino thanks! this looks like a good idea 👍

But what if we would simply ask for a generic iterator as input to the engine? We could only expect the iterator to return a dictionary of tensors.

That's the right track. The Engine and Crop object accepts weather data as a dictionary of tensors, similar to parameters. How to get that dictionary is a data-processing concern and happens outside the engine.

I am sketching here some changes to enable this, it is actually not too big changes. The biggest difference is that one would need to make sure the weatherdataprovider returns weather data in the same order as expected by the simulation.

That's right! we can also implement some checks in init of Engine to make sure weather data is as expected, like checking format and shapes.

What do you think @SarahAlidoost ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants