Skip to content

UC3 Urban Heat Island: Tessera integration#70

Open
BachirNILU wants to merge 10 commits into
developfrom
feature/uhi-tessera
Open

UC3 Urban Heat Island: Tessera integration#70
BachirNILU wants to merge 10 commits into
developfrom
feature/uhi-tessera

Conversation

@BachirNILU

Copy link
Copy Markdown
Collaborator

UC3 Urban Heat Island: Tessera integration

What this PR does

Implements running tessera-based models for Use Case 3 (Land Surface Temperature
prediction for Guatemala City) and fixes the corresponding bugs.

Also adds best configuration files for all 7 encoder variants found after
196 hyperparameter sweep runs:

Variant Encoder
Tessera avg Average pooling of tessera embeddings
Tessera CNN CNN encoder (frozen ResNet34) on tessera patches
Full fusion avg GeoClip + tabular + tessera (average)
Full fusion CNN GeoClip + tabular + tessera (CNN)
Tabular only DL tabular encoder
Fusion coords+tabular GeoClip + tabular
Coords only GeoClip coordinate encoder

Bug fixes

src/data/base_dataset.py
Fixed rec.lon / rec.lat / rec.name_locrec["lon"]....
setup_tessera() iterates over dicts, not objects.
Also passes the gt (GeoTessera) connection object to
get_tessera_embeds() to match the updated function.

src/data/heat_guatemala_dataset.py
Added tessera modality loading in __getitem__. Loads the .npy embedding
file, transposes vectors for CNN compatibility,
and adds it to the batch under batch["tessera"]. Falls back to zeros for
the 19 edge blocks with missing tiles.

src/models/components/geo_encoders/cnn_encoder.py
Fixed forward() to extract the tensor directly from the batch dict using
batch[self.geo_data_name] instead of the old batch.get("eo", {}) pattern
from the multimodal encoder.

Comment thread src/models/components/geo_encoders/cnn_encoder.py Outdated

@robknapen robknapen left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be sync'd with the develop branch, instead of main?

@gabrieletijunaityte gabrieletijunaityte changed the base branch from main to develop April 7, 2026 13:25
@gabrieletijunaityte

Copy link
Copy Markdown
Contributor

Shouldn't this be sync'd with the develop branch, instead of main?

True! Switched it.

Comment thread src/models/components/geo_encoders/cnn_encoder.py Outdated
@vdplasthijs

Copy link
Copy Markdown
Collaborator

Probably easiest to merge #72 first and fix conflicts here. (tagging @gabrieletijunaityte fyi)

@BachirNILU

Copy link
Copy Markdown
Collaborator Author

Hi @vdplasthijs ,

Actually, this is what worked for me. with batch.get("eo", {}) I got the following error:

AttributeError: 'dict' object has no attribute 'dtype'

batch[self.geo_data_name] is correct because, for coords/tabular, everything is under "eo":

batch = {

    "eo": {

        "coords":  tensor([14.6, -90.5]),

        "tabular": tensor([0.08, 12.0, ...])

    }

}

but tessera embeddings are stored at the top level

batch = {

    "eo": { "coords": tensor([14.6, -90.5]) },

    "tessera": tensor([[[0.2, 0.8, ...]]])   # ← top level

}

So with geo_data_name="tessera", batch[self.geo_data_name] returns the tensor directly which is what the CNN encoder expects.

Should we open a separate issue to track the batch structure inconsistency. Tessera should probably live under "eo" like the other modalities?

@vdplasthijs

Copy link
Copy Markdown
Collaborator

Hi @BachirNILU , great thanks for your reply. I see, yes you're right tessera should be under eo too .. It should be for the main version (right @gabrieletijunaityte ?) so perhaps it got out of sync .. ? We can discuss tomorrow in the call? THanks!

@vdplasthijs vdplasthijs mentioned this pull request May 21, 2026
4 tasks
# Conflicts:
#	src/data/base_dataset.py
#	src/data/heat_guatemala_dataset.py
#	src/models/components/geo_encoders/cnn_encoder.py
…e/uhi-tessera

# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.

@vdplasthijs vdplasthijs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work @BachirNILU !

Some questions below. Main thing is we need to make sure your changes of BaseDataset are compatible with everything else, so just trying to understand exactly what they are needed for. thanks!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe change file name to indicate it's specific for Guatemala UC?

Comment thread src/data/base_dataset.py

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gabrieletijunaityte can you double-check this please?

@BachirNILU it looks like you rewrote some stuff and then put the old code back in comments? Any reason to rewrite this, did the previous implementation not work? We'll need to make sure it's compatible with everything else as it's the base dataset. If it's all good and we want to merge, then it would be best to remove the commented code.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesnt look like it would cause any issues, but I agree that for merging we need to remove commented lines out.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vdplasthijs,

This update was a fix while implementing the UHI use case.
I was passing explicit use_aux_data mappings through Hydra, but the auxiliary columns were silently disappearing and the datamodule ended up with use_aux_data=None. I used Claude to help identify the issue.

The cause was that Hydra passes configs as OmegaConf.DictConfig objects, while the existing implementation checked type(use_aux_data) is dict which evaluates to False for DictConfig. As a result, explicit auxiliary-column configurations were being dropped. The fix converts DictConfig to a regular dictionary and uses isinstance(..., dict).

The second change in setup_tessera() came from debugging missing-tile handling.

The commented code should of course be removed, if everything looks fine, I can proceed with deleting these.

Comment thread src/data/heat_guatemala_caption_builder.py
Comment thread src/data/base_dataset.py
from geotessera import GeoTessera

print("Downloading missing Tessera tiles...")
print("[Warning]: it may download tessera tiles filled with 0a")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please restore the warning

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants