[TLE-Raw] Add DSLRegion metadata for deferred vendor lowering and fix redundant-copy removal by i3wanna2 · Pull Request #700 · flagos-ai/FlagTree

i3wanna2 · 2026-06-15T08:26:37Z

Summary

This PR extends tle.dsl_region with metadata (region_dialect, arg_dialect, output_operand_indices, optional hint) to support vendor-specific DSL regions
and a deferred-lowering skeleton, while keeping the current eager CUDA path working.

Changes

• Refactor Python/C++ creation path (call / call_smem, CUDA runtime, source_store, deferred create API)
• Keep ConvertArgToMemDesc as a pure conversion pass
• Fix RemoveRedundantCopy to use output_operand_indices for shared-memory redundant-copy elimination
• Auto-derive alias indices from LLVM return analysis when output_indices is not provided
• Add minimal TOPS Python adapter for compatibility with the shared creation API

Notes

• TOPS remains eager-only (no deferred logic)
• Deferred materialization is out of scope in this PR

Add output_operand_indices/hint attrs, eager and deferred create paths, source_store, and split ConvertArgToMemDesc from RemoveRedundantCopy so loop accumulator redundancy is eliminated in the dedicated pass.

The refactored core.py routes region creation through JIT function helpers; add the same method to MLIRJITFunction used by mlir tutorials.

Default deferred=False preserves eager behavior; callers opt in with deferred=True on the CUDA dialect decorator.

Temporary commit for backup only — not intended for review or merge. Work in progress on NVIDIA deferred tle_raw materialize scaffolding. - Move deferred_raw_materialize to make_llir (before dsl_region_inline) - Keep convert_arg_to_memdesc/remove_redundant_copy in make_ttgir - Add NVIDIA deferred_raw.py + MaterializeDeferredRaw pass skeleton - Add deferred CUDA unit test fixture

Extract shared materialize logic into TritonTLERawUtils, wire the NVIDIA make_llir pass to compile pending sources and fill stub dsl_regions before inline. Add deferred runtime hooks for CUDA/MLIR and defered tutorials; remove in-repo unit test fixtures moved to external debug workspace.

i3wanna2 added 2 commits June 15, 2026 08:25

Restore tle-raw DSLRegion metadata and deferred skeleton.

033806d

Add output_operand_indices/hint attrs, eager and deferred create paths, source_store, and split ConvertArgToMemDesc from RemoveRedundantCopy so loop accumulator redundancy is eliminated in the dedicated pass.

Keep TOPS tle-raw region creation compatible.

d2b5e27

i3wanna2 requested review from sunnycase and zhzhcookie as code owners June 15, 2026 08:26

github-actions Bot added tle triton_v3.6.x labels Jun 15, 2026

flagtree-bot and others added 5 commits June 15, 2026 08:28

Apply code-format changes

8e16caa

Fix MLIR tle_raw backend missing create_region_by_llvm.

1970650

The refactored core.py routes region creation through JIT function helpers; add the same method to MLIRJITFunction used by mlir tutorials.

Move tle-raw deferred flag from module global to @dialect parameter.

b054253

Default deferred=False preserves eager behavior; callers opt in with deferred=True on the CUDA dialect decorator.

Apply code-format changes

c3e61c4

github-actions Bot added the nvidia label Jun 17, 2026

flagtree-bot and others added 3 commits June 17, 2026 10:15

Apply code-format changes

67cf26d

Apply code-format changes

6193efe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TLE-Raw] Add DSLRegion metadata for deferred vendor lowering and fix redundant-copy removal#700

[TLE-Raw] Add DSLRegion metadata for deferred vendor lowering and fix redundant-copy removal#700
i3wanna2 wants to merge 10 commits into
triton_v3.6.xfrom
lxy/tle-raw-restore

i3wanna2 commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

i3wanna2 commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants