📍 Tempe, AZ | 📧 lsuresh4@asu.edu | 🌐 United States
I build software, data, and AI systems for production environments. My work spans backend services, data infrastructure, ML systems, LLM workflows, and GPU-aware inference optimization.
I focus on turning ambiguous requirements into maintainable systems: clear interfaces, tested behavior, practical documentation, and code that can be reviewed and extended by a team.
- 🏗️ Engineering discipline — clean interfaces, tests, CI/CD, observability, reproducibility, and code quality.
- 🔬 AI systems — LLM memory, evaluation, retrieval, ranking, long-horizon behavior, and model reliability.
- 🛠️ Production systems — pipelines, APIs, containers, cloud infrastructure, validation gates, and operational diagnostics.
- ⚡ Performance engineering — CUDA kernels, C++ inference paths, GPU memory optimization, profiling, and low-latency serving.
- 🤝 Open-source mindset — contributing useful fixes, thoughtful tests, and documentation that makes projects easier to maintain.
| Focus | Practice |
|---|---|
| Clear problem framing | Understand constraints, define expected behavior, and make tradeoffs explicit |
| Maintainable systems | Prefer reusable pipelines, typed interfaces, versioned data, and automation |
| Reliability | Use tests, validation gates, observability, and reproducible workflows |
| Communication | Translate technical details into decisions, risks, and next steps |
| Code quality | Keep changes reviewable, documented, and easy to extend |
Recent public contributions across SDKs, ML infrastructure, GPU/runtime tooling, and HPC documentation.
| Project | What I Contributed | Status |
|---|---|---|
| temporalio/sdk-python#1556 | Exposed a public JSONTypeConverterUnhandled sentinel type and updated converter tests/docs. |
✅ Merged May 28, 2026 |
| triton-lang/triton#10425 | Fixed autotune disk-cache invalidation for custom do_bench callables. |
🔄 Open |
| NVIDIA/TensorRT#4779 | Fixed Polygraphy data to-input multi-iteration aliasing and added regression coverage. |
🔄 Open |
| microsoft/onnxruntime#28534 | Added WebGPU program reserve helpers and capacity hints to reduce reallocations. | 🔄 Open |
| llnl/RAJA#2032 | Documented reducer helper utilities and validated the generated Sphinx docs. | 🔄 Open |
Open source is where I practice careful engineering in public: fixes, performance work, tests, and documentation that make a project easier to use and maintain.
My work sits across a few connected areas:
| Area | Strengths |
|---|---|
| Software Engineer | Backend systems, APIs, testing, CI/CD, code quality, open-source fixes, production ownership |
| Infrastructure / DevOps | Docker, Linux, orchestration, automation, reliability, cloud-native deployment, IaC-style delivery |
| Data Engineer | Pipelines, warehouses, validation gates, Airflow/dbt-style orchestration, analytics systems |
| AI / LLM Engineer | RAG, agentic workflows, fine-tuning, prompt/context systems, evaluation harnesses, model reliability |
| ML Acceleration Engineer | CUDA kernels, GPU memory optimization, C++ inference, profiling, Python-to-C++ integration |
| Research / AI Systems | Memory governance, long-horizon agents, evaluation methodology, reproducibility, publications |
Nov 2024 – Present
Working on data reliability, cloud infrastructure, and operational tooling, with a focus on pipelines and services where correctness, access control, and repeatability matter.
Key Contributions
- 🏛️ Governed data pipelines — Built Airflow/dbt-style warehouse workflows with clear inputs/outputs, stable schemas, validation checks, and IAM-aware access boundaries.
- ✅ Data quality as a system — Designed validation and reconciliation logic to catch silent failures before downstream consumers rely on bad data.
- 🚦 Operational diagnostics — Developed Python/C++ diagnostics and test coverage to reduce time-to-detect and improve reliability of distributed processing tasks.
- 🔗 API-driven integrations — Built FastAPI service surfaces for pipeline health, data quality, and operational visibility.
Python C++ SQL FastAPI AWS Airflow dbt Great Expectations Docker CI/CD PyTest UnitTest IAM OAuth
Sep 2021 – Nov 2023
Joined as the first ML/AI engineer to build the AI layer of an early-stage job-matching platform from the ground up: ranking, recommendations, data pipelines, model services, and deployment workflows.
Key Contributions
- 🔍 Retrieval and ranking — Fine-tuned Transformer/BERT-style models for job-candidate relevance and improved ranking quality through evaluation-driven iteration.
- 🔄 ML data pipelines — Built Python/Spark ETL workflows across relational and document stores to keep model data fresh and production-ready.
- 🎯 Recommendation systems — Built skill-gap and job recommendation systems using RAG-style retrieval, vector search, and feature-based ranking baselines.
- ⚡ Inference acceleration — Developed CUDA kernels and C++ preprocessing paths, profiled bottlenecks with NVIDIA tooling, and reduced inference latency from 250ms to 120ms.
- 🧠 Agentic and LLM systems — Built RAG-style retrieval pipelines with LanceDB/Pinecone, experimented with prompt strategies, and tracked model behavior across evaluation workflows.
- 🐳 Production integration — Packaged models with Docker, tracked 25+ model versions with MLflow, and integrated services into CI/CD-backed deployments.
Python C++ CUDA PyTorch TensorFlow Keras scikit-learn XGBoost Hugging Face BERT LangChain RAG LanceDB Pinecone FastAPI Spark Docker Kubernetes MLflow pybind11
Dec 2023 – Jul 2024
Combined analytics, forecasting, automation, and reporting to support editorial and product decisions.
Key Contributions
- 📊 KPI ownership — Built dashboards and reporting workflows across editorial, product, growth, and operations stakeholders.
- 📈 Forecasting and planning — Used Python, Pandas, NumPy, and SciPy for time-series forecasting and scenario planning.
- ⚙️ Workflow automation — Automated recurring analysis and reporting cycles with Python/SQL and interactive reporting.
Python SQL Pandas NumPy SciPy Power BI Tableau Google Analytics D3.js Forecasting
Oct 2023 – Jan 2024
Worked on prompt tasks and evaluation-style workflows for model behavior, instruction following, ambiguity handling, and consistency.
Focus areas: chain-of-thought prompting, instruction tuning, RLHF/SFT workflows, evaluation tasks, statistical validation, model behavior analysis, and inference efficiency tradeoffs.
OpenAI Models Prompt Engineering Context Management Chain-of-Thought RLHF SFT LangChain Ragas Python
Arizona State University · Jul 2025 – Present
Long-running agents do not just need more context. They need governed memory: what gets stored, what expires, what is allowed back into the prompt, how contradictions are resolved, and how token budgets are spent.
MemoryArchitect is a model-agnostic external memory governance layer for LLM agents. It treats memory as a constrained, auditable resource rather than a passive transcript or naive similarity-search log.
| Governance Stage | What It Controls |
|---|---|
| Write policy | Filters noise, duplicates, injection attempts, and low-value traces before storage |
| Metadata & provenance | Tracks source, time scope, trust, sensitivity, and retrieval eligibility |
| TTL / decay | Applies configurable forgetting behavior by memory type |
| Consolidation | Compresses episodic traces into compact semantic summaries |
| Contradiction handling | Flags conflicting facts before they reach the model context |
| Token budget arbitration | Selects useful memories under hard context-window limits |
| Compliance layer | Supports deletion cascades and "do not store" style policies |
Python LangChain LangGraph OpenAI Models Hugging Face RAG Pinecone LanceDB Ragas MLflow
Multimodal AI-Based Workload Relocation Strategy for Reducing Carbon Emissions in Multi-Cloud Environments
IEEE Xplore · ICECONF 2025
DOI: 10.1109/ICECONF65644.2025.11379581
Research on carbon-aware workload relocation in multi-cloud environments using reinforcement learning, forecasting, real-time API signals, and constraint-based optimization.
Ray RLlib PyTorch Hugging Face Transformers LSTM Carbon-Aware Scheduling Energy Modeling Pandas Python
Python · SQL · C/C++ · Java · Go · Rust · TypeScript · REST APIs · Microservices · Testing frameworks
PyTorch · TensorFlow · Keras · scikit-learn · XGBoost · Transformers · BERT · ONNX · RAG · LangChain · LangGraph · LlamaIndex · LanceDB · Pinecone · Prompt/context management · Fine-tuning strategy · Chain-of-thought prompting · Agentic workflows · Evaluation harnesses
CUDA kernel development · GPU memory optimization · CUDA stream management · custom neural network layers · C++ inference backends · low-latency model serving · tensor layout optimization · pre/post-processing acceleration · pybind11 · ctypes · GDB · Valgrind · nvprof · NVIDIA Nsight Systems
AWS · GCP · Docker · Kubernetes · Airflow · dbt-style workflows · Spark · PostgreSQL · MySQL · MongoDB · CI/CD · Linux · IaC-style automation
Automated testing · validation gates · MLflow · data quality checks · diagnostics · performance profiling · reproducible workflows · clear validation notes
- 📝 ICLR 2026 Reviewer — technical review experience across modern AI research and evaluation methodology.
- 🌱 Published researcher — carbon-aware multi-cloud workload relocation, sustainability, and AI-driven optimization.
- 🤝 Open-source contributor — practical fixes and documentation improvements across production-grade repositories.
- 🌐 Cross-functional work — experience translating technical systems into decisions, risks, and implementation plans.
I value readable code, reproducible workflows, and clear ownership. I treat tests and documentation as part of the product. I pay attention to constraints: latency, cost, infrastructure, and maintainability. I write with the next engineer in mind.

