Senior Product Manager at CoreWeave, building infrastructure that powers the future of intelligence. Leading SUNK (Slurm on Kubernetes) and Sandboxes.
Born in Brazil, grew up in family business → Brooks School → Northwestern Engineering → Google Cloud → CoreWeave.
SUNK Self-service GA delivers one-click managed Slurm-on-Kubernetes clusters; SUNK Anywhere extends the same operating model to GKE, EKS, and other Kubernetes deployments.
Industry-first automation synchronizing enterprise identities into SUNK clusters using SCIM protocols.
Meta, Stanford, and CoreWeave partnership achieving post-training runs on 512 H100 GPUs with improved efficiency.
Comprehensive analysis of AI training performance benchmarks and optimization strategies for large-scale GPU clusters.
Seamless CoreWeave integration enabling ML teams to discover and evaluate open-source models within W&B.
Largest-ever MLPerf submission with 2,496 NVIDIA Blackwell GPUs achieving 2x faster training at 91% scaling efficiency.
One-handed video game controller for patients with hemiplegia with provisional patent filed.
One-click production Slurm-on-Kubernetes clusters: nodes flow into Slurm, IAM SSH access, right-sized control plane, shared filesystem, and managed lifecycle. Also deployable as a Kubernetes CR for GitOps workflows.
Extends SUNK to GKE, EKS, and other Kubernetes deployments for one operating model across providers. Agent skills handle deployment with shared storage, GPU/Slurm metrics, and dashboards wired in.
SUNK overview plus the launch of running SkyPilot on SUNK: self-service clusters, ~70–80 hour MTBF on 1k-GPU jobs, automatic node replacement and job requeue, topology-aware scheduling, and GPU straggler detection via a NCCL plugin.
Discussion on achieving 20% more throughput and 97-98% utilization through optimization techniques.
Demonstration of automated user provisioning for HPC workload management.
Technical deep dive on networking configurations and best practices for cloud infrastructure.
Comprehensive guide to setting up and managing cloud operations for production workloads.
In-depth exploration of API management patterns and implementation strategies using Apigee.