Deok Filho

Senior Product Manager at CoreWeave, building infrastructure that powers the future of intelligence. Leading SUNK (Slurm on Kubernetes) and Sandboxes.

Born in Brazil, grew up in family business → Brooks School → Northwestern Engineering → Google Cloud → CoreWeave.

Deok Filho

Projects & Publications

Product Launch

New CoreWeave SUNK Capabilities Help Teams Build Modern AI Research Clusters

April 30, 2026 • CoreWeave

SUNK Self-service GA delivers one-click managed Slurm-on-Kubernetes clusters; SUNK Anywhere extends the same operating model to GKE, EKS, and other Kubernetes deployments.

By Deok Filho
Product Launch

Automated User Provisioning for Slurm Clusters

November 20, 2025 • CoreWeave

Industry-first automation synchronizing enterprise identities into SUNK clusters using SCIM protocols.

By Deok Filho, Andy Manoske
Blog Post

Scaling Reinforcement Learning with torchforge on CoreWeave Cloud

October 22, 2025 • CoreWeave Blog

Meta, Stanford, and CoreWeave partnership achieving post-training runs on 512 H100 GPUs with improved efficiency.

By Deok Filho, Aaron Batilo
Whitepaper

CoreWeave Training Benchmarks Whitepaper

August 2025 • CoreWeave

Comprehensive analysis of AI training performance benchmarks and optimization strategies for large-scale GPU clusters.

Product Launch

W&B Inference powered by CoreWeave

June 17, 2025 • Weights & Biases

Seamless CoreWeave integration enabling ML teams to discover and evaluate open-source models within W&B.

Blog Post

MLPerf Record with NVIDIA GB200 Blackwell Cluster

June 4, 2025 • CoreWeave

Largest-ever MLPerf submission with 2,496 NVIDIA Blackwell GPUs achieving 2x faster training at 91% scaling efficiency.

Academic Project

inControl - Accessible Gaming Controller

Northwestern Engineering

One-handed video game controller for patients with hemiplegia with provisional patent filed.

Northwestern Engineering / Shirley Ryan AbilityLab

Talks, Presentations & Demos

SUNK Self-service Demo
Video

SUNK Self-service Demo

April 30, 2026 • CoreWeave

One-click production Slurm-on-Kubernetes clusters: nodes flow into Slurm, IAM SSH access, right-sized control plane, shared filesystem, and managed lifecycle. Also deployable as a Kubernetes CR for GitOps workflows.

By Deok Filho
SUNK Anywhere Demo
Video

SUNK Anywhere Demo

April 30, 2026 • CoreWeave

Extends SUNK to GKE, EKS, and other Kubernetes deployments for one operating model across providers. Agent skills handle deployment with shared storage, GPU/Slurm metrics, and dashboards wired in.

By Deok Filho
Training At Scale With Confidence (SkyPilot Integration w/ CoreWeave SUNK)
Video

Training At Scale With Confidence: SkyPilot Integration with CoreWeave SUNK

March 25, 2026 • SkyPilot × CoreWeave AI Infra Meetup, W&B office, SF

SUNK overview plus the launch of running SkyPilot on SUNK: self-service clusters, ~70–80 hour MTBF on 1k-GPU jobs, automatic node replacement and job requeue, topology-aware scheduling, and GPU straggler detection via a NCCL plugin.

By Deok Filho
How to Measure and Optimize AI Infrastructure for Large-Scale Training
Video

How to Measure and Optimize AI Infrastructure for Large-Scale Training

August 28, 2025 • CoreWeave

Discussion on achieving 20% more throughput and 97-98% utilization through optimization techniques.

With Wes Brown, Distinguished Engineer at CoreWeave
SUNK User Provisioning Demo
Video

SUNK User Provisioning Demo

November 20, 2025 • CoreWeave

Demonstration of automated user provisioning for HPC workload management.

Networking Configurations on Google Cloud
Video

Networking Configurations on Google Cloud

December 21, 2021 • Google Cloud

Technical deep dive on networking configurations and best practices for cloud infrastructure.

Configuring Cloud Operations on Google Cloud
Video

Configuring Cloud Operations on Google Cloud

January 24, 2022 • Google Cloud

Comprehensive guide to setting up and managing cloud operations for production workloads.

API management with Apigee
Video

API management with Apigee

November 23, 2022 • Google Cloud

In-depth exploration of API management patterns and implementation strategies using Apigee.

Unofficial YouTube Stuff (in Portuguese 🇧🇷)