[Remote] Strategic Technical Account Manager GPU
Note: The job is a remote job and is open to candidates in USA. Vultr is a leading cloud infrastructure company focused on providing high-performance solutions for enterprises and AI innovators. The Strategic Technical Account Manager for GPU will lead the post-sales technical success of customers deploying large-scale AI workloads, acting as a trusted advisor to optimize performance and manage technical relationships.
Responsibilities
- Lead onboarding for customers deploying GPU clusters (bare metal, VMs, or hybrid)
- Advise on cluster design: multi-GPU topology, NVLink/NVSwitch considerations, RDMA, Infiniband and RoCE Ethernet, networking throughput, and storage IOPS requirements
- Guide customers in selecting GPU types and configurations based on workload (training, fine-tuning, inference, embeddings, RAG pipelines)
- Support distributed frameworks: PyTorch, TensorFlow, DeepSpeed, Megatron, JAX, Ray, Mosaic, HuggingFace, etc
- Advanced hands on Kubernetes skills
- Advanced hands on SLURM skills
- Identify bottlenecks (network, storage, memory bandwidth)
- Provide tuning recommendations for batch size, mixed precision, parallelization strategies, and checkpointing
- Help customers evaluate cost vs. performance tradeoffs (GPU mix, CPU pairing, instance types, cluster sizing)
- Own the long-term technical strategy across assigned GPU/AI accounts, including hyperscalers, labs, and high-growth AI startups
- Host recurring technical review meetings, roadmap reviews, and optimization sessions
- Define scaling plans, future GPU reservation needs, and capacity forecasting
- Partner with Support, SRE, Networking, NOC, and Product Management & Engineering to resolve high-urgency incidents
- Manage outage communications, corrective action plans, and postmortem reviews with customers
- Advocate for GPU reliability improvements and influence roadmap priorities
- Identify opportunities for expanded clusters, high speed storage, or networking upgrades
- Support Sales with technical validation and architecture diagrams needed for expansion
- Provide structured feedback on existing and future GPU offerings, networking fabrics, storage platforms, and upcoming AI/ML platform features
- Partner with Product on early access programs (new GPUs, pipelines, orchestration, etc.)
Skills
- 2–5+ years as an AI/ML Engineer, AI/ML Ops, Technical Account Manager, HPC Engineer, Sales/Solutions Engineer or relevant technical role
- Strong knowledge of GPU hardware architectures (NVIDIA/AMD), CUDA/ROCm, distributed training, and ML frameworks
- Experience with Linux tuning, networking (Infiniband, RoCE fabrics)
- Experience with high-performance storage systems (DDN, NetApp, Vast, Weka, etc.)
- Ability to communicate complex concepts clearly to both executives and engineering teams
- Prior experience supporting hyperscale, AI labs, or large cluster deployments is a plus
- Cloud Native Computing Foundation Certified Kubernetes Administrator (CKA) certification is a plus
Benefits
- 100% company-paid insurance premiums for employee medical, dental and vision plans.
- 401(k) plan that matches 100% up to 4%, with immediate vesting
- Professional Development Reimbursement of $2,500 each year
- 11 Holidays + Paid Time Off Accrual + Rollover Plan
- Commitment matters to Vultr! Increased PTO at 3 year and 10 year anniversary + 1 month paid sabbatical every 5 years + Anniversary Bonus each year
- $500 stipend for remote office setup in first year + $400 each following year
- Internet reimbursement up to $75 per month
- Gym membership reimbursement up to $50 per month
- Company paid Wellable subscription
Company Overview
Company H1B Sponsorship