Backed by top-tier venture and strategic investors, we are a stealth mode startup, pioneering system-level innovations for data center-scale inference enabled by an entirely new category of SOC.
Full-time · Onsite · Santa Clara, CA.
Write and optimize high-performance GPU kernels for next-generation AI inference systems.
Full-time · Onsite · Santa Clara, CA.
Drive performance in vLLM, SGLang, and PyTorch. Own cluster scheduling for throughput and latency.
Full-time · Onsite · Santa Clara, CA.
Build compiler capabilities from PyTorch through Triton down to CUDA and machine IR on leading-edge accelerators.
Full-time · Onsite · Santa Clara, CA.
Build functional and performance models across abstraction levels to guide architecture and accelerate HW-SW co-design.
Full-time · Onsite · Santa Clara, CA.
Define compute blocks and data movement algorithms across all interfaces and compute blocks on a leading-edge SoC.
Full-time · Onsite · Santa Clara, CA.
Implement RTL blocks in close collaboration with architects and micro-architects.
2445 Augustine Dr Ste 150
Santa Clara, CA 95054