Show HN: cuTile Rust: Safe, data-race-free GPU kernels in Rust
- Programming
- Hardware
- AI
- Developer Tools
- Open Source
cuTile Rust is an early-stage NVIDIA Labs project for writing GPU kernels in Rust without exposing users to the usual unsafe, race-prone kernel programming model. The core idea is to carry Rust’s ownership and borrowing rules across the CPU-to-GPU boundary. On the host, you split output tensors into disjoint mutable pieces and pass shared read-only inputs. In the kernel, you write code with single-threaded tile semantics, and the compiler lowers that to NVIDIA’s Tile IR, handling thread blocks and shared memory for you. The author claims that for the safe surface API this gives compile-time data-race freedom, while still hitting near-cuBLAS performance on some GEMM cases and strong bandwidth on elementwise kernels. The release is explicitly young. Some patterns still need raw pointers, low-precision support just landed, and the model trades away low-level SIMT control for safety and a higher-level tile abstraction.
If you already ship Rust and need custom NVIDIA kernels, this looks like a credible new option for keeping kernels in one language and one binary without paying an obvious performance tax. The main limitation is strategic, not ergonomic: today it is tied to NVIDIA’s Tile IR, so adopting it means buying into that backend.
- github.com
- Discuss on HN