Machine learningPyTorchProblem solvingQuantizationDistributed trainingModel parallelismCUDAAI safety
GPU Kernel Development: CUDA, Triton, CUTLASS, Flash Attention, tensor core optimization ML Compilers & Frameworks: PyTorch/JAX internals, torch.compile, XLA,…
