USA
4 days ago
Senior Principal Systems Software Engineer - AI Infrastructure Innovation

At Oracle Cloud Infrastructure (OCI), we build the future of the cloud for Enterprises as a diverse team of fellow creators and inventors. We act with the speed and attitude of a start-up, with the scale and customer-focus of the leading enterprise software company in the world. You are the builder here. We are at the forefront of AI innovation, exploring the next generation of AI accelerators and hardware solutions. As part of our growing team, you will be involved in working on the development, optimization, and enhancement of our virtualization platforms and kernel subsystems. You will collaborate with architects, systems engineers, and DevOps teams to design and implement robust, high-performance solutions that scale across large, distributed systems.

 

Responsibilities

Develop infrastructure software and tools for large-scale AI, LLM, and GenAI infrastructure. Propose and evaluate groundbreaking hardware, system, and software innovations to significantly enhance AI training and inference performance and efficiency Guide strategic decisions around Oracle Cloud’s AI Infra offerings.  Architect, and implement new forward-looking driver features and APIs Profile and benchmark state-of-the-art technologies to find bottlenecks across the stack to push the boundaries of training and inference performance

 

Qualifications

Bachelor of Science or Master of Science degree in Computer Science, Electrical Engineering, or related field (or equivalent experience)  10+ years of relevant systems software development experience Demonstrated ability to write great code using Java, GoLang, C#, C++, Python, etc Solid understanding of networking (TCP/IP, Infiniband, RoCE, etc)  Experience with operating system and system software  Knowledge of CPU, GPU architectures, memory coherence and consistency models  Hands-on coding- Ability to write efficient, production-quality code and debug complex distributed systems.  Experience with processor and system-level performance modeling.  Proven track record in building and scaling large-scale distributed systems.

 

Confirmar seu email: Enviar Email
Todos os Empregos de Oracle