Mountain View, CA, 94039, USA
1 day ago
Senior ML Research Engineer – LLM Quantization & Model Optimization
Do you want to be at the forefront of innovating the latest hardware designs to propel Microsoft’s cloud growth? Are you seeking a unique career opportunity that combines technical capabilities, cross team collaboration, with business insight and strategy? Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees, we come together with a growth mindset, innovate to empower others, and collaborate to achieve our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day. Join the Strategic Planning and Architecture (SPARC) team within Microsoft’s Azure Hardware Systems and Infrastructure (AHSI) organization, the team behind Microsoft’s expanding Cloud Infrastructure and for powering Microsoft’s “Intelligent Cloud” mission. Microsoft delivers more than 200 online services to more than one billion individuals worldwide and AHSI is the team behind our expanding cloud infrastructure. We deliver the core infrastructure and foundational technologies for Microsoft's cloud businesses including Microsoft Azure, Bing, MSN, Office 365, OneDrive, Skype, Teams and Xbox Live. We looking for **Senior ML Research Engineer – LLM Quantization & Model Optimization** to join our team! **Responsibilities** + Design and develop novel quantization techniques to enable efficient deployment of LLM inference and training in Microsoft’s Azure production environments. + Drive software development and model optimization tooling proof-of-concept effort to streamline deployment of quantized models. + Analyze performance bottlenecks in state-of-the-art LLM architectures and drive performance improvements. + Prototype and evaluate emerging low-precision data formats through proof-of-concept implementations. + Co-design model architecture optimized for low-precision deployment in close collaboration with companywide AI teams. + Work cross-functionally with data scientists and ML researchers/engineers to align on model accuracy and performance goals. + Partner with hardware architecture and AI software framework teams to ensure end-to-end system efficiency. **Qualifications** **Required/Minimum Qualifications** + Doctorate in relevant field OR equivalent experience. + 4+ years of combined experience, including 2+ years of industry experience in low-precision model optimization and quantization for LLM workloads **Other Qualifications** + Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.   **Preferred Qualifications** + Experience publishing academic papers as a lead author or essential contributor. + Experience participating in a top conference in relevant research domain. + Proven track record in developing production-scale software for model compression and performance optimization. + Proficient with deep learning frameworks such as PyTorch, TensorFlow, TensorRT, and ONNX Runtime. + In-depth understanding of Transformer and LLM architecture, including various model optimization techniques such as quantization, pruning, neural architecture search (NAS), knowledge distillation, sharding/parallelism, KV cache optimization, and FlashAttention. + Hands-on experience in setting up large scale evaluation framework for SOTA LLMs, fine tuning of large models. + Programming skills in Python, C, and C++. + Excellent communication skills and a team-oriented mindset. + Hands-on experience implementing and optimizing low-level linear algebra routines and custom BLAS kernels would be a plus. + Deep knowledge of mixed-precision arithmetic unit microarchitecture would be a plus. Research Sciences IC4 - The typical base pay range for this role across the U.S. is USD $119,800 - $234,700 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $158,400 - $258,000 per year. Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay Microsoft will accept applications for the role until June 27th, 2025. \#AHSI #SPARC Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations (https://careers.microsoft.com/v2/global/en/accessibility.html) .
Confirmar seu email: Enviar Email