Senior Data Engineer (Pyspark, Hadoop, Scala, Hive)- Assistant Vice President

Chennai, IND

2 days ago

Citigroup

**Senior Data Engineer** We are seeking a highly skilled and motivated **Senior Data Engineer** to design, develop, and implement cutting-edge data engineering solutions using modern big data and cloud technologies. In this role, you will collaborate with product owners, data scientists, analysts, and technologists to deliver scalable, high-performance data products in an agile and collaborative environment. You will also play a key role in migrating legacy workloads to the cloud, optimizing data pipelines, and mentoring team members on best practices in data engineering. **Key Responsibilities** + Design and develop scalable big data solutions using platforms like Hadoop, Snowflake, or other modern data ecosystems. + Collaborate with domain experts, product managers, analysts, and data scientists to build robust and efficient data pipelines. + Lead the migration of legacy workloads to cloud platforms (AWS, Azure, or GCP) while ensuring seamless integration and optimization. + Develop and implement cloud-native solutions for data processing and storage. + Partner with data scientists to build data pipelines from heterogeneous sources and provide engineering support for data science applications. + Enable advanced analytics and machine learning workflows by delivering high-quality data pipelines. + Implement CI/CD pipelines to automate data engineering workflows across cloud and on-premises platforms. + Drive automation to improve efficiency and reduce manual intervention in data processes. + Research and evaluate open-source technologies and recommend their integration into the data platform to enhance functionality and scalability. + Act as a technical expert and mentor team members on big data and cloud technologies. + Define and enforce coding standards, reusable components, and consistent patterns for data engineering processes. + Convert SAS-based pipelines into modern frameworks like PySpark, Scala, or Java for execution on Hadoop and non-Hadoop ecosystems. + Optimize big data applications for performance and scalability across platforms. + Analyze evolving business requirements and recommend enhancements or alternatives to current systems. + Evaluate new IT developments and industry standards to ensure the data platform remains cutting-edge. + Foster a collaborative and high-performing team environment. + Ensure compliance with applicable laws, regulations, and organizational policies. + Apply sound ethical judgment and escalate control issues transparently. **Qualifications** + 8+ years of experience with Hadoop (Cloudera) and big data technologies. + Advanced knowledge of the Hadoop ecosystem, including HDFS, MapReduce, Hive, Pig, Impala, Spark, Kafka, Kudu, and Solr. + Proficiency in Java, Python, or Scala. + Hands-on experience with Spark programming (PySpark, Scala, or Java). + Familiarity with Apache Beam is a plus. + Experience with cloud platforms like AWS, Azure, or GCP. + Proven ability to deploy and manage data solutions on cloud platforms. + Expertise in designing and developing data pipelines for ingestion, transformation, and processing. + Experience with Snowflake or Delta Lake is a strong advantage. + Hands-on experience with containerization tools like Docker and Kubernetes. + Proficiency in DevOps practices, including source control, CI/CD, and automated deployments. + Experience with Python libraries for machine learning and data science workflows. + Strong knowledge of data structures, algorithms, distributed storage, and compute systems. + 1+ year of SAS experience preferred. + 1+ year of Hadoop administration experience preferred. + Strong problem-solving and analytical skills. + Excellent interpersonal and teamwork abilities. + Proven leadership experience, including mentoring and managing a team of data engineers and analysts. + A proactive, "can-do" attitude for solving complex business problems. **Education** + Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience). This revised job description is concise, well-structured, and highlights the key responsibilities, qualifications, and benefits of the role. It is tailored to attract experienced data engineers with expertise in big data, cloud platforms, and leadership. ------------------------------------------------------ **Job Family Group:** Technology ------------------------------------------------------ **Job Family:** Data Science ------------------------------------------------------ **Time Type:** Full time ------------------------------------------------------ **Most Relevant Skills** Please see the requirements listed above. ------------------------------------------------------ **Other Relevant Skills** For complementary skills, please see above and/or contact the recruiter. ------------------------------------------------------ _Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law._ _If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review_ _Accessibility at Citi (https://www.citigroup.com/citi/accessibility/application-accessibility.htm)_ _._ _View Citi’s_ _EEO Policy Statement (https://www.citigroup.com/global/eeo-aa-policy)_ _and the_ _Know Your Rights (https://www.eeoc.gov/sites/default/files/2023-06/22-088\_EEOC\_KnowYourRights6.12ScreenRdr.pdf)_ _poster._ Citi is an equal opportunity and affirmative action employer. Minority/Female/Veteran/Individuals with Disabilities/Sexual Orientation/Gender Identity.

Mostrar Mais

Salvar & Candidatar-se depois Applying Later... Click to ApplyI AppliedDidn't Apply

Confirmar seu email: Enviar Email

Candidatar-se à essa vaga

Próxima Vaga »

Todos os Empregos de Citigroup

Vagas de emprego de 272 Citigroup em Chennai