Senior Data Engineer (Pyspark, Hadoop, Scala, Hive)- Assistant Vice President
Citigroup
**Senior Data Engineer**
We are seeking a highly skilled and motivated **Senior Data Engineer** to design, develop, and implement cutting-edge data engineering solutions using modern big data and cloud technologies. In this role, you will collaborate with product owners, data scientists, analysts, and technologists to deliver scalable, high-performance data products in an agile and collaborative environment. You will also play a key role in migrating legacy workloads to the cloud, optimizing data pipelines, and mentoring team members on best practices in data engineering.
**Key Responsibilities**
+ Design and develop scalable big data solutions using platforms like Hadoop, Snowflake, or other modern data ecosystems.
+ Collaborate with domain experts, product managers, analysts, and data scientists to build robust and efficient data pipelines.
+ Lead the migration of legacy workloads to cloud platforms (AWS, Azure, or GCP) while ensuring seamless integration and optimization.
+ Develop and implement cloud-native solutions for data processing and storage.
+ Partner with data scientists to build data pipelines from heterogeneous sources and provide engineering support for data science applications.
+ Enable advanced analytics and machine learning workflows by delivering high-quality data pipelines.
+ Implement CI/CD pipelines to automate data engineering workflows across cloud and on-premises platforms.
+ Drive automation to improve efficiency and reduce manual intervention in data processes.
+ Research and evaluate open-source technologies and recommend their integration into the data platform to enhance functionality and scalability.
+ Act as a technical expert and mentor team members on big data and cloud technologies.
+ Define and enforce coding standards, reusable components, and consistent patterns for data engineering processes.
+ Convert SAS-based pipelines into modern frameworks like PySpark, Scala, or Java for execution on Hadoop and non-Hadoop ecosystems.
+ Optimize big data applications for performance and scalability across platforms.
+ Analyze evolving business requirements and recommend enhancements or alternatives to current systems.
+ Evaluate new IT developments and industry standards to ensure the data platform remains cutting-edge.
+ Foster a collaborative and high-performing team environment.
+ Ensure compliance with applicable laws, regulations, and organizational policies.
+ Apply sound ethical judgment and escalate control issues transparently.
**Qualifications**
+ 8+ years of experience with Hadoop (Cloudera) and big data technologies.
+ Advanced knowledge of the Hadoop ecosystem, including HDFS, MapReduce, Hive, Pig, Impala, Spark, Kafka, Kudu, and Solr.
+ Proficiency in Java, Python, or Scala.
+ Hands-on experience with Spark programming (PySpark, Scala, or Java).
+ Familiarity with Apache Beam is a plus.
+ Experience with cloud platforms like AWS, Azure, or GCP.
+ Proven ability to deploy and manage data solutions on cloud platforms.
+ Expertise in designing and developing data pipelines for ingestion, transformation, and processing.
+ Experience with Snowflake or Delta Lake is a strong advantage.
+ Hands-on experience with containerization tools like Docker and Kubernetes.
+ Proficiency in DevOps practices, including source control, CI/CD, and automated deployments.
+ Experience with Python libraries for machine learning and data science workflows.
+ Strong knowledge of data structures, algorithms, distributed storage, and compute systems.
+ 1+ year of SAS experience preferred.
+ 1+ year of Hadoop administration experience preferred.
+ Strong problem-solving and analytical skills.
+ Excellent interpersonal and teamwork abilities.
+ Proven leadership experience, including mentoring and managing a team of data engineers and analysts.
+ A proactive, "can-do" attitude for solving complex business problems.
**Education**
+ Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience).
This revised job description is concise, well-structured, and highlights the key responsibilities, qualifications, and benefits of the role. It is tailored to attract experienced data engineers with expertise in big data, cloud platforms, and leadership.
------------------------------------------------------
**Job Family Group:**
Technology
------------------------------------------------------
**Job Family:**
Data Science
------------------------------------------------------
**Time Type:**
Full time
------------------------------------------------------
**Most Relevant Skills**
Please see the requirements listed above.
------------------------------------------------------
**Other Relevant Skills**
For complementary skills, please see above and/or contact the recruiter.
------------------------------------------------------
_Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law._
_If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review_ _Accessibility at Citi (https://www.citigroup.com/citi/accessibility/application-accessibility.htm)_ _._
_View Citi’s_ _EEO Policy Statement (https://www.citigroup.com/global/eeo-aa-policy)_ _and the_ _Know Your Rights (https://www.eeoc.gov/sites/default/files/2023-06/22-088\_EEOC\_KnowYourRights6.12ScreenRdr.pdf)_ _poster._
Citi is an equal opportunity and affirmative action employer.
Minority/Female/Veteran/Individuals with Disabilities/Sexual Orientation/Gender Identity.
Confirmar seu email: Enviar Email
Todos os Empregos de Citigroup