Senior Data Engineer - Python & Pyspark
Citigroup
The Senior Data Engineer will be responsible for the architecture, design, development, and maintenance of our data platforms, with a strong focus on leveraging Python and PySpark for data processing and transformation. This role requires a strong technical leader who can work independently and as part of a team, contributing to the overall data strategy and helping to drive data-driven decision-making across the organization.
**Key Responsibilities**
+ **Data Architecture & Design:** Design, develop, and optimize data architectures, pipelines, and data models to support various business needs, including analytics, reporting, and machine learning.
+ **ETL/ELT Development (Python/PySpark Focus):** Build, test, and deploy highly scalable and efficient ETL/ELT processes usingPython and PySpark to ingest, transform, and load data from diverse sources into data warehouses and data lakes. Develop and optimize complex data transformations using PySpark.
+ **Data Quality & Governance:** Implement best practices for data quality, data governance, and data security to ensure the integrity, reliability, and privacy of our data assets.
+ **Performance Optimization:** Monitor, troubleshoot, and optimize data pipeline performance, ensuring data availability and timely delivery, particularly for PySpark jobs.
+ **Infrastructure Management:** Collaborate with DevOps and MLOps teams to manage and optimize data infrastructure, including cloud resources (AWS, Azure, GCP), databases, and data processing frameworks, ensuring efficient operation of PySpark clusters.
+ **Mentorship & Leadership:** Provide technical guidance, mentorship, and code reviews to junior data engineers, particularly in Python and PySpark best practices, fostering a culture of excellence and continuous improvement.
+ **Collaboration:** Work closely with data scientists, analysts, product managers, and other stakeholders to understand data requirements and deliver solutions that meet business objectives.
+ **Innovation:** Research and evaluate new data technologies, tools, and methodologies to enhance our data capabilities and stay ahead of industry trends.
+ **Documentation:** Create and maintain comprehensive documentation for data pipelines, data models, and data infrastructure.
**Qualifications**
**Education**
+ Bachelor's or Master's degree in Computer Science, Software Engineering, Data Science, or a related quantitative field.
**Experience**
+ 5+ years of professional experience in data engineering, with a strong emphasis on building and maintaining large-scale data systems.
+ Extensive hands-on experience with Python for data engineering tasks.
+ Proven experience with PySpark for big data processing and transformation.
+ Proven experience with cloud data platforms (e.g., AWS Redshift, S3, EMR, Glue; Azure Data Lake, Databricks, Synapse; Google BigQuery, Dataflow).
+ Strong experience with SQL and NoSQL databases (e.g., PostgreSQL, MySQL, MongoDB, Cassandra).
+ Extensive experience with distributed data processing frameworks, especially Apache Spark.
**Technical Skills**
+ Programming Languages: Expert proficiency inPython is mandatory. Strong SQL mastery is essential. Familiarity with Scala or Java is a plus.
+ Big Data Technologies: In-depth knowledge and hands-on experience withApache Spark (PySpark) for data processing, including Spark SQL, Spark Streaming, and DataFrame API. Experience with Apache Kafka, Apache Airflow, Delta Lake, or similar technologies.
+ Data Warehousing: In-depth knowledge of data warehousing concepts, dimensional modeling, and ETL/ELT processes.
+ Cloud Platforms: Hands-on experience with at least one major cloud provider (AWS, Azure, GCP) and their data services, particularly those supporting Spark/PySpark workloads.
+ Containerization: Familiarity with Docker and Kubernetes is a plus.
+ Version Control: Proficient with Git and CI/CD pipelines.
**Soft Skills**
+ Excellent problem-solving and analytical abilities.
+ Strong communication and interpersonal skills, with the ability to explain complex technical concepts to non-technical stakeholders.
+ Ability to work effectively in a fast-paced, agile environment.
+ Proactive and self-motivated with a strong sense of ownership.
**Preferred Qualifications**
+ Experience with real-time data streaming and processing using PySpark Structured Streaming.
+ Knowledge of machine learning concepts and MLOps practices, especially integrating ML workflows with PySpark.
+ Familiarity with data visualization tools (e.g., Tableau, Power BI).
+ Contributions to open-source data projects.
------------------------------------------------------
**Job Family Group:**
Technology
------------------------------------------------------
**Job Family:**
Data Analytics
------------------------------------------------------
**Time Type:**
Full time
------------------------------------------------------
**Most Relevant Skills**
Please see the requirements listed above.
------------------------------------------------------
**Other Relevant Skills**
For complementary skills, please see above and/or contact the recruiter.
------------------------------------------------------
_Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law._
_If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review_ _Accessibility at Citi (https://www.citigroup.com/citi/accessibility/application-accessibility.htm)_ _._
_View Citi’s_ _EEO Policy Statement (https://www.citigroup.com/global/eeo-aa-policy)_ _and the_ _Know Your Rights (https://www.eeoc.gov/sites/default/files/2023-06/22-088\_EEOC\_KnowYourRights6.12ScreenRdr.pdf)_ _poster._
Citi is an equal opportunity and affirmative action employer.
Minority/Female/Veteran/Individuals with Disabilities/Sexual Orientation/Gender Identity.
Confirmar seu email: Enviar Email
Todos os Empregos de Citigroup