Chennai, IND
1 day ago
Senior Data Engineer - Python & Pyspark
The Senior Data Engineer will be responsible for the architecture, design, development, and maintenance of our data platforms, with a strong focus on leveraging Python and PySpark for data processing and transformation. This role requires a strong technical leader who can work independently and as part of a team, contributing to the overall data strategy and helping to drive data-driven decision-making across the organization. **Key Responsibilities** + **Data Architecture & Design:** Design, develop, and optimize data architectures, pipelines, and data models to support various business needs, including analytics, reporting, and machine learning. + **ETL/ELT Development (Python/PySpark Focus):** Build, test, and deploy highly scalable and efficient ETL/ELT processes usingPython and PySpark to ingest, transform, and load data from diverse sources into data warehouses and data lakes. Develop and optimize complex data transformations using PySpark. + **Data Quality & Governance:** Implement best practices for data quality, data governance, and data security to ensure the integrity, reliability, and privacy of our data assets. + **Performance Optimization:** Monitor, troubleshoot, and optimize data pipeline performance, ensuring data availability and timely delivery, particularly for PySpark jobs. + **Infrastructure Management:** Collaborate with DevOps and MLOps teams to manage and optimize data infrastructure, including cloud resources (AWS, Azure, GCP), databases, and data processing frameworks, ensuring efficient operation of PySpark clusters. + **Mentorship & Leadership:** Provide technical guidance, mentorship, and code reviews to junior data engineers, particularly in Python and PySpark best practices, fostering a culture of excellence and continuous improvement. + **Collaboration:** Work closely with data scientists, analysts, product managers, and other stakeholders to understand data requirements and deliver solutions that meet business objectives. + **Innovation:** Research and evaluate new data technologies, tools, and methodologies to enhance our data capabilities and stay ahead of industry trends. + **Documentation:** Create and maintain comprehensive documentation for data pipelines, data models, and data infrastructure. **Qualifications** **Education** + Bachelor's or Master's degree in Computer Science, Software Engineering, Data Science, or a related quantitative field. **Experience** + 5+ years of professional experience in data engineering, with a strong emphasis on building and maintaining large-scale data systems. + Extensive hands-on experience with Python for data engineering tasks. + Proven experience with PySpark for big data processing and transformation. + Proven experience with cloud data platforms (e.g., AWS Redshift, S3, EMR, Glue; Azure Data Lake, Databricks, Synapse; Google BigQuery, Dataflow). + Strong experience with SQL and NoSQL databases (e.g., PostgreSQL, MySQL, MongoDB, Cassandra). + Extensive experience with distributed data processing frameworks, especially Apache Spark. **Technical Skills** + Programming Languages: Expert proficiency inPython is mandatory. Strong SQL mastery is essential. Familiarity with Scala or Java is a plus. + Big Data Technologies: In-depth knowledge and hands-on experience withApache Spark (PySpark) for data processing, including Spark SQL, Spark Streaming, and DataFrame API. Experience with Apache Kafka, Apache Airflow, Delta Lake, or similar technologies. + Data Warehousing: In-depth knowledge of data warehousing concepts, dimensional modeling, and ETL/ELT processes. + Cloud Platforms: Hands-on experience with at least one major cloud provider (AWS, Azure, GCP) and their data services, particularly those supporting Spark/PySpark workloads. + Containerization: Familiarity with Docker and Kubernetes is a plus. + Version Control: Proficient with Git and CI/CD pipelines. **Soft Skills** + Excellent problem-solving and analytical abilities. + Strong communication and interpersonal skills, with the ability to explain complex technical concepts to non-technical stakeholders. + Ability to work effectively in a fast-paced, agile environment. + Proactive and self-motivated with a strong sense of ownership. **Preferred Qualifications** + Experience with real-time data streaming and processing using PySpark Structured Streaming. + Knowledge of machine learning concepts and MLOps practices, especially integrating ML workflows with PySpark. + Familiarity with data visualization tools (e.g., Tableau, Power BI). + Contributions to open-source data projects. ------------------------------------------------------ **Job Family Group:** Technology ------------------------------------------------------ **Job Family:** Data Analytics ------------------------------------------------------ **Time Type:** Full time ------------------------------------------------------ **Most Relevant Skills** Please see the requirements listed above. ------------------------------------------------------ **Other Relevant Skills** For complementary skills, please see above and/or contact the recruiter. ------------------------------------------------------ _Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law._ _If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review_ _Accessibility at Citi (https://www.citigroup.com/citi/accessibility/application-accessibility.htm)_ _._ _View Citi’s_ _EEO Policy Statement (https://www.citigroup.com/global/eeo-aa-policy)_ _and the_ _Know Your Rights (https://www.eeoc.gov/sites/default/files/2023-06/22-088\_EEOC\_KnowYourRights6.12ScreenRdr.pdf)_ _poster._ Citi is an equal opportunity and affirmative action employer. Minority/Female/Veteran/Individuals with Disabilities/Sexual Orientation/Gender Identity.
Confirmar seu email: Enviar Email
Todos os Empregos de Citigroup