Atlanta, GA, 30309, USA
1 day ago
Director, Data Engineer
**Location:** Atlanta, GA (Global HQ) **Estimated Travel:** 0-20% (OU, Bottler meetings, industry conferences) **Direct Reports** : None The **Global Equipment Platforms (GEP)** team is seeking a highly skilled and experienced Lead Data Engineer to be a core technical contributor to the data backbone for The Coca-Cola Company's global fleet of 17MM+ connected equipment. Reporting to the Head of Data within GEP Digital, this role is pivotal in transforming raw telemetry data from beverage vending machines, dispensers, coolers, and retail racks into a strategic asset that fuels real-time market insights, predictive analytics, and operational efficiencies across our global ecosystem. You will be responsible for designing, building, and maintaining robust, secure, and cost-optimized cloud-based data pipelines and platforms, primarily leveraging Microsoft Azure. This includes hands-on development of scalable data ingestion, transformation, and storage solutions capable of handling high-volume, real-time data from a diverse fleet of equipment running on the KO Operating System (KOS) and other embedded systems. This role demands a deep technical expert with a proven track record of solving complex data challenges, ensuring data integrity, scalability, and accessibility for both internal stakeholders and our 200+ global franchise bottlers and OEM partners in a multi-tenant environment. **Key Responsibilities:** **Data Pipeline & Platform Development (40%):** + Design, develop, and maintain highly scalable, secure, and resilient data pipelines (batch, streaming, real-time) and data platforms on Microsoft Azure (e.g., Azure Data Factory, Azure Synapse Analytics, Azure Databricks, Event Hubs, IoT Hub, ADLS Gen2, Cosmos DB, Azure SQL). + Implement robust data ingestion processes to collect high-volume telemetry data from 17MM+ connected devices, ensuring data quality and reliability at source. + Develop efficient data transformation logic and data models to rationalize, cleanse, and enrich raw equipment data, making it ready for consumption by analytics and AI applications. + Optimize data storage solutions (data lakes, data warehouses) for performance, cost-efficiency, and accessibility. + Design and build scalable, resilient data pipelines for real-time telemetry, ensuring data quality and accessibility that directly fuels the analytical models developed by Data Scientists and the production AI systems deployed by the Lead AI Engineer. **Data Foundation for AI/ML & Analytics (25%):** + Engineer the data foundation required to support advanced analytics, machine learning models (e.g., predictive maintenance, demand forecasting, personalization), and AI Agents. + Work closely with Data Scientists and Data Analysts to understand their data needs, ensure data quality, and optimize data structures for efficient model training and inference. + Develop and maintain curated datasets and data marts that simplify data consumption for business intelligence tools and internal/external analytics applications. + Ensure the seamless flow of rich, rationalized telemetry data from KOS-powered devices to the core analytics platform. **Data Governance, Quality & Compliance (20%):** + Contribute to the implementation and enforcement of data governance frameworks, data quality standards, and data integrity checks for the equipment data. + Develop and implement solutions to monitor data pipeline health, identify data anomalies, and proactively address data quality issues. + Ensure the data platform adheres to global data privacy regulations (e.g., GDPR, CCPA) and TCCC's internal security protocols, especially for multi-tenant data access by bottlers and OEMs. + Implement robust logging, monitoring, and alerting for data pipelines and data platform components. **Cross-Functional Collaboration & OEM Integration (15%):** + Collaborate closely with the Global Product Owner for Unified IoT, GEP Hardware/Software Engineering, Enterprise Digital Technology Solutions (IT), and Experience Design teams to ensure seamless data integration from connected devices. + Work with equipment OEMs (e.g., Lancer, Cornelius, True, Imbera) to integrate their telemetry systems and ensure data capture adheres to TCCC's standards. + Contribute to the data requirements and solutions for Over-The-Air (OTA) updates of firmware, software, and content to the equipment fleet. + Participate in code reviews, design discussions, and knowledge sharing within the data engineering team and broader GEP digital organization. + Work closely with the Product Owner, Unified Ecosystem for new telemetry requirements driven by product features, ensuring seamless integration of new data sources. + Partner with Data Scientists to optimize data structures and access patterns for efficient model training, feature engineering, and inference, and with Lead AI Engineers for deployment of data-intensive AI solutions. **Key Deliverables:** + Well-architected, highly performant, and reliable data pipelines (batch and streaming). + Optimized data schemas, models, and curated datasets supporting GEP analytics and AI initiatives. + Automated data quality checks and monitoring solutions for critical equipment data. + Clean, accessible, and high-quality data sets for consumption by data scientists, analysts, and business users. + Robust and secure data integrations with various equipment types and OEM systems. + Comprehensive documentation of data pipelines, schemas, and data lineage. **Decision Rights:** + Technical design and implementation details for assigned data pipelines and data solutions within established architectural guidelines. + Selection of specific data engineering tools and libraries (within approved Azure ecosystem). + Prioritization of individual tasks and problem-solving approaches for specific data challenges. **Required Experience & Qualifications:** + Bachelor's degree in Computer Science, Engineering, Information Systems, or a related quantitative field. Master's degree preferred. + 10+ years of hands-on experience in data engineering, with a strong focus on building and operating large-scale data platforms. + Expert-level proficiency in designing, building, and operating data pipelines and data solutions in Microsoft Azure (e.g., Azure Data Factory, Azure Synapse Analytics, Azure Databricks, Event Hubs, IoT Hub, ADLS Gen2). + Deep experience with real-time data streaming architectures and technologies (e.g., Kafka, Azure Event Hubs/IoT Hub). + Strong programming skills in Python (preferred), Scala, or Java; expert in SQL. + Extensive experience with big data technologies and distributed computing frameworks (e.g., Spark). + Solid understanding of data warehousing concepts, dimensional modeling, and data lake architectures. + Proven track record of implementing robust data quality and security measures. + Experience with CI/CD practices for data pipelines and infrastructure as code (e.g., Terraform, ARM templates). + Familiarity with IoT, connected devices, embedded systems, and telemetry data. **Competencies:** + **Technical Mastery** : Possesses deep, hands-on expertise in data engineering principles, tools, and best practices, continuously expanding knowledge. + **Complex Problem Solver** : Ability to analyze complex data challenges, identify root causes, and develop effective, scalable solutions. + **Results-Oriented Execution** : Delivers high-quality data solutions efficiently and effectively, with a strong focus on operational excellence. + **Collaborative Team Player** : Works seamlessly with cross-functional teams (data scientists, product managers, engineers) and external partners. + **Detail-Oriented & Accountable** : Ensures accuracy, integrity, and reliability of data assets, taking full ownership of assigned tasks and outcomes. + **Continuous Improvement Mindset** : Proactively identifies opportunities for process optimization, efficiency gains, and technical enhancements within the data ecosystem. **Success is measured by:** + **Data Pipeline Performance & Reliability** : Achievement of SLAs for data pipeline latency, availability (e.g., 99.9% uptime), and data freshness. + **Data Quality & Integrity** : Accuracy and completeness of critical data sets; reduction in data-related incidents. + **Scalability & Efficiency** : Contribution to cost optimization of data processing and storage; seamless onboarding of new data sources/equipment types. + **Support for Insights & AI** : Timely and accurate delivery of data to enable the development and deployment of AI/ML models and analytics dashboards. + **Code Quality & Maintainability** : Adherence to coding standards, robust documentation, and ease of maintenance for developed data solutions. **What We Can Do for You:** + **Iconic & Innovative Brands** : Our portfolio represents over 250 products with some of the most popular brands in the world, including Coca-Cola, Simply, Fairlife & Topo Chico. + **Expansive & Diverse Customers** : We work with a diversified group of customers which range from retail & grocery outlets, theme parks, movie theatres, restaurants, and many more each day. **Skills:** Python (Programming Language); Big Data Platforms; Microsoft Azure; Azure Data Factory; Data Engineering; Azure Synapse Analytics; Azure IoT Hub; Structured Query Language (SQL) All persons hired will be required to verify identity and eligibility to work in the United States and to complete the required employment eligibility verification form (Form I-9) upon hire. Pay Range:$149,000 - $173,000 Base pay offered may vary depending on geography, job-related knowledge, skills, and experience. A full range of medical, financial, and/or other benefits, dependent on the position, is offered. Annual Incentive Reference Value Percentage:30 Annual Incentive reference value is a market-based competitive value for your role. It falls in the middle of the range for your role, indicating performance at target. We are an Equal Opportunity Employer and do not discriminate against any employee or applicant for employment because of race, color, sex, age, national origin, religion, sexual orientation, gender identity and/or expression, status as a veteran, and basis of disability or any other federal, state or local protected class.
Confirmar seu email: Enviar Email