REMOTE, PHL
8 days ago
Spark Engineer
Spark Engineer **Req number:** R6280 **Employment type:** Full time **Worksite flexibility:** Remote **Who we are** CAI is a global technology services firm with over 8,500 associates worldwide and a yearly revenue of $1 billion+. We have over 40 years of excellence in uniting talent and technology to power the possible for our clients, colleagues, and communities. As a privately held company, we have the freedom and focus to do what is right—whatever it takes. Our tailor-made solutions create lasting results across the public and commercial sectors, and we are trailblazers in bringing neurodiversity to the enterprise. **Job Summary** As a Spark Engineer, you will design, build, and optimize large-scale data processing systems using Apache Spark. You will collaborate with data scientists, analysts, and engineers to ensure scalable, reliable, and efficient data solutions. **Job Description** We are looking for a **Spark Engineer** with deep expertise in distributed data processing, ETL pipelines, and performance tuning for high-volume data environments. This position will be **full-time** and **remote.** **What You'll Do:** + Design, develop, and maintain big data solutions using Apache Spark (Batch and Streaming). + Build data pipelines for processing structured, semi-structured, and unstructured data from multiple sources. + Optimize Spark jobs for performance and scalability across large datasets. + Integrate Spark with various data storage systems (HDFS, S3, Hive, Cassandra, etc.). + Collaborate with data scientists and analysts to deliver robust data solutions for analytics and machine learning. + Implement data quality checks, monitoring, and alerting for Spark-based workflows. + Ensure security and compliance of data processing systems. + Troubleshoot and resolve data pipeline and Spar k job issues in production environments **What You'll Need** Required: + Bachelor’s degree in Computer Science, Engineering, or related field (Master’s preferred). + 3+ years of hands-on experience with Apache Spark (Core, SQL, Streaming). + Strong programming skills in Scala, Java, or Python (PySpark). + Solid understanding of distributed computing concepts and big data ecosystems (Hadoop, YARN, HDFS). + Experience with data serialization formats (Parquet, ORC, Avro). + Familiarity with data lake and cloud environments (AWS EMR, Databricks, GCP DataProc, or Azure Synapse). + Knowledge of SQL and experience with data warehouses (Snowflake, Redshift, BigQuery is a plus). + Strong background in performance tuning and Spark job optimization. + Experience with CI/CD pipelines and version control (Git). + Familiarity with containerization (Docker, Kubernetes) is an advantage. Preferred **:** + Experience with stream processing frameworks (Kafka, Flink). + Exposure to machine learning workflows with Spark MLlib. + Knowledge of workflow orchestration tools (Airflow, Luigi). **Physical Demands** + Ability to safely and successfully perform the essential job functions + Sedentary work that involves sitting or remaining stationary most of the time with occasional need to move around the office to attend meetings, etc. + Ability to conduct repetitive tasks on a computer, utilizing a mouse, keyboard, and monitor **Reasonable accommodation statement** If you require a reasonable accommodation in completing this application, interviewing, completing any pre-employment testing, or otherwise participating in the employment selection process, please direct your inquiries to application.accommodations@cai.io or (888) 824 – 8111.
Confirmar seu email: Enviar Email
Todos os Empregos de CAI