Spark Engineer
CAI
Spark Engineer
**Req number:**
R6280
**Employment type:**
Full time
**Worksite flexibility:**
Remote
**Who we are**
CAI is a global technology services firm with over 8,500 associates worldwide and a yearly revenue of $1 billion+. We have over 40 years of excellence in uniting talent and technology to power the possible for our clients, colleagues, and communities. As a privately held company, we have the freedom and focus to do what is right—whatever it takes. Our tailor-made solutions create lasting results across the public and commercial sectors, and we are trailblazers in bringing neurodiversity to the enterprise.
**Job Summary**
As a Spark Engineer, you will design, build, and optimize large-scale data processing systems using Apache Spark. You will collaborate with data scientists, analysts, and engineers to ensure scalable, reliable, and efficient data solutions.
**Job Description**
We are looking for a **Spark Engineer** with deep expertise in distributed data processing, ETL pipelines, and performance tuning for high-volume data environments. This position will be **full-time** and **remote.**
**What You'll Do:**
+ Design, develop, and maintain big data solutions using Apache Spark (Batch and Streaming).
+ Build data pipelines for processing structured, semi-structured, and unstructured data from multiple sources.
+ Optimize Spark jobs for performance and scalability across large datasets.
+ Integrate Spark with various data storage systems (HDFS, S3, Hive, Cassandra, etc.).
+ Collaborate with data scientists and analysts to deliver robust data solutions for analytics and machine learning.
+ Implement data quality checks, monitoring, and alerting for Spark-based workflows.
+ Ensure security and compliance of data processing systems.
+ Troubleshoot and resolve data pipeline and Spar k job issues in production environments
**What You'll Need**
Required:
+ Bachelor’s degree in Computer Science, Engineering, or related field (Master’s preferred).
+ 3+ years of hands-on experience with Apache Spark (Core, SQL, Streaming).
+ Strong programming skills in Scala, Java, or Python (PySpark).
+ Solid understanding of distributed computing concepts and big data ecosystems (Hadoop, YARN, HDFS).
+ Experience with data serialization formats (Parquet, ORC, Avro).
+ Familiarity with data lake and cloud environments (AWS EMR, Databricks, GCP DataProc, or Azure Synapse).
+ Knowledge of SQL and experience with data warehouses (Snowflake, Redshift, BigQuery is a plus).
+ Strong background in performance tuning and Spark job optimization.
+ Experience with CI/CD pipelines and version control (Git).
+ Familiarity with containerization (Docker, Kubernetes) is an advantage.
Preferred **:**
+ Experience with stream processing frameworks (Kafka, Flink).
+ Exposure to machine learning workflows with Spark MLlib.
+ Knowledge of workflow orchestration tools (Airflow, Luigi).
**Physical Demands**
+ Ability to safely and successfully perform the essential job functions
+ Sedentary work that involves sitting or remaining stationary most of the time with occasional need to move around the office to attend meetings, etc.
+ Ability to conduct repetitive tasks on a computer, utilizing a mouse, keyboard, and monitor
**Reasonable accommodation statement**
If you require a reasonable accommodation in completing this application, interviewing, completing any pre-employment testing, or otherwise participating in the employment selection process, please direct your inquiries to application.accommodations@cai.io or (888) 824 – 8111.
Confirmar seu email: Enviar Email
Todos os Empregos de CAI