Thales is a global technology leader trusted by governments, institutions, and enterprises to tackle their most demanding challenges. From quantum applications and artificial intelligence to cybersecurity and 6G innovation, our solutions empower critical decisions rooted in human intelligence. Operating at the forefront of aerospace and space, cybersecurity and digital identity, we’re driven by a mission to build a future we can all trust.
In Singapore, Thales has been a trusted partner since 1973, originally focused on aerospace activities in the Asia-Pacific region. With 2,000 employees across three local sites, we deliver cutting-edge solutions across aerospace (including air traffic management), defence and security, and digital identity and cybersecurity sectors. Together, we’re shaping the future by enabling customers to make pivotal decisions that safeguard communities and power progress.KEY ACTIVITIES AND RESPONSIBILITIES
As a Level 2 Engineer, you are accountable for:
Operational Support
Lead and coordinate level 2 support operations for mission-critical applications and infrastructureProvide troubleshooting and diagnostics for incidents escalated from level 1Ensure adherence to SLA, system availabilityIncident & Problem Management
Act as incident manager for P1/P2 issuesCoordinate resolution and communicationsPerform root cause analysis and recommend permanent fixesEscalate unresolved issues that required software coding to Level 3 or engineering teamsChange Management
Perform operational impact assessmentPart of the CAB to review and approve changePre-Change Preparation such as review Change Request and Release PlanSupervise post-change production verificationDocumentation update and knowledge transferPost change review and feedbackPatch Management
Perform patch management readinessStakeholder coordination and team coordinationSystem Readiness and Post-Patch ValidationDocumentation update and knowledge transferCompliance and audit readinessDocumentation and Compliance
Operational documentation. SOPs, Incident response checklist, RCA, PIR, monitoring and alert guidebookConfiguration & Infrastructure Documentation. System configuration baseline, application dependency maps, environment inventories such as hosts, services, accountsKnowledge Base Articles for level 2 enablement and faster resolution e.g. Known Errors and Fixes, Frequent How-To Guides, Script Repositories, Lessons LearnedKnowledge ManagementConfiguration Management
Perform validation and accuracy of configurationsMaintain readiness of operational documentationPerform audit to confirm compliance of configurationsCMDB asset verificationChange-linked configuration trackingEnsure environment consistency between DEV – IVVQ – ISO-PROD – UAT and PRODTesting and Verification
Ensure operational readiness testing before production deployment rolloutEnsure post-change verification coordinationPerform regression and sanity test following patching or upgrades, in UAT and PRODParticipation in user acceptance testingKnowledge Management
Documentation of resolutionKnowledge Base ContributionValidation of knowledgeSubject Matter Expertise SharingRoot Cause Analysis
Gather logs, system metrics at the time of failureReproduction of issues in a controlled environment to understand the conditions under which it occursDetermine the scope and severity in terms of the systems affected, downtime duration and business impactNarrow down the possible sources of causing the failureUse of diagnostic tools such to analyse the application behaviourCorrelation of events to sequence the chain of events leading up to the failure and identify the dependenciesKAST (Kubernetes Analytics Stack)
THALES proprietary Kubernetes-based platform that provides a foundational digital infrastructure across Thales business domainKubernetes
Kubernetes is an open-source platform developed by Google for automating the deployment, scaling, and management of containerized applications (typically Docker containers).Docker
Docker Compose is a tool for defining and running multi-container Docker applications using a single configuration file (docker-compose.yml). It allows you to define, manage, and run multiple interconnected Docker containers as a single service stack.Kafka
Apache Kafka is a high-performance distributed streaming platform used for building real-time data pipelines, stream processing, and event-driven architectures.EMQX
EMQX is an MQTT broker that acts as a message middleware between publishers (e.g., sensors, devices) and subscribers (e.g., apps, dashboards, databases) using the MQTT protocol, which is a lightweight publish-subscribe messaging protocol ideal for low-bandwidth, high-latency, or constrained devices.Elasticsearch
Elasticsearch is a distributed, open-source search and analytics engine built on top of Apache Lucene. It is widely used for full-text search, log and event data analysis, and real-time data exploration.MinIO
MinIO is a high-performance, distributed object storage system that stores data as objects (like files, images, videos, backups) in bucketsZookeeper
Apache ZooKeeper is an open-source coordination service for distributed applications. It provides a highly reliable, consistent, and available mechanism to store metadata, configuration, and state information. It complements Apache Kafka by acting as a metadata management and coordination layer in Kafka’s traditional architecture. ZooKeeper ensures reliability, consistency, and fault-tolerance in Kafka’s distributed setup.Sparks
Apache Spark is an open-source, distributed computing system designed for fast, large-scale data processing. It was built for performance, especially for iterative algorithms in data science and machine learning.RHEL
RHEL is a certified Linux operating system optimized for reliability, scalability, and security in business and production environments.Ansible
Ansible is an open-source IT automation tool developed by Red Hat that simplifies the management of servers, applications, and infrastructure. It allows DevOps and system administrators to automate tasks such as configuration management, software deployment, and orchestration. It uses simple, human-readable YAML files (called playbooks) and SSHPrometheus
Open-source monitoring and alerting toolkit that is used to collect, store and query metrics, for the monitoring of infrastructure, services, containers and microservicesGrafana
Open-source analytics and visualization platform used for monitoring, observability, and alerting. Commonly used with Prometheus
KEY KNOWLEDGE AND EXPERIENCE
To be successful in your role, you will have demonstrated and/or acquired the following knowledge and experience:
Education and Experience
Bachelor Degree in Information Technology, Computer Science, Engineering, or a closely related disciplineAt least 5 years in Level 2 support for mission critical 24x7 production support, preferably in public sectorAt least 2 years in a team lead or supervisory role, coordinating tasks and mentoring junior engineersProven experience in handling P1/P2 incidents, managing post-incident reviews (PIRs) and root cause analysisPreferably certification in Red Hat Enterprise Linux or KubernetesKnowledge / Skills
Operating Systems. RHEL (90%) and Windows Server (10%)Networking FundamentalsMiddleware & Infrastructure (Web Server – Nginx, App Servers – Kubernetes with containers (Docker + Spring Boot)Message Queues (IBM MQ, Kafka)Database (SQL Server, PostgreSQL)ITIL/ITSM Process KnowledgeSecurity AwarenessDR and HA conceptsStrong Technical SkillsLeadership & CoordinationCommunication & CollaborationOperational GovernanceAt Thales, we’re committed to fostering a workplace where respect, trust, collaboration, and passion drive everything we do. Here, you’ll feel empowered to bring your best self, thrive in a supportive culture, and love the work you do. Join us, and be part of a team reimagining technology to create solutions that truly make a difference – for a safer, greener, and more inclusive world.