Service Reliability Engineer

Posted 2 months ago by Cyrus Bandani
Apply Now

Apply for this job

Job Description

As a member of the team at ECARX, you’ll play a pivotal role in the operational management and maintenance of cutting-edge telematics solutions. This role demands a blend of technical prowess, collaborative skills, and a strong commitment to operational excellence to ensure the smooth running and continuous improvement of the telematics platform.

Key Responsibilities:

Configure and manage cloud infrastructure across multiple providers.
Deploy, maintain, and operate telematics applications within a Kubernetes environment.
Collaborate with cross-functional teams and platform users to gather requirements, ensuring seamless operations and service delivery.
Implement and support the continuous integration and deployment (CI/CD) pipeline.
On-call support is a part of this role.
Monitor, alarm, configure, and visualize for IaaS & PaaS layers to ensure optimal performance and reliability.
Develop automation scripts and tools to streamline operational tasks and improve efficiency using GCP and AWS suite of services.
Ensure the reliability and availability of GCP and AWS through best practices in monitoring, alerting, and incident resolution.
Optimize system performance and efficiency by identifying and addressing performance bottlenecks.
Manage costs effectively through resource optimization, committed use discounts, and cost allocation tags.
Implement security best practices and compliance requirements across GCP and AWS environments.
Design and implement disaster recovery solutions for high availability and data protection.
Define and monitor Service Level Objectives (SLOs) to ensure reliability and performance targets are met.

Technologies at ECARX:

Programming Languages: Python, Go
Cloud Technologies: GCP, AWS, Rancher, Kubernetes, Istio, Prometheus, Grafana, Vault
DevOps Tools: Terraform, GitHub Actions, ArgoCD, Helm
Platform as a Service (PaaS): Databricks, Snowflake

Who You Are:

EcarX on the lookout for a proactive, solution-focused individual with expertise in:

Cloud services (GCP, AWS, or similar) and the implementation of scalable, reliable infrastructure using an Infrastructure as Code (IaC) approach.
Kubernetes for deploying and managing applications.
CI/CD methodologies and software development practices.

Preferred Experience:

Knowledge in networking, security, and ensuring reliability within enterprise cloud solutions.
Proficiency in Continuous Deployment tools (GitHub Actions, ArgoCD, Helm).
Experience with Big Data technologies (e.g., Kafka, MQTT, Spark, Ray, Arrow) and storage solutions (e.g., Data Lake, Data Warehouse).


At least 5 years of relevant experience.
Familiarity with Agile methodologies.
Fluent in English, both in writing and speaking.
A proactive problem-solver and a team player with a flexible approach to work.