We've detected you're from the Netherlands, if you'd like you can view this page in Dutch!

View Dutch Stay here

We've detected you're from Germany, if you'd like you can view this page in German!

View German Stay here

Site Reliability Engineering Manager

Site Reliability Engineering Manager

Berlin, Permanent

This established tech company has made a name for itself in the navigation and location technology space. They innovated services in the driving process, to help companies with 3D maps, autonomous driving, navigation software, as well as traffic assessment to avoid congestion. Their vast client base includes Uber, and Apple, as well as leading car manufactures. Their engineering team levers Big Data,  Machine Learning and Computer Vision, to build and provide innovative services for their users.

Industry:

OQ-industries Mobility
OQ-industries Software
Tech

What to expect:

You’ll be joining the Service Platform unit and will manage the SRE team, and transform them from a group of talented individuals into a high performing team. You’ll be reporting directly to the Director of Service Platform, and will ensure your team builds a reliable infrastructure with automation and the relevant tooling to support it.  The SRE team works in a multi-cloud setup (AWS, Azure) and the company code base is primarily Java, C++, with the SRE team also using Golang, and Python. The focus of your team will be observability, SLIs, SLAs, SLOs, and the long-term goal is to embed them into product teams for short-term projects, so they can code alongside the developers and share their expertise.

 

 

#LI-PA1

Perks:

  • Competitive package + bonus
  • Be part of an organisation that takes Site Reliability Engineering seriously
  • Join a team where you can have a real impact, and help shape the trajectory
  • Work as part of an established company with ambitious goals

Requirements:

  • You have experience leading an SRE team
  • You have a proven track record of mentoring, talent building, and creating high performing teams
  • You have experience with Incident Management processes, debriefing incidents, and establishing corrective measures
  • You have a working knowledge of container orchestration technologies (K8s)
  • You have extensive knowledge of observability and related tools (Prometheus, Grafana, Scalyr, OpenTelemetry)
  • You have a strong understanding of SLI, SLA, SLO concepts
  • Programming experience in Java is a big plus
  • You have worked in AWS or Azure environments

Sounds good?

Apply now
For more information, connect with our specialised team member on LinkedIn Ali Morgan