We've detected you're from the Netherlands, if you'd like you can view this page in Dutch!

View Dutch Stay here

We've detected you're from Germany, if you'd like you can view this page in German!

View German Stay here

Expert Site Reliability Engineer

Expert Site Reliability Engineer

Berlin, Permanent

This established tech company has made name for itself in the navigation and location technology space. They innovated services in the driving process, to help companies with 3D maps, autonomous driving, navigation software, as well as traffic assessment to avoid congestion. Their vast client base includes Uber, and Apple, as well as leading car manufactures. Their engineering team levers Big Data,  Machine Learning and Computer Vision, to build and provide innovative services for their users.

Industry:

OQ-industries Software
Tech

What to expect:

You’ll be joining at a time when the concept of SRE is being revolutionized within the company. You will be part of a team of 5, and will own the products around observability and monitoring. You will work with SLIs, SLOs, SLAs, and will support with defining the SRE strategy, as well as roadmap. You’ll initally be working with K8s on AWS/ Azure, and will build an internal cluster to provide distributed tracing. Part of your role will be to ensure that critical services have a well-configured monitoring and logging system. You’ll share your expertise with the development teams, and will write code to improve the performance of services. You will act as Incident Commander for services provided to the end customer on a large scale, and will steer the investigation, perform debriefs and analyse preventative measures, and get hands-on to improve the TTR. The SRE setup in this company is fairly mature, but there are still improvements to be made until they reach their goal of embedding SREs into product teams on a project basis. You’ll have the opportunity to take the initiative to build things, have an impact, and play a pivotal role in helping achieve this goal.

Perks:

  • Work as part of an established company with ambitious goals
  • Join a team where you can have a real impact, and help shape the trajectory
  • Be part of an organisation that takes Site Reliabiltiy Engineering seriously
  • Competitive package + bonus

Requirements:

  • You have built and run large scale infrastructure on Kubernetes on public cloud (AWS or Azure)
  • You have proven experience around the SLI, SLA, SLO concepts
  • You have expereince of incident management processes, as well as incident debriefs, and establishing corrective and preventive measures
  • You understand the three pillars of observability (monitoring, logging, tracing)
  • You have experience working as a backend developer (JVM, Python, Go)
  • You are famililar with Infrastructure as Code, as well as configuration management technologies
  • Strong Linux/ Unix experience

Sounds good?

Apply now
For more information, connect with our specialised team member on LinkedIn Poppy Ashmore