Site Reliability Engineer

We at Orange Quarter are working with a client who have built a world-leading metasearch engine that is super fast and constantly optimized – enabling millions of travelers to compare hotel prices from hundreds of booking sites and find great deals in just a few clicks. They pride themselves on using cutting-edge technology, real-time auctions, and machine learning techniques with petabytes of data to create an experience – time and money saved!

Industry:

Travel

What to expect:

You will spend time working closely with engineering teams. You will be influencing them in the process of systems design, implementation, release flow creation.
You will define SLOs and configure monitoring and alerting of new services, to ensure stable and reliable operations.
Most of our applications are containerised. Nevertheless, we see room to standardise release processes and configuration management across our applications. Your objective will be to support us with that.
Educate the teams how to benefit from the standards agreed on across the organisation. Promote golden path solutions and discourage usage of inefficient custom-made implementations.
Collaborate with Platform and DX teams on proof of concepts that introduce new elements to our tech stack, like Kyverno, config-connector, Knative, Keda and other components that can improve our ecosystem.
Use terraform (infrastructure as code) to provision cloud components needed by our services. This applies to new services as well as some legacy systems that were provisioned using GCP console.

Perks:

Flexible working hours
Lunch budget
Relocation budget suited to you
Subsidized ticket for public transport
Free on-campus gym
Learning and Development budget

Requirements:

Solid Google Cloud Platform knowledge and experience
IaC experience – Terraform
Experience in using Istio Service Mesh
Understanding of at least one major programming language
Understanding of key concepts described in the SRE book by Google: Toil Reduction, Monitoring and Alerting, SLIs/SLOs/SLAs, Incident Management, Capacity Planning, Postmortem/Root cause analysis

Sounds good?

Apply now

For more information, connect with our specialised team member on LinkedIn Alex Price

Industry:

What to expect:

Perks:

Requirements:

Sounds good?

Links

Request for information

Submit role(s)

Share profile