Position: Site Reliability Engineer (SRE)
Location: Fully Remote (Offices in Limassol, Kyiv, London, Tbilisi)
Working Hours: Availability to work between 5 PM and 8 AM CET, in one of the following shifts: 17:00–01:00 or 00:00–08:00.

Company Overview:

Our client is one of the fastest-growing B2B iGaming solutions providers in Europe, with over 100 remote team members across the continent. They specialize in delivering high-quality software platforms, payment solutions integrations, marketing tools, and technical support to clients in the online casino and betting sectors. As they continue to expand, they are looking for a talented and growth-oriented individual to help enhance and streamline their infrastructure.

The company offers a dynamic and supportive environment where your input is valued and your professional growth is encouraged. Don’t miss the opportunity to join their exciting journey!

Role Overview:

As a Site Reliability Engineer (SRE), you will bridge the gap between development and operations to ensure that services and platforms remain reliable, scalable, and performant — even under high transaction volumes and regulatory requirements.

You will work closely with backend engineers, DevOps, InfoSec, and operational teams to build automation, improve observability, and respond to incidents.

Key Requirements:

Experience with AWS or hybrid data center setups

Reading logs and stacktraces to determine the root cause of incidents

Infrastructure as Code: Experience with Terraform, Helm, Ansible, (optional: Werf)

Linux administration and container orchestration (K8s) skills

Experience with monitoring/observability stacks: Prometheus, Grafana, ELK, Loki, etc.

Strong understanding of TCP/IP, DNS, and load balancers

Familiarity with incident response, postmortems, and blameless culture

Availability to work between 5 PM and 8 AM CET, in one of the following shifts: 17:00–01:00 or 00:00–08:00

Bonus Skills:

Background in high-throughput environments (e.g., financial, trading, iGaming)

Experience with CDNs, and real-time log aggregation

Proficiency in one or more scripting languages (Python, Bash, Go)

Knowledge of Java, PHP with their respective web-development frameworks

Hands-on experience with MSSQL, PostgreSQL, MongoDB, etc.

Exposure to Kafka, Redis, or other event-driven systems

Key Responsibilities:

Maintain and improve SLA/SLO/SLI metrics for critical systems (e.g., live games, sports betting, KYC, payments)

Manage and support highly available, scalable infrastructure (K8s, cloud, and bare metal)

Implement and manage monitoring, logging, and alerting systems (e.g., Prometheus, Grafana, Loki, ELK)

Automate deployments and operations using CI/CD pipelines (e.g., Jenkins, ArgoCD, Helm)

Conduct post-incident reviews, define action items, and reduce mean time to recovery (MTTR)

Participate in on-call rotation to ensure 24/7 system reliability

Secure infrastructure in line with regulations (e.g., player data integrity, jurisdictional compliance)

Collaborate with Dev, QA, DevOps, and Ops to improve services' stability and uptime

Success Metrics:

< 1% downtime for any user-/partner-facing services

SLO 99.95%

95% of infrastructure managed via code and automation

Documented runbooks and alert playbooks per service group

Why You'll Love Working Here:

International Team: Be part of a respectful, supportive, and goal-driven team.

Freedom & Responsibility: We trust you to take ownership of your work.

Сompetitive Salary: We offer competitive compensation based on your skills and experience.

Fully Remote: Work from anywhere, with optional access to our offices in Limassol, Kyiv, London, or Tbilisi.

Flexible Schedule: We measure performance, not time.

Unlimited Paid Time Off: Enjoy paid vacation and sick leave days for a great work-life balance.

Career Development: Opportunities for continuous learning and growth.

Team-Building & Fun: Enjoy awesome corporate parties and team-building events throughout the year.

Referral Bonuses: Earn rewards when you refer talented friends to join us.

Private Medical Insurance: Choose the right coverage for you, with full/partial compensation based on cost.

Flexible Benefits: Get compensated for activities and expenses like gym subscriptions, language courses, Netflix, spa days, etc.

Learning Foundation: Participate in our biannual raffle for the chance to learn something new outside of your role.

Guardar Postular

Reportar empleo

Site Reliability Engineer (SRE)

Company Overview:

Role Overview:

Key Requirements:

Bonus Skills:

Key Responsibilities:

Success Metrics:

Why You'll Love Working Here:

Lead Reliability Engineer for Maintenance Engineering (Mexico City)

Lead Maintenance and Reliability Engineer (Mexico City)

Site Reliability Engineer, Fleet Automation

Site Reliability Engineer

Design Engineer - Diagnostics

Senior Fullstack Software Engineer (Python)

Design and Realese Engineer Harness

Senior Software Triage Engineer (Senior Application Support Engineer) - REMOTE

Entry Level Site Reliability Engineer

Senior Site Reliability Engineer (DevOps Engineer)