Sugerencias de búsqueda:

lunes a viernes
auxiliar administrativo
recursos humanos
nutrióloga
auxiliar de almacén
liverpool
recepcionista
medio tiempo
sin experiencia
home office
nóminas
aeropuerto
administrativo
Mexico City
Ciudad de México
Jalisco
Estado de México
Ciudad de México
Municipio de Guadalajara
Sinaloa
Baja California
Cuautitlán Izcalli
Guerrero
Zapopan
San Luis Potosí
Postular

Site Reliability Engineer (SRE)

OnHires
México
Tiempo completo
hace 2 días

Position: Site Reliability Engineer (SRE) 
Location: Fully Remote (Offices in Limassol, Kyiv, London, Tbilisi)
Working Hours: Availability to work between 5 PM and 8 AM CET, in one of the following shifts: 17:00–01:00 or 00:00–08:00.

Company Overview:

Our client is one of the fastest-growing B2B iGaming solutions providers in Europe, with over 100 remote team members across the continent. They specialize in delivering high-quality software platforms, payment solutions integrations, marketing tools, and technical support to clients in the online casino and betting sectors. As they continue to expand, they are looking for a talented and growth-oriented individual to help enhance and streamline their infrastructure.

The company offers a dynamic and supportive environment where your input is valued and your professional growth is encouraged. Don’t miss the opportunity to join their exciting journey!

Role Overview:

As a Site Reliability Engineer (SRE), you will bridge the gap between development and operations to ensure that services and platforms remain reliable, scalable, and performant — even under high transaction volumes and regulatory requirements.

You will work closely with backend engineers, DevOps, InfoSec, and operational teams to build automation, improve observability, and respond to incidents.

Key Requirements:

Experience with AWS or hybrid data center setups

Reading logs and stacktraces to determine the root cause of incidents

Infrastructure as Code: Experience with Terraform, Helm, Ansible, (optional: Werf)

Linux administration and container orchestration (K8s) skills

Experience with monitoring/observability stacks: Prometheus, Grafana, ELK, Loki, etc.

Strong understanding of TCP/IP, DNS, and load balancers

Familiarity with incident response, postmortems, and blameless culture

Availability to work between 5 PM and 8 AM CET, in one of the following shifts: 17:00–01:00 or 00:00–08:00

Bonus Skills:

Background in high-throughput environments (e.g., financial, trading, iGaming)

Experience with CDNs, and real-time log aggregation

Proficiency in one or more scripting languages (Python, Bash, Go)

Knowledge of Java, PHP with their respective web-development frameworks

Hands-on experience with MSSQL, PostgreSQL, MongoDB, etc.

Exposure to Kafka, Redis, or other event-driven systems

Key Responsibilities:

Maintain and improve SLA/SLO/SLI metrics for critical systems (e.g., live games, sports betting, KYC, payments)

Manage and support highly available, scalable infrastructure (K8s, cloud, and bare metal)

Implement and manage monitoring, logging, and alerting systems (e.g., Prometheus, Grafana, Loki, ELK)

Automate deployments and operations using CI/CD pipelines (e.g., Jenkins, ArgoCD, Helm)

Conduct post-incident reviews, define action items, and reduce mean time to recovery (MTTR)

Participate in on-call rotation to ensure 24/7 system reliability

Secure infrastructure in line with regulations (e.g., player data integrity, jurisdictional compliance)

Collaborate with Dev, QA, DevOps, and Ops to improve services' stability and uptime

Success Metrics:

< 1% downtime for any user-/partner-facing services

SLO 99.95%

95% of infrastructure managed via code and automation

Documented runbooks and alert playbooks per service group

Why You'll Love Working Here:

International Team: Be part of a respectful, supportive, and goal-driven team.

Freedom & Responsibility: We trust you to take ownership of your work.

Сompetitive Salary: We offer competitive compensation based on your skills and experience.

Fully Remote: Work from anywhere, with optional access to our offices in Limassol, Kyiv, London, or Tbilisi.

Flexible Schedule: We measure performance, not time.

Unlimited Paid Time Off: Enjoy paid vacation and sick leave days for a great work-life balance.

Career Development: Opportunities for continuous learning and growth.

Team-Building & Fun: Enjoy awesome corporate parties and team-building events throughout the year.

Referral Bonuses: Earn rewards when you refer talented friends to join us.

Private Medical Insurance: Choose the right coverage for you, with full/partial compensation based on cost.

Flexible Benefits: Get compensated for activities and expenses like gym subscriptions, language courses, Netflix, spa days, etc.

Learning Foundation: Participate in our biannual raffle for the chance to learn something new outside of your role.

Guardar Postular
Reportar empleo
Otras recomendaciones de empleo:

Lead Reliability Engineer for Maintenance Engineering (Mexico City)

Smith's Group
Ciudad de México
  • Develop and lead, or facilitate and providing consultation...
  • Maintenance Engineering (equipment such as centrifugal...
hace 2 días

Lead Maintenance and Reliability Engineer (Mexico City)

Smith's Group
Ciudad de México
  • Develop and lead, or facilitate and providing consultation...
  • Maintenance Engineering (equipment such as centrifugal...
hace 1 semana

Site Reliability Engineer, Fleet Automation

Dropbox
The SRE team has major impact inside of Dropbox engineering from testing our disaster readiness and building our in-house...
hace 3 semanas

Site Reliability Engineer

Encora
México
  • Manage cloud infrastructure and services, primarily on AWS...
  • Strong knowledge of AWS services, including EC2, CloudWatch,...
hace 3 semanas

Design Engineer - Diagnostics

Walter P Moore
Ciudad de México
Walter P Moore has an immediate opening for a Design Engineer to join our dynamic Diagnostics Group The qualified candidate will...
hace 2 semanas

Senior Fullstack Software Engineer (Python)

KMS Technology
Municipio de Guadalajara, Jalisco
  • Develop and maintain both frontend and backend components of...
  • Implement new features and enhancements using Python and...
hace 3 días

Design and Realese Engineer Harness

Segula Technologies
Municipio de Toluca, Estado de México
  • Bachelor of Science degree in Mechanical or Electrical...
  • 3 years of transportation wiring design and release...
hace 2 semanas

Senior Software Triage Engineer (Senior Application Support Engineer) - REMOTE

SailPoint
  • Handling an 8am to 8pm US CST on call rotation every 7 weeks
  • Develop a deep technical understanding of SailPoint’s...
hace 2 semanas

Entry Level Site Reliability Engineer

Thomson Reuters
  • Diagnose site reliability issues from multiple sources (QA...
  • Driving efficiencies and reducing risks from a technology...
hace 2 semanas

Senior Site Reliability Engineer (DevOps Engineer)

Bright Machines
Cintalapa, Chiapas
  • Design, implement, and manage scalable, reliable, and secure...
  • Automate infrastructure provisioning, configuration...
hace 2 semanas