Important Information:
- Years of Experience: 5+ years of progressive experience in IT infrastructure engineering, software development, or equivalent technical fields; 2+ years of enterprise AWS cloud experience.
- Job Mode: Full-time.
- Work Mode: Remote.
Job Summary:
We are seeking a skilled Site Reliability Engineer (SRE) to join our team. In this role, you will focus on enhancing the reliability, scalability, and performance of our systems. The ideal candidate will possess strong expertise in monitoring, troubleshooting, automation, and AWS cloud services. You will work closely with cross-functional teams to ensure the smooth operation and continuous improvement of our infrastructure and services.
Responsibilities and Duties:
- Design, implement, and maintain monitoring and alerting systems using tools like OpenSearch, Prometheus, and Grafana.
- Analyze and address system performance issues, ensuring reliability and uptime.
- Automate operational processes, including deployments, incident response, and system maintenance.
- Collaborate with engineering teams to define and meet Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
- Conduct incident postmortems, identify root causes, and implement improvements to prevent recurrence.
- Manage cloud infrastructure and services, primarily on AWS (e.g., EC2, CloudWatch, Lambda).
- Utilize scripting languages such as Shell to build tools and enhance operational efficiency.
Qualifications and Skills:
Must Have:
- Strong knowledge of AWS services, including EC2, CloudWatch, and other monitoring tools.
- Experience with monitoring and logging systems (e.g., Prometheus, Grafana, OpenSearch).
- Proficiency in scripting (e.g., Shell) for automation and tooling.
- Solid understanding of Linux systems for log analysis and troubleshooting.
- Experience with system reliability, performance tuning, and capacity planning.
Optional but Preferable:
- Familiarity with infrastructure as code (e.g., Terraform, CloudFormation).
- Experience in Java backend development is a plus but not mandatory.
Ideal Candidate Traits:
- Quick learner with the ability to adapt to new technologies and processes.
- Proactive mindset, anticipating and addressing issues before they become critical.
- Action-oriented, with a hands-on approach to problem-solving and continuous improvement.
Encora is an equal opportunity employer, committed to fostering, cultivating and preserving a culture of diversity equality and inclusion. We embrace and encourage our employees and applicants/candidates' differences in age, color, disability, ethnicity, family or marital status, gender identity and/or expression, language, national origin, physical and mental ability, political affiliation, race, religion, sexual orientation, socioeconomic status, veteran status, and other characteristics that make our employees unique.
Every individual has the right to work in a professional atmosphere that promotes equal employment opportunities and prohibits discriminatory practices, including harassment of any kind.