Site Reliability Engineer at Andela

Hiring Kenya

Blogger

Related Jobs

Accounting

Alrais Holding Careers 2026 Walk in Interview in Dubai

Information Technology

ICT System Engineer at Tatu City Limited

Information Technology

ICT Technician at Tatu City Limited

Jobs

Andela

5 Years

Nairobi, Kenya

or Register to apply for this job

Posted:11 hours ago By:Hiring Kenya

Company Details

Name:Andela

Industry: Information Technology and Services

Website: https://andela.com/

Description: Andela provides companies with access to the top 1% of global tech talent. We identify high-potential developers on the African continent, shape them into world-class technical leaders, and pair them with companies as full-time, distributed team members. Accelerate your product roadmap while minimizing time spent interviewing, on-boarding, and training new hires. Andela is backed by investors including Chan Zuckerberg Initiative, GV, Spark Capital, Omidyar Network, Susa Ventures, Steve Case, Founder Collective, Learn Capital and more. Learn more about us at www.andela.com. Specialties

Job Description

The Senior Site Reliability Engineer is a technical leadership role responsible for designing, implementing, and maintaining highly available, scalable, and secure infrastructure for banking applications, including Mobile Banking and Internet Banking platforms on on-premise infrastructure. This role leads SRE initiatives, mentors junior engineers, drives continuous improvement in production support, and leads observability strategy using OpenShift, Kubernetes, Prometheus, Grafana, and ELK Stack on on-premise data center infrastructure.

Key Responsibilities

Design and architect a highly available and scalable OpenShift/Kubernetes infrastructure for banking applications on on-premise servers
Lead and implement a comprehensive monitoring and observability strategy using Prometheus and Grafana
Design and oversee centralized logging infrastructure using ELK Stack (Elasticsearch, Logstash, Kibana)
Lead SRE best practices implementation and adoption of production support standards across teams
Mentor and coach junior SRE and DevOps engineers on OpenShift, Kubernetes, monitoring, and production support
Define and implement Service Level Indicators (SLIs), Objectives (SLOs), and Agreements (SLAs) with measurable metrics
Lead incident response strategy, post-incident reviews, and drive continuous improvement in production stability
Architect and implement advanced alerting, monitoring dashboards, and visualization strategies using Prometheus and Grafana
Design automation frameworks and tools to reduce operational toil and improve production efficiency
Lead OpenShift/Kubernetes cluster upgrades, security patches, and infrastructure modernization on-premise
Establish production support procedures, on-call rotation policies, and escalation frameworks
Optimize system performance, cost, and resource utilization across containerized on-premise infrastructure
Conduct capacity planning, performance optimization, and infrastructure scaling initiatives
Lead technical architecture reviews and infrastructure design decisions for banking applications
Manage on-premise data center resources and infrastructure planning
Participate in 24/7 on-call rotation and escalation for critical production incidents
Ensure compliance, security hardening, and disaster recovery procedures for financial systems

Qualifications

BSc in Computer Science, Information Technology, Software Engineering, or related field
5+ years of hands-on SRE, DevOps, or Production Engineering experience
3+ years of experience leading SRE teams or managing production support operations
3+ years of hands-on experience managing OpenShift and Kubernetes infrastructure on on-premise infrastructure
Expert-level experience with Prometheus for monitoring and alerting in production
Expert-level experience with Grafana for creating comprehensive monitoring dashboards
Advanced experience with ELK Stack (Elasticsearch, Logstash, Kibana) for logging and log analysis
Proven experience designing and scaling production systems for high-traffic banking applications
Deep expertise in Linux/Unix system administration and container networking
Advanced knowledge of CI/CD automation and deployment strategies
Hands-on experience with database management, tuning, and optimization on-premises
Strong experience with infrastructure automation and Infrastructure as Code
Proven 24/7 production support experience in mission-critical environments
Experience managing on-premise data center infrastructure
Proven leadership skills and ability to mentor junior engineers
Excellent communication skills and ability to present to executive stakeholders
Experience in financial services or banking sector is highly preferred

Salary: Discuss During Interview

Education: Diploma

Employment Type: Remote

Key Skills

informationtechnology

Pinterest Twitter More

Report This Ad Back to Job Listings

Beware of Fraudsters!
Never pay anyone for job applications, interview tests, or job interviews. A genuine employer will never ask you for payment under any circumstances.

Disclaimer & TOS: We do not guarantee the authenticity of every single job posting and are not responsible for any fraudulent activity or misrepresentation by third parties. We are not involved in any stage of the interview or recruitment process and do not charge any fees from job seekers. For further details, please read the rest of the Terms of Service.

Related Jobs

Jobs

Site Reliability Engineer at Andela

General

Browse Semasocial

Directories

Resources