About Position:
As a Site Reliability Engineer (SRE), you will play a crucial role in maintaining the reliability and performance of our infrastructure and applications. You will collaborate with cross-functional teams to ensure the availability, scalability, and security of our services. This role requires a strong background in DevOps practices, cloud environments, and programming languages. You will also provide mentorship and support to less experienced engineers, fostering a culture of continuous improvement and innovation.
- Role: Site Reliability Engineer
- Location: Hyderabad
- Experience: 6+ Years
- Job Type: Full Time
What You’ll Do:
- Establish and implement production readiness practices to minimize risks and enhance reliability throughout the software development lifecycle.
- Develop and enforce best practices to ensure robust, high-performance systems.
- Work closely with application development teams to incorporate their feedback and improve the developer experience.
- Present plans and proposals to Engineering Leadership, clearly communicating strategies and recommendations.
- Develop and maintain tools and frameworks to manage infrastructure and deployments, primarily using Python.
- Build tools to automate processes, reduce manual toil, and increase operational efficiency.
- Lead projects and initiatives within the SRE team, ensuring alignment with shared objectives and roadmaps.
- Provide guidance and mentorship to less senior engineers, fostering a collaborative and supportive environment.
- Manage and optimize cloud environments across multiple platforms such as Azure, AWS, Google Cloud, or IBM Cloud.
- Implement infrastructure as code using Terraform and maintain infrastructure fundamentals in a multi-region microservice architecture.
- Participate in on-call rotations to provide elevated support to application teams, ensuring the rapid resolution of production incidents.
- Implement and manage observability platforms such as New Relic, Datadog, etc., to ensure system health and performance.
- Use monitoring tools to proactively identify and address issues before they impact users.
Expertise You’ll Bring:
- High level of competency with at least two cloud platforms, ideally multiple, such as Azure, AWS, Google Cloud, or IBM Cloud.
- Production-level experience with Terraform, Shell/Bash, and Python (or another programming language like Go, Java, Ruby).
- Strong knowledge of Linux, Kubernetes, Docker, and general networking (VPC, subnets, VPN, firewalls, etc.).
- Detailed understanding of DevOps capabilities and their importance in enabling high-performing teams.
- Experience with GitHub Actions and Groovy/Jenkins pipelines.
- Familiarity with observability platforms like New Relic, Datadog, etc.
- Previous experience in on-call support for production SaaS environments.
- Comfortable asking questions, voicing opinions respectfully, and providing recommendations after evaluating multiple solutions.
Benefits:
- Competitive salary and benefits package
- Culture focused on talent development with quarterly promotion cycles and company-sponsored higher education and certifications.
- Opportunity to work with cutting-edge technologies.
- Employee engagement initiatives such as project parties, flexible work hours, and Long Service awards
- Annual health check-ups
- Insurance coverage: group term life, personal accident, and Mediclaim hospitalization for self, spouse, two children, and parents
Our company fosters a values-driven and people-centric work environment that enables our employees to:
- Accelerate growth, both professionally and personally
- Impact the world in powerful, positive ways, using the latest technologies
- Enjoy collaborative innovation, with diversity and work-life wellbeing at the core
- Unlock global opportunities to work and learn with the industry’s best
Let’s unleash your full potential at Persistent
For more detail, please contact – [email protected]
“Persistent is an Equal Opportunity Employer and prohibits discrimination and harassment of any kind.”