Sr. Site Reliability Engineer - Operational Readiness (Hybrid)



Software Engineering
Bengaluru, Karnataka, India
Posted on Monday, April 15, 2024

About HashiCorp

HashiCorp helps solve development, operations, and security challenges in infrastructure so organizations can focus on business-critical tasks. We build products to give organizations a consistent way to manage their move to cloud-based IT infrastructures for running their applications.

We use the Tao of HashiCorp as our guiding principles for product development and operate according to a strong set of company principles for how we interact with each other. We value top-notch collaboration and communication skills, both among internal teams and in how we interact with our users.

The Role

As a Senior Site Reliability Engineer for the Operational Readiness team, you will play a critical role in enhancing the scalability, performance, and reliability of HashiCorp's cloud products. With at least 6 years of experience in site reliability engineering or a related field, you will lead efforts to identify, address, and mitigate operational challenges before they impact our customers. Your expertise in load testing, performance analysis, and system hardening will ensure that our services meet the highest standards of operational excellence.

Key Responsibilities

  • Design and execute comprehensive load testing strategies to identify performance bottlenecks and scalability limits across our cloud products.
  • Implement best practices and technologies to improve system resilience, ensuring high availability and fault tolerance.
  • Work closely with engineering and product teams to integrate operational readiness into the development lifecycle, enhancing product stability and user satisfaction.
  • Build and refine tools and frameworks for automated testing, environment simulation, and incident reproduction, reducing manual effort and increasing test coverage.
  • Conduct in-depth analysis of testing results, documenting findings and making actionable recommendations for system enhancements.
  • Share your knowledge and expertise with team members, fostering a culture of learning and continuous improvement.

Ideal Candidate

  • 6+ years of experience in site reliability engineering, systems engineering, or software development, with a focus on operational readiness, performance testing, or system scalability.
  • Proven track record of leading successful load testing and performance optimization initiatives in cloud environments.
  • Strong technical foundation in cloud technologies (AWS, Azure, GCP) and experience with infrastructure as code (Terraform, CloudFormation).
  • Excellent problem-solving skills, with the ability to analyze complex systems, identify points of failure, and implement solutions.
  • Effective communication and collaboration skills, capable of working with cross-functional teams and articulating technical concepts to diverse audiences.
  • Familiarity with HashiCorp products and tools is a plus. #LI-hybrid