Drata is looking for a Senior Site Reliability Engineer to join our rapidly growing team. At Drata, Site Reliability Engineering is a role that is highly skilled in both areas of Software Engineering and System Engineering for cloud infrastructure systems. Site Reliability Engineers are working closely with Application Development Teams to ensure, through the entire lifecycle of the project, that features and services they build are reliable and performant. An ideal candidate is self-motivated, eager to learn, grow, own their work, and thrive in a fast-paced and collaborative environment. Additionally, an ideal candidate for this role has experience as a software engineer as well as an infrastructure engineer. One of Drata's core value propositions is automation, and that is on display throughout the platform that this role enables.

Our infrastructure runs on AWS across multiple accounts with everything defined in Terraform. You will help us scale our business to meet the needs of our growing customer base and develop products and tools that delight our users. You'll play a critical role in our fast-growing company while working alongside engineers with diverse skill sets, following best-practices in technology, architecture, and process.

What you’ll do:

Design, implement and maintain the tools and systems that support service reliability, monitoring, and alerting
Maintain Infrastructure as a code using Terraform and design and implement solutions for scaling the IaC codebase
Enable engineering teams to deploy, maintain, and scale their services and ensure that they are correctly monitored, secure and resilient
Work with product owners and engineers to implement SLOs and related SLIs for Drata’s services
Build, practice and maintain the DevOps culture and principles for efficiency and automation by focusing on using automation to upkeep the reliability
Identify and develop processes, tools, automation, infrastructure improvements and software changes to address top operational issues
Enforce architectural governance, deployment standards and infrastructure best practices
Participate in an on-call rotation and incident management to resolve active incidents
Partner with and provide guidance to the rest of the SRE and Cloud Engineering teams to ensure smooth and continuous delivery of Drata services to customers
Engage in design and code reviews of the product, and enhance your knowledge of customers’ experiences to prevent future problems
Create and keep up-to date clean and comprehensive technical documentation
Continuously evaluate new technologies and industry best practices to improve our SRE tooling and incident response procedures
Occasionally dive into the main Drata application code to better discern (and sometimes fix) behavior in production
Communicate team updates, priorities and strategy to engineering and cross-functional leadership teams
Participate in additional activities including interviewing, designing and reviewing technical specifications
Lead and contribute to large projects aimed at improving system reliability, scalability & efficiency
Work with 3rd party vendors to review, implement and integrate with their services
Design, create and manage CI/CD pipelines to ensure rapid, reliable, and repeatable deployment of our cloud-based applications

What you’ll bring:

6+ years of experience in Site Reliability Engineering, Cloud Engineering, and building and maintaining scalable, resilient services
Robust knowledge of cloud computing technologies like Terraform, Docker, Ansible, Git and Linux
Experience in building software systems as a software engineer
Experience with developing tooling and automation in Bash and Python
Experience with CI/CD pipeline automation using Jenkins
Experience working with relational databases (proficiency in MySQL is a plus)
Experience with disaster recovery practices and incident management
Good understanding of observability concepts (monitoring, logging, tracing, metrics)
Experience with Javascript/Node.js ecosystem is a plus
Experience with various container orchestration and deployment technologies including AWS ECS Fargate and Kubernetes
Certified Kubernetes Administration certification is a plus
Ability to take ownership of problems and act on them independently in a constantly evolving environment

Benefits:

Healthcare: 90-100% paid premiums for medical, dental, and vision plans for employee and dependents + on demand health care concierge
HSA, FSA, & DCFSA: Pre-tax savings plans for healthcare and dependent care, with up to a $600 annual employer contribution to the HSA plan (if enrolled in HSA medical plan)
100% paid short and long term disability plus life + AD&D benefits
Learning & Development: $500 annually towards professional development opportunities + $250 annually towards personal development opportunities
Flexible Time Off: Flexible vacation policy for strong, fully charged batteries
16 Weeks Paid Parental Leave: An inclusive policy to ensure you have time with your newborn, newly adopted, or foster child
Work Remotely: Flexible hours and work from home + $1,000 annually to cover necessary business related items for your home office
401K: Reach your financial goals while reducing your taxes

This role will receive a competitive base salary, benefits, and equity. The applicable salary range for each US-based role is based on where the employee works and is aligned to one of 3 tiers based on the cost of labor for that geographic area. The expected salary ranges for this role are set forth below.

Tier 1: $198,900 - $245,700

Tier 2: $179,010 - $221,130

Tier 3: $161,109 - $199,017

Here you can view which geographic pay tier applies to you, based on where you permanently reside and work. A variety of factors are considered when determining someone’s leveling and compensation – including a candidate’s professional background and experience. What tier you are aligned to is non-negotiable and is solely dependent on where you permanently reside. These ranges and tier alignments may be modified in the future and final offer amounts may vary from the amounts listed above.

Drata is on a mission to help build trust across the internet.

Drata is a security and compliance automation platform that continuously monitors and collects evidence of a company's security controls, while streamlining compliance workflows end-to-end to ensure audit readiness.

We all recognize the importance of earning and keeping the trust of our customers when it comes to protecting their data. We've felt firsthand how burdensome achieving and maintaining a strong security and compliance posture can be at a fast-growing company. It’s a manual, redundant, error-prone, and unscalable process - and it only grows more complex and expensive over time.

Our team of SaaS, security, compliance, and audit experts have built a better way - with automation.

Employment at Drata is based solely upon individual merit and qualifications directly related to professional competence. We strictly prohibit unlawful discrimination or harassment on the basis of race, color, religion, veteran status, national origin, ancestry, pregnancy status, sex, gender identity or expression, age, marital status, mental or physical disability, medical condition, sexual orientation, or any other characteristics protected by law. We also make reasonable accommodations to meet our obligations under laws protecting the rights of the disabled.

This job is no longer accepting applications

See open jobs at Drata.See open jobs similar to "Senior Site Reliability Engineer (Remote)" Notable Capital.

See more open positions at Drata

Privacy policy Cookie policy