Job

Senior Site Reliability Engineer (23-00125)

By February 24, 2023May 30th, 2023No Comments

Location: 100% Remote 
Position Type: Direct Hire / Perm

As a Senior SRE, you should have expereince bringing architectural design proposals to the table for consideration among your colleagues on platform and infrastructure development teams. You will be one of the principal technical designers helping push the cloud-native platform toward the future. You will be responsible for driving the implementation of flexible cloud architectures with an automation-first emphasis; You will be deeply immersed in Go and Python observability stacks; in addition to AWS and Terraform as well.

This is a very hands-on Senior Engineering role where your days will be filled with building solutions to technical challenges in the observability and availability of online services. You will help manage and orchestrate each of these by leaning heavily on technologies like Go, Terraform, Docker, and Bash.

Responsibilities:

  • Design, engineer, and develop solutions for ensuring the observability and reliability of the online platform
  • Be a trusted voice in the evangelism of reliability engineering throughout the team with an eagerness for mentoring other developers on the team
  • Help define and oversee short and mid-term project roadmaps for the future of the SRE team
  • Participate in after-hours on-call support rotations

Requirements:

  • Minimum 4 years of professional experience instrumenting complex observability stacks in object oriented programming languages, preferably Go.
  • Proficiency in AWS container management, orchestration, and observability features (ECS, Fargate, Aurora, AppConfig, CloudWatch, etc.)
  • Experience managing AWS access and security services (IAM, kms, Secrets Manager, WAFv2, etc.)Minimum of 2 years experience with containers in a professional setting, preferably Docker
  • Adept understanding of observability stack management (otel, tracing, monitoring, alerting, structured logging, APM, etc.)
  • Comfortable communicator, able to clearly detail designs and implementations on an individual level and in large group settings

Preferred Requirements:

  • Extensive hands-on experience with OpenTelemetry
  • Hands-on experience developing and maintaining CI/CD pipelines, preferably in git/GitLab
  • Understanding of RESTful and Websocket based APIs
  • Familiarity with Datadog
  • Familiarity with Atlassian products (OpsGenie, JIRA, Confluence)
  • Experience working with developers in an agile environment
  • Experience in the games industry, preferably launching multiple online-enabled AAAs
  • Knowledge about Gearbox-owned IPs