Site Reliability Engineer

USA

CIQ

October 19, 2022

Job Description

CIQ OVERVIEW

CIQ believes in helping people do great things.  We do this by building strong communities for open-source software, innovating software infrastructure, and building the next generation of performance computing.  Our software stack consists of Rocky Linux the CentOS replacement, Apptainer the container solution of choice for HPC, Warewulf a provisioning and cluster management solution, and Fuzzball our next-generation performance computing platform that is a multi-cloud, multi-site, multi-cluster, and multi-node.

If you are interested in an environment built on ownership, diversity of thought, and pushing the limits of what is possible, then we would be interested in you.

POSITION SUMMARY 

As a Site Reliability Engineer (SRE), you will work within the development team to combine software and systems engineering and run large-scale distributed systems. You will also maintain our systems’ capacity and performance. Additional responsibilities include, but are not limited to:

  • Taking part in architecture-level discussions, planning, and implementation (lines of GoLang, Terraform code, and Pulumi code).

  • Researching to ensure what we are building is always the best path forward.

  • Documenting each project to facilitate integration for users.

  • Driving proof of concepts and minimal viable products for demonstration.

  • Delivery of Infrastructure as Code.

  • Supporting multiple services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews.

NEEDED TO SUCCEED

To succeed in this role, candidates must have a strong foundational knowledge of Linux and Clouds (AWS/Azure/GCP/etc). Kubernetes knowledge is required. GoLang, Terraform, and Pulumi familiarity is preferred, but not required so long as you are willing to learn. You have strong problem-solving skills and excellent communication skills. You are able to work independently as well as collaboratively in a remote team environment. This is a fast-paced tech startup, with a preference for candidates that are eager to learn, dedicated to the cause, and able to pivot quickly and efficiently. Last but most certainly not least, you are friendly, collaborative, humble, honest, and always striving to be better.

EDUCATION AND EXPERIENCE

At least 3 years of SRE or similar experience. At least two years of programming experience in a conventional programming language. A demonstrated proficiency with Linux and Cloud (AWS/Azure/GCP).

BENEFITS
  • Medical, dental, vision insurance (80% employer/20% employee)

  • Flexible paid time off

  • Employee stock options

  • Remote work, no required travel for most positions.