Site Reliability Engineer

Location: Houston, Texas, United States

Share with: Facebook Twitter Send to a friend

Reliability Engineer - Houston, TX (multiple positions of varying seniority)

Two Sigma is a financial sciences company, combining data analysis, invention, and rigorous inquiry to help solve the toughest challenges in investment management, insurance technology, securities, private equity, and venture capital. 

Our team of scientists, technologists, and academics looks beyond the traditional to develop creative solutions to some of the world’s most complex economic problems.

Our Global Reliability Engineering group consists of multiple teams of versatile full stack engineers who drive the expansion and maintenance of Two Sigma’s many and varied systems. The team exists in the space between traditional systems engineering and development, and seeks to merge the capabilities from both disciplines.

You will take on the following responsibilities:
  • Leading engineering and operational support for multiple large distributed open-source software applications including much of our foundational infrastructure
  • Improving all aspects of software reliability, including better monitoring, alerting and documentation
  • Engaging with our software engineering teams on support issues and improvements to our tools, processes, and software
  • Acting as a conduit between infrastructure and development teams to ensure synergies are maintained and priorities are appropriately aligned
  • Gathering and analyzing metrics from both operating systems and applications to assist in performance tuning and fault finding

You will gain exposure to:
  • Public cloud technologies (AWS, GCP, or Azure), authentication and encryption technologies like TLS, Kerberos and GSSAPI
  • Enterprise messaging systems and concepts (ex. Kafka, JMS, MQ Series)
  • Dynamic resource management frameworks (ex: Kubernetes, Docker)
  • Elasticsearch, Cassandra, and Zookeeper and Prometheus

You should possess the following qualifications:
  • A bachelor’s degree in computer science or another highly technical, scientific discipline
  • The ability to leverage off the shelf and open source systems and utilities to provision production systems in a variety of domains, especially for multi-tenant use
  • Ability to program (structured and OO) with one or more high level languages (such as Python, Java, C/C++, Go) with a proven track record of automation and an algorithmic approach to solving problems
  • In-depth knowledge and experience in at least one of: host based networking, Linux or UNIX engineering, systems programming, distributed systems, databases, cloud computing, and a desire to learn more
  • Experience with automated configuration management tools such as Ansible, Chef, Puppet, SaltStack
  • Prior experience in a similar Site Reliability Engineering (SRE), DevOps, distributed computing, systems engineering/administration, or related function.

You will enjoy the following benefits:
  • Core Benefits: Fully paid medical and dental insurance premiums for employees and dependents, competitive 401k match, employer-paid life & disability insurance
  • Perks: Onsite gyms with laundry service, wellness activities, casual dress, snacks, game rooms
  • Learning: Tuition reimbursement, conference and training sponsorship
  • Time Off: Generous vacation and unlimited sick days, competitive paid caregiver leaves
  • Hybrid Work Policy: Flexible in-office days with budget for home office setup

We are proud to be an equal opportunity workplace. We do not discriminate based upon race, religion, color, national origin, sex, sexual orientation, gender identity/expression, age, status as a protected veteran, status as an individual with a disability, or any other applicable legally protected characteristics.
Apply Now