Senior Site Reliability Engineer

Houston, Texas, United States

Share with: Facebook Twitter Send to a friend

Reliability Engineer - Houston, TX (multiple positions of varying seniority)

Two Sigma is a different kind of investment manager. Since 2001, we have used data science and technology to derive insights that forecast the future and discover value in markets worldwide. Our team of scientists, technologists and academics looks beyond traditional finance to understand the bigger picture and develop creative solutions to some of the world’s most challenging economic problems. Our work spans across markets and industries, from insurance and securities to private investments and new ventures.

Our Global Reliability Engineering group consists of multiple teams of versatile full stack engineers who drive the expansion and maintenance of Two Sigma’s many and varied systems. The team exists in the space between traditional systems engineering and development, and seeks to merge the capabilities from both disciplines.

You will take on the following responsibilities:
  • Leading engineering and operational support for multiple large distributed open-source software applications including much of our foundational infrastructure
  • Improving all aspects of software reliability, including better monitoring, alerting and documentation
  • Engaging with our software engineering teams on support issues and improvements to our tools, processes, and software
  • Acting as a conduit between infrastructure and development teams to ensure synergies are maintained and priorities are appropriately aligned
  • Gathering and analyzing metrics from both operating systems and applications to assist in performance tuning and fault finding
You will gain exposure to:
  • Public cloud technologies (AWS, GCP, or Azure), authentication and encryption technologies like TLS, Kerberos and GSSAPI
  • Enterprise messaging systems and concepts (ex. Kafka, JMS, MQ Series)
  • Dynamic resource management frameworks (ex: Kubernetes, Docker)
  • Elasticsearch, Cassandra, and Zookeeper and Prometheus
You should possess the following qualifications:
  • A bachelor’s degree in computer science or another highly technical, scientific discipline
  • The ability to leverage off the shelf and open source systems and utilities to provision production systems in a variety of domains, especially for multi-tenant use
  • Ability to program (structured and OO) with one or more high level languages (such as Python, Java, C/C++, Go) with a proven track record of automation and an algorithmic approach to solving problems
  • In-depth knowledge and experience in at least one of: host based networking, Linux or UNIX engineering, systems programming, distributed systems, databases, cloud computing, and a desire to learn more
  • Experience with automated configuration management tools such as Ansible, Chef, Puppet, SaltStack
You will enjoy the following benefits:
  • Core Benefits: Fully paid medical and dental insurance premiums for employees and dependents, competitive 401k match, employer-paid life & disability insurance
  • Learning: Tuition reimbursement, conference and training sponsorship
  • Time Off: Generous vacation and unlimited sick days, competitive paid caregiver leaves
We are proud to be an equal opportunity workplace. We do not discriminate based upon race, religion, color, national origin, sex, sexual orientation, gender identity/expression, age, status as a protected veteran, status as an individual with a disability, or any other applicable legally protected characteristics.