Reliability Engineer

New York, New York, United States

Share with: Facebook Twitter Send to a friend

Two Sigma is a financial sciences company, combining data analysis, invention, and rigorous inquiry to help solve the toughest challenges in investment management, insurance technology, securities, private equity, and venture capital.

Our team of scientists, technologists, and academics looks beyond the traditional to develop creative solutions to some of the world’s most complex economic problems.

Our Global Reliability Engineering group consists of multiple teams of versatile full stack engineers who drive the expansion and maintenance of Two Sigma’s many and varied systems. The team exists in the space between traditional systems engineering and development, and seeks to merge the capabilities from both disciplines.

You will take on the following responsibilities:
  • Improving all aspects of software reliability, including better monitoring, alerting and documentation
  • Facilitating the adoption of reliability engineering best practices by our software engineering partner teams
  • Serving on a rotating business-hours on call rotation investigating and remediating alerts using service specific runbooks
  • Recommending, developing, and automating production and research data pipelines using industry best practices
  • Creating acceptance and quality criteria to guarantee the integrity of production and research data sets
  • Automating validation processes for our data sets
  • Collaborating with business partners, software engineers and quantitative modelers, to understand and improve business processes

You will gain exposure to:
    • Dynamic resource management frameworks (ex: Kubernetes, Docker)
    • Large-scale batch processing using public cloud technologies (AWS, GCP, or Azure)
    • Building resilient data pipelines using open source technologies (ex. Airflow, Prefect)
    • Observability in distributed systems (ex. Elasticsearch, Logstash, Kibana, Datadog, Prometheus, Grafana)

You should possess the following qualifications:
  • A bachelor’s degree in computer science or another highly technical, scientific discipline
  • Experience with automating technical processes
  • Experience with Bash, Python, or other similar languages
  • Familiarity with the UNIX command line
  • Prior experience in a similar Site Reliability Engineering (SRE), DevOps, distributed computing, systems engineering/administration, or related function.

You will enjoy the following benefits:
  • Core Benefits: Fully paid medical and dental insurance premiums for employees and dependents, competitive 401k match, employer-paid life & disability insurance
  • Perks: Onsite gyms with laundry service, wellness activities, casual dress, snacks, game rooms
  • Learning: Tuition reimbursement, conference and training sponsorship
  • Time Off: Generous vacation and unlimited sick days, competitive paid caregiver leaves
  • Hybrid Work Policy: Flexible in-office days with budget for home office setup

We are proud to be an equal opportunity workplace. We do not discriminate based upon race, religion, color, national origin, sex, sexual orientation, gender identity/expression, age, status as a protected veteran, status as an individual with a disability, or any other applicable legally protected characteristics.