Reliability Engineer

New York, New York, United States

Share with: Facebook Twitter Send to a friend

Two Sigma is a financial sciences company, combining data analysis, invention, and rigorous inquiry to help solve the toughest challenges in investment management, insurance technology, securities, private equity, and venture capital.

Our team of scientists, technologists, and academics looks beyond the traditional to develop creative solutions to some of the world’s most complex economic problems.

Our Global Reliability Engineering group consists of multiple teams of versatile full stack engineers who drive the expansion and maintenance of Two Sigma’s many and varied systems. The team exists in the space between traditional systems engineering and development, and seeks to merge the capabilities from both disciplines.

You will take on the following responsibilities:
  • Improving all aspects of software reliability, including better monitoring, alerting and documentation
  • Facilitating the adoption of reliability engineering best practices by our software engineering partner teams
  • Serving on a rotating business-hours on call rotation investigating and remediating alerts using service specific runbooks
  • Recommending, developing, and automating production and research data pipelines using industry best practices
  • Creating acceptance and quality criteria to guarantee the integrity of production and research data sets
  • Automating validation processes for our data sets
  • Collaborating with business partners, software engineers and quantitative modelers, to understand and improve business processes
You will gain exposure to:
    • Dynamic resource management frameworks (ex: Kubernetes, Docker)
    • Large-scale batch processing using public cloud technologies (AWS, GCP, or Azure)
    • Building resilient data pipelines using open source technologies (ex. Airflow, Prefect)
    • Observability in distributed systems (ex. Elasticsearch, Logstash, Kibana, Datadog, Prometheus, Grafana)

You should possess the following qualifications:
  • BS/BA in computer science or another highly technical, scientific discipline
  • Minimum 1 year of experience required; preferred 1-10 year(s) of experience with automating technical processes
  • Experience with Bash, Python, or other similar languages
  • Familiarity with the UNIX command line
  • Prior experience in a similar Site Reliability Engineering (SRE), DevOps, distributed computing, systems engineering/administration, or related function.

You will enjoy the following benefits:
  • Core Benefits: Fully paid medical and dental insurance premiums for employees and dependents, competitive 401k match, employer-paid life & disability insurance
  • Perks: Onsite gyms with laundry service, wellness activities, casual dress, snacks, game rooms
  • Learning: Tuition reimbursement, conference and training sponsorship
  • Time Off: Generous vacation and unlimited sick days, competitive paid caregiver leaves
  • Hybrid Work Policy: Flexible in-office days with budget for home office setup

The base pay for this role will be between $165,000 and $325,000. This role may also be eligible for other forms of compensation and benefits, such as a discretionary bonus, health, dental and other wellness plans and 401(k) contributions. Discretionary bonus can be a significant portion of total compensation. Actual compensation for successful candidates will be carefully determined based on a number of factors, including their skills, qualifications and experience.

We are proud to be an equal opportunity workplace. We do not discriminate based upon race, religion, color, national origin, sex, sexual orientation, gender identity/expression, age, status as a protected veteran, status as an individual with a disability, or any other applicable legally protected characteristics.