Two Sigma is a financial sciences company, combining data analysis, invention, and rigorous inquiry to help solve the toughest challenges in investment management, insurance technology, securities, private equity, and venture capital.
Our team of scientists, technologists, and academics looks beyond the traditional to develop creative solutions to some of the world’s most complex economic problems.
Our Distributed Open-Source Systems Reliability Engineering group consists of versatile engineers who drive the expansion and maintenance of Two Sigma’s multidimensional systems. The team exists in the space between systems engineering and development, building sophisticated software solutions and integrating innovative technologies to optimize the performance, efficiency, and scalability of our production environment.
You will take on the following responsibilities:
- Lead engineering and operational support for multiple large distributed open-source software applications (Elasticsearch, Kafka and Zookeeper), including much of the foundational infrastructure used by the Engineering and Research functions at Two Sigma
- Improve all aspects of software reliability, including better monitoring, alerting and documentation
- Collaborate across infrastructure and development teams to ensure strategic priorities are aligned, fix priority support issues, and improve vital software, tools, and processes
- Collect and analyze metrics from operating systems and applications to assist in performance tuning and fault finding
- Participate in a 24x7 on-call rotation for our hosted services
You should possess the following qualifications:
- Minimum 1 year of experience required; 3-10 years of experience preferred in a similar Site Reliability Engineering (SRE), DevOps, Platform Engineering, Systems Engineering/Administration, or related function
- BS in Computer Science or another highly technical, scientific field
- The ability to apply open-source systems (Elasticsearch, Kafka and Zookeeper) and utilities to provision production systems in a variety of domains, especially for multi-tenant use
- Ability to program (structured and OO) with one or more high-level language (such as Python, Java, C/C++, Go) with a proven track record of automation and an algorithmic approach to solving problems
- In-depth knowledge and experience with on-prem (Linux/Unix) and cloud-based (GCP, AWS, etc.) systems
- Experience with automated configuration management tools such as Ansible, Chef, Puppet, and SaltStack