Two Sigma is a financial sciences company, combining data analysis, invention, and rigorous inquiry to help solve the toughest challenges in investment management, insurance technology, securities, private equity, and venture capital.
Our team of scientists, technologists, and academics looks beyond the traditional to develop creative solutions to some of the world’s most complex economic problems.
Storage Reliability Engineering is a versatile group of full stack engineers, at the front line of maintaining and rapidly growing the capabilities of Two Sigma’s diverse distributed storage ecosystem! The team exists in the space between software development and systems administration, and seeks to merge the capabilities from both specialties.
You will take on the following responsibilities:
- Developing and supporting multiple, large distributed software applications and storage infrastructure systems
- Improving all aspects of system, infrastructure, and software reliability, including better monitoring, alerting, and documentation
- Developing tools and automation to support and scale our storage infrastructure environment
- Evaluating and implementing both homegrown and commercial Storage services for use within Two Sigma consisting of NAS appliances, Object storage, Block Storage environments, and other technologies
- Capturing and analyzing metrics from operating systems, applications, and storage environment to assist in performance tuning and fault-finding
You should possess the following qualifications:
- BS in Computer Science or another highly technical, scientific subject area
- Minimum 1 year of experience required; 1-10 years of experience preferred with host-based networking, Linux/UNIX administration, storage technologies (NFS, HDFS, Ceph, S3), systems programming, distributed systems, host based networking, databases, and cloud computing (EC2, GCS)
- Ability to program (structured and OO) with one or more high level languages such as Python, Java, C/C++, Go
- The ability to use off the shelf and open-source systems and utilities to provision production systems in a variety of domains, especially for multi-tenant use
- A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
- Experience with automated configuration management tools like Ansible, Chef, Puppet, SaltStack
- Experience with observability and monitoring technologies such as New Relic, Datadog, Prometheus, Nagios, VictorOps, Splunk