Software Engineer: Open Source (Apache Iceberg)

New York, New York, United States

Share with: Facebook Twitter Send to a friend

Two Sigma is a financial sciences company, combining data analysis, invention, and rigorous inquiry to help solve the toughest challenges in investment management, insurance technology, securities, private equity, and venture capital.

Our team of scientists, technologists, and academics looks beyond the traditional to develop creative solutions to some of the world’s most complex economic problems.

We are seeking an open-source software engineer for the Time Series Storage team! The team is responsible for architecture, design, development and support of a petabyte-scale time-series storage solution to empower Two Sigma’s workflows across research, simulation and live-trading. We are evolving our time-series data access layer towards a modern hybrid-cloud architecture with open table format (Apache Iceberg) to sustain the continuous growth of our business. As an open-source software engineer on the team, you will have an opportunity to work on Apache Iceberg, a cutting-edge table format for huge analytic data, and other open-source projects that integrate with Apache Iceberg (i.e. Parquet, Arrow, Spark). Your work will create impact on many high-visible open-source projects and Two Sigma’s new flagship time-series storage offering. 

You will take on the following responsibilities:
  • Design and develop new Iceberg features and conduct bug fixes required by Two Sigma’s time-series data use cases 
  • Maintain and upgrade the Iceberg library in our codebase
  • Provide consultation in relation to our time-series storage solution and the  downstream users that leverage Iceberg
  • Build and maintain the Iceberg CI/CD pipeline
  • Contribute changes to the Iceberg community
  • Mentor engineers on the team and help them to contribute to the Iceberg codebase
  • Foster the relationship between Two Sigma and the Iceberg community

You should possess the following qualifications:
  • 3+ years of full-time work experience in a relevant domain (storage, data lake, data warehouse) 
  • Hands-on experience of integrating Apache Iceberg with any production system
  • Deep knowledge of Apache Iceberg’s implementation (including table spec - v1/v2, reader and writer, table compaction, snapshot, etc.)
  • Strong software engineering skills in developing, testing and troubleshooting code in one and/or more of the following programming languages: Java/C++/Python
  • BA/BS degree in a technical subject area

The following qualifications are a plus:
  • Experience of using any data processing engines (e.g., Spark, Flink)
  • Familiarity with Pandas ecosystem and Python data science & data analytics libraries, in addition to Active Apache Iceberg committer 
  • Familiarity with common columnar data formats (i.e. Parquet) and in-memory formats (i.e. Arrow)
  • Experience of contributing to Arrow/Parquet community

You will enjoy the following benefits:
  • Core Benefits: Fully paid medical and dental insurance premiums for employees and dependents, competitive 401k match, employer-paid life & disability insurance
  • Perks: Onsite gyms with laundry service, wellness activities, casual dress, snacks, game rooms
  • Learning: Tuition reimbursement, conference and training sponsorship
  • Time Off: Generous vacation and unlimited sick days, competitive paid caregiver leaves
  • Hybrid Work Policy: Flexible in-office days with budget for home office setup

We are proud to be an equal opportunity workplace. We do not discriminate based upon race, religion, color, national origin, sex, sexual orientation, gender identity/expression, age, status as a protected veteran, status as an individual with a disability, or any other applicable legally protected characteristics.