Apache Parquet Graduates to a Top-Level Project

It’s an exciting day. Apache Parquet, the de-facto standard columnar format for Hadoop, has graduated to an Apache top-level project.

The Drill project supports a variety of file formats, but Parquet is the highest performing format, and it’s the one we recommend to anyone who wants to maximize the performance of their queries. We’ve had the pleasure of working closely with the Parquet community for over two years, and it’s exciting to see how much the project has evolved.

We’ve made a number of contributions to the project, including support for self-describing data. We just implemented off-heap memory management for the Parquet readers and writers, which will improve Parquet’s memory handling. (This enhancement will be available in Parquet 1.8.)

I wanted to congratulate Twitter’s Julien Le Dem (@j_), VP of Apache Parquet, and the entire Parquet community on the graduation milestone. Oh, and how can I get a two-letter Twitter handle?

Happy Drilling!
Tomer Shiran