Top 10 Reasons to Use Drill
It only takes a couple minutes to start working with Drill. Untar it on your Mac or Windows laptop and run a query on a local file. No need to set up any infrastructure. No need to define schemas. Just point at the data and drill!
Drill is the world's first and only distributed SQL engine that doesn't require schemas. It shares the same schema-free JSON model as MongoDB and Elasticsearch. Instead of spending weeks or months defining schemas, transforming data (ETL) and maintaining those schemas, simply point Drill at your data (file, directory, HBase table, etc.) and run your queries. Drill automatically understands the structure of the data. Drill's self-service approach reduces the burden on IT and increases the productivity and agility of analysts and developers.
Drill's schema-free JSON model allows you to query complex, semi-structured data in situ. No need to flatten or transform the data prior to or during query execution. Drill also provides intuitive extensions to SQL to work with nested data. Here's a simple query on a JSON file demonstrating how to access nested elements and arrays:
Drill supports the standard SQL:2003 syntax. No need to learn a new "SQL-like" language or struggle with a semi-functional BI tool. Drill supports many data types including DATE, INTERVAL, TIMESTAMP, VARCHAR and DECIMAL, as well as complex query constructs such as correlated sub-queries and joins in WHERE clauses. Here is an example of a TPC-H standard query that runs in Drill "as is":
Drill works with standard BI tools. You can keep using the tools you love, such as Tableau, MicroStrategy, QlikView and Excel. No need to introduce yet another visualization or dashboard tool. Combine a self-service BI tool with the only self-service SQL engine to enable true self-service data exploration.
Apache Drill lets you leverage your investments in Hive. You can run interactive queries with Drill on your Hive tables and access all Hive input/output formats (including custom SerDes). You can join tables associated with different Hive metastores, and you can join a Hive table with an HBase table or a directory of log files. Here's a simple query in Drill on a Hive table:
Drill is designed with extensibility in mind. It provides out-of-the-box connectivity to file systems (local or distributed file systems such as S3, HDFS and MapR-FS), HBase and Hive. You can implement a storage plugin to make Drill work with any other data source. Drill can combine data from multiple data sources on the fly in a single query, with no centralized metadata definitions. Here's a query that combines data from a Hive table, an HBase table (view) and a JSON file:
Drill exposes a simple and high-performance Java API to build custom functions (UDFs and UDAFs) so that you can add your own business logic. If you have already built UDFs in Hive, you can reuse them with Drill with no modifications. Refer to Developing Custom Functions for more information.
Drill is designed fround the ground up for high throughput and low latency. It doesn't use a general purpose execution engine like MapReduce, Tez or Spark. As a result, Drill is able to deliver its unparalleled flexibility (schema-free JSON model) without compromising performance. Drill's optimizer leverages rule- and cost-based techniques, as well as data locality and operator push-down (the ability to push down query fragments into the back-end data sources). Drill also provides a columnar and vectorized execution engine, resulting in higher memory and CPU efficiency.
Drill is available as a simple download you can run on your laptop. When you're ready to analyze larger datasets, simply deploy Drill on your Hadoop cluster (up to 1000 commodity servers). Drill leverages the aggregate memory in the cluster to execute queries using an optimistic pipelined model, and automatically spills to disk when the working set doesn't fit in memory.