Azure Blob Storage Plugin
Drill works well with Azure Blob Storage thanks to the Hadoop compatible layer that exists and make Azure Blob Storage usable by any tool that supports HDFS, just like Apache Drill.
Install Azure Jars
The first step is to download the jars from Maven. The ones the works with the current version of Drill are the following:
The first one is the HDFS wrapper around Azure Blob Storage and the second provides access Azure Blob Storage from Java. Download them jars and save them into $DRILL_HOME/jars/3rdparty
folder.
Providing Azure Blob Storage Credentials
Your environment determines where you provide your Azure Blob Storage credentials. You can define your Azure Blob Storage credentials one of three ways:
- Directly in the Azure Blob Storage storage plugin. Note that this method is the least secure, but sufficient for use on a single machine, such as a laptop.
- In a non-Hadoop environment, you can use the Drill-specific
core-site.xml
file to provide the Azure Blob Storage credentials. - In a Hadoop environment, you can use the existing Azure Blob Storage configuration for Hadoop. The Azure Blob Storage credentials should already be defined. All you need to do is configure the Azure Blob Storage storage plugin.
Defining Access Keys in the Azure Blob Storage Plugin
Refer to Configuring the Azure Blob Storage Plugin.
Defining Access Keys in the Drill core-site.xml File
In order to configure Drill to access the Azure Blob Storage that contains that data you want to query with Drill, the authentication key must be provided. To get the authentication key you can use AZ CLI:
az storage account keys list -g <resource-group> -n <storage-account-name>
pick the primary or secondary key and put it in the site-conf.xml
file that you can find in $DRILL_HOME/conf
or $DRILL_SITE
folder. If it doesn’t exists already, go on and create it (you may also just copy core-site-example.xml
file to core-site.xml
and start from there):
<?xml version="1.0" encoding="UTF-8" ?>
<configuration>
<property>
<name>fs.azure.account.key.STORAGE_ACCOUNT_NAME.blob.core.windows.net</name>
<value>AUTHENTICATION_KEY</value>
</property>
</configuration>
Note: When you rename the file, Hadoop support breaks if $HADOOP_HOME
was in the path because Drill pulls in the Drill core-site.xml file instead of the Hadoop core-site.xml file. In this situation, make the changes in the Hadoop core-site.xml file. Do not create a core-site.xml file for Drill.
Configuring the Azure Blob Storage Plugin
The Storage page in the Drill Web UI provides an Azure Blob Storage plugin that you configure to connect Drill to the Azure Blob Storage file system registered in core-site.xml
. If you did not define your Azure Blob Storage credentials in the core-site.xml
file, you can define them in the storage plugin configuration.
To configure the Azure Blob Storage plugin, log in to the Drill Web UI and then update the Azure Blob Storage configuration with the bucket name, as described in the following steps:
1. To access the Drill Web UI, enter the following URL in the address bar of your web browser:
http://<drill-hostname>:8047
//The drill-hostname is a node on which Drill is running.
2. To configure the Azure Blob Storage plugin in Drill, complete the following steps:
a. Click on the Storage page.
b. Find the CP option on the page and then click Update next to the option.
c. Copy the entire content in the clipboard and the go Back.
d. At the bottom of the page, “New storage Plugin” section, type AZ
in the textbox and click on Create.
e. Paste the text copied from the CP plugin.
f. Configure the Azure Blob Storage plugin, specifying the container you want to access to in the "connection"
property, as shown in the following example:
Note: The "config"
block in the following Azure Blob Storage plugin configuration contains the access key and endpoint properties required if you want to define your Azure Blob Storage credentials here. Do not include the "config"
block in your Azure Blob Storage plugin configuration if you defined your Azure Blob Storage credentials in the core-site.xml
file.
"type": "file",
"enabled": true,
"connection": "wasbs://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net",
"config": {
"fs.azure.account.key.STORAGE_ACCOUNT_NAME.blob.core.windows.net": "AUTHENTICATION_KEY"
},
"workspaces": {
"root": {
"location": "/user/robot/drill",
"writable": true,
"defaultInputFormat": null
},
"tmp": {
"location": "/tmp",
"writable": true,
"defaultInputFormat": null
}
}
4-Click Update to save the configuration. 5-Navigate back to the Storage page. 6-On the Storage page, the newly create AZ option should be automatically enabled.
Drill should now be able to use access data in your Azure Blob Storage container and query it.
https://vimeo.com/286972298