Class GoogleSheetsUtils

java.lang.Object
org.apache.drill.exec.store.googlesheets.utils.GoogleSheetsUtils

public class GoogleSheetsUtils extends Object
  • Constructor Details

    • GoogleSheetsUtils

      public GoogleSheetsUtils()
  • Method Details

    • authorize

      public static com.google.api.client.auth.oauth2.Credential authorize(GoogleSheetsStoragePluginConfig config, com.google.api.client.util.store.DataStore<com.google.api.client.auth.oauth2.StoredCredential> dataStore, String queryUser) throws IOException, GeneralSecurityException
      Creates an authorized Credential for use in GoogleSheets queries.
      Parameters:
      config - The GoogleSheetsStoragePluginConfig to be authorized
      dataStore - A DrillDataStore containing the user's tokens
      queryUser - The current query user's ID. This should be set to anonymous if user translation is disabled.
      Returns:
      A validated Credential object.
      Throws:
      IOException - If anything goes wrong
      GeneralSecurityException - If the credentials are invalid
    • loadCredential

      public static com.google.api.client.auth.oauth2.Credential loadCredential(String userId, com.google.api.client.googleapis.auth.oauth2.GoogleAuthorizationCodeFlow flow, com.google.api.client.util.store.DataStore<com.google.api.client.auth.oauth2.StoredCredential> credentialDataStore)
    • getSheetsService

      public static com.google.api.services.sheets.v4.Sheets getSheetsService(GoogleSheetsStoragePluginConfig config, com.google.api.client.util.store.DataStore<com.google.api.client.auth.oauth2.StoredCredential> dataStore, String queryUser) throws IOException, GeneralSecurityException
      Throws:
      IOException
      GeneralSecurityException
    • getDriveService

      public static com.google.api.services.drive.Drive getDriveService(GoogleSheetsStoragePluginConfig config, com.google.api.client.util.store.DataStore<com.google.api.client.auth.oauth2.StoredCredential> dataStore, String queryUser) throws IOException, GeneralSecurityException
      Returns an authenticated Drive service.
      Parameters:
      config - The GoogleSheetsStoragePluginConfig for the plugin
      dataStore - A DrillDataStore for the stored credentials
      queryUser - A String containing the current query user.
      Returns:
      An authenticated Drive service.
      Throws:
      IOException - If anything goes wrong throw an IOException
      GeneralSecurityException - If the creds are invalid or
    • getTokenToNameMap

      public static Map<String,String> getTokenToNameMap(com.google.api.services.drive.Drive driveService) throws IOException
      In GoogleSheets, the file is uniquely identified by a non-human readable token. The Sheets SDK does not provide a means to map tokens to file names. To do this, we need to use the Google Drive SDK. Google Drive's concept of folders is more similar to that of S3 which doesn't really have folders or directories. So the user is not able to include any file paths in the query string. More importantly however, is that Google Drive allows duplicate file names, even within the same directory. Thus, it is entirely possible to have an entire directory of files with the same name. The tokens for these files will be different, but the human-readable names can be the same. This creates an obvious problem for Drill as we need unique file names to include in a query.
      Parameters:
      driveService - An authenticated Drive service.
      Returns:
      A HashMap containing the tokens as keys and the file names as values.
      Throws:
      IOException - If anything goes wrong, throw an IOException.
    • isProbableFileToken

      public static boolean isProbableFileToken(String id)
      Google Sheets tokens are strings of length 44 that contain upper and lower case letters, numbers and underscores. This function will attempt to identify file tokens.

      Given that Google's spec for file IDs is not officially published, and can change at any time, we will keep the validation as light as possible to prevent future issues, in the event Google changes their file Id structure.

      Parameters:
      id - A String containing an unknown identifier
      Returns:
      True if the string is a file probable file token, false if not.
    • getTabList

      public static List<com.google.api.services.sheets.v4.model.Sheet> getTabList(com.google.api.services.sheets.v4.Sheets service, String sheetID) throws IOException
      Returns a list of the titles of the available spreadsheets within a given Google sheet.
      Parameters:
      service - The Google Sheets service
      sheetID - The sheetID for the Google sheet. This can be obtained from the URL of your Google sheet
      Returns:
      A list of spreadsheet names within a given Google Sheet
      Throws:
      IOException - If the Google sheet is unreachable or invalid.
    • columnToLetter

      public static String columnToLetter(int column)
      Converts a column index to A1 notation. Google sheets has a limitation of approx 18k columns, but that is not enforced here. The column index must be greater than zero or the function will return null. References code found here: Stack Overflow Article
      Parameters:
      column - The column index for the desired column. Must be greater than zero
      Returns:
      The A1 representation of the column index.
    • letterToColumnIndex

      public static int letterToColumnIndex(String letter)
      Given a column reference in A1 notation, this function will return the column numeric index. GoogleSheets has a limit of approx 18k columns, but that is not enforced here. References code found here: Stack Overflow Article
      Parameters:
      letter - The desired column in A1 notation
      Returns:
      The index of the supplied column
    • getFirstRows

      public static List<List<Object>> getFirstRows(com.google.api.services.sheets.v4.Sheets service, String sheetID, String tabName) throws IOException
      This function will be used to build the schema for Drill. As Google sheets does
      Parameters:
      service - An authenticated Google Sheets Service
      sheetID - The Sheet ID for the Google Sheet (Can be found in the Sheet URL)
      tabName - The tab name of the actual spreadsheet you want to query
      Returns:
      A nested list of the first five rows of the dataset.
      Throws:
      IOException - If the request fails, throw an IOException
    • getDataFromRange

      public static List<List<Object>> getDataFromRange(com.google.api.services.sheets.v4.Sheets service, String sheetID, String range) throws IOException
      Returns a 2D table of Objects representing the given range in A1 notation. Note that this function cannot be used with multiple ranges. If you are trying to retrieve multiple groups of columns, you must use the getBatchData function.
      Parameters:
      service - The Authenticated GoogleSheets service
      sheetID - The GoogleSheet ID. This can be found in the Sheet URL
      range - The range in A1 notation.
      Returns:
      A 2D table of Objects representing the given range.
      Throws:
      IOException - If the request fails, throw an IOException.
    • getSheetFromTabList

      public static com.google.api.services.sheets.v4.model.Sheet getSheetFromTabList(String tabName, List<com.google.api.services.sheets.v4.model.Sheet> tabList)
      Finds a Sheet from a list of tabs with a given title. If the sheet is not present, the function will throw a User Exception.
      Parameters:
      tabName - The name of the desired sheet.
      tabList - A List of Sheet objects
      Returns:
      The desired Sheet.
    • getBatchData

      public static List<List<Object>> getBatchData(com.google.api.services.sheets.v4.Sheets service, String sheetID, List<String> ranges) throws IOException
      This function is used to get data when projection is pushed down to Google Sheets.
      Parameters:
      service - The Authenticated GoogleSheets service
      sheetID - The GoogleSheet ID. This can be found in the Sheet URL
      ranges - The list of ranges
      Throws:
      IOException - If anything goes wrong, IOException will be thrown
    • getColumnMap

      public static Map<String,GoogleSheetsColumn> getColumnMap(List<List<Object>> sampleData, List<SchemaPath> projectedColumns, boolean allTextMode)
      Parameters:
      sampleData - This represents a sample of the first few rows of data which will be used to build the schema.
      projectedColumns - A list of projected columns
      allTextMode - If true, the columns will all be of the VARCHAR type
      Returns:
      A map of the column name and GoogleSheetsColumn column for every projected column.
    • getProjectedRanges

      public static List<GoogleSheetsColumnRange> getProjectedRanges(String sheetName, Map<String,GoogleSheetsColumn> columnMap)
    • isProjected

      public static boolean isProjected(List<SchemaPath> projectedColumns, String columnName)
      Returns true if the column is projected, false if not.
      Parameters:
      projectedColumns - A list of projected columns AKA the haystack.
      columnName - The column name AKA the needle
      Returns:
      True if the needle is in the haystack, false if not.
    • buildSchema

      public static TupleMetadata buildSchema(Map<String,GoogleSheetsColumn> columnMap)
      Builds a Drill Schema from a Map of GoogleSheetsColumns.
      Parameters:
      columnMap - A map of GoogleSheetsColumn containing the schema info.
      Returns:
      A populated TupleMetadata schema
    • inferDataType

      public static GoogleSheetsUtils.DATA_TYPES inferDataType(String data)
      Infers the datatype of an unknown string.
      Parameters:
      data - An input String of unknown type.
      Returns:
      The GoogleSheetsUtils.DATA_TYPES of the unknown string.
    • updateDataType

      public static void updateDataType(String columnName, Map<String,GoogleSheetsUtils.DATA_TYPES> dataTypesMap, String value)
    • addTabToGoogleSheet

      public static void addTabToGoogleSheet(com.google.api.services.sheets.v4.Sheets service, String sheetName, String tabName) throws IOException
      Adds a new tab to an existing GoogleSheet document.
      Parameters:
      service - An authenticated GoogleSheet service
      sheetName - The GoogleSheet name of the document
      tabName - The name of the tab you wish to add to the GoogleSheet document
      Throws:
      IOException - Throws an IOException if anything goes wrong.
    • removeTabFromGoogleSheet

      public static void removeTabFromGoogleSheet(com.google.api.services.sheets.v4.Sheets service, String fileToken, com.google.api.services.sheets.v4.model.Sheet deletedTab) throws IOException
      Removes a sheet from an existing GoogleSheets document. This method should only be used if the GoogleSheets document has more than one tab.
      Parameters:
      service - An authenticated GoogleSheet Sheets
      fileToken - The File token of the GoogleSheet containing the sheet to be deleted
      deletedTab - A Sheet which will be removed
      Throws:
      IOException - If anything goes wrong.
    • writeDataToGoogleSheet

      public static void writeDataToGoogleSheet(com.google.api.services.sheets.v4.Sheets service, String sheetID, String tabName, List<List<Object>> data) throws IOException
      Accepts a list of data and writes this data to a GoogleSheet document.
      Parameters:
      service - An authenticated GoogleSheet service
      sheetID - The SheetID. This can be obtained from the URL of the GoogleSheet Document
      tabName - The tab name within the aforementioned GoogleSheet
      data - A list of rows of the data to be inserted.
      Throws:
      IOException - If anything goes wrong, throw an IO exception
    • appendDataToGoogleSheet

      public static void appendDataToGoogleSheet(com.google.api.services.sheets.v4.Sheets service, String sheetID, String tabName, List<List<Object>> data) throws IOException
      Accepts a list of data and writes this data to a GoogleSheet document.
      Parameters:
      service - An authenticated GoogleSheet service
      sheetID - The SheetID. This can be obtained from the URL of the GoogleSheet Document
      tabName - The tab name within the aforementioned GoogleSheet
      data - A list of rows of the data to be inserted.
      Throws:
      IOException - If anything goes wrong, throw an IO exception