Class StoragePluginRegistryImpl

java.lang.Object
org.apache.drill.exec.store.StoragePluginRegistryImpl
All Implemented Interfaces:
AutoCloseable, Iterable<Map.Entry<String,StoragePlugin>>, StoragePluginRegistry

public class StoragePluginRegistryImpl extends Object implements StoragePluginRegistry
Plugin registry. Caches plugin instances which correspond to configurations stored in persistent storage. Synchronizes the instances and storage.

Allows multiple "locators" to provide plugin classes such as the "classic" version for classes in the same class loader, the "system" version for system-defined plugins.

provides multiple layers of abstraction:

  • A plugin config/implementation pair (called a "connector" here) is located by
  • A connector locator, which also provides bootstrap plugins and can create a plugin instance from a configuration, which are cached in
  • The plugin cache, which holds stored, system and ad-hoc plugins. The stored plugins are backed by
  • A persistent store: the file system for tests and embedded, ZK for a distibuted server, or
  • An ephemeral cache for unnamed configs, such as those created by a table function.

The idea is to push most functionality into the above abstractions, leaving overall coordination here.

Plugins themselves have multiple levels of definitions:

  • The config and plugin classes, provided by the locator.
  • The ConnectorHandle which defines the config class and the locator which can create instances of that class.
  • A config instance which is typically deserialized from JSON independent of the implementation class.
  • A PluginHandle which pairs the config with a name as the unit that the user thinks of as a "plugin." The plugin entry links to the ConnectorEntry to create the instance lazily when first requested.
  • The plugin class instance, which provides long-term state and which provides the logic for the plugin.

Concurrency

Drill is a concurrent system; multiple users can attempt to add, remove and update plugin configurations at the same time. The only good solution would be to version the plugin configs. Instead, we rely on the fact that configs change infrequently.

The code syncs the in-memory cache with the persistent store on each access (which is actually inefficient and should be reviewed.)

During refresh, it could be that another thread is doing exactly the same thing, or even fighting us by changing the config. It is impossible to ensure a totally consistent answer. The goal is to make sure that the cache ends up agreeing with the persistent store as it was at some point in time.

The StoragePluginMap class provides in-memory synchronization of the name and config maps. Careful coding is needed when handling refresh since another thread could make the same changes.

Once the planner obtains a plugin, another user could come along and change the config for that plugin. Drill treats that change as another plugin: the original one continues to be used by the planner (but see below), while new queries use the new version.

Since the config on remote servers may have changed relative to the one this Foreman used for planning, the plan includes the plugin config itself (not just a reference to the config.) This works because the config is usually small.

Ephemeral Plugins

An ephemeral plugin handles table functions which create a temporary, unnamed configuration that is needed only for the execution of a single query, but which may be used across many threads. If the same table function is used multiple times, then the same ephemeral plugin will be used across queries. Ephemeral plugins are are based on the same connectors as stored plugins, but are not visible to the planner. They will expire after some time or number.

The ephemeral store also acts as a graveyard for deleted or changed plugins. When removing a plugin, the old plugin is moved to ephemeral storage to allow running queries to locate it. Similarly, when a new configuration is stored, the corresponding plugin is retrieved from ephemeral storage, if it exists. This avoids odd cases where the same plugin exists in both normal and ephemeral storage.

Caveats

The main problem with synchronization at present is that plugins provide a close() method that, if used, could render the plugin unusable. Suppose a Cassandra plugin, say, maintains a connection to a server used across multiple queries and threads. Any change to the config immediately calls close() on the plugin, even though it may be in use in planning a query on another thread. Random failures will result.

The same issue can affect ephemeral plugins: if the number in the cache reaches the limit, the registry will start closing old ones, without knowning if that plugin is actually in use.

The workaround is to not actually honor the close() call. Longer term, a reference count is needed.

Error Handling

Error handling needs review. Those problems that result from user actions should be raised as a UserException. Those that violate invariants as other forms of exception.