Class ObjectParser

java.lang.Object
org.apache.drill.exec.store.easy.json.parser.AbstractElementParser
org.apache.drill.exec.store.easy.json.parser.ObjectParser
All Implemented Interfaces:
ElementParser
Direct Known Subclasses:
TupleParser

public abstract class ObjectParser extends AbstractElementParser
Parses a JSON object: { name : value ... }

The object value may the root object (the row), a top-level field or may be the element of an array. The event methods are called when an object is started and ended, as well as when a new field is discovered.

Creates a map of known fields. Each time a field is parsed, looks up the field in the map. If not found, the value is "sniffed" to determine its type, and a matching parser and listener created. Thereafter, the previous parser is reused.

The object listener provides semantics. One key decision is whether to project a field or not. An unprojected field is parsed with a "dummy" parser that "free-wheels" over all valid JSON structures. Otherwise, the listener is given whatever type information that the parser can discover when creating the field.

Work is divided between this class, which discovers fields, and the listeners which determine the meaning of field values. A field, via a properly-defined listener, can accept one or more different value kinds.

The parser accepts JSON tokens as they appear in the file. The question of whether those tokens make sense is left to the listeners. The listeners decide if the tokens make sense for a particular column. The listener should provide a clear error if a particular token is not valid for a given listener.

Fields

The structure of an object is:
  • ObjectListener which represents the object (tuple) as a whole. Each field, indexed by name, is represented as a
  • ValueListener which represents the value "slot". That value can be scalar, or can be structured, in which case the value listener contains either a
  • ArrayListener for an array, or a
  • ObjectListener for a nested object (tuple).

Nulls

Null values are handled at the semantic, not syntax level. If the first appearance of a field contains a null value, then the parser can provide no hints about the expected field type. The listener must implement a solution such as referring to a schema, waiting for a non-null value to appear, etc.

Since the parser classes handle syntax, they are blissfully ignorant of any fancy logic needed for null handling. Each field is represented by a field parser whether that field is null or not. It is the listener that may have to swap out one mechanism for another as types are discovered.

Complex Types

Parsers handle arrays and objects using a two-level system. Each field always is driven by a field parser. If the field is discovered to be an array, then we add an array parser to the field parser to handle array contents. The same is true of objects.

Both objects and arrays are collections of values, and a value can optionally contain an array or object. (JSON allows any given field name to map to both objects and arrays in different rows. The parser structure reflects this syntax. The listeners can enforce more relational-like semantics).

If an array is single-dimension, then the field parse contains an array parser which contains another value parser for the array contents. If the array is multi-dimensional, there will be multiple array/value parser pairs: one for each dimension.

  • Field Details

    • logger

      protected static final org.slf4j.Logger logger
  • Constructor Details

  • Method Details

    • fieldParser

      public ElementParser fieldParser(String key)
    • onStart

      protected void onStart()
      Called at the start of a set of values for an object. That is, called when the structure parser accepts the { token.
    • onField

      protected abstract ElementParser onField(String key, TokenIterator tokenizer)
      The structure parser has just encountered a new field for this object. This method returns a parser for the field, along with an optional listener to handle events within the field. The field typically uses a value parser create by the FieldParserFactory class. However, special cases (such as Mongo extended types) can create a custom parser.

      If the field is not projected, the method should return a dummy parser from FieldParserFactory.ignoredFieldParser(). The dummy parser will "free-wheel" over whatever values the field contains. (This is one way to avoid structure errors in a JSON file: just ignore them.) Otherwise, the parser will look ahead to guess the field type and will call one of the "add" methods, each of which should return a value listener for the field itself.

      A normal field will respond to the structure of the JSON file as it appears. The associated value listener receives events for the field value. The value listener may be asked to create additional structure, such as arrays or nested objects.

      Parse position: { ... field : ^ ? for a newly-seen field. Constructs a value parser and its listeners by looking ahead some number of tokens to "sniff" the type of the value. For example:

      • foo: <value> - Field value
      • foo: [ <value> ] - 1D array value
      • foo: [ [<value> ] ] - 2D array value
      • Etc.

      There are two cases in which no type estimation is possible:

      • foo: null
      • foo: []
      Parameters:
      key - name of the field
      tokenizer - an instance of a token iterator
      Returns:
      a parser for the newly-created field
    • onEnd

      protected void onEnd()
      Called at the end of a set of values for an object. That is, called when the structure parser accepts the } token.
    • parse

      public void parse(TokenIterator tokenizer)
      Parses { ^ ... }
      Parameters:
      tokenizer - an instance of a token iterator
    • replaceFieldParser

      public ElementParser replaceFieldParser(String key, ElementParser fieldParser)