org.apache.drill.exec.store.easy.json.parser.AbstractElementParser

org.apache.drill.exec.store.easy.json.parser.ObjectParser

All Implemented Interfaces:: ElementParser

Direct Known Subclasses:: TupleParser

public abstract class ObjectParser extends AbstractElementParser

Parses a JSON object: { name : value ... }

The object value may the root object (the row), a top-level field or may be the element of an array. The event methods are called when an object is started and ended, as well as when a new field is discovered.

Creates a map of known fields. Each time a field is parsed, looks up the field in the map. If not found, the value is "sniffed" to determine its type, and a matching parser and listener created. Thereafter, the previous parser is reused.

The object listener provides semantics. One key decision is whether to project a field or not. An unprojected field is parsed with a "dummy" parser that "free-wheels" over all valid JSON structures. Otherwise, the listener is given whatever type information that the parser can discover when creating the field.

Work is divided between this class, which discovers fields, and the listeners which determine the meaning of field values. A field, via a properly-defined listener, can accept one or more different value kinds.

The parser accepts JSON tokens as they appear in the file. The question of whether those tokens make sense is left to the listeners. The listeners decide if the tokens make sense for a particular column. The listener should provide a clear error if a particular token is not valid for a given listener.

Fields

The structure of an object is:

ObjectListener which represents the object (tuple) as a whole. Each field, indexed by name, is represented as a
ValueListener which represents the value "slot". That value can be scalar, or can be structured, in which case the value listener contains either a
ArrayListener for an array, or a
ObjectListener for a nested object (tuple).

Nulls

Null values are handled at the semantic, not syntax level. If the first appearance of a field contains a null value, then the parser can provide no hints about the expected field type. The listener must implement a solution such as referring to a schema, waiting for a non-null value to appear, etc.

Since the parser classes handle syntax, they are blissfully ignorant of any fancy logic needed for null handling. Each field is represented by a field parser whether that field is null or not. It is the listener that may have to swap out one mechanism for another as types are discovered.

Complex Types

Parsers handle arrays and objects using a two-level system. Each field always is driven by a field parser. If the field is discovered to be an array, then we add an array parser to the field parser to handle array contents. The same is true of objects.

Both objects and arrays are collections of values, and a value can optionally contain an array or object. (JSON allows any given field name to map to both objects and arrays in different rows. The parser structure reflects this syntax. The listeners can enforce more relational-like semantics).

If an array is single-dimension, then the field parse contains an array parser which contains another value parser for the array contents. If the array is multi-dimensional, there will be multiple array/value parser pairs: one for each dimension.

Field Summary

Fields

Modifier and Type

Field

Description

protected static final org.slf4j.Logger

logger
Constructor Summary

Constructors

Constructor

Description

ObjectParser(JsonStructureParser structParser)
Method Summary

Modifier and Type

Method

Description

ElementParser

fieldParser(String key)

protected void

onEnd()

Called at the end of a set of values for an object.

protected abstract ElementParser

onField(String key, TokenIterator tokenizer)

The structure parser has just encountered a new field for this object.

protected void

onStart()

Called at the start of a set of values for an object.

void

parse(TokenIterator tokenizer)

Parses { ^ ...
ElementParser replaceFieldParser(String key, ElementParser fieldParser)
Methods inherited from class org.apache.drill.exec.store.easy.json.parser.AbstractElementParser errorFactory, structParser Methods inherited from class java.lang.Object clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait







Field Details



logger
protected static final org.slf4j.Logger logger








Constructor Details



ObjectParser
public ObjectParser(JsonStructureParser structParser)








Method Details



fieldParser
public ElementParser fieldParser(String key)




onStart
protected void onStart()
Called at the start of a set of values for an object. That is, called
 when the structure parser accepts the { token.




onField
protected abstract ElementParser onField(String key,
 TokenIterator tokenizer)
The structure parser has just encountered a new field for this
 object. This method returns a parser for the field, along with
 an optional listener to handle events within the field. The field typically
 uses a value parser create by the FieldParserFactory class.
 However, special cases (such as Mongo extended types) can create a
 custom parser.
 
 If the field is not projected, the method should return a dummy parser
 from FieldParserFactory.ignoredFieldParser().
 The dummy parser will "free-wheel" over whatever values the
 field contains. (This is one way to avoid structure errors in a JSON file:
 just ignore them.) Otherwise, the parser will look ahead to guess the
 field type and will call one of the "add" methods, each of which should
 return a value listener for the field itself.
 

 A normal field will respond to the structure of the JSON file as it
 appears. The associated value listener receives events for the
 field value. The value listener may be asked to create additional
 structure, such as arrays or nested objects.
 

 Parse position: { ... field : ^ ? for a newly-seen field.
 Constructs a value parser and its listeners by looking ahead
 some number of tokens to "sniff" the type of the value. For
 example:
 

 foo: <value> - Field value
 foo: [ <value> ] - 1D array value
 foo: [ [<value> ] ] - 2D array value
 Etc.
 
 
 There are two cases in which no type estimation is possible:
 

 foo: null
 foo: []
 

Parameters:
key - name of the field
tokenizer - an instance of a token iterator
Returns:
a parser for the newly-created field





onEnd
protected void onEnd()
Called at the end of a set of values for an object. That is, called
 when the structure parser accepts the } token.




parse
public void parse(TokenIterator tokenizer)
Parses { ^ ... }

Parameters:
tokenizer - an instance of a token iterator





replaceFieldParser
public ElementParser replaceFieldParser(String key,
 ElementParser fieldParser)

Class ObjectParser

Fields

Nulls

Complex Types

Field Summary

Constructor Summary

Method Summary

Methods inherited from class org.apache.drill.exec.store.easy.json.parser.AbstractElementParser

Methods inherited from class java.lang.Object

Field Details

logger

Constructor Details

ObjectParser

Method Details

fieldParser

onStart

onField

onEnd

parse

replaceFieldParser