Class ObjectParser
- All Implemented Interfaces:
ElementParser
- Direct Known Subclasses:
TupleParser
{ name : value ... }
The object value may the root object (the row), a top-level field or may be the element of an array. The event methods are called when an object is started and ended, as well as when a new field is discovered.
Creates a map of known fields. Each time a field is parsed, looks up the field in the map. If not found, the value is "sniffed" to determine its type, and a matching parser and listener created. Thereafter, the previous parser is reused.
The object listener provides semantics. One key decision is whether to project a field or not. An unprojected field is parsed with a "dummy" parser that "free-wheels" over all valid JSON structures. Otherwise, the listener is given whatever type information that the parser can discover when creating the field.
Work is divided between this class, which discovers fields, and the listeners which determine the meaning of field values. A field, via a properly-defined listener, can accept one or more different value kinds.
The parser accepts JSON tokens as they appear in the file. The question of whether those tokens make sense is left to the listeners. The listeners decide if the tokens make sense for a particular column. The listener should provide a clear error if a particular token is not valid for a given listener.
Fields
The structure of an object is:ObjectListener
which represents the object (tuple) as a whole. Each field, indexed by name, is represented as aValueListener
which represents the value "slot". That value can be scalar, or can be structured, in which case the value listener contains either aArrayListener
for an array, or aObjectListener
for a nested object (tuple).
Nulls
Null values are handled at the semantic, not syntax level. If the first appearance of a field contains a null value, then the parser can provide no hints about the expected field type. The listener must implement a solution such as referring to a schema, waiting for a non-null value to appear, etc.Since the parser classes handle syntax, they are blissfully ignorant of any fancy logic needed for null handling. Each field is represented by a field parser whether that field is null or not. It is the listener that may have to swap out one mechanism for another as types are discovered.
Complex Types
Parsers handle arrays and objects using a two-level system. Each field always is driven by a field parser. If the field is discovered to be an array, then we add an array parser to the field parser to handle array contents. The same is true of objects.Both objects and arrays are collections of values, and a value can optionally contain an array or object. (JSON allows any given field name to map to both objects and arrays in different rows. The parser structure reflects this syntax. The listeners can enforce more relational-like semantics).
If an array is single-dimension, then the field parse contains an array parser which contains another value parser for the array contents. If the array is multi-dimensional, there will be multiple array/value parser pairs: one for each dimension.
-
Field Summary
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionfieldParser
(String key) protected void
onEnd()
Called at the end of a set of values for an object.protected abstract ElementParser
onField
(String key, TokenIterator tokenizer) The structure parser has just encountered a new field for this object.protected void
onStart()
Called at the start of a set of values for an object.void
parse
(TokenIterator tokenizer) Parses{ ^ ...
replaceFieldParser
(String key, ElementParser fieldParser) Methods inherited from class org.apache.drill.exec.store.easy.json.parser.AbstractElementParser
errorFactory, structParser
-
Field Details
-
logger
protected static final org.slf4j.Logger logger
-
-
Constructor Details
-
ObjectParser
-
-
Method Details
-
fieldParser
-
onStart
protected void onStart()Called at the start of a set of values for an object. That is, called when the structure parser accepts the{
token. -
onField
The structure parser has just encountered a new field for this object. This method returns a parser for the field, along with an optional listener to handle events within the field. The field typically uses a value parser create by theFieldParserFactory
class. However, special cases (such as Mongo extended types) can create a custom parser.If the field is not projected, the method should return a dummy parser from
FieldParserFactory.ignoredFieldParser()
. The dummy parser will "free-wheel" over whatever values the field contains. (This is one way to avoid structure errors in a JSON file: just ignore them.) Otherwise, the parser will look ahead to guess the field type and will call one of the "add" methods, each of which should return a value listener for the field itself.A normal field will respond to the structure of the JSON file as it appears. The associated value listener receives events for the field value. The value listener may be asked to create additional structure, such as arrays or nested objects.
Parse position:
{ ... field : ^ ?
for a newly-seen field. Constructs a value parser and its listeners by looking ahead some number of tokens to "sniff" the type of the value. For example:foo: <value>
- Field valuefoo: [ <value> ]
- 1D array valuefoo: [ [<value> ] ]
- 2D array value- Etc.
There are two cases in which no type estimation is possible:
foo: null
foo: []
- Parameters:
key
- name of the fieldtokenizer
- an instance of a token iterator- Returns:
- a parser for the newly-created field
-
onEnd
protected void onEnd()Called at the end of a set of values for an object. That is, called when the structure parser accepts the}
token. -
parse
Parses{ ^ ... }
- Parameters:
tokenizer
- an instance of a token iterator
-
replaceFieldParser
-