Package org.apache.drill.exec.store.easy.json.extended
{ "$type": value }
. Supports both
V1 and
V2 names. Supports both the Canonical and Relaxed formats.
Does not support all types as some appear internal to Mongo. Supported types:
- <a href="https://docs.mongodb.com/manual/reference/mongodb-extended-json/#bson.Array> Array
-
Binary, translated to a Drill
VARBINARY
. The data must be encoded in the default Jackson Base64 format. ThesubType
field, if present, is ignored. -
Date, translated to a Drill
TIMESTAMP
. Drill's times are in the server local time. The UTC date in Mongo will be shifted to the local time zone on read. -
Decimal (V1), translated to a Drill
VARDECIMAL
. -
Decimal128 (V2), translated to a Drill
VARDECIMAL
, but limited to the supported DECIMAL range. -
Document which is translated to a Drill
MAP
. The map fields must be consistent across documents: same names and types. (This is a restriction of Maps in Drill's relational data model.) Field names cannot be the same as any of the extended type names. -
Double, translated to a Drill
FLOAT8
. -
Int64, translated to a Drill
BIGINT
. -
Int32, translated to a Drill
INT
. -
Object ID, translated to a Drill
VARCHAR
.
- MaxKey
- MinKey
- Regular Expression
- Data Ref (V1)
-
Timestamp. According to
this page:
The BSON timestamp type is for internal MongoDB use. For most cases, in application development, you will want to use the BSON date type.
-
Undefined (V1), since Drill has no untyped
NULL
value.
The unsupported types appear more for commands and queries rather than data. They do not represent a Drill type. If they appear in data, they will be translated to a Drill map.
Drill defines a few "extended extended" types:
- Date (
$dateDay
) - a date-only field in the formYYYY-MM-DD
which maps to a DrillDATE
vector. - Time (
$time
) - a time-only field in the formHH:MM:SS.SSS
which maps to a DrillTIME
vector. - Interval (
$interval
) - a date/time interval in ISO format which maps to a DrillINTERVAL
vector.
Drill extends the extended types to allow null values in the usual way. Drill accepts normal "un-extended" JSON in the same file, but doing so can lead to ambiguities (see below.)
Once Drill defines a field as an extended type, parsing rules are tighter than for normal "non-extended" types. For example an extended double will not convert from a Boolean or float value.
Provided Schema
If used with a provided schema, then:- If the first field is in canonical format (with a type), then the extended type must agree with the provided type, or an error will occur.
- If the first field is in relaxed format, or is
null
, then the provided schema will force the given type as though the data were in canonical format.
Ambiguities
Extended JSON is subject to the same ambiguities as normal JSON. If Drill sees a field in relaxed mode before extended mode, Drill will use its normal type inference rules. Thus, if the first field presents asa: "30"
, Drill will infer the
type as string, even if a later field presents as a: { "numberInt": 30 }
.
To avoid ambiguities, either use only the canonical format, or use a provided
schema.
Implementation
Extended types disabled by default and must be enabled using thestore.json.extended_types
system/session option (
ExecConstants.JSON_EXTENDED_TYPES_KEY
).
Extended types are implemented via a field factory. The field factory builds the
structure needed each time the JSON structure parser sees a new field. For extended types,
the field factory looks ahead to detect an extended type, specifically for the pattern
{ "$type":
. If the pattern is found, and the name is one of the supported
type names, then the factory creates a parser to accept the enhanced type in either the
canonical or relaxed forms.
Each field is represented by a Mongo-specific parser along with an associated value listener. The implementation does not reify the object structure; that structure is consumed by the field parser itself. The value listener receives value tokens as if the data were in relaxed format.
- See Also:
-
MapVectorOutput for an older implementation
-
ClassDescriptionNames of Mongo extended types.Parsers a binary.Parses a Mongo date in the V1 format:Parsers a Mongo extended type of the form: