JSON data types are for storing JSON (JavaScript Object Notation)
data, as specified in
RFC
7159. Such data can also be stored as text , but
- both JSON data types have the advantage of enforcing that each
- stored value is a valid JSON value. There are also related support
- functions available; see .
+ the JSON data types have the advantage of enforcing that each
+ stored value is valid according to the JSON rules. There are also
+ assorted JSON-specific functions available for data stored in these
+ data types; see .
There are two JSON data types: json> and jsonb>.
- Both accept almost > identical sets of values as
+ They accept almost > identical sets of values as
input. The major practical difference is one of efficiency. The
json> data type stores an exact copy of the input text,
- which processing functions must continually reparse, while
+ which processing functions must reparse on each execution; while
jsonb> data is stored in a decomposed binary format that
- makes it slightly less efficient to input due to added serializat ion
+ makes it slightly slower to input due to added convers ion
overhead, but significantly faster to process, since it never needs
- reparsing. jsonb> also supports advanced
-
GIN indexing, which is a further significant
- advantage.
+ reparsing. jsonb> also supports indexing, which can be a
+ significant advantage.
- The other difference between the types is that the json>
- type is guaranteed to contain an exact copy of the input, including
- preservation of semantically insignificant white space, and the
- order of keys within JSON objects (although jsonb> will
- preserve trailing zeros within a JSON number). Also, because the
- exact text is kept, if a JSON object within the value contains the
- same key more than once, and has been stored using the json>
- type, all the key/value pairs are kept. In that case, the
- processing functions consider the last value as the operative one.
- By contrast, jsonb> does not preserve white space, does not
- preserve the order of object keys, and does not keep duplicate
- object keys. Only the last value for a key specified in the input
- is kept.
+ Because the json> type stores an exact copy of the input text, it
+ will preserve semantically-insignificant white space between tokens, as
+ well as the order of keys within JSON objects. Also, if a JSON object
+ within the value contains the same key more than once, all the key/value
+ pairs are kept. (The processing functions consider the last value as the
+ operative one.) By contrast, jsonb> does not preserve white
+ space, does not preserve the order of object keys, and does not keep
+ duplicate object keys. Only the last value for a key specified in the
+ input is kept. jsonb> will preserve trailing zeros within a JSON
+ number, even though those are semantically insignificant for purposes such
+ as equality checks.
- In general, most applications will prefer to store JSON data as
- jsonb>, unless there are quite specialized needs.
+ In general, most applications should prefer to store JSON data as
+ jsonb>, unless there are quite specialized needs, such as
+ legacy assumptions about ordering of object keys.
-
PostgreSQL allows only one
server
+
PostgreSQL allows only one
character set
encoding per database. It is therefore not possible for the JSON
- types to conform rigidly to the specification unless the server
+ types to conform rigidly to the JSON specification unless the database
encoding is UTF-8. Attempts to directly include characters which
- cannot be represented in the server encoding will fail; conversely,
- characters which can be represented in the server encoding but not
+ cannot be represented in the database encoding will fail; conversely,
+ characters which can be represented in the database encoding but not
in UTF-8 will be allowed. \uXXXX escapes are
- allowed regardless of the server encoding, and are checked only for
+ allowed regardless of the database encoding, and are checked only for
syntactic correctness.
Mapping of RFC-7159/JSON Primitive Types to PostgreSQL Types
-
Mapping of type correspondence, not es
+
JSON scalar types and corresponding PostgreSQL typ es
|
RFC-7159/JSON primitive type
Notes
|
- text>
string>
- See general introductory notes on encoding and JSON
+ text>
+ See introductory notes on JSON and encoding
|
- numeric>
number>
+ numeric>
NaN and infinity values are disallowed
|
boolean>
boolean>
- Only lowercase true and false value s are accepted
+ Only lowercase true and false spelling s are accepted
|
- unknown>
null>
- SQL NULL is orthogonal. NULL semantics do not apply.
+ (none)
+ SQL NULL is a different concept
- Primitive types described by
RFC> 7159 are effectively
- internally mapped onto native
-
PostgreSQL types. Therefore, there are
+ When converting textual JSON input into jsonb>,
+ the primitive types described by
RFC> 7159 are effectively
+ mapped onto native
+
PostgreSQL types, as shown in
+ . Therefore, there are
some very minor additional constraints on what constitutes valid
jsonb that do not apply to the json
- type, or to JSON in the abstract, that pertain to limits on what
- can be represented by the underlying type system. These
+ type, nor to JSON in the abstract, corresponding to limits on what
+ can be represented by the underlying data type. Specifically,
+ jsonb> will reject numbers that are outside the range of
+ the
PostgreSQL numeric> data type,
+ while json> will not. Such
implementation-defined restrictions are permitted by
-
RFC> 7159. However, in practice problems are far more
- likely to occur in other implementations which internally
+
RFC> 7159. However, in practice such problems are far more
+ likely to occur in other implementations, as it is common to
represent the number> JSON primitive type as IEEE 754
- double precision floating point
values, which
RFC> 7159
- explicitly anticipates and allows for. When using JSON as an
+ double precision floating point
( which
RFC> 7159
+ explicitly anticipates and allows for) . When using JSON as an
interchange format with such systems, the danger of losing numeric
- precision in respect of data originally stored by
+ precision compared to data originally stored by
PostgreSQL should be considered.
+
- Conversely, as noted abov e there are some minor restrictions on
+ Conversely, as noted in the tabl e there are some minor restrictions on
the input format of JSON primitive types that do not apply to
- corresponding
PostgreSQL types.
+ the corresponding
PostgreSQL types.
+
+
+
+
+
jsonb> Input and Output Syntax
+ The input/output syntax for the JSON data types is as specified in
+ The following are all valid json> (or jsonb>) expressions:
+-- Simple scalar/primitive value (explicitly required by RFC-7159)
+SELECT '5'::json;
+-- Array of heterogeneous, primitive-typed elements
+SELECT '[1, 2, "foo", null]'::json;
+
+-- Object of heterogeneous key/value pairs of primitive types
+-- Note that key values are always strings
+SELECT '{"bar": "baz", "balance": 7.77, "active":false}'::json;
+
+
+ Note the distinction between scalar/primitive values as array elements,
+ keys and values.
+
summarize a set of documents> (datums) in a table.
- jsonb > data is subject to the same concurrency control
+ json> data is subject to the same concurrency control
considerations as any other datatype when stored in a table.
Although storing large documents is practicable, in order to ensure
correct behavior row-level locks are, quite naturally, acquired as
- rows are updated. Consider keeping jsonb > documents at a
+ rows are updated. Consider keeping json> documents at a
manageable size in order to decrease lock contention among updating
- transactions. Ideally, jsonb > documents should each
+ transactions. Ideally, json> documents should each
represent an atomic datum that business rules dictate cannot
reasonably be further subdivided into smaller atomic datums that
can be independently modified.
-
-
jsonb> Input and Output Syntax
- In effect, jsonb> has an internal type system whose
- implementation is defined in terms of several particular ordinary
-
PostgreSQL types. The SQL parser does
- not have direct knowledge of the internal types that constitute a
- jsonb>.
-
- The following are all valid jsonb> expressions:
--- Simple scalar/primitive value (explicitly required by RFC-7159)
-SELECT '5'::jsonb;
--- Array of heterogeneous, primitive-typed elements
-SELECT '[1, 2, "foo", null]'::jsonb;
-
--- Object of heterogeneous key/value pairs of primitive types
--- Note that key values are always strings
-SELECT '{"bar": "baz", "balance": 7.77, "active":false}'::jsonb;
-
-
- Note the distinction between scalar/primitive values as elements,
- keys and values.
-
-
jsonb> containment
technically, top-down, unordered subtree isomorphism>
may be tested. Containment is conventionally tested using the
@>> operator, which is made indexable by various
- operator classes discussed later in this section .
+ operator classes discussed below .
-- Simple scalar/primitive values may contain only each other:
The various containment operators, along with all other JSON
- operators and support functions are documented fully within
- linkend="functions-json">,
- linkend="functions-jsonb-op-table">.
+ operators and support functions are documented in
+ linkend="functions-json">.
+
-
jsonb> GIN Indexing
+
jsonb> Indexing
indexes on
+
- jsonb> GIN indexes can be used to efficiently search among
- more than one possible key/value pair within a single
- jsonb> datum/document, among a large number of such
- documents within a column in a table (i.e. among many rows).
+ jsonb> GIN indexes can be used to efficiently search for
+ keys or key/value pairs occurring within a large number of
+ jsonb> documents (datums).
+ Two GIN operator classes> are provided, offering different
+ performance and flexibility tradeoffs.
- jsonb> has GIN index support for the @>>,
- ?>, ?&> and ?|> operators.
- The default GIN operator class makes all these operators
- indexable:
-
+ The default GIN operator class supports queries with the
+ @>>, ?>, ?&> and ?|>
+ operators.
+ (For details of the semantics that these operators
+ implement, see .)
+ An example of creating an index with this operator class is:
--- GIN index (default opclass)
-CREATE INDEX idxgin ON api USING GIN (jdoc);
-
--- GIN jsonb_hash_ops index
-CREATE INDEX idxginh ON api USING GIN (jdoc jsonb_hash_ops);
+CREATE INDEX idxgin ON api USING gin (jdoc);
The non-default GIN operator class jsonb_hash_ops>
supports indexing the @>> operator only.
+ An example of creating an index with this operator class is:
+CREATE INDEX idxginh ON api USING gin (jdoc jsonb_hash_ops);
+
+
Consider the example of a table that stores JSON documents
retrieved from a third-party web service, with a documented schema
- definition. An example of a document retrieved from this web
- service is as follows:
+ definition. A typical document is:
{
"guid": "9c36adc1-7fb5-4d5b-83b4-90356a46061a",
]
}
- If a GIN index is created on the table that stores these
- documents, api , on its jdoc>
- jsonb> column, we can expect that queries like the
- following may make use of the index:
+ We store these documents in a table named api>,
+ in a jsonb> column named jdoc>.
+ If a GIN index is created on this column,
+ queries like the following can make use of the index:
-- Note that both key and value have been specified
-SELECT jdoc->'guid', jdoc-> 'name' FROM api WHERE jdoc @> '{"company": "Magnafone"}';
+SELECT jdoc->'guid', jdoc-> 'name' FROM api WHERE jdoc @> '{"company": "Magnafone"}';
However, the index could not be used for queries like the
- following, due to the aforementioned nesting restriction:
+ following, because though the operator ?> is indexable,
+ it is not applied directly to the indexed column jdoc>:
-SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc -> 'tags' ? 'qui';
+SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc -> 'tags' ? 'qui';
- Still, with judicious use of expressional indexing , the above
+ Still, with judicious use of expression indexes , the above
query can use an index scan. If there is a requirement to find
those records with a particular tag quickly, and the tags have a
high cardinality across all documents, defining an index as
follows is an effective approach to indexing:
--- Note that the "jsonb -> text" operator can only be called on an
--- object, so as a consequence of creating this index the root "jdoc"
--- datum must be an object. This is enforced during insertion.
-CREATE INDEX idxgin ON api USING GIN ((jdoc -> 'tags'));
+-- Note that the "jsonb -> text" operator can only be called on an
+-- object, so as a consequence of creating this index the root of each
+-- "jdoc" value must be an object. This is enforced during insertion.
+CREATE INDEX idxgintags ON api USING gin ((jdoc -> 'tags'));
+ Now, the WHERE> clause jdoc -> 'tags' ? 'qui'>
+ will be recognized as an application of the indexable
+ operator ?> to the indexed
+ expression jdoc -> 'tags'>.
+ (More information on expression indexes can be found in
+ linkend="indexes-expressional">.)
- Expressional indexes are discussed in
- linkend="indexes-expressional">.
-
- For the most flexible approach in terms of what may be indexed,
- sophisticated querying on nested structures is possible by
- exploiting containment. At the cost of having to create an index
- on the entire structure for each row, and not just a nested
- subset, we may exploit containment semantics to get an equivalent
- result with a non-expressional index on the entire jdoc>
- column, without> ever having to create additional
- expressional indexes against the document (provided only
- containment will be tested). While the index will be considerably
- larger than our expression index, it will also be much more
- flexible, allowing arbitrary structured searching. Such an index
- can generally be expected to help with a query like the following:
-
+ Another approach to querying is to exploit containment, for example:
-SELECT jdoc->'guid', jdoc-> 'name' FROM api WHERE jdoc @> '{"tags": ["qui"]}';
+SELECT jdoc->'guid', jdoc-> 'name' FROM api WHERE jdoc @> '{"tags": ["qui"]}';
- For full details of the semantics that these indexable operators
- implement, see
,
- linkend="functions-jsonb-op-table">.
-
-
-
-
jsonb> non-default GIN operator class
-
- indexes on
-
- Although only the @>> operator is made indexable, a
- jsonb_hash_ops operator class GIN index has
- some notable advantages over an equivalent GIN index of the
- default GIN operator class for jsonb . Search
- operations typically perform considerably better, and the on-disk
- size of a jsonb_hash_ops operator class GIN
- index can be much smaller.
+ This approach uses a single GIN index covering everything in the
+ jdoc> column, whereas our expression index stored only
+ data found under the tags> key. While the single-index
+ approach is certainly more flexible, targeted expression indexes
+ are likely to be smaller and faster to search than a single index.
-
-
-
jsonb> B-Tree and hash indexing
+
- jsonb comparisons and related operations are
- type-wise>, in that the underlying
-
PostgreSQL datatype comparators are
- invoked recursively, much like a traditional composite type.
+ Although the jsonb_hash_ops operator class supports
+ only queries with the @>> operator, it has notable
+ performance advantages over the default operator
+ class jsonb_ops . A jsonb_hash_ops
+ GIN index is usually much smaller than a jsonb_ops
+ index over the same data, and the specificity of searches is better,
+ particularly when queries contain tags that appear frequently in the
+ data. Therefore search operations typically perform considerably better
+ than with the default operator class.
+
- jsonb> also supports btree> and hash>
- indexes. Ordering between jsonb> datums is:
+ jsonb> also supports btree> and hash>
+ indexes. These are usually useful only if it's important to check
+ equality of complete JSON documents.
+ The btree> ordering for jsonb> datums is:
Object > Array > Boolean > Number > String > Null
Array with n elements > array with n - 1 elements
- Subsequently, individual primitive type comparators are invoked.
- All comparisons of JSON primitive types occurs using the same
- comparison rules as the underlying
-
PostgreSQL types. Strings are
- compared lexically, using the default database collation.
- Objects with equal numbers of pairs are compared:
+ Objects with equal numbers of pairs are compared in the order:
key-1 , value-1 , key-2 ...
- Note however that object keys are compared in their storage order, and in particular,
- since shorter keys are stored before longer keys, this can lead to results that might be
- unintuitive, such as:
-
{ "aa": 1, "c": 1} > {"b": 1, "d": 1}
+ Note however that object keys are compared in their storage order, and
+ in particular, since shorter keys are stored before longer keys, this
+ can lead to results that might be unintuitive, such as:
+{ "aa": 1, "c": 1} > {"b": 1, "d": 1}
+
Similarly, arrays with equal numbers of elements are compared:
element-1 , element-2 ...
+ Primitive JSON values are compared using the same
+ comparison rules as for the underlying
+
PostgreSQL data type. Strings are
+ compared using the default database collation.
Allow
JSON> values to be
- linkend="functions-json-table ">converted into records
+ linkend="functions-json">converted into records
(Andrew Dunstan)
- Add -table">functions to convert
+ Add functions to convert
scalars, records, and hstore> values to JSON> (Andrew
Dunstan)