File-based schemas

Suppose that we’ve created several JSON schema files that we’d like to use to validate organization data. For this example, our schemas and our data are stored in a data directory, relative to the location of the Python scripts that follow.

The primary schema is an org schema, stored in data/org-schema.json:

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "https://example.com/org-schema",
    "type": "object",
    "properties": {
        "people": {
            "type": "array",
            "items": {
                "$ref": "https://example.com/person-schema"
            }
        }
    }
}

The org schema references a person schema, stored in data/person-schema.json:

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "https://example.com/person-schema",
    "type": "object",
    "properties": {
        "name": {
            "type": "string"
        }
    }
}

We’re going to use the org schema to validate our org data, which can be found in data/org-data.json:

{
    "people": [
        {"name": "Alice"},
        {"name": "Bob"}
    ]
}

There are several different ways to ensure that all our schemas are loaded and available as needed.

The first way is to load all of our schemas up front. In this case, when the "$ref" keyword is encountered in the org schema, the target (person) schema is found already cached in the catalog.

import pathlib
from jschon import create_catalog, JSON, JSONSchema

data_dir = pathlib.Path(__file__).parent / 'data'

catalog = create_catalog('2020-12')

person_schema = JSONSchema.loadf(data_dir / 'person-schema.json')
org_schema = JSONSchema.loadf(data_dir / 'org-schema.json')
org_data = JSON.loadf(data_dir / 'org-data.json')

result = org_schema.evaluate(org_data)
print(result.output('flag'))

Note that, when using this approach, the schemas must be loaded in "$ref" dependency order.

Another way is to set up a base URI-to-directory mapping on the catalog. In this case, when the "$ref" keyword is encountered in the org schema, the catalog knows where to find the person schema on disk, and loads it on the fly.

import pathlib
from jschon import create_catalog, JSON, JSONSchema, URI, LocalSource

data_dir = pathlib.Path(__file__).parent / 'data'

catalog = create_catalog('2020-12')
catalog.add_uri_source(URI('https://example.com/'), LocalSource(data_dir, suffix='.json'))

org_schema = JSONSchema.loadf(data_dir / 'org-schema.json')
org_data = JSON.loadf(data_dir / 'org-data.json')

result = org_schema.evaluate(org_data)
print(result.output('flag'))

Finally, yet another way is again to set up a base URI-to-directory mapping on the catalog, but this time we retrieve our primary schema from the catalog rather than loading it explicitly.

import pathlib
from jschon import create_catalog, JSON, URI, LocalSource

data_dir = pathlib.Path(__file__).parent / 'data'

catalog = create_catalog('2020-12')
catalog.add_uri_source(URI('https://example.com/'), LocalSource(data_dir, suffix='.json'))

org_schema = catalog.get_schema(URI('https://example.com/org-schema'))
org_data = JSON.loadf(data_dir / 'org-data.json')

result = org_schema.evaluate(org_data)
print(result.output('flag'))

This last approach is well-suited to schema re-use, in which JSON document evaluations are done independently with knowledge only of a schema’s URI. The schema is loaded and compiled the first time it is retrieved; thereafter, it is simply read from the cache.