4.1 Support access to schema documents outside of SML-IF 4
4.2 Ignorable schema locations 4
4.3 Include definition and instance documents as-is 4
4.4 Lazy schema assembly 4
4.5 Support reference constraints 4
5 Interoperability Approach 5
6 Schema binding proposal 5
6.1 An Example 5
6.2 Solution to the Example 6
6.3 Default Schema 7
6.4 Formal Proposal 8
6.5 Proposal Analysis 9
7 Implementation Cost 10
8 Acknowledgement 10
Schema document: an element; can be an XML fragment
Schema: a set of schema components; a schema is normally (but not required to be) constructed from one or more schema documents
Schema component: an element declaration or a type definition or a particle or …
Include: A schema document can include another schema document using . Both schema documents contribute to the same schema; and both correspond to schema components from the same target namespace (or no namespace). If the included schema document does not have a target namespace, namespace of the including schema document is used.
Redefine: Similar to include, but use , and the redefining schema document can replace certain included components with new components.
Import: Allows the importing schema document to refer to components from the imported namespace (or no namespace), which must be different from the importing schema document’s target namespace. If the combination of the “namespace” attribute and the “schemaLocation” attribute on resolves to a schema document, then the resulting schema also includes components from the imported schema document.
Schema composition: (In this document) construct a single schema from multiple schema documents, using the above include, redefine and/or import mechanisms.
Note: “a schema” is not equal to “a schema document”!
In performing SML model validation over the SML model packaged in an SML-IF instance, associations between XML Schema definition documents and instance documents need to be drawn, both to completely validate XML Schema documents themselves (to make sure they produce valid schemas) and to establish schema-validity of the instance documents.
Schema documents can be connected with other schema documents using composition features provided by XML Schema. This includes , , and . A schema document’s validity may depend on other schema documents it includes/redefines/imports, or even other schema documents that include/redefine/import it.
When validating a model instance document, a precise list of schema documents need to be associated with it for a “schema” and the instance document is schema-assessed using this schema.
The XML Schema 1.0 specification provides more flexibility in constructing the schema used for assessment than is appropriate for the semantics defined by SML and SML-IF validation:
It allows processor latitude in terms of locating schema documents (resolving namespace and schema location attributes) and composing schema documents together to form a single schema.
Schema location attributes can be ignored in some cases (“xsi:schemaLocation” in instance documents and “schemaLocation” on ); and allowed to “fail to resolve” in others (“schemaLocation” attribute on and ). Known schema and SML implementations behave differently with respect to how/whether they process schema location attributes.
Multiple imports of the same namespace allow all but the first one to be ignored.
So it is clear that we have no hope of guaranteeing general case interoperability using anything based only on XML Schema given the constraints above, and SML-IF needs to specify how to determine such associations.
NOTE: this proposal is only about SML model validation, and not SML-IF validation (against the IF schema). Unless otherwise indicated, “validation/validity” in the following sections is always about SML model validation.
3.1Support schema composition
There are many real-life schemas that are constructed from multiple schema documents. Such schemas may span multiple namespaces (hence the need for import); components from each namespace may be further divided into multiple schema documents (hence the need for include).
Schema has a feature often referred to as “chameleon include”. This means that a schema document with a target namespace includes or redefines another schema document without a target namespace, and the result is as if the included/redefined document had a target namespace that’s the same as the including/redefining document. SML-IF needs to support this usage scenario.
3.2Support schema versioning
Schema authors can’t anticipate how their schemas will be used, hence the need to evolve schemas. There are different versioning scenarios. There are cases where minor modifications of older versions suffice, and redefine can be used. Some schemas need to be rewritten to accommodate new requirements, and new namespace may or may not be introduced (compatibility is often a good reason for not changing namespaces). There are also cases where there are generic and specific versions (as opposed to previous and next versions), which often co-exist and share the same namespace.
To support this, SML-IF needs to be able to package in the same SML-IF instance different versions of the same schema in the same namespace.
For a given SML-IF instance, there MUST be no ambiguity in determining how schema documents (that are included in this instance) are connected using , , and , and therefore MUST be no ambiguity in determining which schema documents are used to form a schema against which a given instance document is validated.
3.4Full schema support
Being a generic validation language, SML supports all schema features. Being a mechanism to transmit SML models, SML-IF also needs to support full schema features, especially , , and . For example, in an SML model, if an instance document I is validated against a schema formed from a schema document A, which redefines schema document B, then it MUST be possible to transmit I, A, and B in an SML-IF instance and maintain their relationship.
3.5Schema document exchange
An SML-IF document can contain XML Schema documents within its definition documents that are attached for exchange purposes only. These documents are not intended to be used for XML Schema validity assessment of the model instance documents. SML-IF needs to support this use case and ensure that documents of this purpose do not participate in model instance document validation.
This is analogous to the case we already have for rule documents, except rule documents do not have a “bind to all” default as we are contemplating for XML Schema documents. Any new types of definition documents added in the future will have to address similar concerns, whose syntax will be influenced by the default binding (all or none).
Note that for both schema documents and Schematron rule documents that are not bound to any instances, their validity should still be checked when assessing SML model validity, as required by SML, which has:
Each XML Schema document in the model's definition documents MUST satisfy the conditions expressed in Errors in Schema Construction and Structure (§5.1). [XML Schema Structures]
Each Schematron document in the model's definition documents MUST be a valid Schematron document [ISO/IEC 19757-3]
4.1Support access to schema documents outside of SML-IF
We do not want to force all schemas necessary to validate the model instance documents packaged by a single SML-IF instance to be included by value in every SML-IF instance. It is not clear this would even be sensible in a repository interchange scenario, let alone the more general case of usage scenarios some have mentioned for SML-IF like web services message exchanges.
We cannot require honoring of xsi:schemaLocation and xsi:noNamespaceSchemaLocation in instance documents or schemaLocation on , because
Some existing implementations ignore them
Honoring schema location in instance documents may have security consequences
Schema specification does require that processors attempt to resolve schema locations specified on and . It is not an error for such attempt to fail for . It is an error when contains non-annotation content.
It’s more flexible for . Schema allows any strategy for processors to locate components to import, based on either or both of the namespace and the schema location.
4.3Include definition and instance documents as-is
SML-IF instance producers may not have control over the content of the schemas necessary for validation of model instance documents, where “control” means what is coded in the files. I.e. there will be cases where xs:import and xs:include are coded, with and without schemaLocation, and multiple files containing schema components for the same namespace will be observed.
4.4Lazy schema assembly
Schema specification allows schemas to be assembled lazily. A partial schema can be used to validate an instance document, and more components can be added to the schema during the validation, as long as the new components don’t change the validation result of information items that are already validated.
This is sometimes not easy to enforce, but a consequence of “supporting full schema” implies that SML-IF validation cannot violate this constraint.
Reference-related constraints (targetElement, targetType, acyclic, SML identity constraints) need to be properly supported. When 2 documents A and B are connected by an SML reference, these constraints require the ability to determine whether a component from the schema used to assess A is identical to a component from the schema used to assess B. The schema spec doesn't define identity of components across multiple schemas. The same source declaration may produce totally different components in different schemas. So to check those reference-related constraints, related instance documents MUST be validated using the same schema.
We divide the universe of SML-IF documents into two disjoint subsets:
A set that have all schema documents included, by value (smlif:data) and/or by reference (smlif:locator), in the SML-IF instance; the “schema-complete set”
All other SML-IF documents; the “schema-incomplete set”
It is necessary for a producer to declaratively distinguish between these two cases, since it is not always possible to distinguish based on the content alone. For example, XML Schema allows xs:include’s schema location attribute’s value to not resolve, although the value is required. This can be done by introducing a “schemaComplete” attribute on the element to indicate whether this SML-IF instance includes all necessary definition documents.
When this attribute is specified with an actual value “true”, then for the instance to be valid, its schema definition documents and instance documents can only refer to either built-in components or components from definition documents included in the instance. “Built-in” components include:
4 xsi: attributes (defined by XML Schema)
all schema built-in types (xs:anyType and simple types defined in XML Schema Part 2)
sml:ref attribute declaration
sml:uri element declaration
Remember, this is not trying to say that SML-IF document instances in the schema-incomplete set are now invalid. It does say that SML-IF cannot guarantee interoperability for the schema-incomplete set.
(See the picture next page) Assume an SML model packaged in an IF document has 4 schema documents: xsd1-a and xsd1-b have target namespace ns1, and xsd2-v1 and xsd2-v2 have target namespace ns2, where xsd2-v1 and xsd2-v2 are conflicting versions of the same schema (same target namespace). There are 4 instances: doc1 uses xsd1-a and xsd1-b; doc2-v1-a and doc2-v1-b uses xsd2-v1, and doc2-v2 uses xsd2-v2. All doc2-* instances have SML references to doc1, and their references have targetType constraints, pointing to a component in ns1.
To check targetType, doc2-v1-a, doc2-v1-b and doc1 must be validated using the same schema (xsd1-a + xsd1-b + xsd2-v1); similarly, doc2-v2 and doc1 must be validated using the schema from xsd1-a + xsd1-b + xsd2-v2. More concretely, in the following picture, instances in the red rectangle are validated using the schema built from schema documents in the red oval; and instances in the blue rectangle are validated using the schema built from the blue oval.
Note that doc1 is validated twice using 2 different schemas. doc1 may also be validated against only xsd1; this is up to the model author to specify.
6.2Solution to the Example
instance documents that are assessed against this schema -->
There are cases where most instance documents use the same schema. It’s desirable to have a default schema to cover this case, instead of having to have a that lists all those instances.
For example, if an IF document contains 3 schema documents: ns1.xsd, ns2.xsd, and ns2-exchange.xsd, where the latter 2 documents share the same target namespace, but ns2-exchange.xsd is meant to be exchanged only and should not be considered as part of the schema that governs instance documents. This can be achieved using the following syntax:
all instance documents *not* included in any "schemaBinding". -->
“defaultSchema” can be used together with “schemaBinding” as a default to cover instances documents that are not included in any “schemaBinding”.
Change the IF document structure to add the following (new content highlighted):
The details of the preceding XML syntax, e.g. whether the data is contained in attributes or elements, is fully negotiable. The XML above simply captures enough to have the discussion that follows.
For every schema binding SB in the model, i.e. every “/model/schemaBindings/schemaBinding” element (using XPATH notation):
Compose a schema using all documents specified under all SB’s children
Whenever there is an for a namespace N
If there is a child of SB whose "namespace" matches N, then components from schema documents listed in the corresponding "aliases" are used. As with rule bindings, URI prefixing is used for matching schema document aliases.
Note: at most one is allowed per namespace N within a given SB. If more than one namespace binding exists for the namespace as part of a single schema binding, the SML-IF instance is in error.
Note: if the set of aliases for namespace Nis empty, the namespace has no schema documents defining it in the schema binding.
Otherwise if there are schema documents in the IF whose targetNamespace is N, then components from all those schema documents are used
If a schema-complete document (/model/definitions/@schemaComplete=true) is being processed, then no component from N (other than built-ins) is included in the schema being composed
Otherwise, it is implementation-defined whether the processor tries to retrieve components for N from outside the SML-IF instance
Whenever there is an or , the schemaLocation is used to match aliases of schema documents, as with base SML-IF.
If there is a schema document in the IF matching that alias, then that document is used
If it's a schema-complete set, then the or is unresolved (which is allowed by XML Schema validity assessment rules)
Otherwise, it's implementation-defined whether it tries to resolve or to schema documents outside the IF
The list of documents are assessed against this *same* schema. targetXXX and identity constraints can now be checked. Similar to under elements, each can refer to multiple documents via URI prefixing.
If is present, then compose a schema from it following rules 2.1 to 2.3 above; otherwise cCompose a schema using *all* schema documents included in the IF. T, then use this schema to assess those instance documents that are not included in any .
Note: in the common case where match-all namespace matching is the desired result, this is achieved by omitting , i.e. without introducing any additional complexity into the SML-IF instance.
Note: one implication of this formulation is that the Schema document exchange requirement of section 3.5 is supported. This would be done by explicitly binding /model/instances/* to a schema binding that excludes the exchange-only schemas. The model instance documents may still contain information items from namespace(s) in the exchange-only schemas, however those schema documents would not be used to assess schema validity of the model instance documents.
Great synergy with
It works in a way very similar to Schematron rules. You associate a schema (built from a set of schema documents) with a set of instance documents
Handles all the requirements
Supports schema composition: chameleon included documents is supported by removing them from the corresponding (whose “namespace” attribute is absent)
Supports schema versioning: multiple versions can be specified in different elements
Deterministic: the association between instances and schemas is deterministic
Full schema support: are all supported
Schema document exchange: similar to chameleon included documents, exchange-only documents can also be omitted from the corresponding
Meets all the constraints
Supports access to schema documents outside of SML-IF: when schemaComplete=false, processors are allowed to use external schema documents
Ignorable schema locations: all xsi:schemaLocation attributes can be ignored
Lazy schema assembly: the schema is known up-front; no need to handle lazy assembly
Supports reference constraints: instances specified under the same use the same schema, so reference constraints can be checked.
Simple to understand
This has may Note that the “trivial case” is also handled by bullet 2. That is, there is no and all instance documents are assessed against the same schema.
We have to assume that all existing schema processors are capable of handling the "namespace matching" approach. That is, they can compose a schema from a list of schema documents.
This approach should be straightforward to handle. All the SML processor needs to do is to compute a list of schema documents based on schema documents mentioned in and give that list to the schema processor.
The “Explicit Binding” approach from earlier iterations of this proposal had what we believe is equivalent function, but was eliminated because it also had greater complexity (four levels of binding to sift through, instead of the two used here, i.e. schema binding and match-all namespace matching).
John Arwe, Bassam Tabarra, Harm Sluiman, and Pratul Dublish all provided useful input into the formulation of this document. This does not imply their endorsement of the proposal.