The SAX2 API for XML parsers was originally developed for
Java. Please be aware that there is no standard SAX2 API for
C++, and that use of the &XercesCName; SAX2 API does not
guarantee client code compatibility with other C++ XML
parsers.
The SAX2 API presents a callback based API to the parser. An
application that uses SAX2 provides an instance of a handler
class to the parser. When the parser detects XML constructs,
it calls the methods of the handler class, passing them
information about the construct that was detected. The most
commonly used handler classes are ContentHandler which is
called when XML constructs are recognized, and ErrorHandler
which is called when an error occurs. The header files for the
various SAX2 handler classes are in the xercesc/sax2/
directory.
As a convenience, &XercesCName; provides DefaultHandler,
a single class which is publicly derived
from all the Handler classes. DefaultHandler's default
implementation of the handler callback methods is to do
nothing. A convenient way to get started with &XercesCName; is
to derive your own handler class from DefaultHandler and override
just those methods in HandlerBase which you are interested in
customizing. This simple example shows how to create a handler
which will print element names, and print fatal error
messages. The source code for the sample applications show
additional examples of how to write handler classes.
This is the header file MySAX2Handler.hpp:
This is the implementation file MySAX2Handler.cpp:
The XMLCh and Attributes types are supplied by
&XercesCName; and are documented in the API Reference.
Examples of their usage appear in the source code to
the sample applications.
In order to use &XercesCName; SAX2 to parse XML files, you will
need to create an instance of the SAX2XMLReader class. The example
below shows the code you need in order to create an instance
of SAX2XMLReader. The ContentHandler and ErrorHandler instances
required by the SAX2 API are provided using the DefaultHandler
class supplied with &XercesCName;.
The behavior of the SAX2XMLReader is dependant on the values of the following features.
All of the features below can be set using the function SAX2XMLReader::setFeature(cons XMLCh* const, const bool).
And can be queried using the function bool SAX2XMLReader::getFeature(const XMLCh* const).
None of these features can be modified in the middle of a parse, or an exception will be thrown.
http://xml.org/sax/features/namespaces
true:
Perform Namespace processing.
false:
Do not perform Namespace processing.
default:
true
XMLUni Predefined Constant:
fgSAX2CoreNameSpaces
note:
If the validation feature is set to true, then the
document must contain a grammar that supports the use of namespaces.
see:
http://xml.org/sax/features/namespace-prefixes
see:
http://xml.org/sax/features/validation
http://xml.org/sax/features/namespace-prefixes
true:
Report the original prefixed names and attributes used for Namespace declarations.
false:
Do not report attributes used for Namespace declarations, and optionally do not report original prefixed names.
default:
false
XMLUni Predefined Constant:
fgSAX2CoreNameSpacePrefixes
http://xml.org/sax/features/validation
true:
Report all validation errors.
false:
Do not report validation errors.
default:
false
XMLUni Predefined Constant:
fgSAX2CoreValidation
note:
If this feature is set to true, the document must
specify a grammar. If this feature is set to false and document specifies a grammar,
that grammar might be parsed but no validation of the document contents will be
performed.
Enable full schema constraint checking, including checking
which may be time-consuming or memory intensive. Currently, particle unique
attribution constraint checking and particle derivation restriction checking
are controlled by this option.
false:
Disable full schema constraint checking.
default:
false
XMLUni Predefined Constant:
fgXercesSchemaFullChecking
note:
This feature checks the schema grammar itself for
additional errors that are time-consuming or memory intensive. It does not affect the
level of checking performed on document instances that use schema grammars.
The behavior of the parser when this feature is set to
true is undetermined! Therefore use this feature with extreme caution because
the parser may get stuck in an infinite loop or worse.
The parser will treat validation error as fatal and will
exit depends on the state of
http://apache.org/xml/features/continue-after-fatal-error.
false:
The parser will report the error and continue processing.
default:
false
XMLUni Predefined Constant:
fgXercesValidationErrorAsFatal
note:
Setting this true does not mean the validation error will
be printed with the word "Fatal Error". It is still printed as "Error", but the parser
will exit if
http://apache.org/xml/features/continue-after-fatal-error
is set to false.
If http://apache.org/xml/features/validation/cache-grammarFromParse is enabled,
this feature is set to true automatically and any setting to this feature by the user is a no-op.
Enable generation of synthetic annotations. A synthetic annotation will be
generated when a schema component has non-schema attributes but no child annotation.
Ignore a cached DTD when an XML document contains both an
internal and external DTD, and the use cached grammar from parse option
is enabled. Currently, we do not allow using cached DTD grammar when an
internal subset is present in the document. This option will only affect
the behavior of the parser when an internal and external DTD both exist
in a document (i.e. no effect if document has no internal subset).
During schema validation allow multiple schemas with the same namespace
to be imported.
false:
Don't import multiple schemas with the same namespace.
default:
false
XMLUni Predefined Constant:
fgXercesHandleMultipleImports
The behavior of the SAX2XMLReader is dependant on the values of the following properties.
All of the properties below can be set using the function SAX2XMLReader::setProperty(const XMLCh* const, void*).
It takes a void pointer as the property value. Application is required to initialize this void
pointer to a correct type. Please check the column "Value Type" below
to learn exactly what type of property value each property expects for processing.
Passing a void pointer that was initialized with a wrong type will lead to unexpected result.
If the same property is set more than once, the last one takes effect.
Property values can be queried using the function void* SAX2XMLReader::getProperty(const XMLCh* const).
The parser owns the returned pointer, and the memory allocated for the returned pointer will
be destroyed when the parser is deleted. To ensure accessibility of the returned information after
the parser is deleted, callers need to copy and store the returned information somewhere else.
Since the returned pointer is a generic void pointer, check the column "Value Type" below to learn
exactly what type of object each property returns for replication.
None of these properties can be modified in the middle of a parse, or an exception will be thrown.
The XML Schema Recommendation explicitly states that
the inclusion of schemaLocation/ noNamespaceSchemaLocation attributes in the
instance document is only a hint; it does not mandate that these attributes
must be used to locate schemas. Similar situation happens to <import>
element in schema documents. This property allows the user to specify a list
of schemas to use. If the targetNamespace of a schema specified using this
method matches the targetNamespace of a schema occurring in the instance
document in schemaLocation attribute, or
if the targetNamespace matches the namespace attribute of <import>
element, the schema specified by the user using this property will
be used (i.e., the schemaLocation attribute in the instance document
or on the <import> element will be effectively ignored).
Value
The syntax is the same as for schemaLocation attributes
in instance documents: e.g, "http://www.example.com file_name.xsd".
The user can specify more than one XML Schema in the list.
The XML Schema Recommendation explicitly states that
the inclusion of schemaLocation/ noNamespaceSchemaLocation attributes in the
instance document is only a hint; it does not mandate that these attributes
must be used to locate schemas. This property allows the user to specify the
no target namespace XML Schema Location externally. If specified, the instance
document's noNamespaceSchemaLocation attribute will be effectively ignored.
Value
The syntax is the same as for the noNamespaceSchemaLocation
attribute that may occur in an instance document: e.g."file_name.xsd".
Value Type
XMLCh*
XMLUni Predefined Constant:
fgXercesSchemaExternalNoNameSpaceSchemaLocation
http://apache.org/xml/properties/scannerName
Description
This property allows the user to specify the name of
the XMLScanner to use for scanning XML documents. If not specified, the default
scanner "IGXMLScanner" is used.
Value
The recognized scanner names are:
1."WFXMLScanner" - scanner that performs well-formedness checking only.
2. "DGXMLScanner" - scanner that handles XML documents with DTD grammar information.
3. "SGXMLScanner" - scanner that handles XML documents with XML schema grammar information.
4. "IGXMLScanner" - scanner that handles XML documents with DTD or/and XML schema grammar information.
Users can use the predefined constants defined in XMLUni directly (fgWFXMLScanner, fgDGXMLScanner,
fgSGXMLScanner, or fgIGXMLScanner) or a string that matches the value of
one of those constants.
Value Type
XMLCh*
XMLUni Predefined Constant:
fgXercesScannerName
note:
See Use Specific Scanner
for more programming details.
http://apache.org/xml/properties/security-manager
Description
Certain valid XML and XML Schema constructs can force a
processor to consume more system resources than an
application may wish. In fact, certain features could
be exploited by malicious document writers to produce a
denial-of-service attack. This property allows
applications to impose limits on the amount of
resources the processor will consume while processing
these constructs.
Value
An instance of the SecurityManager class (see
xercesc/util/SecurityManager). This
class's documentation describes the particular limits
that may be set. Note that, when instantiated, default
values for limits that should be appropriate in most
settings are provided. The default implementation is
not thread-safe; if thread-safety is required, the
application should extend this class, overriding
methods appropriately. The parser will not adopt the
SecurityManager instance; the application is
responsible for deleting it when it is finished with
it. If no SecurityManager instance has been provided to
the parser (the default) then processing strictly
conforming to the relevant specifications will be
performed.
Value Type
SecurityManager*
XMLUni Predefined Constant:
fgXercesSecurityManager
http://apache.org/xml/properties/low-water-mark
Description
If the number of available bytes in the raw buffer is less than
the low water mark the parser will attempt to read more data before
continuing parsing. By default the value for this parameter is 100
bytes. You may want to set this parameter to 0 if you would like
the parser to parse the available data immediately without
potentially blocking while waiting for more date.
Value
New low water mark.
Value Type
XMLSize_t*
XMLUni Predefined Constant:
fgXercesLowWaterMark
setInputBufferSize(const size_t bufferSize)
Description
Set maximum input buffer size.
This method allows users to limit the size of buffers used in parsing
XML character data. The effect of setting this size is to limit the
size of a ContentHandler::characters() call.
The parser's default input buffer size is 1 megabyte.