You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
721 lines
29 KiB
721 lines
29 KiB
<?xml version="1.0" standalone="no"?>
|
|
<!--
|
|
* Licensed to the Apache Software Foundation (ASF) under one or more
|
|
* contributor license agreements. See the NOTICE file distributed with
|
|
* this work for additional information regarding copyright ownership.
|
|
* The ASF licenses this file to You under the Apache License, Version 2.0
|
|
* (the "License"); you may not use this file except in compliance with
|
|
* the License. You may obtain a copy of the License at
|
|
*
|
|
* http://www.apache.org/licenses/LICENSE-2.0
|
|
*
|
|
* Unless required by applicable law or agreed to in writing, software
|
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
* See the License for the specific language governing permissions and
|
|
* limitations under the License.
|
|
-->
|
|
|
|
<!DOCTYPE s1 SYSTEM "sbk:/style/dtd/document.dtd">
|
|
|
|
<s1 title="Programming Guide">
|
|
<anchor name="Macro"/>
|
|
<s2 title="Version Macro">
|
|
<p>&XercesCName; defines a numeric preprocessor macro, _XERCES_VERSION, for users to
|
|
introduce into their code to perform conditional compilation where the
|
|
version of &XercesCName; is detected in order to enable or disable version
|
|
specific capabilities. For example,
|
|
</p>
|
|
<source>
|
|
#if _XERCES_VERSION >= 30102
|
|
// Code specific to Xerces-C++ version 3.1.2 and later.
|
|
#else
|
|
// Old code.
|
|
#endif
|
|
</source>
|
|
<p>The minor and revision (patch level) numbers have two digits of resolution
|
|
which means that '1' becomes '01' and '2' becomes '02' in this example.
|
|
</p>
|
|
<p>There are also other string macros or constants to represent the Xerces-C++ version.
|
|
Please refer to the <code>xercesc/util/XercesVersion.hpp</code> header for details.
|
|
</p>
|
|
</s2>
|
|
|
|
|
|
<anchor name="Schema"/>
|
|
<s2 title="Schema Support">
|
|
<p>&XercesCName; contains an implementation of the W3C XML Schema
|
|
Language. See the <jump href="schema-&XercesC3Series;.html">XML Schema Support</jump> page for details.
|
|
</p>
|
|
</s2>
|
|
|
|
<anchor name="Progressive"/>
|
|
<s2 title="Progressive Parsing">
|
|
|
|
<p>In addition to using the <code>parse()</code> method to parse an XML File.
|
|
You can use the other two parsing methods, <code>parseFirst()</code> and <code>parseNext()</code>
|
|
to do the so called progressive parsing. This way you don't
|
|
have to depend on throwing an exception to terminate the
|
|
parsing operation.
|
|
</p>
|
|
<p>
|
|
Calling <code>parseFirst()</code> will cause the DTD (both internal and
|
|
external subsets), and any pre-content, i.e. everything up to
|
|
but not including the root element, to be parsed. Subsequent calls to
|
|
<code>parseNext()</code> will cause one more pieces of markup to be parsed,
|
|
and propagated from the core scanning code to the parser (and
|
|
hence either on to you if using SAX/SAX2 or into the DOM tree if
|
|
using DOM).
|
|
</p>
|
|
<p>
|
|
You can quit the parse any time by just not
|
|
calling <code>parseNext()</code> anymore and breaking out of the loop. When
|
|
you call <code>parseNext()</code> and the end of the root element is the
|
|
next piece of markup, the parser will continue on to the end
|
|
of the file and return false, to let you know that the parse
|
|
is done. So a typical progressive parse loop will look like
|
|
this:</p>
|
|
|
|
<source>// Create a progressive scan token
|
|
XMLPScanToken token;
|
|
|
|
if (!parser.parseFirst(xmlFile, token))
|
|
{
|
|
cerr << "scanFirst() failed\n" << endl;
|
|
return 1;
|
|
}
|
|
|
|
//
|
|
// We started ok, so lets call scanNext()
|
|
// until we find what we want or hit the end.
|
|
//
|
|
bool gotMore = true;
|
|
while (gotMore && !handler.getDone())
|
|
gotMore = parser.parseNext(token);</source>
|
|
|
|
<p>In this case, our event handler object (named 'handler')
|
|
is watching for some criteria and will
|
|
return a status from its <code>getDone()</code> method. Since
|
|
the handler
|
|
sees the SAX events coming out of the SAXParser, it can tell
|
|
when it finds what it wants. So we loop until we get no more
|
|
data or our handler indicates that it saw what it wanted to
|
|
see.</p>
|
|
|
|
<p>When doing non-progressive parses, the parser can easily
|
|
know when the parse is complete and insure that any used
|
|
resources are cleaned up. Even in the case of a fatal parsing
|
|
error, it can clean up all per-parse resources. However, when
|
|
progressive parsing is done, the client code doing the parse
|
|
loop might choose to stop the parse before the end of the
|
|
primary file is reached. In such cases, the parser will not
|
|
know that the parse has ended, so any resources will not be
|
|
reclaimed until the parser is destroyed or another parse is started.</p>
|
|
|
|
<p>This might not seem like such a bad thing; however, in this case,
|
|
the files and sockets which were opened in order to parse the
|
|
referenced XML entities will remain open. This could cause
|
|
serious problems. Therefore, you should destroy the parser instance
|
|
in such cases, or restart another parse immediately. In a future
|
|
release, a reset method will be provided to do this more cleanly.</p>
|
|
|
|
<p>Also note that you must create a scan token and pass it
|
|
back in on each call. This insures that things don't get done
|
|
out of sequence. When you call <code>parseFirst()</code> or
|
|
<code>parse()</code>, any
|
|
previous scan tokens are invalidated and will cause an error
|
|
if used again. This prevents incorrect mixed use of the two
|
|
different parsing schemes or incorrect calls to
|
|
<code>parseNext()</code>.</p>
|
|
|
|
</s2>
|
|
|
|
<anchor name="GrammarCache"/>
|
|
<s2 title="Pre-parsing Grammar and Grammar Caching">
|
|
<p>&XercesCName; provides a function to pre-parse the grammar so that users
|
|
can check for any syntax error before using the grammar. Users can also optionally
|
|
cache these pre-parsed grammars for later use during actual parsing.
|
|
</p>
|
|
<p>Here is an example:</p>
|
|
<source>
|
|
XercesDOMParser parser;
|
|
|
|
// Enable schema processing.
|
|
parser.setDoSchema(true);
|
|
parser.setDONamespaces(true);
|
|
|
|
// Let's preparse the schema grammar (.xsd) and cache it.
|
|
Grammar* grammar = parser.loadGrammar(xmlFile, Grammar::SchemaGrammarType, true);
|
|
</source>
|
|
<p>Besides caching pre-parsed schema grammars, users can also cache any
|
|
grammars encountered during an xml document parse.
|
|
</p>
|
|
<p>Here is an example:</p>
|
|
<source>
|
|
SAXParser parser;
|
|
|
|
// Enable grammar caching by setting cacheGrammarFromParse to true.
|
|
// The parser will cache any encountered grammars if it does not
|
|
// exist in the pool.
|
|
// If the grammar is DTD, no internal subset is allowed.
|
|
parser.cacheGrammarFromParse(true);
|
|
|
|
// Let's parse our xml file (DTD grammar)
|
|
parser.parse(xmlFile);
|
|
|
|
// We can get the grammar where the root element was declared
|
|
// by calling the parser's method getRootGrammar;
|
|
// Note: The parser owns the grammar, and the user should not delete it.
|
|
Grammar* grammar = parser.getRootGrammar();
|
|
</source>
|
|
<p>We can use any previously cached grammars when parsing new xml
|
|
documents. Here are some examples on how to use those cached grammars:
|
|
</p>
|
|
<source>
|
|
/**
|
|
* Caching and reusing XML Schema (.xsd) grammar
|
|
* Parse an XML document and cache its grammar set. Then, use the cached
|
|
* grammar set in subsequent parses.
|
|
*/
|
|
|
|
XercesDOMParser parser;
|
|
|
|
// Enable schema processing
|
|
parser.setDoSchema(true);
|
|
parser.setDoNamespaces(true);
|
|
|
|
// Enable grammar caching
|
|
parser.cacheGrammarFromParse(true);
|
|
|
|
// Let's parse the XML document. The parser will cache any grammars encountered.
|
|
parser.parse(xmlFile);
|
|
|
|
// No need to enable re-use by setting useCachedGrammarInParse to true. It is
|
|
// automatically enabled with grammar caching.
|
|
for (int i=0; i< 3; i++)
|
|
parser.parse(xmlFile);
|
|
|
|
// This will flush the grammar pool
|
|
parser.resetCachedGrammarPool();
|
|
</source>
|
|
|
|
<source>
|
|
/**
|
|
* Caching and reusing DTD grammar
|
|
* Preparse a grammar and cache it in the pool. Then, we use the cached grammar
|
|
* when parsing XML documents.
|
|
*/
|
|
|
|
SAX2XMLReader* parser = XMLReaderFactory::createXMLReader();
|
|
|
|
// Load grammar and cache it
|
|
parser->loadGrammar(dtdFile, Grammar::DTDGrammarType, true);
|
|
|
|
// enable grammar reuse
|
|
parser->setFeature(XMLUni::fgXercesUseCachedGrammarInParse, true);
|
|
|
|
// Parse xml files
|
|
parser->parse(xmlFile1);
|
|
parser->parse(xmlFile2);
|
|
</source>
|
|
<p>There are some limitations about caching and using cached grammars:</p>
|
|
<ul>
|
|
<li>When caching/reusing DTD grammars, no internal subset is allowed.</li>
|
|
<li>When preparsing grammars with caching option enabled, if a grammar, in the
|
|
result set, already exists in the pool (same namespace for schema or same system
|
|
id for DTD), the entire set will not be cached. This behavior is the default but can
|
|
be overridden for XML Schema caching. See the SAX/SAX2/DOM parser features for details.</li>
|
|
<li>When parsing an XML document with the grammar caching option enabled, the
|
|
reuse option is also automatically enabled. We will only parse a grammar if it
|
|
does not exist in the pool.</li>
|
|
</ul>
|
|
</s2>
|
|
|
|
<anchor name="LoadableMessageText"/>
|
|
<s2 title="Loadable Message Text">
|
|
|
|
<p>The &XercesCName; supports loadable message text. Although
|
|
the current distribution only supports English, it is capable of
|
|
supporting other
|
|
languages. Anyone interested in contributing any translations
|
|
should contact us. This would be an extremely useful
|
|
service.</p>
|
|
|
|
<p>In order to support the local message loading services, all the error messages
|
|
are captured in an XML file in the src/xercesc/NLS/ directory.
|
|
There is a simple program, in the tools/NLS/Xlat/ directory,
|
|
which can translate that text in various formats. It currently
|
|
supports a simple 'in memory' format (i.e. an array of
|
|
strings), the Win32 resource format, and the message catalog
|
|
format. The 'in memory' format is intended for very simple
|
|
installations or for use when porting to a new platform (since
|
|
you can use it until you can get your own local message
|
|
loading support done.)</p>
|
|
|
|
<p>In the src/xercesc/util/ directory, there is an XMLMsgLoader
|
|
class. This is an abstraction from which any number of
|
|
message loading services can be derived. Your platform driver
|
|
file can create whichever type of message loader it wants to
|
|
use on that platform. &XercesCName; currently has versions for the in
|
|
memory format, the Win32 resource format, the message
|
|
catalog format, and ICU message loader.
|
|
Some of the platforms can support multiple message
|
|
loaders, in which case a #define token is used to control
|
|
which one is used. You can set this in your build projects to
|
|
control the message loader type used.</p>
|
|
|
|
</s2>
|
|
|
|
<anchor name="PluggableTranscoders"/>
|
|
<s2 title="Pluggable Transcoders">
|
|
|
|
<p>&XercesCName; also supports pluggable transcoding services. The
|
|
XMLTransService class is an abstract API that can be derived
|
|
from, to support any desired transcoding
|
|
service. XMLTranscoder is the abstract API for a particular
|
|
instance of a transcoder for a particular encoding. The
|
|
platform driver file decides what specific type of transcoder
|
|
to use, which allows each platform to use its native
|
|
transcoding services, or the ICU service if desired.</p>
|
|
|
|
<p>Implementations are provided for Win32 native services, ICU
|
|
services, and the <ref>iconv</ref> services available on many
|
|
Unix platforms. The Win32 version only provides native code
|
|
page services, so it can only handle XML code in the intrinsic
|
|
encodings ASCII, UTF-8, UTF-16 (Big/Small Endian), UCS4
|
|
(Big/Small Endian), EBCDIC code pages IBM037, IBM1047 and
|
|
IBM1140 encodings, ISO-8859-1 (aka Latin1) and Windows-1252. The ICU version
|
|
provides all of the encodings that ICU supports. The
|
|
<ref>iconv</ref> version will support the encodings supported
|
|
by the local system. You can use transcoders we provide or
|
|
create your own if you feel ours are insufficient in some way,
|
|
or if your platform requires an implementation that &XercesCName; does not
|
|
provide.</p>
|
|
|
|
</s2>
|
|
|
|
<anchor name="PortingGuidelines"/>
|
|
<s2 title="Porting Guidelines">
|
|
|
|
<p>All platform dependent code in &XercesCName; has been
|
|
isolated to a couple of files, which should ease the porting
|
|
effort. The <code>src/xercesc/util</code> directory
|
|
contains all such files. In particular:</p>
|
|
|
|
<ul>
|
|
<li>The <code>src/xercesc/util/FileManagers</code> directory
|
|
contains implementations of file managers for various
|
|
platforms.</li>
|
|
|
|
<li>The <code>src/xercesc/util/MutexManagers</code> directory
|
|
contains implementations of mutex managers for various
|
|
platforms.</li>
|
|
|
|
<li>The <code>src/xercesc/util/Xerces_autoconf_const*</code> files
|
|
provide base definitions for various platforms.</li>
|
|
</ul>
|
|
|
|
<p>Other concerns are:</p>
|
|
|
|
<ul>
|
|
<li>Does ICU compile on your platform? If not, then you'll need to
|
|
create a transcoder implementation that uses your local transcoding
|
|
services. The iconv transcoder should work for you, though perhaps
|
|
with some modifications.</li>
|
|
<li>What message loader will you use? To get started, you can use the
|
|
"in memory" one, which is very simple and easy. Then, once you get
|
|
going, you may want to adapt the message catalog message loader, or
|
|
write one of your own that uses local services.</li>
|
|
<li>What should I define XMLCh to be? Please refer to <jump
|
|
href="build-misc-&XercesC3Series;.html#XMLChInfo">What should I define XMLCh to be?</jump> for
|
|
further details.</li>
|
|
</ul>
|
|
|
|
<p>Finally, you need to decide about how to define XMLCh. Generally,
|
|
XMLCh should be defined to be a type suitable for holding a
|
|
utf-16 encoded (16 bit) value, usually an <code>unsigned short</code>. </p>
|
|
|
|
<p>All XML data is handled within &XercesCName; as strings of
|
|
XMLCh characters. Regardless of the size of the
|
|
type chosen, the data stored in variables of type XMLCh
|
|
will always be utf-16 encoded values. </p>
|
|
|
|
|
|
|
|
<p>Unlike XMLCh, the encoding
|
|
of wchar_t is platform dependent. Sometimes it is utf-16
|
|
(AIX, Windows), sometimes ucs-4 (Solaris,
|
|
Linux), sometimes it is not based on Unicode at all
|
|
(HP/UX, AS/400, system 390). </p>
|
|
|
|
<p>Some earlier releases of &XercesCName; defined XMLCh to be the
|
|
same type as wchar_t on most platforms, with the goal of making
|
|
it possible to pass XMLCh strings to library or system functions
|
|
that were expecting wchar_t parameters. This approach has
|
|
been abandoned because of</p>
|
|
|
|
<ul>
|
|
<li>
|
|
Portability problems with any code that assumes that
|
|
the types of XMLCh and wchar_t are compatible
|
|
</li>
|
|
|
|
<li>Excessive memory usage, especially in the DOM, on
|
|
platforms with 32 bit wchar_t.
|
|
</li>
|
|
|
|
<li>utf-16 encoded XMLCh is not always compatible with
|
|
ucs-4 encoded wchar_t on Solaris and Linux. The
|
|
problem occurs with Unicode characters with values
|
|
greater than 64k; in ucs-4 the value is stored as
|
|
a single 32 bit quantity. With utf-16, the value
|
|
will be stored as a "surrogate pair" of two 16 bit
|
|
values. Even with XMLCh equated to wchar_t, xerces will
|
|
still create the utf-16 encoded surrogate pairs, which
|
|
are illegal in ucs-4 encoded wchar_t strings.
|
|
</li>
|
|
</ul>
|
|
|
|
|
|
|
|
</s2>
|
|
|
|
<anchor name="CPPNamespace"/>
|
|
<s2 title="Using C++ Namespace">
|
|
|
|
<p>&XercesCName; makes use of C++ namespace to make sure its
|
|
definitions do not conflict with other libraries and
|
|
applications. As a result applications must
|
|
namespace-qualify all &XercesCName; classes, data and
|
|
variables using the <code>xercesc</code> name. Alternatively,
|
|
applications can use <code>using xercesc::<Name>;</code>
|
|
declarations
|
|
to make individual &XercesCName; names visible in the
|
|
current scope
|
|
or <code>using namespace xercesc;</code>
|
|
definition to make all &XercesCName; names visible in the
|
|
current scope.</p>
|
|
|
|
<p>While the above information should be sufficient for the majority
|
|
of applications, for cases where several versions of the &XercesCName;
|
|
library must be used in the same application, namespace versioning is
|
|
provided. The following convenience macros can be used to access the
|
|
&XercesCName; namespace names with versions embedded
|
|
(see <code>src/xercesc/util/XercesDefs.hpp</code>):</p>
|
|
|
|
<source>
|
|
#define XERCES_CPP_NAMESPACE_BEGIN namespace &XercesC3NSVersion; {
|
|
#define XERCES_CPP_NAMESPACE_END }
|
|
#define XERCES_CPP_NAMESPACE_USE using namespace &XercesC3NSVersion;;
|
|
#define XERCES_CPP_NAMESPACE_QUALIFIER &XercesC3NSVersion;::
|
|
|
|
namespace &XercesC3NSVersion; { }
|
|
namespace &XercesC3Namespace; = &XercesC3NSVersion;;
|
|
</source>
|
|
</s2>
|
|
|
|
|
|
<anchor name="SpecifyLocaleForMessageLoader"/>
|
|
<s2 title="Specify Locale for Message Loader">
|
|
|
|
<p>&XercesCName; provides mechanisms for Native Language Support (NLS).
|
|
Even though
|
|
the current distribution has only English message file, it is capable
|
|
of supporting other languages once the translated version of the
|
|
target language is available.</p>
|
|
|
|
<p>An application can specify the locale for the message loader in their
|
|
very first invocation to XMLPlatformUtils::Initialize() by supplying
|
|
a parameter for the target locale intended. The default locale is "en_US".
|
|
</p>
|
|
<source>
|
|
// Initialize the parser system
|
|
try
|
|
{
|
|
XMLPlatformUtils::Initialize("fr_FR");
|
|
}
|
|
catch ()
|
|
{
|
|
}
|
|
</source>
|
|
</s2>
|
|
|
|
|
|
<anchor name="SpecifyLocationForMessageLoader"/>
|
|
<s2 title="Specify Location for Message Loader">
|
|
|
|
<p>&XercesCName; searches for message files at the location
|
|
specified in the <code>XERCESC_NLS_HOME</code> environment
|
|
variable and, if that is not set, at the default
|
|
message directory, <code>$XERCESCROOT/msg</code>.
|
|
</p>
|
|
|
|
<p>Application can specify an alternative location for the message files in their
|
|
very first invocation to XMLPlatformUtils::Initialize() by supplying
|
|
a parameter for the alternative location.
|
|
</p>
|
|
|
|
<source>
|
|
// Initialize the parser system
|
|
try
|
|
{
|
|
XMLPlatformUtils::Initialize("en_US", "/usr/nls");
|
|
}
|
|
catch ()
|
|
{
|
|
}
|
|
</source>
|
|
</s2>
|
|
|
|
<anchor name="PluggablePanicHandler"/>
|
|
<s2 title="Pluggable Panic Handler">
|
|
|
|
<p>&XercesCName; reports panic conditions encountered to the panic
|
|
handler installed. The panic handler can take whatever action
|
|
appropriate to handle the panic condition.
|
|
</p>
|
|
<p>&XercesCName; allows application to provide a customized panic handler
|
|
(class implementing the interface PanicHandler), in its very first invocation of
|
|
XMLPlatformUtils::Initialize().
|
|
</p>
|
|
<p>In the absence of an application-specific panic handler, &XercesCName; default
|
|
panic handler is installed and used, which aborts program whenever a panic
|
|
condition is encountered.
|
|
</p>
|
|
|
|
<source>
|
|
// Initialize the parser system
|
|
try
|
|
{
|
|
PanicHandler* ph = new MyPanicHandler();
|
|
|
|
XMLPlatformUtils::Initialize("en_US",
|
|
"/usr/nls",
|
|
ph);
|
|
}
|
|
catch ()
|
|
{
|
|
}
|
|
</source>
|
|
</s2>
|
|
|
|
<anchor name="PluggableMemoryManager"/>
|
|
<s2 title="Pluggable Memory Manager">
|
|
<p>Certain applications wish to maintain precise control over
|
|
memory allocation. This enables them to recover more easily
|
|
from crashes of individual components, as well as to allocate
|
|
memory more efficiently than a general-purpose OS-level
|
|
procedure with no knowledge of the characteristics of the
|
|
program making the requests for memory. In &XercesCName; this
|
|
is supported via the Pluggable Memory Handler.
|
|
</p>
|
|
|
|
<p>Users who wish to implement their own MemoryManager,
|
|
an interface found in <code>xercesc/framework/MemoryManager.hpp</code>,
|
|
need to implement only two methods:</p>
|
|
<source>
|
|
// This method allocates requested memory.
|
|
// the parameter is the requested memory size
|
|
// A pointer to the allocated memory is returned.
|
|
virtual void* allocate(XMLSize_t size) = 0;
|
|
|
|
// This method deallocates memory
|
|
// The parameter is a pointer to the allocated memory to be deleted
|
|
virtual void deallocate(void* p) = 0;
|
|
</source>
|
|
<p>To maximize the amount of flexibility that applications
|
|
have in terms of controlling memory allocation, a
|
|
MemoryManager instance may be set as part of the call to
|
|
XMLPlatformUtils::Initialize() to allow for static
|
|
initialization to be done with the given MemoryHandler; a
|
|
(possibly different) MemoryManager may be passed in to the
|
|
constructors of all Xerces parser objects as well, and all
|
|
dynamic allocations within the parsers will make use of this
|
|
object. Assuming that MyMemoryHandler is a class that
|
|
implements the MemoryManager interface, here is a bit of
|
|
pseudocode which illustrates these ideas:
|
|
</p>
|
|
<source>
|
|
MyMemoryHandler *mm_for_statics = new MyMemoryHandler();
|
|
MyMemoryHandler *mm_for_particular_parser = new MyMemoryManager();
|
|
|
|
// initialize the parser information; try/catch
|
|
// removed for brevity
|
|
XMLPlatformUtils::Initialize(XMLUni::fgXercescDefaultLocale, 0,0,
|
|
mm_for_statics);
|
|
|
|
// create a parser object
|
|
XercesDOMParser *parser = new
|
|
XercesDomParser(mm_for_particular_parser);
|
|
|
|
// ...
|
|
delete parser;
|
|
XMLPlatformUtils::Terminate();
|
|
</source>
|
|
<p>
|
|
If a user provides a MemoryManager object to the parser, then
|
|
the user owns that object. It is also important to note that
|
|
&XercesCName; default implementation simply uses the global
|
|
new and delete operators.
|
|
</p>
|
|
</s2>
|
|
|
|
<anchor name="SecurityManager"/>
|
|
<s2 title="Managing Security Vulnerabilities">
|
|
<p>
|
|
The purpose of the SecurityManager class is to permit applications a
|
|
means to have the parser reject documents whose processing would
|
|
otherwise consume large amounts of system resources. Malicious
|
|
use of such documents could be used to launch a denial-of-service
|
|
attack against a system running the parser. Initially, the
|
|
SecurityManager only knows about attacks that can result from
|
|
exponential entity expansion; this is the only known attack that
|
|
involves processing a single XML document. Other, similar attacks
|
|
can be launched if arbitrary schemas may be parsed; there already
|
|
exist means (via use of the EntityResolver interface) by which
|
|
applications can deny processing of untrusted schemas. In future,
|
|
the SecurityManager will be expanded to take these other exploits
|
|
into account.
|
|
</p>
|
|
<p>
|
|
The SecurityManager class is very simple: It will contain
|
|
getters and setters corresponding to each known variety of
|
|
exploit. These will reflect limits that the application may
|
|
impose on the parser with respect to the processing of various
|
|
XML constructs. When an instance of SecurityManager is
|
|
instantiated, default values for these limits will be provided
|
|
that should suit most applications.
|
|
</p>
|
|
<p>
|
|
By default, &XercesCName; is a wholly conformant XML parser; that
|
|
is, no security-related considerations will be observed by
|
|
default. An application must provide an instance of the
|
|
SecurityManager class to a parser in order to make that
|
|
parser behave in a security-conscious manner. For example:
|
|
</p>
|
|
<source>
|
|
SAXParser *myParser = new SAXParser();
|
|
SecurityManager *myManager = new SecurityManager();
|
|
myManager->setEntityExpansionLimit(100000); // larger than default
|
|
myParser->setSecurityManager(myManager);
|
|
// ... use the parser
|
|
</source>
|
|
<p>
|
|
Note that SecurityManager instances may be set on all kinds of
|
|
&XercesCName; parsers; please see the documentation for the
|
|
individual parsers for details.
|
|
</p>
|
|
<p>
|
|
Note also that the application always owns the SecurityManager
|
|
instance. The default SecurityManager that &XercesCName; provides
|
|
is not thread-safe; although it only uses primitive operations at
|
|
the moment, users may need to extend the class with a
|
|
thread-safe implementation on some platforms.
|
|
</p>
|
|
</s2>
|
|
<anchor name="UseSpecificScanner"/>
|
|
<s2 title="Use Specific Scanner">
|
|
|
|
<p>For performance and modularity &XercesCName; provides a mechanism
|
|
for specifying the scanner to be used when scanning an XML document.
|
|
Such mechanism will enable the creation of special purpose scanners
|
|
that can be easily plugged in.</p>
|
|
|
|
<p>&XercesCName; supports the following scanners:</p>
|
|
|
|
<s3 title="WFXMLScanner">
|
|
|
|
<p>
|
|
The WFXMLScanner is a non-validating scanner which performs well-formedness check only.
|
|
It does not do any DTD/XMLSchema processing. If the XML document contains a DOCTYPE, it
|
|
will be silently ignored (i.e. no warning message is issued). Similarly, any schema
|
|
specific attributes (e.g. schemaLocation), will be treated as normal element attributes.
|
|
Setting grammar specific features/properties will have no effect on its behavior
|
|
(e.g. setLoadExternalDTD(true) is ignored).
|
|
</p>
|
|
|
|
<source>
|
|
// Create a DOM parser
|
|
XercesDOMParser parser;
|
|
|
|
// Specify scanner name
|
|
parser.useScanner(XMLUni::fgWFXMLScanner);
|
|
|
|
// Specify other parser features, e.g.
|
|
parser.setDoNamespaces(true);
|
|
</source>
|
|
|
|
|
|
</s3>
|
|
|
|
<s3 title="DGXMLScanner">
|
|
|
|
<p>
|
|
The DGXMLScanner handles XML documents with DOCTYPE information. It does not do any
|
|
XMLSchema processing, which means that any schema specific attributes (e.g. schemaLocation),
|
|
will be treated as normal element attributes. Setting schema grammar specific features/properties
|
|
will have no effect on its behavior (e.g. setDoSchema(true) and setLoadSchema(true) are ignored).
|
|
</p>
|
|
|
|
<source>
|
|
// Create a SAX parser
|
|
SAXParser parser;
|
|
|
|
// Specify scanner name
|
|
parser.useScanner(XMLUni::fgDGXMLScanner);
|
|
|
|
// Specify other parser features, e.g.
|
|
parser.setLoadExternalDTD(true);
|
|
</source>
|
|
|
|
</s3>
|
|
|
|
<s3 title="SGXMLScanner">
|
|
|
|
<p>
|
|
The SGXMLScanner handles XML documents with XML schema grammar information.
|
|
If the XML document contains a DOCTYPE, it will be ignored. Namespace and
|
|
schema processing features are on by default, and setting them to off has
|
|
not effect.
|
|
</p>
|
|
|
|
<source>
|
|
// Create a SAX2 parser
|
|
SAX2XMLReader* parser = XMLReaderFactory::createXMLReader();
|
|
|
|
// Specify scanner name
|
|
parser->setProperty(XMLUni::fgXercesScannerName, (void *)XMLUni::fgSGXMLScanner);
|
|
|
|
// Specify other parser features, e.g.
|
|
parser->setFeature(XMLUni::fgXercesSchemaFullChecking, false);
|
|
</source>
|
|
|
|
</s3>
|
|
|
|
<s3 title="IGXMLScanner">
|
|
|
|
<p>
|
|
The IGXMLScanner is an integrated scanner and handles XML documents with DTD and/or
|
|
XML schema grammar. This is the default scanner used by the various parsers if no
|
|
scanner is specified.
|
|
</p>
|
|
|
|
<source>
|
|
// Create a DOMLSParser parser
|
|
DOMLSParser *parser = ((DOMImplementationLS*)impl)->createLSParser(
|
|
DOMImplementationLS::MODE_SYNCHRONOUS, 0);
|
|
|
|
// Specify scanner name - This is optional as IGXMLScanner is the default
|
|
parser->getDomConfig()->setParameter(
|
|
XMLUni::fgXercesScannerName, (void *)XMLUni::fgIGXMLScanner);
|
|
|
|
// Specify other parser features, e.g.
|
|
parser->getDomConfig()->setParameter(XMLUni::fgDOMNamespaces, doNamespaces);
|
|
parser->getDomConfig()->setParameter(XMLUni::fgXercesSchema, doSchema);
|
|
</source>
|
|
|
|
</s3>
|
|
|
|
</s2>
|
|
|
|
</s1>
|