You are viewing a plain text version of this content. The canonical link for it is here.
Posted to cvs@httpd.apache.org by gs...@hyperreal.org on 1999/05/25 12:03:40 UTC
cvs commit: apache-1.3/src/lib/expat MPL-1_0.html Makefile.tmpl asciitab.h expat.html gplelect.html hashtable.c hashtable.h iasciitab.h latin1tab.h nametab.h utf8tab.h xmlparse.c xmlparse.h xmlrole.c xmlrole.h xmltok.c xmltok.h xmltok_impl.c xmltok_impl.h
gstein 99/05/25 03:03:39
Added: src/lib/expat MPL-1_0.html Makefile.tmpl asciitab.h
expat.html gplelect.html hashtable.c hashtable.h
iasciitab.h latin1tab.h nametab.h utf8tab.h
xmlparse.c xmlparse.h xmlrole.c xmlrole.h xmltok.c
xmltok.h xmltok_impl.c xmltok_impl.h
Log:
Adding Expat 1.0.2 to the Apache repository. Apache elects to use the MPL
license, per the instructions in expat.html. This version of Expat 1.0.2
has been modified to contain only the tokenizer and parser, and the
Makefile has been adjusted to use Apache conventions (Makefile.tmpl).
See http://www.jclark.com/xml/expat.html for more information on Expat.
Revision Changes Path
1.1 apache-1.3/src/lib/expat/MPL-1_0.html
Index: MPL-1_0.html
===================================================================
<TITLE>Mozilla Public License version 1.0</TITLE>
<BODY BGCOLOR="#FFFFFF" TEXT="#000000"
LINK="#0000EE" VLINK="#551A8B" ALINK="#FF0000">
<P ALIGN=CENTER>
<FONT SIZE="+2"><B>MOZILLA PUBLIC LICENSE</B></FONT><BR>
<B>Version 1.0</B>
</P>
<P><HR WIDTH="20%"><P>
<P><B>1. Definitions.</B>
<UL>
<B>1.1. ``Contributor''</B> means each entity that creates or contributes
to the creation of Modifications.
<P><B>1.2. ``Contributor Version''</B> means the combination of the
Original Code, prior Modifications used by a Contributor, and the
Modifications made by that particular Contributor.
<P><B>1.3. ``Covered Code''</B> means the Original Code or Modifications
or the combination of the Original Code and Modifications, in each case
including portions thereof<B>.</B>
<P><B>1.4. ``Electronic Distribution Mechanism''</B> means a mechanism
generally accepted in the software development community for the
electronic transfer of data.
<P><B>1.5. ``Executable''</B> means Covered Code in any form other than
Source Code.
<P><B>1.6. ``Initial Developer''</B> means the individual or entity
identified as the Initial Developer in the Source Code notice required by
<B>Exhibit A</B>.
<P><B>1.7. ``Larger Work''</B> means a work which combines Covered Code
or portions thereof with code not governed by the terms of this License.
<P><B>1.8. ``License''</B> means this document.
<P><B>1.9. ``Modifications''</B> means any addition to or deletion from
the substance or structure of either the Original Code or any previous
Modifications. When Covered Code is released as a series of files, a
Modification is:
<UL>
<P><B>A.</B> Any addition to or deletion from the contents of a file
containing Original Code or previous Modifications.
<P><B>B.</B> Any new file that contains any part of the Original
Code or previous Modifications.
</UL>
<P><B>1.10. ``Original Code''</B> means Source Code of computer software
code which is described in the Source Code notice required by <B>Exhibit
A</B> as Original Code, and which, at the time of its release under this
License is not already Covered Code governed by this License.
<P><B>1.11. ``Source Code''</B> means the preferred form of the Covered
Code for making modifications to it, including all modules it contains,
plus any associated interface definition files, scripts used to control
compilation and installation of an Executable, or a list of source code
differential comparisons against either the Original Code or another well
known, available Covered Code of the Contributor's choice. The Source
Code can be in a compressed or archival form, provided the appropriate
decompression or de-archiving software is widely available for no charge.
<P><B>1.12. ``You''</B> means an individual or a legal entity exercising
rights under, and complying with all of the terms of, this License or a
future version of this License issued under Section 6.1. For legal
entities, ``You'' includes any entity which controls, is controlled by,
or is under common control with You. For purposes of this definition,
``control'' means (a) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or otherwise,
or (b) ownership of fifty percent (50%) or more of the outstanding shares
or beneficial ownership of such entity.
</UL>
<B>2. Source Code License.</B>
<UL>
<B>2.1. The Initial Developer Grant.</B>
<BR>The Initial Developer hereby grants You a world-wide, royalty-free,
non-exclusive license, subject to third party intellectual property
claims:
<UL>
<P><B>(a)</B> to use, reproduce, modify, display, perform, sublicense
and distribute the Original Code (or portions thereof) with or
without Modifications, or as part of a Larger Work; and
<P><B>(b)</B> under patents now or hereafter owned or controlled by
Initial Developer, to make, have made, use and sell (``Utilize'') the
Original Code (or portions thereof), but solely to the extent that
any such patent is reasonably necessary to enable You to Utilize the
Original Code (or portions thereof) and not to any greater extent
that may be necessary to Utilize further Modifications or
combinations.
</UL>
<P><B>2.2. Contributor Grant.</B>
<BR>Each Contributor hereby grants You a world-wide, royalty-free,
non-exclusive license, subject to third party intellectual property
claims:
<UL>
<P><B>(a)</B> to use, reproduce, modify, display, perform, sublicense and
distribute the Modifications created by such Contributor (or portions
thereof) either on an unmodified basis, with other Modifications, as
Covered Code or as part of a Larger Work; and
<P><B>(b)</B> under patents now or hereafter owned or controlled by
Contributor, to Utilize the Contributor Version (or portions thereof),
but solely to the extent that any such patent is reasonably necessary to
enable You to Utilize the Contributor Version (or portions thereof), and
not to any greater extent that may be necessary to Utilize further
Modifications or combinations.
</UL>
</UL>
<B>3. Distribution Obligations.</B>
<UL>
<B>3.1. Application of License.</B>
<BR>The Modifications which You create or to which You contribute are
governed by the terms of this License, including without limitation
Section <B>2.2</B>. The Source Code version of Covered Code may be
distributed only under the terms of this License or a future version of
this License released under Section <B>6.1</B>, and You must include a
copy of this License with every copy of the Source Code You
distribute. You may not offer or impose any terms on any Source Code
version that alters or restricts the applicable version of this License
or the recipients' rights hereunder. However, You may include an
additional document offering the additional rights described in Section
<B>3.5</B>.
<P><B>3.2. Availability of Source Code.</B>
<BR>Any Modification which You create or to which You contribute must be
made available in Source Code form under the terms of this License either
on the same media as an Executable version or via an accepted Electronic
Distribution Mechanism to anyone to whom you made an Executable version
available; and if made available via Electronic Distribution Mechanism,
must remain available for at least twelve (12) months after the date it
initially became available, or at least six (6) months after a subsequent
version of that particular Modification has been made available to such
recipients. You are responsible for ensuring that the Source Code version
remains available even if the Electronic Distribution Mechanism is
maintained by a third party.
<P><B>3.3. Description of Modifications.</B>
<BR>You must cause all Covered Code to which you contribute to contain a
file documenting the changes You made to create that Covered Code and the
date of any change. You must include a prominent statement that the
Modification is derived, directly or indirectly, from Original Code
provided by the Initial Developer and including the name of the Initial
Developer in (a) the Source Code, and (b) in any notice in an Executable
version or related documentation in which You describe the origin or
ownership of the Covered Code.
<P><B>3.4. Intellectual Property Matters</B>
<UL>
<P><B>(a) Third Party Claims</B>.
<BR>If You have knowledge that a party claims an intellectual
property right in particular functionality or code (or its
utilization under this License), you must include a text file with
the source code distribution titled ``LEGAL'' which describes the
claim and the party making the claim in sufficient detail that a
recipient will know whom to contact. If you obtain such knowledge
after You make Your Modification available as described in Section
<B>3.2</B>, You shall promptly modify the LEGAL file in all copies
You make available thereafter and shall take other steps (such as
notifying appropriate mailing lists or newsgroups) reasonably
calculated to inform those who received the Covered Code that new
knowledge has been obtained.
<P><B>(b) Contributor APIs</B>.
<BR>If Your Modification is an application programming interface and
You own or control patents which are reasonably necessary to
implement that API, you must also include this information in the
LEGAL file.
</UL>
<P><B>3.5. Required Notices.</B>
<BR>You must duplicate the notice in <B>Exhibit A</B> in each file of the
Source Code, and this License in any documentation for the Source Code,
where You describe recipients' rights relating to Covered Code. If You
created one or more Modification(s), You may add your name as a
Contributor to the notice described in <B>Exhibit A</B>. If it is not
possible to put such notice in a particular Source Code file due to its
structure, then you must include such notice in a location (such as a
relevant directory file) where a user would be likely to look for such a
notice. You may choose to offer, and to charge a fee for, warranty,
support, indemnity or liability obligations to one or more recipients of
Covered Code. However, You may do so only on Your own behalf, and not on
behalf of the Initial Developer or any Contributor. You must make it
absolutely clear than any such warranty, support, indemnity or liability
obligation is offered by You alone, and You hereby agree to indemnify the
Initial Developer and every Contributor for any liability incurred by the
Initial Developer or such Contributor as a result of warranty, support,
indemnity or liability terms You offer.
<P><B>3.6. Distribution of Executable Versions.</B>
<BR>You may distribute Covered Code in Executable form only if the
requirements of Section <B>3.1-3.5</B> have been met for that Covered
Code, and if You include a notice stating that the Source Code version of
the Covered Code is available under the terms of this License, including
a description of how and where You have fulfilled the obligations of
Section <B>3.2</B>. The notice must be conspicuously included in any
notice in an Executable version, related documentation or collateral in
which You describe recipients' rights relating to the Covered Code. You
may distribute the Executable version of Covered Code under a license of
Your choice, which may contain terms different from this License,
provided that You are in compliance with the terms of this License and
that the license for the Executable version does not attempt to limit or
alter the recipient's rights in the Source Code version from the rights
set forth in this License. If You distribute the Executable version under
a different license You must make it absolutely clear that any terms
which differ from this License are offered by You alone, not by the
Initial Developer or any Contributor. You hereby agree to indemnify the
Initial Developer and every Contributor for any liability incurred by the
Initial Developer or such Contributor as a result of any such terms You
offer.
<P><B>3.7. Larger Works.</B>
<BR>You may create a Larger Work by combining Covered Code with other
code not governed by the terms of this License and distribute the Larger
Work as a single product. In such a case, You must make sure the
requirements of this License are fulfilled for the Covered Code.
</UL>
<B>4. Inability to Comply Due to Statute or Regulation.</B>
<UL>
<P>If it is impossible for You to comply with any of the terms of this
License with respect to some or all of the Covered Code due to statute or
regulation then You must: (a) comply with the terms of this License to
the maximum extent possible; and (b) describe the limitations and the
code they affect. Such description must be included in the LEGAL file
described in Section <B>3.4</B> and must be included with all
distributions of the Source Code. Except to the extent prohibited by
statute or regulation, such description must be sufficiently detailed for
a recipient of ordinary skill
to be able to understand it.
</UL>
<B>5. Application of this License.</B>
<UL>
This License applies to code to which the Initial Developer has attached
the notice in <B>Exhibit A</B>, and to related Covered Code.
</UL>
<B>6. Versions of the License.</B>
<UL>
<B>6.1. New Versions</B>.
<BR>Netscape Communications Corporation (``Netscape'') may publish
revised and/or new versions of the License from time to time. Each
version will be given a distinguishing version number.
<P><B>6.2. Effect of New Versions</B>.
<BR>Once Covered Code has been published under a particular version of
the License, You may always continue to use it under the terms of that
version. You may also choose to use such Covered Code under the terms of
any subsequent version of the License published by Netscape. No one other
than Netscape has the right to modify the terms applicable to Covered
Code created under this License.
<P><B>6.3. Derivative Works</B>.
<BR>If you create or use a modified version of this License (which you
may only do in order to apply it to code which is not already Covered
Code governed by this License), you must (a) rename Your license so that
the phrases ``Mozilla'', ``MOZILLAPL'', ``MOZPL'', ``Netscape'', ``NPL''
or any confusingly similar phrase do not appear anywhere in your license
and (b) otherwise make it clear that your version of the license contains
terms which differ from the Mozilla Public License and Netscape Public
License. (Filling in the name of the Initial Developer, Original Code or
Contributor in the notice described in <B>Exhibit A</B> shall not of
themselves be deemed to be modifications of this License.)
</UL>
<B>7. DISCLAIMER OF WARRANTY.</B>
<UL>
COVERED CODE IS PROVIDED UNDER THIS LICENSE ON AN ``AS IS'' BASIS,
WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING,
WITHOUT LIMITATION, WARRANTIES THAT THE COVERED CODE IS FREE OF DEFECTS,
MERCHANTABLE, FIT FOR A PARTICULAR PURPOSE OR NON-INFRINGING. THE ENTIRE
RISK AS TO THE QUALITY AND PERFORMANCE OF THE COVERED CODE IS WITH
YOU. SHOULD ANY COVERED CODE PROVE DEFECTIVE IN ANY RESPECT, YOU (NOT THE
INITIAL DEVELOPER OR ANY OTHER CONTRIBUTOR) ASSUME THE COST OF ANY
NECESSARY SERVICING, REPAIR OR CORRECTION. THIS DISCLAIMER OF WARRANTY
CONSTITUTES AN ESSENTIAL PART OF THIS LICENSE. NO USE OF ANY COVERED CODE
IS AUTHORIZED HEREUNDER EXCEPT UNDER THIS DISCLAIMER.
</UL>
<B>8. TERMINATION.</B>
<UL>
This License and the rights granted hereunder will terminate
automatically if You fail to comply with terms herein and fail to cure
such breach within 30 days of becoming aware of the breach. All
sublicenses to the Covered Code which are properly granted shall survive
any termination of this License. Provisions which, by their nature, must
remain in effect beyond the termination of this License shall survive.
</UL>
<B>9. LIMITATION OF LIABILITY.</B>
<UL>
UNDER NO CIRCUMSTANCES AND UNDER NO LEGAL THEORY, WHETHER TORT (INCLUDING
NEGLIGENCE), CONTRACT, OR OTHERWISE, SHALL THE INITIAL DEVELOPER, ANY
OTHER CONTRIBUTOR, OR ANY DISTRIBUTOR OF COVERED CODE, OR ANY SUPPLIER OF
ANY OF SUCH PARTIES, BE LIABLE TO YOU OR ANY OTHER PERSON FOR ANY
INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY CHARACTER
INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF GOODWILL, WORK
STOPPAGE, COMPUTER FAILURE OR MALFUNCTION, OR ANY AND ALL OTHER
COMMERCIAL DAMAGES OR LOSSES, EVEN IF SUCH PARTY SHALL HAVE BEEN INFORMED
OF THE POSSIBILITY OF SUCH DAMAGES. THIS LIMITATION OF LIABILITY SHALL
NOT APPLY TO LIABILITY FOR DEATH OR PERSONAL INJURY RESULTING FROM SUCH
PARTY'S NEGLIGENCE TO THE EXTENT APPLICABLE LAW PROHIBITS SUCH
LIMITATION. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION OR LIMITATION
OF INCIDENTAL OR CONSEQUENTIAL DAMAGES, SO THAT EXCLUSION AND LIMITATION
MAY NOT APPLY TO YOU.
</UL>
<B>10. U.S. GOVERNMENT END USERS.</B>
<UL>
The Covered Code is a ``commercial item,'' as that term is defined in 48
C.F.R. 2.101 (Oct. 1995), consisting of ``commercial computer software''
and ``commercial computer software documentation,'' as such terms are
used in 48 C.F.R. 12.212 (Sept. 1995). Consistent with 48 C.F.R. 12.212
and 48 C.F.R. 227.7202-1 through 227.7202-4 (June 1995), all
U.S. Government End Users acquire Covered Code with only those rights set
forth herein.
</UL>
<B>11. MISCELLANEOUS.</B>
<UL>
This License represents the complete agreement concerning subject matter
hereof. If any provision of this License is held to be unenforceable,
such provision shall be reformed only to the extent necessary to make it
enforceable. This License shall be governed by California law provisions
(except to the extent applicable law, if any, provides otherwise),
excluding its conflict-of-law provisions. With respect to disputes in
which at least one party is a citizen of, or an entity chartered or
registered to do business in, the United States of America: (a) unless
otherwise agreed in writing, all disputes relating to this License
(excepting any dispute relating to intellectual property rights) shall be
subject to final and binding arbitration, with the losing party paying
all costs of arbitration; (b) any arbitration relating to this Agreement
shall be held in Santa Clara County, California, under the auspices of
JAMS/EndDispute; and (c) any litigation relating to this Agreement shall
be subject to the jurisdiction of the Federal Courts of the Northern
District of California, with venue lying in Santa Clara County,
California, with the losing party responsible for costs, including
without limitation, court costs and reasonable attorneys fees and
expenses. The application of the United Nations Convention on Contracts
for the International Sale of Goods is expressly excluded. Any law or
regulation which provides that the language of a contract shall be
construed against the drafter shall not apply to this License.
</UL>
<B>12. RESPONSIBILITY FOR CLAIMS.</B>
<UL>
Except in cases where another Contributor has failed to comply with
Section <B>3.4</B>, You are responsible for damages arising, directly or
indirectly, out of Your utilization of rights under this License, based
on the number of copies of Covered Code you made available, the revenues
you received from utilizing such rights, and other relevant factors. You
agree to work with affected parties to distribute responsibility on an
equitable basis.
</UL>
<B>EXHIBIT A.</B>
<UL>
``The contents of this file are subject to the Mozilla Public License
Version 1.0 (the "License"); you may not use this file except in
compliance with the License. You may obtain a copy of the License at
http://www.mozilla.org/MPL/
<P>Software distributed under the License is distributed on an "AS IS"
basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the
License for the specific language governing rights and limitations under
the License.
<P>The Original Code is ______________________________________.
<P>The Initial Developer of the Original Code is
________________________. Portions created by ______________________ are
Copyright (C) ______ _______________________. All Rights Reserved.
<P>Contributor(s): ______________________________________.''
</UL>
1.1 apache-1.3/src/lib/expat/Makefile.tmpl
Index: Makefile.tmpl
===================================================================
#
# default definition of these two. dunno how to get it prepended when the
# Makefile is built, so we do it manually
#
CFLAGS=$(OPTIM) $(CFLAGS1) $(EXTRA_CFLAGS)
INCLUDES=$(INCLUDES1) $(INCLUDES0) $(EXTRA_INCLUDES)
# If you know what your system's byte order is, define BYTE_ORDER:
# use -DBYTE_ORDER=12 for little-endian byte order;
# use -DBYTE_ORDER=21 for big-endian (network) byte order.
#CFLAGS=-O2
OBJS=xmltok.o xmlrole.o xmlparse.o hashtable.o
all lib: libexpat.a
libexpat.a: $(OBJS)
rm -f libexpat.a
ar cr libexpat.a $(OBJS)
$(RANLIB) libexpat.a
clean:
rm -f $(OBJS) libexpat.a
distclean: clean
-rm -f Makefile
.SUFFIXES: .o
.c.o:
$(CC) -c $(INCLUDES) $(CFLAGS) $<
1.1 apache-1.3/src/lib/expat/asciitab.h
Index: asciitab.h
===================================================================
/*
The contents of this file are subject to the Mozilla Public License
Version 1.0 (the "License"); you may not use this file except in
compliance with the License. You may obtain a copy of the License at
http://www.mozilla.org/MPL/
Software distributed under the License is distributed on an "AS IS"
basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the
License for the specific language governing rights and limitations
under the License.
The Original Code is expat.
The Initial Developer of the Original Code is James Clark.
Portions created by James Clark are Copyright (C) 1998
James Clark. All Rights Reserved.
Contributor(s):
*/
/* 0x00 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML,
/* 0x04 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML,
/* 0x08 */ BT_NONXML, BT_S, BT_LF, BT_NONXML,
/* 0x0C */ BT_NONXML, BT_CR, BT_NONXML, BT_NONXML,
/* 0x10 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML,
/* 0x14 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML,
/* 0x18 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML,
/* 0x1C */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML,
/* 0x20 */ BT_S, BT_EXCL, BT_QUOT, BT_NUM,
/* 0x24 */ BT_OTHER, BT_PERCNT, BT_AMP, BT_APOS,
/* 0x28 */ BT_LPAR, BT_RPAR, BT_AST, BT_PLUS,
/* 0x2C */ BT_COMMA, BT_MINUS, BT_NAME, BT_SOL,
/* 0x30 */ BT_DIGIT, BT_DIGIT, BT_DIGIT, BT_DIGIT,
/* 0x34 */ BT_DIGIT, BT_DIGIT, BT_DIGIT, BT_DIGIT,
/* 0x38 */ BT_DIGIT, BT_DIGIT, BT_NMSTRT, BT_SEMI,
/* 0x3C */ BT_LT, BT_EQUALS, BT_GT, BT_QUEST,
/* 0x40 */ BT_OTHER, BT_HEX, BT_HEX, BT_HEX,
/* 0x44 */ BT_HEX, BT_HEX, BT_HEX, BT_NMSTRT,
/* 0x48 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x4C */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x50 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x54 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x58 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_LSQB,
/* 0x5C */ BT_OTHER, BT_RSQB, BT_OTHER, BT_NMSTRT,
/* 0x60 */ BT_OTHER, BT_HEX, BT_HEX, BT_HEX,
/* 0x64 */ BT_HEX, BT_HEX, BT_HEX, BT_NMSTRT,
/* 0x68 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x6C */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x70 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x74 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x78 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_OTHER,
/* 0x7C */ BT_VERBAR, BT_OTHER, BT_OTHER, BT_OTHER,
1.1 apache-1.3/src/lib/expat/expat.html
Index: expat.html
===================================================================
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd">
<HTML>
<TITLE>expat</TITLE>
<BODY>
<H1>expat - XML Parser Toolkit</H1>
<H3>Version 1.0.2</H3>
<P>Copyright (c) 1998, 1999 James Clark. Expat is subject to the <A
HREF="MPL-1_0.html">Mozilla Public License Version 1.0</A>. You may
as a special exception <A href="gplelect.html">elect to use the GNU
General Public License</A>. Please contact me if you wish to
negotiate an alternative license.</P>
<P>Expat is an <A
HREF="http://www.w3.org/TR/1998/REC-xml-19980210">XML 1.0</A> parser
written in C. It aims to be fully conforming. It is currently not a
validating XML processor. Expat can be downloaded from <A href =
"ftp://ftp.jclark.com/pub/xml/expat.zip"
>ftp://ftp.jclark.com/pub/xml/expat.zip</A>. This is a production
version.</P>
<P>The directory <SAMP>xmltok</SAMP> contains a low-level library for
tokenizing XML. The interface is documented in
<SAMP>xmltok/xmltok.h</SAMP>.</P>
<P>The directory <SAMP>xmlparse</SAMP> contains an XML parser library
which is built on top of the <SAMP>xmltok</SAMP> library. The
interface is documented in <SAMP>xmlparse/xmlparse.h</SAMP>. The
directory <SAMP>sample</SAMP> contains a simple example program using
this interface; <SAMP>sample/build.bat</SAMP> is a batch file to build
the example using Visual C++.</P>
<P>The directory <SAMP>xmlwf</SAMP> contains the <SAMP>xmlwf</SAMP>
application, which uses the <SAMP>xmlparse</SAMP> library. The
arguments to <SAMP>xmlwf</SAMP> are one or more files which are each
to be checked for well-formedness. An option <SAMP>-d
<VAR>dir</VAR></SAMP> can be specified; for each well-formed input
file the corresponding <A
href="http://www.jclark.com/xml/canonxml.html">canonical XML</A> will
be written to <SAMP>dir/<VAR>f</VAR></SAMP>, where
<SAMP><VAR>f</VAR></SAMP> is the filename (without any path) of the
input file. A <CODE>-x</CODE> option will cause references to
external general entities to be processed.</P>
<P>The <SAMP>bin</SAMP> directory contains Win32 executables. The
<SAMP>lib</SAMP> directory contains Win32 import libraries.</P>
<P></P>
<ADDRESS>
<A HREF="mailto:jjc@jclark.com">James Clark</A>
</ADDRESS>
</BODY>
</HTML>
1.1 apache-1.3/src/lib/expat/gplelect.html
Index: gplelect.html
===================================================================
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd">
<HTML>
<TITLE>Using the GPL for expat</TITLE>
<BODY>
<H1>Using the GPL for expat</H1>
<P>Instead of using the Mozilla Public License, you may instead elect
to use the <A href="http://www.gnu.org/copyleft/gpl.html">GNU General
Public License</A> (GPL) for a particular copy of expat. The election
to use the GPL is irrevocable for that particular copy. If you do
elect to use the GPL, then you must replace the copyright notice on
each file with the following:</P>
<PRE>
expat XML parser
Copyright (C) 1998 James Clark
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
</PRE>
<P>and you must change readme.html to say that an election has been
made to use this copy under the GPL. If you redistribute expat, you
should also include a copy of the GPL.</P>
<P>Note that I will probably not accept patches under terms that allow
me to distribute them only under the GPL.</P>
<P></P>
<ADDRESS>
<A HREF="mailto:jjc@jclark.com">James Clark</A>
</ADDRESS>
</BODY>
</HTML>
1.1 apache-1.3/src/lib/expat/hashtable.c
Index: hashtable.c
===================================================================
/*
The contents of this file are subject to the Mozilla Public License
Version 1.0 (the "License"); you may not use this file except in
csompliance with the License. You may obtain a copy of the License at
http://www.mozilla.org/MPL/
Software distributed under the License is distributed on an "AS IS"
basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the
License for the specific language governing rights and limitations
under the License.
The Original Code is expat.
The Initial Developer of the Original Code is James Clark.
Portions created by James Clark are Copyright (C) 1998
James Clark. All Rights Reserved.
Contributor(s):
*/
#include <stdlib.h>
#include <string.h>
#include "hashtable.h"
#ifdef XML_UNICODE
#define keycmp wcscmp
#else
#define keycmp strcmp
#endif
#define INIT_SIZE 64
static
unsigned long hash(KEY s)
{
unsigned long h = 0;
while (*s)
h = (h << 5) + h + (unsigned char)*s++;
return h;
}
NAMED *lookup(HASH_TABLE *table, KEY name, size_t createSize)
{
size_t i;
if (table->size == 0) {
if (!createSize)
return 0;
table->v = calloc(INIT_SIZE, sizeof(NAMED *));
if (!table->v)
return 0;
table->size = INIT_SIZE;
table->usedLim = INIT_SIZE / 2;
i = hash(name) & (table->size - 1);
}
else {
unsigned long h = hash(name);
for (i = h & (table->size - 1);
table->v[i];
i == 0 ? i = table->size - 1 : --i) {
if (keycmp(name, table->v[i]->name) == 0)
return table->v[i];
}
if (!createSize)
return 0;
if (table->used == table->usedLim) {
/* check for overflow */
size_t newSize = table->size * 2;
NAMED **newV = calloc(newSize, sizeof(NAMED *));
if (!newV)
return 0;
for (i = 0; i < table->size; i++)
if (table->v[i]) {
size_t j;
for (j = hash(table->v[i]->name) & (newSize - 1);
newV[j];
j == 0 ? j = newSize - 1 : --j)
;
newV[j] = table->v[i];
}
free(table->v);
table->v = newV;
table->size = newSize;
table->usedLim = newSize/2;
for (i = h & (table->size - 1);
table->v[i];
i == 0 ? i = table->size - 1 : --i)
;
}
}
table->v[i] = calloc(1, createSize);
if (!table->v[i])
return 0;
table->v[i]->name = name;
(table->used)++;
return table->v[i];
}
void hashTableDestroy(HASH_TABLE *table)
{
size_t i;
for (i = 0; i < table->size; i++) {
NAMED *p = table->v[i];
if (p)
free(p);
}
free(table->v);
}
void hashTableInit(HASH_TABLE *p)
{
p->size = 0;
p->usedLim = 0;
p->used = 0;
p->v = 0;
}
void hashTableIterInit(HASH_TABLE_ITER *iter, const HASH_TABLE *table)
{
iter->p = table->v;
iter->end = iter->p + table->size;
}
NAMED *hashTableIterNext(HASH_TABLE_ITER *iter)
{
while (iter->p != iter->end) {
NAMED *tem = *(iter->p)++;
if (tem)
return tem;
}
return 0;
}
1.1 apache-1.3/src/lib/expat/hashtable.h
Index: hashtable.h
===================================================================
/*
The contents of this file are subject to the Mozilla Public License
Version 1.0 (the "License"); you may not use this file except in
compliance with the License. You may obtain a copy of the License at
http://www.mozilla.org/MPL/
Software distributed under the License is distributed on an "AS IS"
basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the
License for the specific language governing rights and limitations
under the License.
The Original Code is expat.
The Initial Developer of the Original Code is James Clark.
Portions created by James Clark are Copyright (C) 1998
James Clark. All Rights Reserved.
Contributor(s):
*/
#include <stddef.h>
#ifdef XML_UNICODE
typedef const wchar_t *KEY;
#else
typedef const char *KEY;
#endif
typedef struct {
KEY name;
} NAMED;
typedef struct {
NAMED **v;
size_t size;
size_t used;
size_t usedLim;
} HASH_TABLE;
NAMED *lookup(HASH_TABLE *table, KEY name, size_t createSize);
void hashTableInit(HASH_TABLE *);
void hashTableDestroy(HASH_TABLE *);
typedef struct {
NAMED **p;
NAMED **end;
} HASH_TABLE_ITER;
void hashTableIterInit(HASH_TABLE_ITER *, const HASH_TABLE *);
NAMED *hashTableIterNext(HASH_TABLE_ITER *);
1.1 apache-1.3/src/lib/expat/iasciitab.h
Index: iasciitab.h
===================================================================
/*
The contents of this file are subject to the Mozilla Public License
Version 1.0 (the "License"); you may not use this file except in
compliance with the License. You may obtain a copy of the License at
http://www.mozilla.org/MPL/
Software distributed under the License is distributed on an "AS IS"
basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the
License for the specific language governing rights and limitations
under the License.
The Original Code is expat.
The Initial Developer of the Original Code is James Clark.
Portions created by James Clark are Copyright (C) 1998
James Clark. All Rights Reserved.
Contributor(s):
*/
/* Like asciitab.h, except that 0xD has code BT_S rather than BT_CR */
/* 0x00 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML,
/* 0x04 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML,
/* 0x08 */ BT_NONXML, BT_S, BT_LF, BT_NONXML,
/* 0x0C */ BT_NONXML, BT_S, BT_NONXML, BT_NONXML,
/* 0x10 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML,
/* 0x14 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML,
/* 0x18 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML,
/* 0x1C */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML,
/* 0x20 */ BT_S, BT_EXCL, BT_QUOT, BT_NUM,
/* 0x24 */ BT_OTHER, BT_PERCNT, BT_AMP, BT_APOS,
/* 0x28 */ BT_LPAR, BT_RPAR, BT_AST, BT_PLUS,
/* 0x2C */ BT_COMMA, BT_MINUS, BT_NAME, BT_SOL,
/* 0x30 */ BT_DIGIT, BT_DIGIT, BT_DIGIT, BT_DIGIT,
/* 0x34 */ BT_DIGIT, BT_DIGIT, BT_DIGIT, BT_DIGIT,
/* 0x38 */ BT_DIGIT, BT_DIGIT, BT_NMSTRT, BT_SEMI,
/* 0x3C */ BT_LT, BT_EQUALS, BT_GT, BT_QUEST,
/* 0x40 */ BT_OTHER, BT_HEX, BT_HEX, BT_HEX,
/* 0x44 */ BT_HEX, BT_HEX, BT_HEX, BT_NMSTRT,
/* 0x48 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x4C */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x50 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x54 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x58 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_LSQB,
/* 0x5C */ BT_OTHER, BT_RSQB, BT_OTHER, BT_NMSTRT,
/* 0x60 */ BT_OTHER, BT_HEX, BT_HEX, BT_HEX,
/* 0x64 */ BT_HEX, BT_HEX, BT_HEX, BT_NMSTRT,
/* 0x68 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x6C */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x70 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x74 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x78 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_OTHER,
/* 0x7C */ BT_VERBAR, BT_OTHER, BT_OTHER, BT_OTHER,
1.1 apache-1.3/src/lib/expat/latin1tab.h
Index: latin1tab.h
===================================================================
/*
The contents of this file are subject to the Mozilla Public License
Version 1.0 (the "License"); you may not use this file except in
compliance with the License. You may obtain a copy of the License at
http://www.mozilla.org/MPL/
Software distributed under the License is distributed on an "AS IS"
basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the
License for the specific language governing rights and limitations
under the License.
The Original Code is expat.
The Initial Developer of the Original Code is James Clark.
Portions created by James Clark are Copyright (C) 1998
James Clark. All Rights Reserved.
Contributor(s):
*/
/* 0x80 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER,
/* 0x84 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER,
/* 0x88 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER,
/* 0x8C */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER,
/* 0x90 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER,
/* 0x94 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER,
/* 0x98 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER,
/* 0x9C */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER,
/* 0xA0 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER,
/* 0xA4 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER,
/* 0xA8 */ BT_OTHER, BT_OTHER, BT_NMSTRT, BT_OTHER,
/* 0xAC */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER,
/* 0xB0 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER,
/* 0xB4 */ BT_OTHER, BT_NMSTRT, BT_OTHER, BT_NAME,
/* 0xB8 */ BT_OTHER, BT_OTHER, BT_NMSTRT, BT_OTHER,
/* 0xBC */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER,
/* 0xC0 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0xC4 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0xC8 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0xCC */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0xD0 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0xD4 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_OTHER,
/* 0xD8 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0xDC */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0xE0 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0xE4 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0xE8 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0xEC */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0xF0 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0xF4 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_OTHER,
/* 0xF8 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0xFC */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
1.1 apache-1.3/src/lib/expat/nametab.h
Index: nametab.h
===================================================================
static const unsigned namingBitmap[] = {
0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000,
0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
0x00000000, 0x04000000, 0x87FFFFFE, 0x07FFFFFE,
0x00000000, 0x00000000, 0xFF7FFFFF, 0xFF7FFFFF,
0xFFFFFFFF, 0x7FF3FFFF, 0xFFFFFDFE, 0x7FFFFFFF,
0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFE00F, 0xFC31FFFF,
0x00FFFFFF, 0x00000000, 0xFFFF0000, 0xFFFFFFFF,
0xFFFFFFFF, 0xF80001FF, 0x00000003, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000,
0xFFFFD740, 0xFFFFFFFB, 0x547F7FFF, 0x000FFFFD,
0xFFFFDFFE, 0xFFFFFFFF, 0xDFFEFFFF, 0xFFFFFFFF,
0xFFFF0003, 0xFFFFFFFF, 0xFFFF199F, 0x033FCFFF,
0x00000000, 0xFFFE0000, 0x027FFFFF, 0xFFFFFFFE,
0x0000007F, 0x00000000, 0xFFFF0000, 0x000707FF,
0x00000000, 0x07FFFFFE, 0x000007FE, 0xFFFE0000,
0xFFFFFFFF, 0x7CFFFFFF, 0x002F7FFF, 0x00000060,
0xFFFFFFE0, 0x23FFFFFF, 0xFF000000, 0x00000003,
0xFFF99FE0, 0x03C5FDFF, 0xB0000000, 0x00030003,
0xFFF987E0, 0x036DFDFF, 0x5E000000, 0x001C0000,
0xFFFBAFE0, 0x23EDFDFF, 0x00000000, 0x00000001,
0xFFF99FE0, 0x23CDFDFF, 0xB0000000, 0x00000003,
0xD63DC7E0, 0x03BFC718, 0x00000000, 0x00000000,
0xFFFDDFE0, 0x03EFFDFF, 0x00000000, 0x00000003,
0xFFFDDFE0, 0x03EFFDFF, 0x40000000, 0x00000003,
0xFFFDDFE0, 0x03FFFDFF, 0x00000000, 0x00000003,
0x00000000, 0x00000000, 0x00000000, 0x00000000,
0xFFFFFFFE, 0x000D7FFF, 0x0000003F, 0x00000000,
0xFEF02596, 0x200D6CAE, 0x0000001F, 0x00000000,
0x00000000, 0x00000000, 0xFFFFFEFF, 0x000003FF,
0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0xFFFFFFFF, 0xFFFF003F, 0x007FFFFF,
0x0007DAED, 0x50000000, 0x82315001, 0x002C62AB,
0x40000000, 0xF580C900, 0x00000007, 0x02010800,
0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
0x0FFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x03FFFFFF,
0x3F3FFFFF, 0xFFFFFFFF, 0xAAFF3F3F, 0x3FFFFFFF,
0xFFFFFFFF, 0x5FDFFFFF, 0x0FCF1FDC, 0x1FDC1FFF,
0x00000000, 0x00004C40, 0x00000000, 0x00000000,
0x00000007, 0x00000000, 0x00000000, 0x00000000,
0x00000080, 0x000003FE, 0xFFFFFFFE, 0xFFFFFFFF,
0x001FFFFF, 0xFFFFFFFE, 0xFFFFFFFF, 0x07FFFFFF,
0xFFFFFFE0, 0x00001FFF, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000,
0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
0xFFFFFFFF, 0x0000003F, 0x00000000, 0x00000000,
0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
0xFFFFFFFF, 0x0000000F, 0x00000000, 0x00000000,
0x00000000, 0x07FF6000, 0x87FFFFFE, 0x07FFFFFE,
0x00000000, 0x00800000, 0xFF7FFFFF, 0xFF7FFFFF,
0x00FFFFFF, 0x00000000, 0xFFFF0000, 0xFFFFFFFF,
0xFFFFFFFF, 0xF80001FF, 0x00030003, 0x00000000,
0xFFFFFFFF, 0xFFFFFFFF, 0x0000003F, 0x00000003,
0xFFFFD7C0, 0xFFFFFFFB, 0x547F7FFF, 0x000FFFFD,
0xFFFFDFFE, 0xFFFFFFFF, 0xDFFEFFFF, 0xFFFFFFFF,
0xFFFF007B, 0xFFFFFFFF, 0xFFFF199F, 0x033FCFFF,
0x00000000, 0xFFFE0000, 0x027FFFFF, 0xFFFFFFFE,
0xFFFE007F, 0xBBFFFFFB, 0xFFFF0016, 0x000707FF,
0x00000000, 0x07FFFFFE, 0x0007FFFF, 0xFFFF03FF,
0xFFFFFFFF, 0x7CFFFFFF, 0xFFEF7FFF, 0x03FF3DFF,
0xFFFFFFEE, 0xF3FFFFFF, 0xFF1E3FFF, 0x0000FFCF,
0xFFF99FEE, 0xD3C5FDFF, 0xB080399F, 0x0003FFCF,
0xFFF987E4, 0xD36DFDFF, 0x5E003987, 0x001FFFC0,
0xFFFBAFEE, 0xF3EDFDFF, 0x00003BBF, 0x0000FFC1,
0xFFF99FEE, 0xF3CDFDFF, 0xB0C0398F, 0x0000FFC3,
0xD63DC7EC, 0xC3BFC718, 0x00803DC7, 0x0000FF80,
0xFFFDDFEE, 0xC3EFFDFF, 0x00603DDF, 0x0000FFC3,
0xFFFDDFEC, 0xC3EFFDFF, 0x40603DDF, 0x0000FFC3,
0xFFFDDFEC, 0xC3FFFDFF, 0x00803DCF, 0x0000FFC3,
0x00000000, 0x00000000, 0x00000000, 0x00000000,
0xFFFFFFFE, 0x07FF7FFF, 0x03FF7FFF, 0x00000000,
0xFEF02596, 0x3BFF6CAE, 0x03FF3F5F, 0x00000000,
0x03000000, 0xC2A003FF, 0xFFFFFEFF, 0xFFFE03FF,
0xFEBF0FDF, 0x02FE3FFF, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x1FFF0000, 0x00000002,
0x000000A0, 0x003EFFFE, 0xFFFFFFFE, 0xFFFFFFFF,
0x661FFFFF, 0xFFFFFFFE, 0xFFFFFFFF, 0x77FFFFFF,
};
static const unsigned char nmstrtPages[] = {
0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x00,
0x00, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F,
0x10, 0x11, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x12, 0x13,
0x00, 0x14, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x15, 0x16, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x17,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x18,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
};
static const unsigned char namePages[] = {
0x19, 0x03, 0x1A, 0x1B, 0x1C, 0x1D, 0x1E, 0x00,
0x00, 0x1F, 0x20, 0x21, 0x22, 0x23, 0x24, 0x25,
0x10, 0x11, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x12, 0x13,
0x26, 0x14, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x27, 0x16, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x17,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x18,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
};
1.1 apache-1.3/src/lib/expat/utf8tab.h
Index: utf8tab.h
===================================================================
/*
The contents of this file are subject to the Mozilla Public License
Version 1.0 (the "License"); you may not use this file except in
compliance with the License. You may obtain a copy of the License at
http://www.mozilla.org/MPL/
Software distributed under the License is distributed on an "AS IS"
basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the
License for the specific language governing rights and limitations
under the License.
The Original Code is expat.
The Initial Developer of the Original Code is James Clark.
Portions created by James Clark are Copyright (C) 1998
James Clark. All Rights Reserved.
Contributor(s):
*/
/* 0x80 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0x84 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0x88 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0x8C */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0x90 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0x94 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0x98 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0x9C */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0xA0 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0xA4 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0xA8 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0xAC */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0xB0 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0xB4 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0xB8 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0xBC */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0xC0 */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2,
/* 0xC4 */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2,
/* 0xC8 */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2,
/* 0xCC */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2,
/* 0xD0 */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2,
/* 0xD4 */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2,
/* 0xD8 */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2,
/* 0xDC */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2,
/* 0xE0 */ BT_LEAD3, BT_LEAD3, BT_LEAD3, BT_LEAD3,
/* 0xE4 */ BT_LEAD3, BT_LEAD3, BT_LEAD3, BT_LEAD3,
/* 0xE8 */ BT_LEAD3, BT_LEAD3, BT_LEAD3, BT_LEAD3,
/* 0xEC */ BT_LEAD3, BT_LEAD3, BT_LEAD3, BT_LEAD3,
/* 0xF0 */ BT_LEAD4, BT_LEAD4, BT_LEAD4, BT_LEAD4,
/* 0xF4 */ BT_LEAD4, BT_NONXML, BT_NONXML, BT_NONXML,
/* 0xF8 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML,
/* 0xFC */ BT_NONXML, BT_NONXML, BT_MALFORM, BT_MALFORM,
1.1 apache-1.3/src/lib/expat/xmlparse.c
Index: xmlparse.c
===================================================================
/*
The contents of this file are subject to the Mozilla Public License
Version 1.0 (the "License"); you may not use this file except in
compliance with the License. You may obtain a copy of the License at
http://www.mozilla.org/MPL/
Software distributed under the License is distributed on an "AS IS"
basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the
License for the specific language governing rights and limitations
under the License.
The Original Code is expat.
The Initial Developer of the Original Code is James Clark.
Portions created by James Clark are Copyright (C) 1998
James Clark. All Rights Reserved.
Contributor(s):
*/
#include <stdlib.h>
#include <string.h>
#include <stddef.h>
#ifdef XML_UNICODE
#define XML_ENCODE_MAX XML_UTF16_ENCODE_MAX
#define XmlConvert XmlUtf16Convert
#define XmlGetInternalEncoding XmlGetUtf16InternalEncoding
#define XmlEncode XmlUtf16Encode
#define MUST_CONVERT(enc, s) (!(enc)->isUtf16 || (((unsigned long)s) & 1))
typedef unsigned short ICHAR;
#else
#define XML_ENCODE_MAX XML_UTF8_ENCODE_MAX
#define XmlConvert XmlUtf8Convert
#define XmlGetInternalEncoding XmlGetUtf8InternalEncoding
#define XmlEncode XmlUtf8Encode
#define MUST_CONVERT(enc, s) (!(enc)->isUtf8)
typedef char ICHAR;
#endif
#ifdef XML_UNICODE_WCHAR_T
#define XML_T(x) L ## x
#else
#define XML_T(x) x
#endif
/* Round up n to be a multiple of sz, where sz is a power of 2. */
#define ROUND_UP(n, sz) (((n) + ((sz) - 1)) & ~((sz) - 1))
#include "xmlparse.h"
#include "xmltok.h"
#include "xmlrole.h"
#include "hashtable.h"
#define INIT_TAG_BUF_SIZE 32 /* must be a multiple of sizeof(XML_Char) */
#define INIT_DATA_BUF_SIZE 1024
#define INIT_ATTS_SIZE 16
#define INIT_BLOCK_SIZE 1024
#define INIT_BUFFER_SIZE 1024
typedef struct tag {
struct tag *parent;
const char *rawName;
int rawNameLength;
const XML_Char *name;
char *buf;
char *bufEnd;
} TAG;
typedef struct {
const XML_Char *name;
const XML_Char *textPtr;
int textLen;
const XML_Char *systemId;
const XML_Char *base;
const XML_Char *publicId;
const XML_Char *notation;
char open;
} ENTITY;
typedef struct block {
struct block *next;
int size;
XML_Char s[1];
} BLOCK;
typedef struct {
BLOCK *blocks;
BLOCK *freeBlocks;
const XML_Char *end;
XML_Char *ptr;
XML_Char *start;
} STRING_POOL;
/* The XML_Char before the name is used to determine whether
an attribute has been specified. */
typedef struct {
XML_Char *name;
char maybeTokenized;
} ATTRIBUTE_ID;
typedef struct {
const ATTRIBUTE_ID *id;
char isCdata;
const XML_Char *value;
} DEFAULT_ATTRIBUTE;
typedef struct {
const XML_Char *name;
int nDefaultAtts;
int allocDefaultAtts;
DEFAULT_ATTRIBUTE *defaultAtts;
} ELEMENT_TYPE;
typedef struct {
HASH_TABLE generalEntities;
HASH_TABLE elementTypes;
HASH_TABLE attributeIds;
STRING_POOL pool;
int complete;
int standalone;
const XML_Char *base;
} DTD;
typedef enum XML_Error Processor(XML_Parser parser,
const char *start,
const char *end,
const char **endPtr);
static Processor prologProcessor;
static Processor prologInitProcessor;
static Processor contentProcessor;
static Processor cdataSectionProcessor;
static Processor epilogProcessor;
static Processor errorProcessor;
static Processor externalEntityInitProcessor;
static Processor externalEntityInitProcessor2;
static Processor externalEntityInitProcessor3;
static Processor externalEntityContentProcessor;
static enum XML_Error
handleUnknownEncoding(XML_Parser parser, const XML_Char *encodingName);
static enum XML_Error
processXmlDecl(XML_Parser parser, int isGeneralTextEntity, const char *, const char *);
static enum XML_Error
initializeEncoding(XML_Parser parser);
static enum XML_Error
doContent(XML_Parser parser, int startTagLevel, const ENCODING *enc,
const char *start, const char *end, const char **endPtr);
static enum XML_Error
doCdataSection(XML_Parser parser, const ENCODING *, const char **startPtr, const char *end, const char **nextPtr);
static enum XML_Error storeAtts(XML_Parser parser, const ENCODING *, const XML_Char *tagName, const char *s);
static int
defineAttribute(ELEMENT_TYPE *type, ATTRIBUTE_ID *, int isCdata, const XML_Char *dfltValue);
static enum XML_Error
storeAttributeValue(XML_Parser parser, const ENCODING *, int isCdata, const char *, const char *,
STRING_POOL *);
static enum XML_Error
appendAttributeValue(XML_Parser parser, const ENCODING *, int isCdata, const char *, const char *,
STRING_POOL *);
static ATTRIBUTE_ID *
getAttributeId(XML_Parser parser, const ENCODING *enc, const char *start, const char *end);
static enum XML_Error
storeEntityValue(XML_Parser parser, const char *start, const char *end);
static int
reportProcessingInstruction(XML_Parser parser, const ENCODING *enc, const char *start, const char *end);
static void
reportDefault(XML_Parser parser, const ENCODING *enc, const char *start, const char *end);
static const XML_Char *getOpenEntityNames(XML_Parser parser);
static int setOpenEntityNames(XML_Parser parser, const XML_Char *openEntityNames);
static void normalizePublicId(XML_Char *s);
static int dtdInit(DTD *);
static void dtdDestroy(DTD *);
static int dtdCopy(DTD *newDtd, const DTD *oldDtd);
static void poolInit(STRING_POOL *);
static void poolClear(STRING_POOL *);
static void poolDestroy(STRING_POOL *);
static XML_Char *poolAppend(STRING_POOL *pool, const ENCODING *enc,
const char *ptr, const char *end);
static XML_Char *poolStoreString(STRING_POOL *pool, const ENCODING *enc,
const char *ptr, const char *end);
static int poolGrow(STRING_POOL *pool);
static const XML_Char *poolCopyString(STRING_POOL *pool, const XML_Char *s);
static const XML_Char *poolCopyStringN(STRING_POOL *pool, const XML_Char *s, int n);
#define poolStart(pool) ((pool)->start)
#define poolEnd(pool) ((pool)->ptr)
#define poolLength(pool) ((pool)->ptr - (pool)->start)
#define poolChop(pool) ((void)--(pool->ptr))
#define poolLastChar(pool) (((pool)->ptr)[-1])
#define poolDiscard(pool) ((pool)->ptr = (pool)->start)
#define poolFinish(pool) ((pool)->start = (pool)->ptr)
#define poolAppendChar(pool, c) \
(((pool)->ptr == (pool)->end && !poolGrow(pool)) \
? 0 \
: ((*((pool)->ptr)++ = c), 1))
typedef struct {
/* The first member must be userData so that the XML_GetUserData macro works. */
void *userData;
void *handlerArg;
char *buffer;
/* first character to be parsed */
const char *bufferPtr;
/* past last character to be parsed */
char *bufferEnd;
/* allocated end of buffer */
const char *bufferLim;
long parseEndByteIndex;
const char *parseEndPtr;
XML_Char *dataBuf;
XML_Char *dataBufEnd;
XML_StartElementHandler startElementHandler;
XML_EndElementHandler endElementHandler;
XML_CharacterDataHandler characterDataHandler;
XML_ProcessingInstructionHandler processingInstructionHandler;
XML_DefaultHandler defaultHandler;
XML_UnparsedEntityDeclHandler unparsedEntityDeclHandler;
XML_NotationDeclHandler notationDeclHandler;
XML_ExternalEntityRefHandler externalEntityRefHandler;
XML_UnknownEncodingHandler unknownEncodingHandler;
const ENCODING *encoding;
INIT_ENCODING initEncoding;
const XML_Char *protocolEncodingName;
void *unknownEncodingMem;
void *unknownEncodingData;
void *unknownEncodingHandlerData;
void (*unknownEncodingRelease)(void *);
PROLOG_STATE prologState;
Processor *processor;
enum XML_Error errorCode;
const char *eventPtr;
const char *eventEndPtr;
const char *positionPtr;
int tagLevel;
ENTITY *declEntity;
const XML_Char *declNotationName;
const XML_Char *declNotationPublicId;
ELEMENT_TYPE *declElementType;
ATTRIBUTE_ID *declAttributeId;
char declAttributeIsCdata;
DTD dtd;
TAG *tagStack;
TAG *freeTagList;
int attsSize;
ATTRIBUTE *atts;
POSITION position;
STRING_POOL tempPool;
STRING_POOL temp2Pool;
char *groupConnector;
unsigned groupSize;
int hadExternalDoctype;
} Parser;
#define userData (((Parser *)parser)->userData)
#define handlerArg (((Parser *)parser)->handlerArg)
#define startElementHandler (((Parser *)parser)->startElementHandler)
#define endElementHandler (((Parser *)parser)->endElementHandler)
#define characterDataHandler (((Parser *)parser)->characterDataHandler)
#define processingInstructionHandler (((Parser *)parser)->processingInstructionHandler)
#define defaultHandler (((Parser *)parser)->defaultHandler)
#define unparsedEntityDeclHandler (((Parser *)parser)->unparsedEntityDeclHandler)
#define notationDeclHandler (((Parser *)parser)->notationDeclHandler)
#define externalEntityRefHandler (((Parser *)parser)->externalEntityRefHandler)
#define unknownEncodingHandler (((Parser *)parser)->unknownEncodingHandler)
#define encoding (((Parser *)parser)->encoding)
#define initEncoding (((Parser *)parser)->initEncoding)
#define unknownEncodingMem (((Parser *)parser)->unknownEncodingMem)
#define unknownEncodingData (((Parser *)parser)->unknownEncodingData)
#define unknownEncodingHandlerData \
(((Parser *)parser)->unknownEncodingHandlerData)
#define unknownEncodingRelease (((Parser *)parser)->unknownEncodingRelease)
#define protocolEncodingName (((Parser *)parser)->protocolEncodingName)
#define prologState (((Parser *)parser)->prologState)
#define processor (((Parser *)parser)->processor)
#define errorCode (((Parser *)parser)->errorCode)
#define eventPtr (((Parser *)parser)->eventPtr)
#define eventEndPtr (((Parser *)parser)->eventEndPtr)
#define positionPtr (((Parser *)parser)->positionPtr)
#define position (((Parser *)parser)->position)
#define tagLevel (((Parser *)parser)->tagLevel)
#define buffer (((Parser *)parser)->buffer)
#define bufferPtr (((Parser *)parser)->bufferPtr)
#define bufferEnd (((Parser *)parser)->bufferEnd)
#define parseEndByteIndex (((Parser *)parser)->parseEndByteIndex)
#define parseEndPtr (((Parser *)parser)->parseEndPtr)
#define bufferLim (((Parser *)parser)->bufferLim)
#define dataBuf (((Parser *)parser)->dataBuf)
#define dataBufEnd (((Parser *)parser)->dataBufEnd)
#define dtd (((Parser *)parser)->dtd)
#define declEntity (((Parser *)parser)->declEntity)
#define declNotationName (((Parser *)parser)->declNotationName)
#define declNotationPublicId (((Parser *)parser)->declNotationPublicId)
#define declElementType (((Parser *)parser)->declElementType)
#define declAttributeId (((Parser *)parser)->declAttributeId)
#define declAttributeIsCdata (((Parser *)parser)->declAttributeIsCdata)
#define freeTagList (((Parser *)parser)->freeTagList)
#define tagStack (((Parser *)parser)->tagStack)
#define atts (((Parser *)parser)->atts)
#define attsSize (((Parser *)parser)->attsSize)
#define tempPool (((Parser *)parser)->tempPool)
#define temp2Pool (((Parser *)parser)->temp2Pool)
#define groupConnector (((Parser *)parser)->groupConnector)
#define groupSize (((Parser *)parser)->groupSize)
#define hadExternalDoctype (((Parser *)parser)->hadExternalDoctype)
XML_Parser XML_ParserCreate(const XML_Char *encodingName)
{
XML_Parser parser = malloc(sizeof(Parser));
if (!parser)
return parser;
processor = prologInitProcessor;
XmlPrologStateInit(&prologState);
userData = 0;
handlerArg = 0;
startElementHandler = 0;
endElementHandler = 0;
characterDataHandler = 0;
processingInstructionHandler = 0;
defaultHandler = 0;
unparsedEntityDeclHandler = 0;
notationDeclHandler = 0;
externalEntityRefHandler = 0;
unknownEncodingHandler = 0;
buffer = 0;
bufferPtr = 0;
bufferEnd = 0;
parseEndByteIndex = 0;
parseEndPtr = 0;
bufferLim = 0;
declElementType = 0;
declAttributeId = 0;
declEntity = 0;
declNotationName = 0;
declNotationPublicId = 0;
memset(&position, 0, sizeof(POSITION));
errorCode = XML_ERROR_NONE;
eventPtr = 0;
eventEndPtr = 0;
positionPtr = 0;
tagLevel = 0;
tagStack = 0;
freeTagList = 0;
attsSize = INIT_ATTS_SIZE;
atts = malloc(attsSize * sizeof(ATTRIBUTE));
dataBuf = malloc(INIT_DATA_BUF_SIZE * sizeof(XML_Char));
groupSize = 0;
groupConnector = 0;
hadExternalDoctype = 0;
unknownEncodingMem = 0;
unknownEncodingRelease = 0;
unknownEncodingData = 0;
unknownEncodingHandlerData = 0;
poolInit(&tempPool);
poolInit(&temp2Pool);
protocolEncodingName = encodingName ? poolCopyString(&tempPool, encodingName) : 0;
if (!dtdInit(&dtd) || !atts || !dataBuf
|| (encodingName && !protocolEncodingName)) {
XML_ParserFree(parser);
return 0;
}
dataBufEnd = dataBuf + INIT_DATA_BUF_SIZE;
XmlInitEncoding(&initEncoding, &encoding, 0);
return parser;
}
XML_Parser XML_ExternalEntityParserCreate(XML_Parser oldParser,
const XML_Char *openEntityNames,
const XML_Char *encodingName)
{
XML_Parser parser = oldParser;
DTD *oldDtd = &dtd;
XML_StartElementHandler oldStartElementHandler = startElementHandler;
XML_EndElementHandler oldEndElementHandler = endElementHandler;
XML_CharacterDataHandler oldCharacterDataHandler = characterDataHandler;
XML_ProcessingInstructionHandler oldProcessingInstructionHandler = processingInstructionHandler;
XML_DefaultHandler oldDefaultHandler = defaultHandler;
XML_ExternalEntityRefHandler oldExternalEntityRefHandler = externalEntityRefHandler;
XML_UnknownEncodingHandler oldUnknownEncodingHandler = unknownEncodingHandler;
void *oldUserData = userData;
void *oldHandlerArg = handlerArg;
parser = XML_ParserCreate(encodingName);
if (!parser)
return 0;
startElementHandler = oldStartElementHandler;
endElementHandler = oldEndElementHandler;
characterDataHandler = oldCharacterDataHandler;
processingInstructionHandler = oldProcessingInstructionHandler;
defaultHandler = oldDefaultHandler;
externalEntityRefHandler = oldExternalEntityRefHandler;
unknownEncodingHandler = oldUnknownEncodingHandler;
userData = oldUserData;
if (oldUserData == oldHandlerArg)
handlerArg = userData;
else
handlerArg = parser;
if (!dtdCopy(&dtd, oldDtd) || !setOpenEntityNames(parser, openEntityNames)) {
XML_ParserFree(parser);
return 0;
}
processor = externalEntityInitProcessor;
return parser;
}
void XML_ParserFree(XML_Parser parser)
{
for (;;) {
TAG *p;
if (tagStack == 0) {
if (freeTagList == 0)
break;
tagStack = freeTagList;
freeTagList = 0;
}
p = tagStack;
tagStack = tagStack->parent;
free(p->buf);
free(p);
}
poolDestroy(&tempPool);
poolDestroy(&temp2Pool);
dtdDestroy(&dtd);
free((void *)atts);
free(groupConnector);
free(buffer);
free(dataBuf);
free(unknownEncodingMem);
if (unknownEncodingRelease)
unknownEncodingRelease(unknownEncodingData);
free(parser);
}
void XML_UseParserAsHandlerArg(XML_Parser parser)
{
handlerArg = parser;
}
void XML_SetUserData(XML_Parser parser, void *p)
{
if (handlerArg == userData)
handlerArg = userData = p;
else
userData = p;
}
int XML_SetBase(XML_Parser parser, const XML_Char *p)
{
if (p) {
p = poolCopyString(&dtd.pool, p);
if (!p)
return 0;
dtd.base = p;
}
else
dtd.base = 0;
return 1;
}
const XML_Char *XML_GetBase(XML_Parser parser)
{
return dtd.base;
}
void XML_SetElementHandler(XML_Parser parser,
XML_StartElementHandler start,
XML_EndElementHandler end)
{
startElementHandler = start;
endElementHandler = end;
}
void XML_SetCharacterDataHandler(XML_Parser parser,
XML_CharacterDataHandler handler)
{
characterDataHandler = handler;
}
void XML_SetProcessingInstructionHandler(XML_Parser parser,
XML_ProcessingInstructionHandler handler)
{
processingInstructionHandler = handler;
}
void XML_SetDefaultHandler(XML_Parser parser,
XML_DefaultHandler handler)
{
defaultHandler = handler;
}
void XML_SetUnparsedEntityDeclHandler(XML_Parser parser,
XML_UnparsedEntityDeclHandler handler)
{
unparsedEntityDeclHandler = handler;
}
void XML_SetNotationDeclHandler(XML_Parser parser,
XML_NotationDeclHandler handler)
{
notationDeclHandler = handler;
}
void XML_SetExternalEntityRefHandler(XML_Parser parser,
XML_ExternalEntityRefHandler handler)
{
externalEntityRefHandler = handler;
}
void XML_SetUnknownEncodingHandler(XML_Parser parser,
XML_UnknownEncodingHandler handler,
void *data)
{
unknownEncodingHandler = handler;
unknownEncodingHandlerData = data;
}
int XML_Parse(XML_Parser parser, const char *s, int len, int isFinal)
{
if (len == 0) {
if (!isFinal)
return 1;
errorCode = processor(parser, bufferPtr, parseEndPtr = bufferEnd, 0);
if (errorCode == XML_ERROR_NONE)
return 1;
eventEndPtr = eventPtr;
return 0;
}
else if (bufferPtr == bufferEnd) {
const char *end;
int nLeftOver;
parseEndByteIndex += len;
positionPtr = s;
if (isFinal) {
errorCode = processor(parser, s, parseEndPtr = s + len, 0);
if (errorCode == XML_ERROR_NONE)
return 1;
eventEndPtr = eventPtr;
return 0;
}
errorCode = processor(parser, s, parseEndPtr = s + len, &end);
if (errorCode != XML_ERROR_NONE) {
eventEndPtr = eventPtr;
return 0;
}
XmlUpdatePosition(encoding, positionPtr, end, &position);
nLeftOver = s + len - end;
if (nLeftOver) {
if (buffer == 0 || nLeftOver > bufferLim - buffer) {
/* FIXME avoid integer overflow */
buffer = buffer == 0 ? malloc(len * 2) : realloc(buffer, len * 2);
if (!buffer) {
errorCode = XML_ERROR_NO_MEMORY;
eventPtr = eventEndPtr = 0;
return 0;
}
bufferLim = buffer + len * 2;
}
memcpy(buffer, end, nLeftOver);
bufferPtr = buffer;
bufferEnd = buffer + nLeftOver;
}
return 1;
}
else {
memcpy(XML_GetBuffer(parser, len), s, len);
return XML_ParseBuffer(parser, len, isFinal);
}
}
int XML_ParseBuffer(XML_Parser parser, int len, int isFinal)
{
const char *start = bufferPtr;
positionPtr = start;
bufferEnd += len;
parseEndByteIndex += len;
errorCode = processor(parser, start, parseEndPtr = bufferEnd,
isFinal ? (const char **)0 : &bufferPtr);
if (errorCode == XML_ERROR_NONE) {
if (!isFinal)
XmlUpdatePosition(encoding, positionPtr, bufferPtr, &position);
return 1;
}
else {
eventEndPtr = eventPtr;
return 0;
}
}
void *XML_GetBuffer(XML_Parser parser, int len)
{
if (len > bufferLim - bufferEnd) {
/* FIXME avoid integer overflow */
int neededSize = len + (bufferEnd - bufferPtr);
if (neededSize <= bufferLim - buffer) {
memmove(buffer, bufferPtr, bufferEnd - bufferPtr);
bufferEnd = buffer + (bufferEnd - bufferPtr);
bufferPtr = buffer;
}
else {
char *newBuf;
int bufferSize = bufferLim - bufferPtr;
if (bufferSize == 0)
bufferSize = INIT_BUFFER_SIZE;
do {
bufferSize *= 2;
} while (bufferSize < neededSize);
newBuf = malloc(bufferSize);
if (newBuf == 0) {
errorCode = XML_ERROR_NO_MEMORY;
return 0;
}
bufferLim = newBuf + bufferSize;
if (bufferPtr) {
memcpy(newBuf, bufferPtr, bufferEnd - bufferPtr);
free(buffer);
}
bufferEnd = newBuf + (bufferEnd - bufferPtr);
bufferPtr = buffer = newBuf;
}
}
return bufferEnd;
}
enum XML_Error XML_GetErrorCode(XML_Parser parser)
{
return errorCode;
}
long XML_GetCurrentByteIndex(XML_Parser parser)
{
if (eventPtr)
return parseEndByteIndex - (parseEndPtr - eventPtr);
return -1;
}
int XML_GetCurrentLineNumber(XML_Parser parser)
{
if (eventPtr) {
XmlUpdatePosition(encoding, positionPtr, eventPtr, &position);
positionPtr = eventPtr;
}
return position.lineNumber + 1;
}
int XML_GetCurrentColumnNumber(XML_Parser parser)
{
if (eventPtr) {
XmlUpdatePosition(encoding, positionPtr, eventPtr, &position);
positionPtr = eventPtr;
}
return position.columnNumber;
}
void XML_DefaultCurrent(XML_Parser parser)
{
if (defaultHandler)
reportDefault(parser, encoding, eventPtr, eventEndPtr);
}
const XML_LChar *XML_ErrorString(int code)
{
static const XML_LChar *message[] = {
0,
XML_T("out of memory"),
XML_T("syntax error"),
XML_T("no element found"),
XML_T("not well-formed"),
XML_T("unclosed token"),
XML_T("unclosed token"),
XML_T("mismatched tag"),
XML_T("duplicate attribute"),
XML_T("junk after document element"),
XML_T("illegal parameter entity reference"),
XML_T("undefined entity"),
XML_T("recursive entity reference"),
XML_T("asynchronous entity"),
XML_T("reference to invalid character number"),
XML_T("reference to binary entity"),
XML_T("reference to external entity in attribute"),
XML_T("xml processing instruction not at start of external entity"),
XML_T("unknown encoding"),
XML_T("encoding specified in XML declaration is incorrect"),
XML_T("unclosed CDATA section"),
XML_T("error in processing external entity reference")
};
if (code > 0 && code < sizeof(message)/sizeof(message[0]))
return message[code];
return 0;
}
static
enum XML_Error contentProcessor(XML_Parser parser,
const char *start,
const char *end,
const char **endPtr)
{
return doContent(parser, 0, encoding, start, end, endPtr);
}
static
enum XML_Error externalEntityInitProcessor(XML_Parser parser,
const char *start,
const char *end,
const char **endPtr)
{
enum XML_Error result = initializeEncoding(parser);
if (result != XML_ERROR_NONE)
return result;
processor = externalEntityInitProcessor2;
return externalEntityInitProcessor2(parser, start, end, endPtr);
}
static
enum XML_Error externalEntityInitProcessor2(XML_Parser parser,
const char *start,
const char *end,
const char **endPtr)
{
const char *next;
int tok = XmlContentTok(encoding, start, end, &next);
switch (tok) {
case XML_TOK_BOM:
start = next;
break;
case XML_TOK_PARTIAL:
if (endPtr) {
*endPtr = start;
return XML_ERROR_NONE;
}
eventPtr = start;
return XML_ERROR_UNCLOSED_TOKEN;
case XML_TOK_PARTIAL_CHAR:
if (endPtr) {
*endPtr = start;
return XML_ERROR_NONE;
}
eventPtr = start;
return XML_ERROR_PARTIAL_CHAR;
}
processor = externalEntityInitProcessor3;
return externalEntityInitProcessor3(parser, start, end, endPtr);
}
static
enum XML_Error externalEntityInitProcessor3(XML_Parser parser,
const char *start,
const char *end,
const char **endPtr)
{
const char *next;
int tok = XmlContentTok(encoding, start, end, &next);
switch (tok) {
case XML_TOK_XML_DECL:
{
enum XML_Error result = processXmlDecl(parser, 1, start, next);
if (result != XML_ERROR_NONE)
return result;
start = next;
}
break;
case XML_TOK_PARTIAL:
if (endPtr) {
*endPtr = start;
return XML_ERROR_NONE;
}
eventPtr = start;
return XML_ERROR_UNCLOSED_TOKEN;
case XML_TOK_PARTIAL_CHAR:
if (endPtr) {
*endPtr = start;
return XML_ERROR_NONE;
}
eventPtr = start;
return XML_ERROR_PARTIAL_CHAR;
}
processor = externalEntityContentProcessor;
tagLevel = 1;
return doContent(parser, 1, encoding, start, end, endPtr);
}
static
enum XML_Error externalEntityContentProcessor(XML_Parser parser,
const char *start,
const char *end,
const char **endPtr)
{
return doContent(parser, 1, encoding, start, end, endPtr);
}
static enum XML_Error
doContent(XML_Parser parser,
int startTagLevel,
const ENCODING *enc,
const char *s,
const char *end,
const char **nextPtr)
{
const ENCODING *internalEnc = XmlGetInternalEncoding();
const char *dummy;
const char **eventPP;
const char **eventEndPP;
if (enc == encoding) {
eventPP = &eventPtr;
*eventPP = s;
eventEndPP = &eventEndPtr;
}
else
eventPP = eventEndPP = &dummy;
for (;;) {
const char *next;
int tok = XmlContentTok(enc, s, end, &next);
*eventEndPP = next;
switch (tok) {
case XML_TOK_TRAILING_CR:
if (nextPtr) {
*nextPtr = s;
return XML_ERROR_NONE;
}
*eventEndPP = end;
if (characterDataHandler) {
XML_Char c = XML_T('\n');
characterDataHandler(handlerArg, &c, 1);
}
else if (defaultHandler)
reportDefault(parser, enc, s, end);
if (startTagLevel == 0)
return XML_ERROR_NO_ELEMENTS;
if (tagLevel != startTagLevel)
return XML_ERROR_ASYNC_ENTITY;
return XML_ERROR_NONE;
case XML_TOK_NONE:
if (nextPtr) {
*nextPtr = s;
return XML_ERROR_NONE;
}
if (startTagLevel > 0) {
if (tagLevel != startTagLevel)
return XML_ERROR_ASYNC_ENTITY;
return XML_ERROR_NONE;
}
return XML_ERROR_NO_ELEMENTS;
case XML_TOK_INVALID:
*eventPP = next;
return XML_ERROR_INVALID_TOKEN;
case XML_TOK_PARTIAL:
if (nextPtr) {
*nextPtr = s;
return XML_ERROR_NONE;
}
return XML_ERROR_UNCLOSED_TOKEN;
case XML_TOK_PARTIAL_CHAR:
if (nextPtr) {
*nextPtr = s;
return XML_ERROR_NONE;
}
return XML_ERROR_PARTIAL_CHAR;
case XML_TOK_ENTITY_REF:
{
const XML_Char *name;
ENTITY *entity;
XML_Char ch = XmlPredefinedEntityName(enc,
s + enc->minBytesPerChar,
next - enc->minBytesPerChar);
if (ch) {
if (characterDataHandler)
characterDataHandler(handlerArg, &ch, 1);
else if (defaultHandler)
reportDefault(parser, enc, s, next);
break;
}
name = poolStoreString(&dtd.pool, enc,
s + enc->minBytesPerChar,
next - enc->minBytesPerChar);
if (!name)
return XML_ERROR_NO_MEMORY;
entity = (ENTITY *)lookup(&dtd.generalEntities, name, 0);
poolDiscard(&dtd.pool);
if (!entity) {
if (dtd.complete || dtd.standalone)
return XML_ERROR_UNDEFINED_ENTITY;
if (defaultHandler)
reportDefault(parser, enc, s, next);
break;
}
if (entity->open)
return XML_ERROR_RECURSIVE_ENTITY_REF;
if (entity->notation)
return XML_ERROR_BINARY_ENTITY_REF;
if (entity) {
if (entity->textPtr) {
enum XML_Error result;
if (defaultHandler) {
reportDefault(parser, enc, s, next);
break;
}
/* Protect against the possibility that somebody sets
the defaultHandler from inside another handler. */
*eventEndPP = *eventPP;
entity->open = 1;
result = doContent(parser,
tagLevel,
internalEnc,
(char *)entity->textPtr,
(char *)(entity->textPtr + entity->textLen),
0);
entity->open = 0;
if (result)
return result;
}
else if (externalEntityRefHandler) {
const XML_Char *openEntityNames;
entity->open = 1;
openEntityNames = getOpenEntityNames(parser);
entity->open = 0;
if (!openEntityNames)
return XML_ERROR_NO_MEMORY;
if (!externalEntityRefHandler(parser, openEntityNames, dtd.base, entity->systemId, entity->publicId))
return XML_ERROR_EXTERNAL_ENTITY_HANDLING;
poolDiscard(&tempPool);
}
else if (defaultHandler)
reportDefault(parser, enc, s, next);
}
break;
}
case XML_TOK_START_TAG_WITH_ATTS:
if (!startElementHandler) {
enum XML_Error result = storeAtts(parser, enc, 0, s);
if (result)
return result;
}
/* fall through */
case XML_TOK_START_TAG_NO_ATTS:
{
TAG *tag;
if (freeTagList) {
tag = freeTagList;
freeTagList = freeTagList->parent;
}
else {
tag = malloc(sizeof(TAG));
if (!tag)
return XML_ERROR_NO_MEMORY;
tag->buf = malloc(INIT_TAG_BUF_SIZE);
if (!tag->buf)
return XML_ERROR_NO_MEMORY;
tag->bufEnd = tag->buf + INIT_TAG_BUF_SIZE;
}
tag->parent = tagStack;
tagStack = tag;
tag->rawName = s + enc->minBytesPerChar;
tag->rawNameLength = XmlNameLength(enc, tag->rawName);
if (nextPtr) {
if (tag->rawNameLength > tag->bufEnd - tag->buf) {
int bufSize = tag->rawNameLength * 4;
bufSize = ROUND_UP(bufSize, sizeof(XML_Char));
tag->buf = realloc(tag->buf, bufSize);
if (!tag->buf)
return XML_ERROR_NO_MEMORY;
tag->bufEnd = tag->buf + bufSize;
}
memcpy(tag->buf, tag->rawName, tag->rawNameLength);
tag->rawName = tag->buf;
}
++tagLevel;
if (startElementHandler) {
enum XML_Error result;
XML_Char *toPtr;
for (;;) {
const char *rawNameEnd = tag->rawName + tag->rawNameLength;
const char *fromPtr = tag->rawName;
int bufSize;
if (nextPtr)
toPtr = (XML_Char *)(tag->buf + ROUND_UP(tag->rawNameLength, sizeof(XML_Char)));
else
toPtr = (XML_Char *)tag->buf;
tag->name = toPtr;
XmlConvert(enc,
&fromPtr, rawNameEnd,
(ICHAR **)&toPtr, (ICHAR *)tag->bufEnd - 1);
if (fromPtr == rawNameEnd)
break;
bufSize = (tag->bufEnd - tag->buf) << 1;
tag->buf = realloc(tag->buf, bufSize);
if (!tag->buf)
return XML_ERROR_NO_MEMORY;
tag->bufEnd = tag->buf + bufSize;
if (nextPtr)
tag->rawName = tag->buf;
}
*toPtr = XML_T('\0');
result = storeAtts(parser, enc, tag->name, s);
if (result)
return result;
startElementHandler(handlerArg, tag->name, (const XML_Char **)atts);
poolClear(&tempPool);
}
else {
tag->name = 0;
if (defaultHandler)
reportDefault(parser, enc, s, next);
}
break;
}
case XML_TOK_EMPTY_ELEMENT_WITH_ATTS:
if (!startElementHandler) {
enum XML_Error result = storeAtts(parser, enc, 0, s);
if (result)
return result;
}
/* fall through */
case XML_TOK_EMPTY_ELEMENT_NO_ATTS:
if (startElementHandler || endElementHandler) {
const char *rawName = s + enc->minBytesPerChar;
const XML_Char *name = poolStoreString(&tempPool, enc, rawName,
rawName
+ XmlNameLength(enc, rawName));
if (!name)
return XML_ERROR_NO_MEMORY;
poolFinish(&tempPool);
if (startElementHandler) {
enum XML_Error result = storeAtts(parser, enc, name, s);
if (result)
return result;
startElementHandler(handlerArg, name, (const XML_Char **)atts);
}
if (endElementHandler) {
if (startElementHandler)
*eventPP = *eventEndPP;
endElementHandler(handlerArg, name);
}
poolClear(&tempPool);
}
else if (defaultHandler)
reportDefault(parser, enc, s, next);
if (tagLevel == 0)
return epilogProcessor(parser, next, end, nextPtr);
break;
case XML_TOK_END_TAG:
if (tagLevel == startTagLevel)
return XML_ERROR_ASYNC_ENTITY;
else {
int len;
const char *rawName;
TAG *tag = tagStack;
tagStack = tag->parent;
tag->parent = freeTagList;
freeTagList = tag;
rawName = s + enc->minBytesPerChar*2;
len = XmlNameLength(enc, rawName);
if (len != tag->rawNameLength
|| memcmp(tag->rawName, rawName, len) != 0) {
*eventPP = rawName;
return XML_ERROR_TAG_MISMATCH;
}
--tagLevel;
if (endElementHandler) {
if (tag->name)
endElementHandler(handlerArg, tag->name);
else {
const XML_Char *name = poolStoreString(&tempPool, enc, rawName,
rawName + len);
if (!name)
return XML_ERROR_NO_MEMORY;
endElementHandler(handlerArg, name);
poolClear(&tempPool);
}
}
else if (defaultHandler)
reportDefault(parser, enc, s, next);
if (tagLevel == 0)
return epilogProcessor(parser, next, end, nextPtr);
}
break;
case XML_TOK_CHAR_REF:
{
int n = XmlCharRefNumber(enc, s);
if (n < 0)
return XML_ERROR_BAD_CHAR_REF;
if (characterDataHandler) {
XML_Char buf[XML_ENCODE_MAX];
characterDataHandler(handlerArg, buf, XmlEncode(n, (ICHAR *)buf));
}
else if (defaultHandler)
reportDefault(parser, enc, s, next);
}
break;
case XML_TOK_XML_DECL:
return XML_ERROR_MISPLACED_XML_PI;
case XML_TOK_DATA_NEWLINE:
if (characterDataHandler) {
XML_Char c = XML_T('\n');
characterDataHandler(handlerArg, &c, 1);
}
else if (defaultHandler)
reportDefault(parser, enc, s, next);
break;
case XML_TOK_CDATA_SECT_OPEN:
{
enum XML_Error result;
if (characterDataHandler)
characterDataHandler(handlerArg, dataBuf, 0);
else if (defaultHandler)
reportDefault(parser, enc, s, next);
result = doCdataSection(parser, enc, &next, end, nextPtr);
if (!next) {
processor = cdataSectionProcessor;
return result;
}
}
break;
case XML_TOK_TRAILING_RSQB:
if (nextPtr) {
*nextPtr = s;
return XML_ERROR_NONE;
}
if (characterDataHandler) {
if (MUST_CONVERT(enc, s)) {
ICHAR *dataPtr = (ICHAR *)dataBuf;
XmlConvert(enc, &s, end, &dataPtr, (ICHAR *)dataBufEnd);
characterDataHandler(handlerArg, dataBuf, dataPtr - (ICHAR *)dataBuf);
}
else
characterDataHandler(handlerArg,
(XML_Char *)s,
(XML_Char *)end - (XML_Char *)s);
}
else if (defaultHandler)
reportDefault(parser, enc, s, end);
if (startTagLevel == 0) {
*eventPP = end;
return XML_ERROR_NO_ELEMENTS;
}
if (tagLevel != startTagLevel) {
*eventPP = end;
return XML_ERROR_ASYNC_ENTITY;
}
return XML_ERROR_NONE;
case XML_TOK_DATA_CHARS:
if (characterDataHandler) {
if (MUST_CONVERT(enc, s)) {
for (;;) {
ICHAR *dataPtr = (ICHAR *)dataBuf;
XmlConvert(enc, &s, next, &dataPtr, (ICHAR *)dataBufEnd);
*eventEndPP = s;
characterDataHandler(handlerArg, dataBuf, dataPtr - (ICHAR *)dataBuf);
if (s == next)
break;
*eventPP = s;
}
}
else
characterDataHandler(handlerArg,
(XML_Char *)s,
(XML_Char *)next - (XML_Char *)s);
}
else if (defaultHandler)
reportDefault(parser, enc, s, next);
break;
case XML_TOK_PI:
if (!reportProcessingInstruction(parser, enc, s, next))
return XML_ERROR_NO_MEMORY;
break;
default:
if (defaultHandler)
reportDefault(parser, enc, s, next);
break;
}
*eventPP = s = next;
}
/* not reached */
}
/* If tagName is non-null, build a real list of attributes,
otherwise just check the attributes for well-formedness. */
static enum XML_Error storeAtts(XML_Parser parser, const ENCODING *enc,
const XML_Char *tagName, const char *s)
{
ELEMENT_TYPE *elementType = 0;
int nDefaultAtts = 0;
const XML_Char **appAtts;
int i;
int n;
if (tagName) {
elementType = (ELEMENT_TYPE *)lookup(&dtd.elementTypes, tagName, 0);
if (elementType)
nDefaultAtts = elementType->nDefaultAtts;
}
n = XmlGetAttributes(enc, s, attsSize, atts);
if (n + nDefaultAtts > attsSize) {
int oldAttsSize = attsSize;
attsSize = n + nDefaultAtts + INIT_ATTS_SIZE;
atts = realloc((void *)atts, attsSize * sizeof(ATTRIBUTE));
if (!atts)
return XML_ERROR_NO_MEMORY;
if (n > oldAttsSize)
XmlGetAttributes(enc, s, n, atts);
}
appAtts = (const XML_Char **)atts;
for (i = 0; i < n; i++) {
ATTRIBUTE_ID *attId = getAttributeId(parser, enc, atts[i].name,
atts[i].name
+ XmlNameLength(enc, atts[i].name));
if (!attId)
return XML_ERROR_NO_MEMORY;
if ((attId->name)[-1]) {
if (enc == encoding)
eventPtr = atts[i].name;
return XML_ERROR_DUPLICATE_ATTRIBUTE;
}
(attId->name)[-1] = 1;
appAtts[i << 1] = attId->name;
if (!atts[i].normalized) {
enum XML_Error result;
int isCdata = 1;
if (attId->maybeTokenized) {
int j;
for (j = 0; j < nDefaultAtts; j++) {
if (attId == elementType->defaultAtts[j].id) {
isCdata = elementType->defaultAtts[j].isCdata;
break;
}
}
}
result = storeAttributeValue(parser, enc, isCdata,
atts[i].valuePtr, atts[i].valueEnd,
&tempPool);
if (result)
return result;
if (tagName) {
appAtts[(i << 1) + 1] = poolStart(&tempPool);
poolFinish(&tempPool);
}
else
poolDiscard(&tempPool);
}
else if (tagName) {
appAtts[(i << 1) + 1] = poolStoreString(&tempPool, enc, atts[i].valuePtr, atts[i].valueEnd);
if (appAtts[(i << 1) + 1] == 0)
return XML_ERROR_NO_MEMORY;
poolFinish(&tempPool);
}
}
if (tagName) {
int j;
for (j = 0; j < nDefaultAtts; j++) {
const DEFAULT_ATTRIBUTE *da = elementType->defaultAtts + j;
if (!(da->id->name)[-1] && da->value) {
(da->id->name)[-1] = 1;
appAtts[i << 1] = da->id->name;
appAtts[(i << 1) + 1] = da->value;
i++;
}
}
appAtts[i << 1] = 0;
}
while (i-- > 0)
((XML_Char *)appAtts[i << 1])[-1] = 0;
return XML_ERROR_NONE;
}
/* The idea here is to avoid using stack for each CDATA section when
the whole file is parsed with one call. */
static
enum XML_Error cdataSectionProcessor(XML_Parser parser,
const char *start,
const char *end,
const char **endPtr)
{
enum XML_Error result = doCdataSection(parser, encoding, &start, end, endPtr);
if (start) {
processor = contentProcessor;
return contentProcessor(parser, start, end, endPtr);
}
return result;
}
/* startPtr gets set to non-null is the section is closed, and to null if
the section is not yet closed. */
static
enum XML_Error doCdataSection(XML_Parser parser,
const ENCODING *enc,
const char **startPtr,
const char *end,
const char **nextPtr)
{
const char *s = *startPtr;
const char *dummy;
const char **eventPP;
const char **eventEndPP;
if (enc == encoding) {
eventPP = &eventPtr;
*eventPP = s;
eventEndPP = &eventEndPtr;
}
else
eventPP = eventEndPP = &dummy;
*startPtr = 0;
for (;;) {
const char *next;
int tok = XmlCdataSectionTok(enc, s, end, &next);
*eventEndPP = next;
switch (tok) {
case XML_TOK_CDATA_SECT_CLOSE:
if (characterDataHandler)
characterDataHandler(handlerArg, dataBuf, 0);
else if (defaultHandler)
reportDefault(parser, enc, s, next);
*startPtr = next;
return XML_ERROR_NONE;
case XML_TOK_DATA_NEWLINE:
if (characterDataHandler) {
XML_Char c = XML_T('\n');
characterDataHandler(handlerArg, &c, 1);
}
else if (defaultHandler)
reportDefault(parser, enc, s, next);
break;
case XML_TOK_DATA_CHARS:
if (characterDataHandler) {
if (MUST_CONVERT(enc, s)) {
for (;;) {
ICHAR *dataPtr = (ICHAR *)dataBuf;
XmlConvert(enc, &s, next, &dataPtr, (ICHAR *)dataBufEnd);
*eventEndPP = next;
characterDataHandler(handlerArg, dataBuf, dataPtr - (ICHAR *)dataBuf);
if (s == next)
break;
*eventPP = s;
}
}
else
characterDataHandler(handlerArg,
(XML_Char *)s,
(XML_Char *)next - (XML_Char *)s);
}
else if (defaultHandler)
reportDefault(parser, enc, s, next);
break;
case XML_TOK_INVALID:
*eventPP = next;
return XML_ERROR_INVALID_TOKEN;
case XML_TOK_PARTIAL_CHAR:
if (nextPtr) {
*nextPtr = s;
return XML_ERROR_NONE;
}
return XML_ERROR_PARTIAL_CHAR;
case XML_TOK_PARTIAL:
case XML_TOK_NONE:
if (nextPtr) {
*nextPtr = s;
return XML_ERROR_NONE;
}
return XML_ERROR_UNCLOSED_CDATA_SECTION;
default:
abort();
}
*eventPP = s = next;
}
/* not reached */
}
static enum XML_Error
initializeEncoding(XML_Parser parser)
{
const char *s;
#ifdef XML_UNICODE
char encodingBuf[128];
if (!protocolEncodingName)
s = 0;
else {
int i;
for (i = 0; protocolEncodingName[i]; i++) {
if (i == sizeof(encodingBuf) - 1
|| protocolEncodingName[i] >= 0x80
|| protocolEncodingName[i] < 0) {
encodingBuf[0] = '\0';
break;
}
encodingBuf[i] = (char)protocolEncodingName[i];
}
encodingBuf[i] = '\0';
s = encodingBuf;
}
#else
s = protocolEncodingName;
#endif
if (XmlInitEncoding(&initEncoding, &encoding, s))
return XML_ERROR_NONE;
return handleUnknownEncoding(parser, protocolEncodingName);
}
static enum XML_Error
processXmlDecl(XML_Parser parser, int isGeneralTextEntity,
const char *s, const char *next)
{
const char *encodingName = 0;
const ENCODING *newEncoding = 0;
const char *version;
int standalone = -1;
if (!XmlParseXmlDecl(isGeneralTextEntity,
encoding,
s,
next,
&eventPtr,
&version,
&encodingName,
&newEncoding,
&standalone))
return XML_ERROR_SYNTAX;
if (!isGeneralTextEntity && standalone == 1)
dtd.standalone = 1;
if (defaultHandler)
reportDefault(parser, encoding, s, next);
if (!protocolEncodingName) {
if (newEncoding) {
if (newEncoding->minBytesPerChar != encoding->minBytesPerChar) {
eventPtr = encodingName;
return XML_ERROR_INCORRECT_ENCODING;
}
encoding = newEncoding;
}
else if (encodingName) {
enum XML_Error result;
const XML_Char *s = poolStoreString(&tempPool,
encoding,
encodingName,
encodingName
+ XmlNameLength(encoding, encodingName));
if (!s)
return XML_ERROR_NO_MEMORY;
result = handleUnknownEncoding(parser, s);
poolDiscard(&tempPool);
if (result == XML_ERROR_UNKNOWN_ENCODING)
eventPtr = encodingName;
return result;
}
}
return XML_ERROR_NONE;
}
static enum XML_Error
handleUnknownEncoding(XML_Parser parser, const XML_Char *encodingName)
{
if (unknownEncodingHandler) {
XML_Encoding info;
int i;
for (i = 0; i < 256; i++)
info.map[i] = -1;
info.convert = 0;
info.data = 0;
info.release = 0;
if (unknownEncodingHandler(unknownEncodingHandlerData, encodingName, &info)) {
ENCODING *enc;
unknownEncodingMem = malloc(XmlSizeOfUnknownEncoding());
if (!unknownEncodingMem) {
if (info.release)
info.release(info.data);
return XML_ERROR_NO_MEMORY;
}
enc = XmlInitUnknownEncoding(unknownEncodingMem,
info.map,
info.convert,
info.data);
if (enc) {
unknownEncodingData = info.data;
unknownEncodingRelease = info.release;
encoding = enc;
return XML_ERROR_NONE;
}
}
if (info.release)
info.release(info.data);
}
return XML_ERROR_UNKNOWN_ENCODING;
}
static enum XML_Error
prologInitProcessor(XML_Parser parser,
const char *s,
const char *end,
const char **nextPtr)
{
enum XML_Error result = initializeEncoding(parser);
if (result != XML_ERROR_NONE)
return result;
processor = prologProcessor;
return prologProcessor(parser, s, end, nextPtr);
}
static enum XML_Error
prologProcessor(XML_Parser parser,
const char *s,
const char *end,
const char **nextPtr)
{
for (;;) {
const char *next;
int tok = XmlPrologTok(encoding, s, end, &next);
if (tok <= 0) {
if (nextPtr != 0 && tok != XML_TOK_INVALID) {
*nextPtr = s;
return XML_ERROR_NONE;
}
switch (tok) {
case XML_TOK_INVALID:
eventPtr = next;
return XML_ERROR_INVALID_TOKEN;
case XML_TOK_NONE:
return XML_ERROR_NO_ELEMENTS;
case XML_TOK_PARTIAL:
return XML_ERROR_UNCLOSED_TOKEN;
case XML_TOK_PARTIAL_CHAR:
return XML_ERROR_PARTIAL_CHAR;
case XML_TOK_TRAILING_CR:
eventPtr = s + encoding->minBytesPerChar;
return XML_ERROR_NO_ELEMENTS;
default:
abort();
}
}
switch (XmlTokenRole(&prologState, tok, s, next, encoding)) {
case XML_ROLE_XML_DECL:
{
enum XML_Error result = processXmlDecl(parser, 0, s, next);
if (result != XML_ERROR_NONE)
return result;
}
break;
case XML_ROLE_DOCTYPE_SYSTEM_ID:
hadExternalDoctype = 1;
break;
case XML_ROLE_DOCTYPE_PUBLIC_ID:
case XML_ROLE_ENTITY_PUBLIC_ID:
if (!XmlIsPublicId(encoding, s, next, &eventPtr))
return XML_ERROR_SYNTAX;
if (declEntity) {
XML_Char *tem = poolStoreString(&dtd.pool,
encoding,
s + encoding->minBytesPerChar,
next - encoding->minBytesPerChar);
if (!tem)
return XML_ERROR_NO_MEMORY;
normalizePublicId(tem);
declEntity->publicId = tem;
poolFinish(&dtd.pool);
}
break;
case XML_ROLE_INSTANCE_START:
processor = contentProcessor;
if (hadExternalDoctype)
dtd.complete = 0;
return contentProcessor(parser, s, end, nextPtr);
case XML_ROLE_ATTLIST_ELEMENT_NAME:
{
const XML_Char *name = poolStoreString(&dtd.pool, encoding, s, next);
if (!name)
return XML_ERROR_NO_MEMORY;
declElementType = (ELEMENT_TYPE *)lookup(&dtd.elementTypes, name, sizeof(ELEMENT_TYPE));
if (!declElementType)
return XML_ERROR_NO_MEMORY;
if (declElementType->name != name)
poolDiscard(&dtd.pool);
else
poolFinish(&dtd.pool);
break;
}
case XML_ROLE_ATTRIBUTE_NAME:
declAttributeId = getAttributeId(parser, encoding, s, next);
if (!declAttributeId)
return XML_ERROR_NO_MEMORY;
declAttributeIsCdata = 0;
break;
case XML_ROLE_ATTRIBUTE_TYPE_CDATA:
declAttributeIsCdata = 1;
break;
case XML_ROLE_IMPLIED_ATTRIBUTE_VALUE:
case XML_ROLE_REQUIRED_ATTRIBUTE_VALUE:
if (dtd.complete
&& !defineAttribute(declElementType, declAttributeId, declAttributeIsCdata, 0))
return XML_ERROR_NO_MEMORY;
break;
case XML_ROLE_DEFAULT_ATTRIBUTE_VALUE:
case XML_ROLE_FIXED_ATTRIBUTE_VALUE:
{
const XML_Char *attVal;
enum XML_Error result
= storeAttributeValue(parser, encoding, declAttributeIsCdata,
s + encoding->minBytesPerChar,
next - encoding->minBytesPerChar,
&dtd.pool);
if (result)
return result;
attVal = poolStart(&dtd.pool);
poolFinish(&dtd.pool);
if (dtd.complete
&& !defineAttribute(declElementType, declAttributeId, declAttributeIsCdata, attVal))
return XML_ERROR_NO_MEMORY;
break;
}
case XML_ROLE_ENTITY_VALUE:
{
enum XML_Error result = storeEntityValue(parser, s, next);
if (result != XML_ERROR_NONE)
return result;
}
break;
case XML_ROLE_ENTITY_SYSTEM_ID:
if (declEntity) {
declEntity->systemId = poolStoreString(&dtd.pool, encoding,
s + encoding->minBytesPerChar,
next - encoding->minBytesPerChar);
if (!declEntity->systemId)
return XML_ERROR_NO_MEMORY;
declEntity->base = dtd.base;
poolFinish(&dtd.pool);
}
break;
case XML_ROLE_ENTITY_NOTATION_NAME:
if (declEntity) {
declEntity->notation = poolStoreString(&dtd.pool, encoding, s, next);
if (!declEntity->notation)
return XML_ERROR_NO_MEMORY;
poolFinish(&dtd.pool);
if (unparsedEntityDeclHandler) {
eventPtr = eventEndPtr = s;
unparsedEntityDeclHandler(handlerArg,
declEntity->name,
declEntity->base,
declEntity->systemId,
declEntity->publicId,
declEntity->notation);
}
}
break;
case XML_ROLE_GENERAL_ENTITY_NAME:
{
const XML_Char *name;
if (XmlPredefinedEntityName(encoding, s, next)) {
declEntity = 0;
break;
}
name = poolStoreString(&dtd.pool, encoding, s, next);
if (!name)
return XML_ERROR_NO_MEMORY;
if (dtd.complete) {
declEntity = (ENTITY *)lookup(&dtd.generalEntities, name, sizeof(ENTITY));
if (!declEntity)
return XML_ERROR_NO_MEMORY;
if (declEntity->name != name) {
poolDiscard(&dtd.pool);
declEntity = 0;
}
else
poolFinish(&dtd.pool);
}
else {
poolDiscard(&dtd.pool);
declEntity = 0;
}
}
break;
case XML_ROLE_PARAM_ENTITY_NAME:
declEntity = 0;
break;
case XML_ROLE_NOTATION_NAME:
declNotationPublicId = 0;
declNotationName = 0;
if (notationDeclHandler) {
declNotationName = poolStoreString(&tempPool, encoding, s, next);
if (!declNotationName)
return XML_ERROR_NO_MEMORY;
poolFinish(&tempPool);
}
break;
case XML_ROLE_NOTATION_PUBLIC_ID:
if (!XmlIsPublicId(encoding, s, next, &eventPtr))
return XML_ERROR_SYNTAX;
if (declNotationName) {
XML_Char *tem = poolStoreString(&tempPool,
encoding,
s + encoding->minBytesPerChar,
next - encoding->minBytesPerChar);
if (!tem)
return XML_ERROR_NO_MEMORY;
normalizePublicId(tem);
declNotationPublicId = tem;
poolFinish(&tempPool);
}
break;
case XML_ROLE_NOTATION_SYSTEM_ID:
if (declNotationName && notationDeclHandler) {
const XML_Char *systemId
= poolStoreString(&tempPool, encoding,
s + encoding->minBytesPerChar,
next - encoding->minBytesPerChar);
if (!systemId)
return XML_ERROR_NO_MEMORY;
eventPtr = eventEndPtr = s;
notationDeclHandler(handlerArg,
declNotationName,
dtd.base,
systemId,
declNotationPublicId);
}
poolClear(&tempPool);
break;
case XML_ROLE_NOTATION_NO_SYSTEM_ID:
if (declNotationPublicId && notationDeclHandler) {
eventPtr = eventEndPtr = s;
notationDeclHandler(handlerArg,
declNotationName,
dtd.base,
0,
declNotationPublicId);
}
poolClear(&tempPool);
break;
case XML_ROLE_ERROR:
eventPtr = s;
switch (tok) {
case XML_TOK_PARAM_ENTITY_REF:
return XML_ERROR_PARAM_ENTITY_REF;
case XML_TOK_XML_DECL:
return XML_ERROR_MISPLACED_XML_PI;
default:
return XML_ERROR_SYNTAX;
}
case XML_ROLE_GROUP_OPEN:
if (prologState.level >= groupSize) {
if (groupSize)
groupConnector = realloc(groupConnector, groupSize *= 2);
else
groupConnector = malloc(groupSize = 32);
if (!groupConnector)
return XML_ERROR_NO_MEMORY;
}
groupConnector[prologState.level] = 0;
break;
case XML_ROLE_GROUP_SEQUENCE:
if (groupConnector[prologState.level] == '|') {
eventPtr = s;
return XML_ERROR_SYNTAX;
}
groupConnector[prologState.level] = ',';
break;
case XML_ROLE_GROUP_CHOICE:
if (groupConnector[prologState.level] == ',') {
eventPtr = s;
return XML_ERROR_SYNTAX;
}
groupConnector[prologState.level] = '|';
break;
case XML_ROLE_PARAM_ENTITY_REF:
dtd.complete = 0;
break;
case XML_ROLE_NONE:
switch (tok) {
case XML_TOK_PI:
eventPtr = s;
eventEndPtr = next;
if (!reportProcessingInstruction(parser, encoding, s, next))
return XML_ERROR_NO_MEMORY;
break;
}
break;
}
if (defaultHandler) {
switch (tok) {
case XML_TOK_PI:
case XML_TOK_BOM:
case XML_TOK_XML_DECL:
break;
default:
eventPtr = s;
eventEndPtr = next;
reportDefault(parser, encoding, s, next);
}
}
s = next;
}
/* not reached */
}
static
enum XML_Error epilogProcessor(XML_Parser parser,
const char *s,
const char *end,
const char **nextPtr)
{
processor = epilogProcessor;
eventPtr = s;
for (;;) {
const char *next;
int tok = XmlPrologTok(encoding, s, end, &next);
eventEndPtr = next;
switch (tok) {
case XML_TOK_TRAILING_CR:
if (defaultHandler) {
eventEndPtr = end;
reportDefault(parser, encoding, s, end);
}
/* fall through */
case XML_TOK_NONE:
if (nextPtr)
*nextPtr = end;
return XML_ERROR_NONE;
case XML_TOK_PROLOG_S:
case XML_TOK_COMMENT:
if (defaultHandler)
reportDefault(parser, encoding, s, next);
break;
case XML_TOK_PI:
if (!reportProcessingInstruction(parser, encoding, s, next))
return XML_ERROR_NO_MEMORY;
break;
case XML_TOK_INVALID:
eventPtr = next;
return XML_ERROR_INVALID_TOKEN;
case XML_TOK_PARTIAL:
if (nextPtr) {
*nextPtr = s;
return XML_ERROR_NONE;
}
return XML_ERROR_UNCLOSED_TOKEN;
case XML_TOK_PARTIAL_CHAR:
if (nextPtr) {
*nextPtr = s;
return XML_ERROR_NONE;
}
return XML_ERROR_PARTIAL_CHAR;
default:
return XML_ERROR_JUNK_AFTER_DOC_ELEMENT;
}
eventPtr = s = next;
}
}
static
enum XML_Error errorProcessor(XML_Parser parser,
const char *s,
const char *end,
const char **nextPtr)
{
return errorCode;
}
static enum XML_Error
storeAttributeValue(XML_Parser parser, const ENCODING *enc, int isCdata,
const char *ptr, const char *end,
STRING_POOL *pool)
{
enum XML_Error result = appendAttributeValue(parser, enc, isCdata, ptr, end, pool);
if (result)
return result;
if (!isCdata && poolLength(pool) && poolLastChar(pool) == XML_T(' '))
poolChop(pool);
if (!poolAppendChar(pool, XML_T('\0')))
return XML_ERROR_NO_MEMORY;
return XML_ERROR_NONE;
}
static enum XML_Error
appendAttributeValue(XML_Parser parser, const ENCODING *enc, int isCdata,
const char *ptr, const char *end,
STRING_POOL *pool)
{
const ENCODING *internalEnc = XmlGetInternalEncoding();
for (;;) {
const char *next;
int tok = XmlAttributeValueTok(enc, ptr, end, &next);
switch (tok) {
case XML_TOK_NONE:
return XML_ERROR_NONE;
case XML_TOK_INVALID:
if (enc == encoding)
eventPtr = next;
return XML_ERROR_INVALID_TOKEN;
case XML_TOK_PARTIAL:
if (enc == encoding)
eventPtr = ptr;
return XML_ERROR_INVALID_TOKEN;
case XML_TOK_CHAR_REF:
{
XML_Char buf[XML_ENCODE_MAX];
int i;
int n = XmlCharRefNumber(enc, ptr);
if (n < 0) {
if (enc == encoding)
eventPtr = ptr;
return XML_ERROR_BAD_CHAR_REF;
}
if (!isCdata
&& n == 0x20 /* space */
&& (poolLength(pool) == 0 || poolLastChar(pool) == XML_T(' ')))
break;
n = XmlEncode(n, (ICHAR *)buf);
if (!n) {
if (enc == encoding)
eventPtr = ptr;
return XML_ERROR_BAD_CHAR_REF;
}
for (i = 0; i < n; i++) {
if (!poolAppendChar(pool, buf[i]))
return XML_ERROR_NO_MEMORY;
}
}
break;
case XML_TOK_DATA_CHARS:
if (!poolAppend(pool, enc, ptr, next))
return XML_ERROR_NO_MEMORY;
break;
break;
case XML_TOK_TRAILING_CR:
next = ptr + enc->minBytesPerChar;
/* fall through */
case XML_TOK_ATTRIBUTE_VALUE_S:
case XML_TOK_DATA_NEWLINE:
if (!isCdata && (poolLength(pool) == 0 || poolLastChar(pool) == XML_T(' ')))
break;
if (!poolAppendChar(pool, XML_T(' ')))
return XML_ERROR_NO_MEMORY;
break;
case XML_TOK_ENTITY_REF:
{
const XML_Char *name;
ENTITY *entity;
XML_Char ch = XmlPredefinedEntityName(enc,
ptr + enc->minBytesPerChar,
next - enc->minBytesPerChar);
if (ch) {
if (!poolAppendChar(pool, ch))
return XML_ERROR_NO_MEMORY;
break;
}
name = poolStoreString(&temp2Pool, enc,
ptr + enc->minBytesPerChar,
next - enc->minBytesPerChar);
if (!name)
return XML_ERROR_NO_MEMORY;
entity = (ENTITY *)lookup(&dtd.generalEntities, name, 0);
poolDiscard(&temp2Pool);
if (!entity) {
if (dtd.complete) {
if (enc == encoding)
eventPtr = ptr;
return XML_ERROR_UNDEFINED_ENTITY;
}
}
else if (entity->open) {
if (enc == encoding)
eventPtr = ptr;
return XML_ERROR_RECURSIVE_ENTITY_REF;
}
else if (entity->notation) {
if (enc == encoding)
eventPtr = ptr;
return XML_ERROR_BINARY_ENTITY_REF;
}
else if (!entity->textPtr) {
if (enc == encoding)
eventPtr = ptr;
return XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF;
}
else {
enum XML_Error result;
const XML_Char *textEnd = entity->textPtr + entity->textLen;
entity->open = 1;
result = appendAttributeValue(parser, internalEnc, isCdata, (char *)entity->textPtr, (char *)textEnd, pool);
entity->open = 0;
if (result)
return result;
}
}
break;
default:
abort();
}
ptr = next;
}
/* not reached */
}
static
enum XML_Error storeEntityValue(XML_Parser parser,
const char *entityTextPtr,
const char *entityTextEnd)
{
const ENCODING *internalEnc = XmlGetInternalEncoding();
STRING_POOL *pool = &(dtd.pool);
entityTextPtr += encoding->minBytesPerChar;
entityTextEnd -= encoding->minBytesPerChar;
for (;;) {
const char *next;
int tok = XmlEntityValueTok(encoding, entityTextPtr, entityTextEnd, &next);
switch (tok) {
case XML_TOK_PARAM_ENTITY_REF:
eventPtr = entityTextPtr;
return XML_ERROR_SYNTAX;
case XML_TOK_NONE:
if (declEntity) {
declEntity->textPtr = pool->start;
declEntity->textLen = pool->ptr - pool->start;
poolFinish(pool);
}
else
poolDiscard(pool);
return XML_ERROR_NONE;
case XML_TOK_ENTITY_REF:
case XML_TOK_DATA_CHARS:
if (!poolAppend(pool, encoding, entityTextPtr, next))
return XML_ERROR_NO_MEMORY;
break;
case XML_TOK_TRAILING_CR:
next = entityTextPtr + encoding->minBytesPerChar;
/* fall through */
case XML_TOK_DATA_NEWLINE:
if (pool->end == pool->ptr && !poolGrow(pool))
return XML_ERROR_NO_MEMORY;
*(pool->ptr)++ = XML_T('\n');
break;
case XML_TOK_CHAR_REF:
{
XML_Char buf[XML_ENCODE_MAX];
int i;
int n = XmlCharRefNumber(encoding, entityTextPtr);
if (n < 0) {
eventPtr = entityTextPtr;
return XML_ERROR_BAD_CHAR_REF;
}
n = XmlEncode(n, (ICHAR *)buf);
if (!n) {
eventPtr = entityTextPtr;
return XML_ERROR_BAD_CHAR_REF;
}
for (i = 0; i < n; i++) {
if (pool->end == pool->ptr && !poolGrow(pool))
return XML_ERROR_NO_MEMORY;
*(pool->ptr)++ = buf[i];
}
}
break;
case XML_TOK_PARTIAL:
eventPtr = entityTextPtr;
return XML_ERROR_INVALID_TOKEN;
case XML_TOK_INVALID:
eventPtr = next;
return XML_ERROR_INVALID_TOKEN;
default:
abort();
}
entityTextPtr = next;
}
/* not reached */
}
static void
normalizeLines(XML_Char *s)
{
XML_Char *p;
for (;; s++) {
if (*s == XML_T('\0'))
return;
if (*s == XML_T('\r'))
break;
}
p = s;
do {
if (*s == XML_T('\r')) {
*p++ = XML_T('\n');
if (*++s == XML_T('\n'))
s++;
}
else
*p++ = *s++;
} while (*s);
*p = XML_T('\0');
}
static int
reportProcessingInstruction(XML_Parser parser, const ENCODING *enc, const char *start, const char *end)
{
const XML_Char *target;
XML_Char *data;
const char *tem;
if (!processingInstructionHandler) {
if (defaultHandler)
reportDefault(parser, enc, start, end);
return 1;
}
start += enc->minBytesPerChar * 2;
tem = start + XmlNameLength(enc, start);
target = poolStoreString(&tempPool, enc, start, tem);
if (!target)
return 0;
poolFinish(&tempPool);
data = poolStoreString(&tempPool, enc,
XmlSkipS(enc, tem),
end - enc->minBytesPerChar*2);
if (!data)
return 0;
normalizeLines(data);
processingInstructionHandler(handlerArg, target, data);
poolClear(&tempPool);
return 1;
}
static void
reportDefault(XML_Parser parser, const ENCODING *enc, const char *s, const char *end)
{
if (MUST_CONVERT(enc, s)) {
for (;;) {
ICHAR *dataPtr = (ICHAR *)dataBuf;
XmlConvert(enc, &s, end, &dataPtr, (ICHAR *)dataBufEnd);
if (s == end) {
defaultHandler(handlerArg, dataBuf, dataPtr - (ICHAR *)dataBuf);
break;
}
if (enc == encoding) {
eventEndPtr = s;
defaultHandler(handlerArg, dataBuf, dataPtr - (ICHAR *)dataBuf);
eventPtr = s;
}
else
defaultHandler(handlerArg, dataBuf, dataPtr - (ICHAR *)dataBuf);
}
}
else
defaultHandler(handlerArg, (XML_Char *)s, (XML_Char *)end - (XML_Char *)s);
}
static int
defineAttribute(ELEMENT_TYPE *type, ATTRIBUTE_ID *attId, int isCdata, const XML_Char *value)
{
DEFAULT_ATTRIBUTE *att;
if (type->nDefaultAtts == type->allocDefaultAtts) {
if (type->allocDefaultAtts == 0) {
type->allocDefaultAtts = 8;
type->defaultAtts = malloc(type->allocDefaultAtts*sizeof(DEFAULT_ATTRIBUTE));
}
else {
type->allocDefaultAtts *= 2;
type->defaultAtts = realloc(type->defaultAtts,
type->allocDefaultAtts*sizeof(DEFAULT_ATTRIBUTE));
}
if (!type->defaultAtts)
return 0;
}
att = type->defaultAtts + type->nDefaultAtts;
att->id = attId;
att->value = value;
att->isCdata = isCdata;
if (!isCdata)
attId->maybeTokenized = 1;
type->nDefaultAtts += 1;
return 1;
}
static ATTRIBUTE_ID *
getAttributeId(XML_Parser parser, const ENCODING *enc, const char *start, const char *end)
{
ATTRIBUTE_ID *id;
const XML_Char *name;
if (!poolAppendChar(&dtd.pool, XML_T('\0')))
return 0;
name = poolStoreString(&dtd.pool, enc, start, end);
if (!name)
return 0;
++name;
id = (ATTRIBUTE_ID *)lookup(&dtd.attributeIds, name, sizeof(ATTRIBUTE_ID));
if (!id)
return 0;
if (id->name != name)
poolDiscard(&dtd.pool);
else
poolFinish(&dtd.pool);
return id;
}
static
const XML_Char *getOpenEntityNames(XML_Parser parser)
{
HASH_TABLE_ITER iter;
hashTableIterInit(&iter, &(dtd.generalEntities));
for (;;) {
const XML_Char *s;
ENTITY *e = (ENTITY *)hashTableIterNext(&iter);
if (!e)
break;
if (!e->open)
continue;
if (poolLength(&tempPool) > 0 && !poolAppendChar(&tempPool, XML_T(' ')))
return 0;
for (s = e->name; *s; s++)
if (!poolAppendChar(&tempPool, *s))
return 0;
}
if (!poolAppendChar(&tempPool, XML_T('\0')))
return 0;
return tempPool.start;
}
static
int setOpenEntityNames(XML_Parser parser, const XML_Char *openEntityNames)
{
const XML_Char *s = openEntityNames;
while (*openEntityNames != XML_T('\0')) {
if (*s == XML_T(' ') || *s == XML_T('\0')) {
ENTITY *e;
if (!poolAppendChar(&tempPool, XML_T('\0')))
return 0;
e = (ENTITY *)lookup(&dtd.generalEntities, poolStart(&tempPool), 0);
if (e)
e->open = 1;
if (*s == XML_T(' '))
s++;
openEntityNames = s;
poolDiscard(&tempPool);
}
else {
if (!poolAppendChar(&tempPool, *s))
return 0;
s++;
}
}
return 1;
}
static
void normalizePublicId(XML_Char *publicId)
{
XML_Char *p = publicId;
XML_Char *s;
for (s = publicId; *s; s++) {
switch (*s) {
case XML_T(' '):
case XML_T('\r'):
case XML_T('\n'):
if (p != publicId && p[-1] != XML_T(' '))
*p++ = XML_T(' ');
break;
default:
*p++ = *s;
}
}
if (p != publicId && p[-1] == XML_T(' '))
--p;
*p = XML_T('\0');
}
static int dtdInit(DTD *p)
{
poolInit(&(p->pool));
hashTableInit(&(p->generalEntities));
hashTableInit(&(p->elementTypes));
hashTableInit(&(p->attributeIds));
p->complete = 1;
p->standalone = 0;
p->base = 0;
return 1;
}
static void dtdDestroy(DTD *p)
{
HASH_TABLE_ITER iter;
hashTableIterInit(&iter, &(p->elementTypes));
for (;;) {
ELEMENT_TYPE *e = (ELEMENT_TYPE *)hashTableIterNext(&iter);
if (!e)
break;
if (e->allocDefaultAtts != 0)
free(e->defaultAtts);
}
hashTableDestroy(&(p->generalEntities));
hashTableDestroy(&(p->elementTypes));
hashTableDestroy(&(p->attributeIds));
poolDestroy(&(p->pool));
}
/* Do a deep copy of the DTD. Return 0 for out of memory; non-zero otherwise.
The new DTD has already been initialized. */
static int dtdCopy(DTD *newDtd, const DTD *oldDtd)
{
HASH_TABLE_ITER iter;
if (oldDtd->base) {
const XML_Char *tem = poolCopyString(&(newDtd->pool), oldDtd->base);
if (!tem)
return 0;
newDtd->base = tem;
}
hashTableIterInit(&iter, &(oldDtd->attributeIds));
/* Copy the attribute id table. */
for (;;) {
ATTRIBUTE_ID *newA;
const XML_Char *name;
const ATTRIBUTE_ID *oldA = (ATTRIBUTE_ID *)hashTableIterNext(&iter);
if (!oldA)
break;
/* Remember to allocate the scratch byte before the name. */
if (!poolAppendChar(&(newDtd->pool), XML_T('\0')))
return 0;
name = poolCopyString(&(newDtd->pool), oldA->name);
if (!name)
return 0;
++name;
newA = (ATTRIBUTE_ID *)lookup(&(newDtd->attributeIds), name, sizeof(ATTRIBUTE_ID));
if (!newA)
return 0;
newA->maybeTokenized = oldA->maybeTokenized;
}
/* Copy the element type table. */
hashTableIterInit(&iter, &(oldDtd->elementTypes));
for (;;) {
int i;
ELEMENT_TYPE *newE;
const XML_Char *name;
const ELEMENT_TYPE *oldE = (ELEMENT_TYPE *)hashTableIterNext(&iter);
if (!oldE)
break;
name = poolCopyString(&(newDtd->pool), oldE->name);
if (!name)
return 0;
newE = (ELEMENT_TYPE *)lookup(&(newDtd->elementTypes), name, sizeof(ELEMENT_TYPE));
if (!newE)
return 0;
newE->defaultAtts = (DEFAULT_ATTRIBUTE *)malloc(oldE->nDefaultAtts * sizeof(DEFAULT_ATTRIBUTE));
if (!newE->defaultAtts)
return 0;
newE->allocDefaultAtts = newE->nDefaultAtts = oldE->nDefaultAtts;
for (i = 0; i < newE->nDefaultAtts; i++) {
newE->defaultAtts[i].id = (ATTRIBUTE_ID *)lookup(&(newDtd->attributeIds), oldE->defaultAtts[i].id->name, 0);
newE->defaultAtts[i].isCdata = oldE->defaultAtts[i].isCdata;
if (oldE->defaultAtts[i].value) {
newE->defaultAtts[i].value = poolCopyString(&(newDtd->pool), oldE->defaultAtts[i].value);
if (!newE->defaultAtts[i].value)
return 0;
}
else
newE->defaultAtts[i].value = 0;
}
}
/* Copy the entity table. */
hashTableIterInit(&iter, &(oldDtd->generalEntities));
for (;;) {
ENTITY *newE;
const XML_Char *name;
const ENTITY *oldE = (ENTITY *)hashTableIterNext(&iter);
if (!oldE)
break;
name = poolCopyString(&(newDtd->pool), oldE->name);
if (!name)
return 0;
newE = (ENTITY *)lookup(&(newDtd->generalEntities), name, sizeof(ENTITY));
if (!newE)
return 0;
if (oldE->systemId) {
const XML_Char *tem = poolCopyString(&(newDtd->pool), oldE->systemId);
if (!tem)
return 0;
newE->systemId = tem;
if (oldE->base) {
if (oldE->base == oldDtd->base)
newE->base = newDtd->base;
tem = poolCopyString(&(newDtd->pool), oldE->base);
if (!tem)
return 0;
newE->base = tem;
}
}
else {
const XML_Char *tem = poolCopyStringN(&(newDtd->pool), oldE->textPtr, oldE->textLen);
if (!tem)
return 0;
newE->textPtr = tem;
newE->textLen = oldE->textLen;
}
if (oldE->notation) {
const XML_Char *tem = poolCopyString(&(newDtd->pool), oldE->notation);
if (!tem)
return 0;
newE->notation = tem;
}
}
newDtd->complete = oldDtd->complete;
newDtd->standalone = oldDtd->standalone;
return 1;
}
static
void poolInit(STRING_POOL *pool)
{
pool->blocks = 0;
pool->freeBlocks = 0;
pool->start = 0;
pool->ptr = 0;
pool->end = 0;
}
static
void poolClear(STRING_POOL *pool)
{
if (!pool->freeBlocks)
pool->freeBlocks = pool->blocks;
else {
BLOCK *p = pool->blocks;
while (p) {
BLOCK *tem = p->next;
p->next = pool->freeBlocks;
pool->freeBlocks = p;
p = tem;
}
}
pool->blocks = 0;
pool->start = 0;
pool->ptr = 0;
pool->end = 0;
}
static
void poolDestroy(STRING_POOL *pool)
{
BLOCK *p = pool->blocks;
while (p) {
BLOCK *tem = p->next;
free(p);
p = tem;
}
pool->blocks = 0;
p = pool->freeBlocks;
while (p) {
BLOCK *tem = p->next;
free(p);
p = tem;
}
pool->freeBlocks = 0;
pool->ptr = 0;
pool->start = 0;
pool->end = 0;
}
static
XML_Char *poolAppend(STRING_POOL *pool, const ENCODING *enc,
const char *ptr, const char *end)
{
if (!pool->ptr && !poolGrow(pool))
return 0;
for (;;) {
XmlConvert(enc, &ptr, end, (ICHAR **)&(pool->ptr), (ICHAR *)pool->end);
if (ptr == end)
break;
if (!poolGrow(pool))
return 0;
}
return pool->start;
}
static const XML_Char *poolCopyString(STRING_POOL *pool, const XML_Char *s)
{
do {
if (!poolAppendChar(pool, *s))
return 0;
} while (*s++);
s = pool->start;
poolFinish(pool);
return s;
}
static const XML_Char *poolCopyStringN(STRING_POOL *pool, const XML_Char *s, int n)
{
if (!pool->ptr && !poolGrow(pool))
return 0;
for (; n > 0; --n, s++) {
if (!poolAppendChar(pool, *s))
return 0;
}
s = pool->start;
poolFinish(pool);
return s;
}
static
XML_Char *poolStoreString(STRING_POOL *pool, const ENCODING *enc,
const char *ptr, const char *end)
{
if (!poolAppend(pool, enc, ptr, end))
return 0;
if (pool->ptr == pool->end && !poolGrow(pool))
return 0;
*(pool->ptr)++ = 0;
return pool->start;
}
static
int poolGrow(STRING_POOL *pool)
{
if (pool->freeBlocks) {
if (pool->start == 0) {
pool->blocks = pool->freeBlocks;
pool->freeBlocks = pool->freeBlocks->next;
pool->blocks->next = 0;
pool->start = pool->blocks->s;
pool->end = pool->start + pool->blocks->size;
pool->ptr = pool->start;
return 1;
}
if (pool->end - pool->start < pool->freeBlocks->size) {
BLOCK *tem = pool->freeBlocks->next;
pool->freeBlocks->next = pool->blocks;
pool->blocks = pool->freeBlocks;
pool->freeBlocks = tem;
memcpy(pool->blocks->s, pool->start, (pool->end - pool->start) * sizeof(XML_Char));
pool->ptr = pool->blocks->s + (pool->ptr - pool->start);
pool->start = pool->blocks->s;
pool->end = pool->start + pool->blocks->size;
return 1;
}
}
if (pool->blocks && pool->start == pool->blocks->s) {
int blockSize = (pool->end - pool->start)*2;
pool->blocks = realloc(pool->blocks, offsetof(BLOCK, s) + blockSize * sizeof(XML_Char));
if (!pool->blocks)
return 0;
pool->blocks->size = blockSize;
pool->ptr = pool->blocks->s + (pool->ptr - pool->start);
pool->start = pool->blocks->s;
pool->end = pool->start + blockSize;
}
else {
BLOCK *tem;
int blockSize = pool->end - pool->start;
if (blockSize < INIT_BLOCK_SIZE)
blockSize = INIT_BLOCK_SIZE;
else
blockSize *= 2;
tem = malloc(offsetof(BLOCK, s) + blockSize * sizeof(XML_Char));
if (!tem)
return 0;
tem->size = blockSize;
tem->next = pool->blocks;
pool->blocks = tem;
memcpy(tem->s, pool->start, (pool->ptr - pool->start) * sizeof(XML_Char));
pool->ptr = tem->s + (pool->ptr - pool->start);
pool->start = tem->s;
pool->end = tem->s + blockSize;
}
return 1;
}
1.1 apache-1.3/src/lib/expat/xmlparse.h
Index: xmlparse.h
===================================================================
/*
The contents of this file are subject to the Mozilla Public License
Version 1.0 (the "License"); you may not use this file except in
compliance with the License. You may obtain a copy of the License at
http://www.mozilla.org/MPL/
Software distributed under the License is distributed on an "AS IS"
basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the
License for the specific language governing rights and limitations
under the License.
The Original Code is expat.
The Initial Developer of the Original Code is James Clark.
Portions created by James Clark are Copyright (C) 1998
James Clark. All Rights Reserved.
Contributor(s):
*/
#ifndef XmlParse_INCLUDED
#define XmlParse_INCLUDED 1
#ifdef __cplusplus
extern "C" {
#endif
#ifndef XMLPARSEAPI
#define XMLPARSEAPI /* as nothing */
#endif
typedef void *XML_Parser;
#ifdef XML_UNICODE_WCHAR_T
/* XML_UNICODE_WCHAR_T will work only if sizeof(wchar_t) == 2 and wchar_t
uses Unicode. */
/* Information is UTF-16 encoded as wchar_ts */
#ifndef XML_UNICODE
#define XML_UNICODE
#endif
#include <stddef.h>
typedef wchar_t XML_Char;
typedef wchar_t XML_LChar;
#else /* not XML_UNICODE_WCHAR_T */
#ifdef XML_UNICODE
/* Information is UTF-16 encoded as unsigned shorts */
typedef unsigned short XML_Char;
typedef char XML_LChar;
#else /* not XML_UNICODE */
/* Information is UTF-8 encoded. */
typedef char XML_Char;
typedef char XML_LChar;
#endif /* not XML_UNICODE */
#endif /* not XML_UNICODE_WCHAR_T */
/* Constructs a new parser; encoding is the encoding specified by the external
protocol or null if there is none specified. */
XML_Parser XMLPARSEAPI
XML_ParserCreate(const XML_Char *encoding);
/* atts is array of name/value pairs, terminated by 0;
names and values are 0 terminated. */
typedef void (*XML_StartElementHandler)(void *userData,
const XML_Char *name,
const XML_Char **atts);
typedef void (*XML_EndElementHandler)(void *userData,
const XML_Char *name);
/* s is not 0 terminated. */
typedef void (*XML_CharacterDataHandler)(void *userData,
const XML_Char *s,
int len);
/* target and data are 0 terminated */
typedef void (*XML_ProcessingInstructionHandler)(void *userData,
const XML_Char *target,
const XML_Char *data);
/* This is called for any characters in the XML document for
which there is no applicable handler. This includes both
characters that are part of markup which is of a kind that is
not reported (comments, markup declarations), or characters
that are part of a construct which could be reported but
for which no handler has been supplied. The characters are passed
exactly as they were in the XML document except that
they will be encoded in UTF-8. Line boundaries are not normalized.
Note that a byte order mark character is not passed to the default handler.
If a default handler is set, internal entity references
are not expanded. There are no guarantees about
how characters are divided between calls to the default handler:
for example, a comment might be split between multiple calls. */
typedef void (*XML_DefaultHandler)(void *userData,
const XML_Char *s,
int len);
/* This is called for a declaration of an unparsed (NDATA)
entity. The base argument is whatever was set by XML_SetBase.
The entityName, systemId and notationName arguments will never be null.
The other arguments may be. */
typedef void (*XML_UnparsedEntityDeclHandler)(void *userData,
const XML_Char *entityName,
const XML_Char *base,
const XML_Char *systemId,
const XML_Char *publicId,
const XML_Char *notationName);
/* This is called for a declaration of notation.
The base argument is whatever was set by XML_SetBase.
The notationName will never be null. The other arguments can be. */
typedef void (*XML_NotationDeclHandler)(void *userData,
const XML_Char *notationName,
const XML_Char *base,
const XML_Char *systemId,
const XML_Char *publicId);
/* This is called for a reference to an external parsed general entity.
The referenced entity is not automatically parsed.
The application can parse it immediately or later using
XML_ExternalEntityParserCreate.
The parser argument is the parser parsing the entity containing the reference;
it can be passed as the parser argument to XML_ExternalEntityParserCreate.
The systemId argument is the system identifier as specified in the entity declaration;
it will not be null.
The base argument is the system identifier that should be used as the base for
resolving systemId if systemId was relative; this is set by XML_SetBase;
it may be null.
The publicId argument is the public identifier as specified in the entity declaration,
or null if none was specified; the whitespace in the public identifier
will have been normalized as required by the XML spec.
The openEntityNames argument is a space-separated list of the names of the entities
that are open for the parse of this entity (including the name of the referenced
entity); this can be passed as the openEntityNames argument to
XML_ExternalEntityParserCreate; openEntityNames is valid only until the handler
returns, so if the referenced entity is to be parsed later, it must be copied.
The handler should return 0 if processing should not continue because of
a fatal error in the handling of the external entity.
In this case the calling parser will return an XML_ERROR_EXTERNAL_ENTITY_HANDLING
error.
Note that unlike other handlers the first argument is the parser, not userData. */
typedef int (*XML_ExternalEntityRefHandler)(XML_Parser parser,
const XML_Char *openEntityNames,
const XML_Char *base,
const XML_Char *systemId,
const XML_Char *publicId);
/* This structure is filled in by the XML_UnknownEncodingHandler
to provide information to the parser about encodings that are unknown
to the parser.
The map[b] member gives information about byte sequences
whose first byte is b.
If map[b] is c where c is >= 0, then b by itself encodes the Unicode scalar value c.
If map[b] is -1, then the byte sequence is malformed.
If map[b] is -n, where n >= 2, then b is the first byte of an n-byte
sequence that encodes a single Unicode scalar value.
The data member will be passed as the first argument to the convert function.
The convert function is used to convert multibyte sequences;
s will point to a n-byte sequence where map[(unsigned char)*s] == -n.
The convert function must return the Unicode scalar value
represented by this byte sequence or -1 if the byte sequence is malformed.
The convert function may be null if the encoding is a single-byte encoding,
that is if map[b] >= -1 for all bytes b.
When the parser is finished with the encoding, then if release is not null,
it will call release passing it the data member;
once release has been called, the convert function will not be called again.
Expat places certain restrictions on the encodings that are supported
using this mechanism.
1. Every ASCII character that can appear in a well-formed XML document,
other than the characters
$@\^`{}~
must be represented by a single byte, and that byte must be the
same byte that represents that character in ASCII.
2. No character may require more than 4 bytes to encode.
3. All characters encoded must have Unicode scalar values <= 0xFFFF,
(ie characters that would be encoded by surrogates in UTF-16
are not allowed). Note that this restriction doesn't apply to
the built-in support for UTF-8 and UTF-16.
4. No Unicode character may be encoded by more than one distinct sequence
of bytes. */
typedef struct {
int map[256];
void *data;
int (*convert)(void *data, const char *s);
void (*release)(void *data);
} XML_Encoding;
/* This is called for an encoding that is unknown to the parser.
The encodingHandlerData argument is that which was passed as the
second argument to XML_SetUnknownEncodingHandler.
The name argument gives the name of the encoding as specified in
the encoding declaration.
If the callback can provide information about the encoding,
it must fill in the XML_Encoding structure, and return 1.
Otherwise it must return 0.
If info does not describe a suitable encoding,
then the parser will return an XML_UNKNOWN_ENCODING error. */
typedef int (*XML_UnknownEncodingHandler)(void *encodingHandlerData,
const XML_Char *name,
XML_Encoding *info);
void XMLPARSEAPI
XML_SetElementHandler(XML_Parser parser,
XML_StartElementHandler start,
XML_EndElementHandler end);
void XMLPARSEAPI
XML_SetCharacterDataHandler(XML_Parser parser,
XML_CharacterDataHandler handler);
void XMLPARSEAPI
XML_SetProcessingInstructionHandler(XML_Parser parser,
XML_ProcessingInstructionHandler handler);
void XMLPARSEAPI
XML_SetDefaultHandler(XML_Parser parser,
XML_DefaultHandler handler);
void XMLPARSEAPI
XML_SetUnparsedEntityDeclHandler(XML_Parser parser,
XML_UnparsedEntityDeclHandler handler);
void XMLPARSEAPI
XML_SetNotationDeclHandler(XML_Parser parser,
XML_NotationDeclHandler handler);
void XMLPARSEAPI
XML_SetExternalEntityRefHandler(XML_Parser parser,
XML_ExternalEntityRefHandler handler);
void XMLPARSEAPI
XML_SetUnknownEncodingHandler(XML_Parser parser,
XML_UnknownEncodingHandler handler,
void *encodingHandlerData);
/* This can be called within a handler for a start element, end element,
processing instruction or character data. It causes the corresponding
markup to be passed to the default handler.
Within the expansion of an internal entity, nothing will be passed
to the default handler, although this usually will not happen since
setting a default handler inhibits expansion of internal entities. */
void XMLPARSEAPI XML_DefaultCurrent(XML_Parser parser);
/* This value is passed as the userData argument to callbacks. */
void XMLPARSEAPI
XML_SetUserData(XML_Parser parser, void *userData);
/* Returns the last value set by XML_SetUserData or null. */
#define XML_GetUserData(parser) (*(void **)(parser))
/* If this function is called, then the parser will be passed
as the first argument to callbacks instead of userData.
The userData will still be accessible using XML_GetUserData. */
void XMLPARSEAPI
XML_UseParserAsHandlerArg(XML_Parser parser);
/* Sets the base to be used for resolving relative URIs in system identifiers in
declarations. Resolving relative identifiers is left to the application:
this value will be passed through as the base argument to the
XML_ExternalEntityRefHandler, XML_NotationDeclHandler
and XML_UnparsedEntityDeclHandler. The base argument will be copied.
Returns zero if out of memory, non-zero otherwise. */
int XMLPARSEAPI
XML_SetBase(XML_Parser parser, const XML_Char *base);
const XML_Char XMLPARSEAPI *
XML_GetBase(XML_Parser parser);
/* Parses some input. Returns 0 if a fatal error is detected.
The last call to XML_Parse must have isFinal true;
len may be zero for this call (or any other). */
int XMLPARSEAPI
XML_Parse(XML_Parser parser, const char *s, int len, int isFinal);
void XMLPARSEAPI *
XML_GetBuffer(XML_Parser parser, int len);
int XMLPARSEAPI
XML_ParseBuffer(XML_Parser parser, int len, int isFinal);
/* Creates an XML_Parser object that can parse an external general entity;
openEntityNames is a space-separated list of the names of the entities that are open
for the parse of this entity (including the name of this one);
encoding is the externally specified encoding,
or null if there is no externally specified encoding.
This can be called at any point after the first call to an ExternalEntityRefHandler
so longer as the parser has not yet been freed.
The new parser is completely independent and may safely be used in a separate thread.
The handlers and userData are initialized from the parser argument.
Returns 0 if out of memory. Otherwise returns a new XML_Parser object. */
XML_Parser XMLPARSEAPI
XML_ExternalEntityParserCreate(XML_Parser parser,
const XML_Char *openEntityNames,
const XML_Char *encoding);
enum XML_Error {
XML_ERROR_NONE,
XML_ERROR_NO_MEMORY,
XML_ERROR_SYNTAX,
XML_ERROR_NO_ELEMENTS,
XML_ERROR_INVALID_TOKEN,
XML_ERROR_UNCLOSED_TOKEN,
XML_ERROR_PARTIAL_CHAR,
XML_ERROR_TAG_MISMATCH,
XML_ERROR_DUPLICATE_ATTRIBUTE,
XML_ERROR_JUNK_AFTER_DOC_ELEMENT,
XML_ERROR_PARAM_ENTITY_REF,
XML_ERROR_UNDEFINED_ENTITY,
XML_ERROR_RECURSIVE_ENTITY_REF,
XML_ERROR_ASYNC_ENTITY,
XML_ERROR_BAD_CHAR_REF,
XML_ERROR_BINARY_ENTITY_REF,
XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF,
XML_ERROR_MISPLACED_XML_PI,
XML_ERROR_UNKNOWN_ENCODING,
XML_ERROR_INCORRECT_ENCODING,
XML_ERROR_UNCLOSED_CDATA_SECTION,
XML_ERROR_EXTERNAL_ENTITY_HANDLING
};
/* If XML_Parse or XML_ParseBuffer have returned 0, then XML_GetErrorCode
returns information about the error. */
enum XML_Error XMLPARSEAPI XML_GetErrorCode(XML_Parser parser);
/* These functions return information about the current parse location.
They may be called when XML_Parse or XML_ParseBuffer return 0;
in this case the location is the location of the character at which
the error was detected.
They may also be called from any other callback called to report
some parse event; in this the location is the location of the first
of the sequence of characters that generated the event. */
int XMLPARSEAPI XML_GetCurrentLineNumber(XML_Parser parser);
int XMLPARSEAPI XML_GetCurrentColumnNumber(XML_Parser parser);
long XMLPARSEAPI XML_GetCurrentByteIndex(XML_Parser parser);
/* For backwards compatibility with previous versions. */
#define XML_GetErrorLineNumber XML_GetCurrentLineNumber
#define XML_GetErrorColumnNumber XML_GetCurrentColumnNumber
#define XML_GetErrorByteIndex XML_GetCurrentByteIndex
/* Frees memory used by the parser. */
void XMLPARSEAPI
XML_ParserFree(XML_Parser parser);
/* Returns a string describing the error. */
const XML_LChar XMLPARSEAPI *XML_ErrorString(int code);
#ifdef __cplusplus
}
#endif
#endif /* not XmlParse_INCLUDED */
1.1 apache-1.3/src/lib/expat/xmlrole.c
Index: xmlrole.c
===================================================================
/*
The contents of this file are subject to the Mozilla Public License
Version 1.0 (the "License"); you may not use this file except in
compliance with the License. You may obtain a copy of the License at
http://www.mozilla.org/MPL/
Software distributed under the License is distributed on an "AS IS"
basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the
License for the specific language governing rights and limitations
under the License.
The Original Code is expat.
The Initial Developer of the Original Code is James Clark.
Portions created by James Clark are Copyright (C) 1998
James Clark. All Rights Reserved.
Contributor(s):
*/
#include "xmlrole.h"
/* Doesn't check:
that ,| are not mixed in a model group
content of literals
*/
#ifndef MIN_BYTES_PER_CHAR
#define MIN_BYTES_PER_CHAR(enc) ((enc)->minBytesPerChar)
#endif
typedef int PROLOG_HANDLER(struct prolog_state *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc);
static PROLOG_HANDLER
prolog0, prolog1, prolog2,
doctype0, doctype1, doctype2, doctype3, doctype4, doctype5,
internalSubset,
entity0, entity1, entity2, entity3, entity4, entity5, entity6,
entity7, entity8, entity9,
notation0, notation1, notation2, notation3, notation4,
attlist0, attlist1, attlist2, attlist3, attlist4, attlist5, attlist6,
attlist7, attlist8, attlist9,
element0, element1, element2, element3, element4, element5, element6,
element7,
declClose,
error;
static
int syntaxError(PROLOG_STATE *);
static
int prolog0(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
state->handler = prolog1;
return XML_ROLE_NONE;
case XML_TOK_XML_DECL:
state->handler = prolog1;
return XML_ROLE_XML_DECL;
case XML_TOK_PI:
state->handler = prolog1;
return XML_ROLE_NONE;
case XML_TOK_COMMENT:
state->handler = prolog1;
case XML_TOK_BOM:
return XML_ROLE_NONE;
case XML_TOK_DECL_OPEN:
if (!XmlNameMatchesAscii(enc,
ptr + 2 * MIN_BYTES_PER_CHAR(enc),
"DOCTYPE"))
break;
state->handler = doctype0;
return XML_ROLE_NONE;
case XML_TOK_INSTANCE_START:
state->handler = error;
return XML_ROLE_INSTANCE_START;
}
return syntaxError(state);
}
static
int prolog1(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_PI:
case XML_TOK_COMMENT:
case XML_TOK_BOM:
return XML_ROLE_NONE;
case XML_TOK_DECL_OPEN:
if (!XmlNameMatchesAscii(enc,
ptr + 2 * MIN_BYTES_PER_CHAR(enc),
"DOCTYPE"))
break;
state->handler = doctype0;
return XML_ROLE_NONE;
case XML_TOK_INSTANCE_START:
state->handler = error;
return XML_ROLE_INSTANCE_START;
}
return syntaxError(state);
}
static
int prolog2(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_PI:
case XML_TOK_COMMENT:
return XML_ROLE_NONE;
case XML_TOK_INSTANCE_START:
state->handler = error;
return XML_ROLE_INSTANCE_START;
}
return syntaxError(state);
}
static
int doctype0(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_NAME:
state->handler = doctype1;
return XML_ROLE_DOCTYPE_NAME;
}
return syntaxError(state);
}
static
int doctype1(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_OPEN_BRACKET:
state->handler = internalSubset;
return XML_ROLE_NONE;
case XML_TOK_DECL_CLOSE:
state->handler = prolog2;
return XML_ROLE_DOCTYPE_CLOSE;
case XML_TOK_NAME:
if (XmlNameMatchesAscii(enc, ptr, "SYSTEM")) {
state->handler = doctype3;
return XML_ROLE_NONE;
}
if (XmlNameMatchesAscii(enc, ptr, "PUBLIC")) {
state->handler = doctype2;
return XML_ROLE_NONE;
}
break;
}
return syntaxError(state);
}
static
int doctype2(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_LITERAL:
state->handler = doctype3;
return XML_ROLE_DOCTYPE_PUBLIC_ID;
}
return syntaxError(state);
}
static
int doctype3(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_LITERAL:
state->handler = doctype4;
return XML_ROLE_DOCTYPE_SYSTEM_ID;
}
return syntaxError(state);
}
static
int doctype4(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_OPEN_BRACKET:
state->handler = internalSubset;
return XML_ROLE_NONE;
case XML_TOK_DECL_CLOSE:
state->handler = prolog2;
return XML_ROLE_DOCTYPE_CLOSE;
}
return syntaxError(state);
}
static
int doctype5(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_DECL_CLOSE:
state->handler = prolog2;
return XML_ROLE_DOCTYPE_CLOSE;
}
return syntaxError(state);
}
static
int internalSubset(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_DECL_OPEN:
if (XmlNameMatchesAscii(enc,
ptr + 2 * MIN_BYTES_PER_CHAR(enc),
"ENTITY")) {
state->handler = entity0;
return XML_ROLE_NONE;
}
if (XmlNameMatchesAscii(enc,
ptr + 2 * MIN_BYTES_PER_CHAR(enc),
"ATTLIST")) {
state->handler = attlist0;
return XML_ROLE_NONE;
}
if (XmlNameMatchesAscii(enc,
ptr + 2 * MIN_BYTES_PER_CHAR(enc),
"ELEMENT")) {
state->handler = element0;
return XML_ROLE_NONE;
}
if (XmlNameMatchesAscii(enc,
ptr + 2 * MIN_BYTES_PER_CHAR(enc),
"NOTATION")) {
state->handler = notation0;
return XML_ROLE_NONE;
}
break;
case XML_TOK_PI:
case XML_TOK_COMMENT:
return XML_ROLE_NONE;
case XML_TOK_PARAM_ENTITY_REF:
return XML_ROLE_PARAM_ENTITY_REF;
case XML_TOK_CLOSE_BRACKET:
state->handler = doctype5;
return XML_ROLE_NONE;
}
return syntaxError(state);
}
static
int entity0(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_PERCENT:
state->handler = entity1;
return XML_ROLE_NONE;
case XML_TOK_NAME:
state->handler = entity2;
return XML_ROLE_GENERAL_ENTITY_NAME;
}
return syntaxError(state);
}
static
int entity1(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_NAME:
state->handler = entity7;
return XML_ROLE_PARAM_ENTITY_NAME;
}
return syntaxError(state);
}
static
int entity2(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_NAME:
if (XmlNameMatchesAscii(enc, ptr, "SYSTEM")) {
state->handler = entity4;
return XML_ROLE_NONE;
}
if (XmlNameMatchesAscii(enc, ptr, "PUBLIC")) {
state->handler = entity3;
return XML_ROLE_NONE;
}
break;
case XML_TOK_LITERAL:
state->handler = declClose;
return XML_ROLE_ENTITY_VALUE;
}
return syntaxError(state);
}
static
int entity3(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_LITERAL:
state->handler = entity4;
return XML_ROLE_ENTITY_PUBLIC_ID;
}
return syntaxError(state);
}
static
int entity4(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_LITERAL:
state->handler = entity5;
return XML_ROLE_ENTITY_SYSTEM_ID;
}
return syntaxError(state);
}
static
int entity5(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_DECL_CLOSE:
state->handler = internalSubset;
return XML_ROLE_NONE;
case XML_TOK_NAME:
if (XmlNameMatchesAscii(enc, ptr, "NDATA")) {
state->handler = entity6;
return XML_ROLE_NONE;
}
break;
}
return syntaxError(state);
}
static
int entity6(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_NAME:
state->handler = declClose;
return XML_ROLE_ENTITY_NOTATION_NAME;
}
return syntaxError(state);
}
static
int entity7(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_NAME:
if (XmlNameMatchesAscii(enc, ptr, "SYSTEM")) {
state->handler = entity9;
return XML_ROLE_NONE;
}
if (XmlNameMatchesAscii(enc, ptr, "PUBLIC")) {
state->handler = entity8;
return XML_ROLE_NONE;
}
break;
case XML_TOK_LITERAL:
state->handler = declClose;
return XML_ROLE_ENTITY_VALUE;
}
return syntaxError(state);
}
static
int entity8(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_LITERAL:
state->handler = entity9;
return XML_ROLE_ENTITY_PUBLIC_ID;
}
return syntaxError(state);
}
static
int entity9(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_LITERAL:
state->handler = declClose;
return XML_ROLE_ENTITY_SYSTEM_ID;
}
return syntaxError(state);
}
static
int notation0(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_NAME:
state->handler = notation1;
return XML_ROLE_NOTATION_NAME;
}
return syntaxError(state);
}
static
int notation1(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_NAME:
if (XmlNameMatchesAscii(enc, ptr, "SYSTEM")) {
state->handler = notation3;
return XML_ROLE_NONE;
}
if (XmlNameMatchesAscii(enc, ptr, "PUBLIC")) {
state->handler = notation2;
return XML_ROLE_NONE;
}
break;
}
return syntaxError(state);
}
static
int notation2(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_LITERAL:
state->handler = notation4;
return XML_ROLE_NOTATION_PUBLIC_ID;
}
return syntaxError(state);
}
static
int notation3(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_LITERAL:
state->handler = declClose;
return XML_ROLE_NOTATION_SYSTEM_ID;
}
return syntaxError(state);
}
static
int notation4(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_LITERAL:
state->handler = declClose;
return XML_ROLE_NOTATION_SYSTEM_ID;
case XML_TOK_DECL_CLOSE:
state->handler = internalSubset;
return XML_ROLE_NOTATION_NO_SYSTEM_ID;
}
return syntaxError(state);
}
static
int attlist0(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_NAME:
state->handler = attlist1;
return XML_ROLE_ATTLIST_ELEMENT_NAME;
}
return syntaxError(state);
}
static
int attlist1(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_DECL_CLOSE:
state->handler = internalSubset;
return XML_ROLE_NONE;
case XML_TOK_NAME:
state->handler = attlist2;
return XML_ROLE_ATTRIBUTE_NAME;
}
return syntaxError(state);
}
static
int attlist2(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_NAME:
{
static const char *types[] = {
"CDATA",
"ID",
"IDREF",
"IDREFS",
"ENTITY",
"ENTITIES",
"NMTOKEN",
"NMTOKENS",
};
int i;
for (i = 0; i < (int)(sizeof(types)/sizeof(types[0])); i++)
if (XmlNameMatchesAscii(enc, ptr, types[i])) {
state->handler = attlist8;
return XML_ROLE_ATTRIBUTE_TYPE_CDATA + i;
}
}
if (XmlNameMatchesAscii(enc, ptr, "NOTATION")) {
state->handler = attlist5;
return XML_ROLE_NONE;
}
break;
case XML_TOK_OPEN_PAREN:
state->handler = attlist3;
return XML_ROLE_NONE;
}
return syntaxError(state);
}
static
int attlist3(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_NMTOKEN:
case XML_TOK_NAME:
state->handler = attlist4;
return XML_ROLE_ATTRIBUTE_ENUM_VALUE;
}
return syntaxError(state);
}
static
int attlist4(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_CLOSE_PAREN:
state->handler = attlist8;
return XML_ROLE_NONE;
case XML_TOK_OR:
state->handler = attlist3;
return XML_ROLE_NONE;
}
return syntaxError(state);
}
static
int attlist5(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_OPEN_PAREN:
state->handler = attlist6;
return XML_ROLE_NONE;
}
return syntaxError(state);
}
static
int attlist6(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_NAME:
state->handler = attlist7;
return XML_ROLE_ATTRIBUTE_NOTATION_VALUE;
}
return syntaxError(state);
}
static
int attlist7(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_CLOSE_PAREN:
state->handler = attlist8;
return XML_ROLE_NONE;
case XML_TOK_OR:
state->handler = attlist6;
return XML_ROLE_NONE;
}
return syntaxError(state);
}
/* default value */
static
int attlist8(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_POUND_NAME:
if (XmlNameMatchesAscii(enc,
ptr + MIN_BYTES_PER_CHAR(enc),
"IMPLIED")) {
state->handler = attlist1;
return XML_ROLE_IMPLIED_ATTRIBUTE_VALUE;
}
if (XmlNameMatchesAscii(enc,
ptr + MIN_BYTES_PER_CHAR(enc),
"REQUIRED")) {
state->handler = attlist1;
return XML_ROLE_REQUIRED_ATTRIBUTE_VALUE;
}
if (XmlNameMatchesAscii(enc,
ptr + MIN_BYTES_PER_CHAR(enc),
"FIXED")) {
state->handler = attlist9;
return XML_ROLE_NONE;
}
break;
case XML_TOK_LITERAL:
state->handler = attlist1;
return XML_ROLE_DEFAULT_ATTRIBUTE_VALUE;
}
return syntaxError(state);
}
static
int attlist9(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_LITERAL:
state->handler = attlist1;
return XML_ROLE_FIXED_ATTRIBUTE_VALUE;
}
return syntaxError(state);
}
static
int element0(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_NAME:
state->handler = element1;
return XML_ROLE_ELEMENT_NAME;
}
return syntaxError(state);
}
static
int element1(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_NAME:
if (XmlNameMatchesAscii(enc, ptr, "EMPTY")) {
state->handler = declClose;
return XML_ROLE_CONTENT_EMPTY;
}
if (XmlNameMatchesAscii(enc, ptr, "ANY")) {
state->handler = declClose;
return XML_ROLE_CONTENT_ANY;
}
break;
case XML_TOK_OPEN_PAREN:
state->handler = element2;
state->level = 1;
return XML_ROLE_GROUP_OPEN;
}
return syntaxError(state);
}
static
int element2(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_POUND_NAME:
if (XmlNameMatchesAscii(enc,
ptr + MIN_BYTES_PER_CHAR(enc),
"PCDATA")) {
state->handler = element3;
return XML_ROLE_CONTENT_PCDATA;
}
break;
case XML_TOK_OPEN_PAREN:
state->level = 2;
state->handler = element6;
return XML_ROLE_GROUP_OPEN;
case XML_TOK_NAME:
state->handler = element7;
return XML_ROLE_CONTENT_ELEMENT;
case XML_TOK_NAME_QUESTION:
state->handler = element7;
return XML_ROLE_CONTENT_ELEMENT_OPT;
case XML_TOK_NAME_ASTERISK:
state->handler = element7;
return XML_ROLE_CONTENT_ELEMENT_REP;
case XML_TOK_NAME_PLUS:
state->handler = element7;
return XML_ROLE_CONTENT_ELEMENT_PLUS;
}
return syntaxError(state);
}
static
int element3(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_CLOSE_PAREN:
case XML_TOK_CLOSE_PAREN_ASTERISK:
state->handler = declClose;
return XML_ROLE_GROUP_CLOSE_REP;
case XML_TOK_OR:
state->handler = element4;
return XML_ROLE_NONE;
}
return syntaxError(state);
}
static
int element4(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_NAME:
state->handler = element5;
return XML_ROLE_CONTENT_ELEMENT;
}
return syntaxError(state);
}
static
int element5(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_CLOSE_PAREN_ASTERISK:
state->handler = declClose;
return XML_ROLE_GROUP_CLOSE_REP;
case XML_TOK_OR:
state->handler = element4;
return XML_ROLE_NONE;
}
return syntaxError(state);
}
static
int element6(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_OPEN_PAREN:
state->level += 1;
return XML_ROLE_GROUP_OPEN;
case XML_TOK_NAME:
state->handler = element7;
return XML_ROLE_CONTENT_ELEMENT;
case XML_TOK_NAME_QUESTION:
state->handler = element7;
return XML_ROLE_CONTENT_ELEMENT_OPT;
case XML_TOK_NAME_ASTERISK:
state->handler = element7;
return XML_ROLE_CONTENT_ELEMENT_REP;
case XML_TOK_NAME_PLUS:
state->handler = element7;
return XML_ROLE_CONTENT_ELEMENT_PLUS;
}
return syntaxError(state);
}
static
int element7(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_CLOSE_PAREN:
state->level -= 1;
if (state->level == 0)
state->handler = declClose;
return XML_ROLE_GROUP_CLOSE;
case XML_TOK_CLOSE_PAREN_ASTERISK:
state->level -= 1;
if (state->level == 0)
state->handler = declClose;
return XML_ROLE_GROUP_CLOSE_REP;
case XML_TOK_CLOSE_PAREN_QUESTION:
state->level -= 1;
if (state->level == 0)
state->handler = declClose;
return XML_ROLE_GROUP_CLOSE_OPT;
case XML_TOK_CLOSE_PAREN_PLUS:
state->level -= 1;
if (state->level == 0)
state->handler = declClose;
return XML_ROLE_GROUP_CLOSE_PLUS;
case XML_TOK_COMMA:
state->handler = element6;
return XML_ROLE_GROUP_SEQUENCE;
case XML_TOK_OR:
state->handler = element6;
return XML_ROLE_GROUP_CHOICE;
}
return syntaxError(state);
}
static
int declClose(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_DECL_CLOSE:
state->handler = internalSubset;
return XML_ROLE_NONE;
}
return syntaxError(state);
}
#if 0
static
int ignore(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_DECL_CLOSE:
state->handler = internalSubset;
return 0;
default:
return XML_ROLE_NONE;
}
return syntaxError(state);
}
#endif
static
int error(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
return XML_ROLE_NONE;
}
static
int syntaxError(PROLOG_STATE *state)
{
state->handler = error;
return XML_ROLE_ERROR;
}
void XmlPrologStateInit(PROLOG_STATE *state)
{
state->handler = prolog0;
}
1.1 apache-1.3/src/lib/expat/xmlrole.h
Index: xmlrole.h
===================================================================
/*
The contents of this file are subject to the Mozilla Public License
Version 1.0 (the "License"); you may not use this file except in
compliance with the License. You may obtain a copy of the License at
http://www.mozilla.org/MPL/
Software distributed under the License is distributed on an "AS IS"
basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the
License for the specific language governing rights and limitations
under the License.
The Original Code is expat.
The Initial Developer of the Original Code is James Clark.
Portions created by James Clark are Copyright (C) 1998
James Clark. All Rights Reserved.
Contributor(s):
*/
#ifndef XmlRole_INCLUDED
#define XmlRole_INCLUDED 1
#include "xmltok.h"
#ifdef __cplusplus
extern "C" {
#endif
enum {
XML_ROLE_ERROR = -1,
XML_ROLE_NONE = 0,
XML_ROLE_XML_DECL,
XML_ROLE_INSTANCE_START,
XML_ROLE_DOCTYPE_NAME,
XML_ROLE_DOCTYPE_SYSTEM_ID,
XML_ROLE_DOCTYPE_PUBLIC_ID,
XML_ROLE_DOCTYPE_CLOSE,
XML_ROLE_GENERAL_ENTITY_NAME,
XML_ROLE_PARAM_ENTITY_NAME,
XML_ROLE_ENTITY_VALUE,
XML_ROLE_ENTITY_SYSTEM_ID,
XML_ROLE_ENTITY_PUBLIC_ID,
XML_ROLE_ENTITY_NOTATION_NAME,
XML_ROLE_NOTATION_NAME,
XML_ROLE_NOTATION_SYSTEM_ID,
XML_ROLE_NOTATION_NO_SYSTEM_ID,
XML_ROLE_NOTATION_PUBLIC_ID,
XML_ROLE_ATTRIBUTE_NAME,
XML_ROLE_ATTRIBUTE_TYPE_CDATA,
XML_ROLE_ATTRIBUTE_TYPE_ID,
XML_ROLE_ATTRIBUTE_TYPE_IDREF,
XML_ROLE_ATTRIBUTE_TYPE_IDREFS,
XML_ROLE_ATTRIBUTE_TYPE_ENTITY,
XML_ROLE_ATTRIBUTE_TYPE_ENTITIES,
XML_ROLE_ATTRIBUTE_TYPE_NMTOKEN,
XML_ROLE_ATTRIBUTE_TYPE_NMTOKENS,
XML_ROLE_ATTRIBUTE_ENUM_VALUE,
XML_ROLE_ATTRIBUTE_NOTATION_VALUE,
XML_ROLE_ATTLIST_ELEMENT_NAME,
XML_ROLE_IMPLIED_ATTRIBUTE_VALUE,
XML_ROLE_REQUIRED_ATTRIBUTE_VALUE,
XML_ROLE_DEFAULT_ATTRIBUTE_VALUE,
XML_ROLE_FIXED_ATTRIBUTE_VALUE,
XML_ROLE_ELEMENT_NAME,
XML_ROLE_CONTENT_ANY,
XML_ROLE_CONTENT_EMPTY,
XML_ROLE_CONTENT_PCDATA,
XML_ROLE_GROUP_OPEN,
XML_ROLE_GROUP_CLOSE,
XML_ROLE_GROUP_CLOSE_REP,
XML_ROLE_GROUP_CLOSE_OPT,
XML_ROLE_GROUP_CLOSE_PLUS,
XML_ROLE_GROUP_CHOICE,
XML_ROLE_GROUP_SEQUENCE,
XML_ROLE_CONTENT_ELEMENT,
XML_ROLE_CONTENT_ELEMENT_REP,
XML_ROLE_CONTENT_ELEMENT_OPT,
XML_ROLE_CONTENT_ELEMENT_PLUS,
XML_ROLE_PARAM_ENTITY_REF
};
typedef struct prolog_state {
int (*handler)(struct prolog_state *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc);
unsigned level;
} PROLOG_STATE;
void XMLTOKAPI XmlPrologStateInit(PROLOG_STATE *);
#define XmlTokenRole(state, tok, ptr, end, enc) \
(((state)->handler)(state, tok, ptr, end, enc))
#ifdef __cplusplus
}
#endif
#endif /* not XmlRole_INCLUDED */
1.1 apache-1.3/src/lib/expat/xmltok.c
Index: xmltok.c
===================================================================
/*
The contents of this file are subject to the Mozilla Public License
Version 1.0 (the "License"); you may not use this file except in
compliance with the License. You may obtain a copy of the License at
http://www.mozilla.org/MPL/
Software distributed under the License is distributed on an "AS IS"
basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the
License for the specific language governing rights and limitations
under the License.
The Original Code is expat.
The Initial Developer of the Original Code is James Clark.
Portions created by James Clark are Copyright (C) 1998
James Clark. All Rights Reserved.
Contributor(s):
*/
#include "xmltok.h"
#include "nametab.h"
#define VTABLE1 \
{ PREFIX(prologTok), PREFIX(contentTok), PREFIX(cdataSectionTok) }, \
{ PREFIX(attributeValueTok), PREFIX(entityValueTok) }, \
PREFIX(sameName), \
PREFIX(nameMatchesAscii), \
PREFIX(nameLength), \
PREFIX(skipS), \
PREFIX(getAtts), \
PREFIX(charRefNumber), \
PREFIX(predefinedEntityName), \
PREFIX(updatePosition), \
PREFIX(isPublicId)
#define VTABLE VTABLE1, PREFIX(toUtf8), PREFIX(toUtf16)
#define UCS2_GET_NAMING(pages, hi, lo) \
(namingBitmap[(pages[hi] << 3) + ((lo) >> 5)] & (1 << ((lo) & 0x1F)))
/* A 2 byte UTF-8 representation splits the characters 11 bits
between the bottom 5 and 6 bits of the bytes.
We need 8 bits to index into pages, 3 bits to add to that index and
5 bits to generate the mask. */
#define UTF8_GET_NAMING2(pages, byte) \
(namingBitmap[((pages)[(((byte)[0]) >> 2) & 7] << 3) \
+ ((((byte)[0]) & 3) << 1) \
+ ((((byte)[1]) >> 5) & 1)] \
& (1 << (((byte)[1]) & 0x1F)))
/* A 3 byte UTF-8 representation splits the characters 16 bits
between the bottom 4, 6 and 6 bits of the bytes.
We need 8 bits to index into pages, 3 bits to add to that index and
5 bits to generate the mask. */
#define UTF8_GET_NAMING3(pages, byte) \
(namingBitmap[((pages)[((((byte)[0]) & 0xF) << 4) \
+ ((((byte)[1]) >> 2) & 0xF)] \
<< 3) \
+ ((((byte)[1]) & 3) << 1) \
+ ((((byte)[2]) >> 5) & 1)] \
& (1 << (((byte)[2]) & 0x1F)))
#define UTF8_GET_NAMING(pages, p, n) \
((n) == 2 \
? UTF8_GET_NAMING2(pages, (const unsigned char *)(p)) \
: ((n) == 3 \
? UTF8_GET_NAMING3(pages, (const unsigned char *)(p)) \
: 0))
#define UTF8_INVALID3(p) \
((*p) == 0xED \
? (((p)[1] & 0x20) != 0) \
: ((*p) == 0xEF \
? ((p)[1] == 0xBF && ((p)[2] == 0xBF || (p)[2] == 0xBE)) \
: 0))
#define UTF8_INVALID4(p) ((*p) == 0xF4 && ((p)[1] & 0x30) != 0)
static
int isNever(const ENCODING *enc, const char *p)
{
return 0;
}
static
int utf8_isName2(const ENCODING *enc, const char *p)
{
return UTF8_GET_NAMING2(namePages, (const unsigned char *)p);
}
static
int utf8_isName3(const ENCODING *enc, const char *p)
{
return UTF8_GET_NAMING3(namePages, (const unsigned char *)p);
}
#define utf8_isName4 isNever
static
int utf8_isNmstrt2(const ENCODING *enc, const char *p)
{
return UTF8_GET_NAMING2(nmstrtPages, (const unsigned char *)p);
}
static
int utf8_isNmstrt3(const ENCODING *enc, const char *p)
{
return UTF8_GET_NAMING3(nmstrtPages, (const unsigned char *)p);
}
#define utf8_isNmstrt4 isNever
#define utf8_isInvalid2 isNever
static
int utf8_isInvalid3(const ENCODING *enc, const char *p)
{
return UTF8_INVALID3((const unsigned char *)p);
}
static
int utf8_isInvalid4(const ENCODING *enc, const char *p)
{
return UTF8_INVALID4((const unsigned char *)p);
}
struct normal_encoding {
ENCODING enc;
unsigned char type[256];
int (*isName2)(const ENCODING *, const char *);
int (*isName3)(const ENCODING *, const char *);
int (*isName4)(const ENCODING *, const char *);
int (*isNmstrt2)(const ENCODING *, const char *);
int (*isNmstrt3)(const ENCODING *, const char *);
int (*isNmstrt4)(const ENCODING *, const char *);
int (*isInvalid2)(const ENCODING *, const char *);
int (*isInvalid3)(const ENCODING *, const char *);
int (*isInvalid4)(const ENCODING *, const char *);
};
#define NORMAL_VTABLE(E) \
E ## isName2, \
E ## isName3, \
E ## isName4, \
E ## isNmstrt2, \
E ## isNmstrt3, \
E ## isNmstrt4, \
E ## isInvalid2, \
E ## isInvalid3, \
E ## isInvalid4
static int checkCharRefNumber(int);
#include "xmltok_impl.h"
/* minimum bytes per character */
#define MINBPC 1
#define BYTE_TYPE(enc, p) \
(((struct normal_encoding *)(enc))->type[(unsigned char)*(p)])
#define BYTE_TO_ASCII(enc, p) (*p)
#define IS_NAME_CHAR(enc, p, n) \
(((const struct normal_encoding *)(enc))->isName ## n(enc, p))
#define IS_NMSTRT_CHAR(enc, p, n) \
(((const struct normal_encoding *)(enc))->isNmstrt ## n(enc, p))
#define IS_INVALID_CHAR(enc, p, n) \
(((const struct normal_encoding *)(enc))->isInvalid ## n(enc, p))
#define IS_NAME_CHAR_MINBPC(enc, p) (0)
#define IS_NMSTRT_CHAR_MINBPC(enc, p) (0)
/* c is an ASCII character */
#define CHAR_MATCHES(enc, p, c) (*(p) == c)
#define PREFIX(ident) normal_ ## ident
#include "xmltok_impl.c"
#undef MINBPC
#undef BYTE_TYPE
#undef BYTE_TO_ASCII
#undef CHAR_MATCHES
#undef IS_NAME_CHAR
#undef IS_NAME_CHAR_MINBPC
#undef IS_NMSTRT_CHAR
#undef IS_NMSTRT_CHAR_MINBPC
#undef IS_INVALID_CHAR
enum { /* UTF8_cvalN is value of masked first byte of N byte sequence */
UTF8_cval1 = 0x00,
UTF8_cval2 = 0xc0,
UTF8_cval3 = 0xe0,
UTF8_cval4 = 0xf0
};
static
void utf8_toUtf8(const ENCODING *enc,
const char **fromP, const char *fromLim,
char **toP, const char *toLim)
{
char *to;
const char *from;
if (fromLim - *fromP > toLim - *toP) {
/* Avoid copying partial characters. */
for (fromLim = *fromP + (toLim - *toP); fromLim > *fromP; fromLim--)
if (((unsigned char)fromLim[-1] & 0xc0) != 0x80)
break;
}
for (to = *toP, from = *fromP; from != fromLim; from++, to++)
*to = *from;
*fromP = from;
*toP = to;
}
static
void utf8_toUtf16(const ENCODING *enc,
const char **fromP, const char *fromLim,
unsigned short **toP, const unsigned short *toLim)
{
unsigned short *to = *toP;
const char *from = *fromP;
while (from != fromLim && to != toLim) {
switch (((struct normal_encoding *)enc)->type[(unsigned char)*from]) {
case BT_LEAD2:
*to++ = ((from[0] & 0x1f) << 6) | (from[1] & 0x3f);
from += 2;
break;
case BT_LEAD3:
*to++ = ((from[0] & 0xf) << 12) | ((from[1] & 0x3f) << 6) | (from[2] & 0x3f);
from += 3;
break;
case BT_LEAD4:
{
unsigned long n;
if (to + 1 == toLim)
break;
n = ((from[0] & 0x7) << 18) | ((from[1] & 0x3f) << 12) | ((from[2] & 0x3f) << 6) | (from[3] & 0x3f);
n -= 0x10000;
to[0] = (unsigned short)((n >> 10) | 0xD800);
to[1] = (unsigned short)((n & 0x3FF) | 0xDC00);
to += 2;
from += 4;
}
break;
default:
*to++ = *from++;
break;
}
}
*fromP = from;
*toP = to;
}
static const struct normal_encoding utf8_encoding = {
{ VTABLE1, utf8_toUtf8, utf8_toUtf16, 1, 1, 0 },
{
#include "asciitab.h"
#include "utf8tab.h"
},
NORMAL_VTABLE(utf8_)
};
static const struct normal_encoding internal_utf8_encoding = {
{ VTABLE1, utf8_toUtf8, utf8_toUtf16, 1, 1, 0 },
{
#include "iasciitab.h"
#include "utf8tab.h"
},
NORMAL_VTABLE(utf8_)
};
static
void latin1_toUtf8(const ENCODING *enc,
const char **fromP, const char *fromLim,
char **toP, const char *toLim)
{
for (;;) {
unsigned char c;
if (*fromP == fromLim)
break;
c = (unsigned char)**fromP;
if (c & 0x80) {
if (toLim - *toP < 2)
break;
*(*toP)++ = ((c >> 6) | UTF8_cval2);
*(*toP)++ = ((c & 0x3f) | 0x80);
(*fromP)++;
}
else {
if (*toP == toLim)
break;
*(*toP)++ = *(*fromP)++;
}
}
}
static
void latin1_toUtf16(const ENCODING *enc,
const char **fromP, const char *fromLim,
unsigned short **toP, const unsigned short *toLim)
{
while (*fromP != fromLim && *toP != toLim)
*(*toP)++ = (unsigned char)*(*fromP)++;
}
static const struct normal_encoding latin1_encoding = {
{ VTABLE1, latin1_toUtf8, latin1_toUtf16, 1, 0, 0 },
{
#include "asciitab.h"
#include "latin1tab.h"
}
};
static
void ascii_toUtf8(const ENCODING *enc,
const char **fromP, const char *fromLim,
char **toP, const char *toLim)
{
while (*fromP != fromLim && *toP != toLim)
*(*toP)++ = *(*fromP)++;
}
static const struct normal_encoding ascii_encoding = {
{ VTABLE1, ascii_toUtf8, latin1_toUtf16, 1, 1, 0 },
{
#include "asciitab.h"
/* BT_NONXML == 0 */
}
};
#undef PREFIX
static int unicode_byte_type(char hi, char lo)
{
switch ((unsigned char)hi) {
case 0xD8: case 0xD9: case 0xDA: case 0xDB:
return BT_LEAD4;
case 0xDC: case 0xDD: case 0xDE: case 0xDF:
return BT_TRAIL;
case 0xFF:
switch ((unsigned char)lo) {
case 0xFF:
case 0xFE:
return BT_NONXML;
}
break;
}
return BT_NONASCII;
}
#define DEFINE_UTF16_TO_UTF8 \
static \
void PREFIX(toUtf8)(const ENCODING *enc, \
const char **fromP, const char *fromLim, \
char **toP, const char *toLim) \
{ \
const char *from; \
for (from = *fromP; from != fromLim; from += 2) { \
int plane; \
unsigned char lo2; \
unsigned char lo = GET_LO(from); \
unsigned char hi = GET_HI(from); \
switch (hi) { \
case 0: \
if (lo < 0x80) { \
if (*toP == toLim) { \
*fromP = from; \
return; \
} \
*(*toP)++ = lo; \
break; \
} \
/* fall through */ \
case 0x1: case 0x2: case 0x3: \
case 0x4: case 0x5: case 0x6: case 0x7: \
if (toLim - *toP < 2) { \
*fromP = from; \
return; \
} \
*(*toP)++ = ((lo >> 6) | (hi << 2) | UTF8_cval2); \
*(*toP)++ = ((lo & 0x3f) | 0x80); \
break; \
default: \
if (toLim - *toP < 3) { \
*fromP = from; \
return; \
} \
/* 16 bits divided 4, 6, 6 amongst 3 bytes */ \
*(*toP)++ = ((hi >> 4) | UTF8_cval3); \
*(*toP)++ = (((hi & 0xf) << 2) | (lo >> 6) | 0x80); \
*(*toP)++ = ((lo & 0x3f) | 0x80); \
break; \
case 0xD8: case 0xD9: case 0xDA: case 0xDB: \
if (toLim - *toP < 4) { \
*fromP = from; \
return; \
} \
plane = (((hi & 0x3) << 2) | ((lo >> 6) & 0x3)) + 1; \
*(*toP)++ = ((plane >> 2) | UTF8_cval4); \
*(*toP)++ = (((lo >> 2) & 0xF) | ((plane & 0x3) << 4) | 0x80); \
from += 2; \
lo2 = GET_LO(from); \
*(*toP)++ = (((lo & 0x3) << 4) \
| ((GET_HI(from) & 0x3) << 2) \
| (lo2 >> 6) \
| 0x80); \
*(*toP)++ = ((lo2 & 0x3f) | 0x80); \
break; \
} \
} \
*fromP = from; \
}
#define DEFINE_UTF16_TO_UTF16 \
static \
void PREFIX(toUtf16)(const ENCODING *enc, \
const char **fromP, const char *fromLim, \
unsigned short **toP, const unsigned short *toLim) \
{ \
/* Avoid copying first half only of surrogate */ \
if (fromLim - *fromP > ((toLim - *toP) << 1) \
&& (GET_HI(fromLim - 2) & 0xF8) == 0xD8) \
fromLim -= 2; \
for (; *fromP != fromLim && *toP != toLim; *fromP += 2) \
*(*toP)++ = (GET_HI(*fromP) << 8) | GET_LO(*fromP); \
}
#define PREFIX(ident) little2_ ## ident
#define MINBPC 2
#define BYTE_TYPE(enc, p) \
((p)[1] == 0 \
? ((struct normal_encoding *)(enc))->type[(unsigned char)*(p)] \
: unicode_byte_type((p)[1], (p)[0]))
#define BYTE_TO_ASCII(enc, p) ((p)[1] == 0 ? (p)[0] : -1)
#define CHAR_MATCHES(enc, p, c) ((p)[1] == 0 && (p)[0] == c)
#define IS_NAME_CHAR(enc, p, n) (0)
#define IS_NAME_CHAR_MINBPC(enc, p) \
UCS2_GET_NAMING(namePages, (unsigned char)p[1], (unsigned char)p[0])
#define IS_NMSTRT_CHAR(enc, p, n) (0)
#define IS_NMSTRT_CHAR_MINBPC(enc, p) \
UCS2_GET_NAMING(nmstrtPages, (unsigned char)p[1], (unsigned char)p[0])
#include "xmltok_impl.c"
#define SET2(ptr, ch) \
(((ptr)[0] = ((ch) & 0xff)), ((ptr)[1] = ((ch) >> 8)))
#define GET_LO(ptr) ((unsigned char)(ptr)[0])
#define GET_HI(ptr) ((unsigned char)(ptr)[1])
DEFINE_UTF16_TO_UTF8
DEFINE_UTF16_TO_UTF16
#undef SET2
#undef GET_LO
#undef GET_HI
#undef MINBPC
#undef BYTE_TYPE
#undef BYTE_TO_ASCII
#undef CHAR_MATCHES
#undef IS_NAME_CHAR
#undef IS_NAME_CHAR_MINBPC
#undef IS_NMSTRT_CHAR
#undef IS_NMSTRT_CHAR_MINBPC
#undef IS_INVALID_CHAR
static const struct normal_encoding little2_encoding = {
{ VTABLE, 2, 0,
#if BYTE_ORDER == 12
1
#else
0
#endif
},
#include "asciitab.h"
#include "latin1tab.h"
};
#if BYTE_ORDER != 21
static const struct normal_encoding internal_little2_encoding = {
{ VTABLE, 2, 0, 1 },
#include "iasciitab.h"
#include "latin1tab.h"
};
#endif
#undef PREFIX
#define PREFIX(ident) big2_ ## ident
#define MINBPC 2
/* CHAR_MATCHES is guaranteed to have MINBPC bytes available. */
#define BYTE_TYPE(enc, p) \
((p)[0] == 0 \
? ((struct normal_encoding *)(enc))->type[(unsigned char)(p)[1]] \
: unicode_byte_type((p)[0], (p)[1]))
#define BYTE_TO_ASCII(enc, p) ((p)[0] == 0 ? (p)[1] : -1)
#define CHAR_MATCHES(enc, p, c) ((p)[0] == 0 && (p)[1] == c)
#define IS_NAME_CHAR(enc, p, n) 0
#define IS_NAME_CHAR_MINBPC(enc, p) \
UCS2_GET_NAMING(namePages, (unsigned char)p[0], (unsigned char)p[1])
#define IS_NMSTRT_CHAR(enc, p, n) (0)
#define IS_NMSTRT_CHAR_MINBPC(enc, p) \
UCS2_GET_NAMING(nmstrtPages, (unsigned char)p[0], (unsigned char)p[1])
#include "xmltok_impl.c"
#define SET2(ptr, ch) \
(((ptr)[0] = ((ch) >> 8)), ((ptr)[1] = ((ch) & 0xFF)))
#define GET_LO(ptr) ((unsigned char)(ptr)[1])
#define GET_HI(ptr) ((unsigned char)(ptr)[0])
DEFINE_UTF16_TO_UTF8
DEFINE_UTF16_TO_UTF16
#undef SET2
#undef GET_LO
#undef GET_HI
#undef MINBPC
#undef BYTE_TYPE
#undef BYTE_TO_ASCII
#undef CHAR_MATCHES
#undef IS_NAME_CHAR
#undef IS_NAME_CHAR_MINBPC
#undef IS_NMSTRT_CHAR
#undef IS_NMSTRT_CHAR_MINBPC
#undef IS_INVALID_CHAR
static const struct normal_encoding big2_encoding = {
{ VTABLE, 2, 0,
#if BYTE_ORDER == 21
1
#else
0
#endif
},
#include "asciitab.h"
#include "latin1tab.h"
};
#if BYTE_ORDER != 12
static const struct normal_encoding internal_big2_encoding = {
{ VTABLE, 2, 0, 1 },
#include "iasciitab.h"
#include "latin1tab.h"
};
#endif
#undef PREFIX
static
int streqci(const char *s1, const char *s2)
{
for (;;) {
char c1 = *s1++;
char c2 = *s2++;
if ('a' <= c1 && c1 <= 'z')
c1 += 'A' - 'a';
if ('a' <= c2 && c2 <= 'z')
c2 += 'A' - 'a';
if (c1 != c2)
return 0;
if (!c1)
break;
}
return 1;
}
static
int initScan(const ENCODING *enc, int state, const char *ptr, const char *end,
const char **nextTokPtr)
{
const ENCODING **encPtr;
if (ptr == end)
return XML_TOK_NONE;
encPtr = ((const INIT_ENCODING *)enc)->encPtr;
if (ptr + 1 == end) {
switch ((unsigned char)*ptr) {
case 0xFE:
case 0xFF:
case 0x00:
case 0x3C:
return XML_TOK_PARTIAL;
}
}
else {
switch (((unsigned char)ptr[0] << 8) | (unsigned char)ptr[1]) {
case 0x003C:
*encPtr = &big2_encoding.enc;
return XmlTok(*encPtr, state, ptr, end, nextTokPtr);
case 0xFEFF:
*nextTokPtr = ptr + 2;
*encPtr = &big2_encoding.enc;
return XML_TOK_BOM;
case 0x3C00:
*encPtr = &little2_encoding.enc;
return XmlTok(*encPtr, state, ptr, end, nextTokPtr);
case 0xFFFE:
*nextTokPtr = ptr + 2;
*encPtr = &little2_encoding.enc;
return XML_TOK_BOM;
}
}
*encPtr = &utf8_encoding.enc;
return XmlTok(*encPtr, state, ptr, end, nextTokPtr);
}
static
int initScanProlog(const ENCODING *enc, const char *ptr, const char *end,
const char **nextTokPtr)
{
return initScan(enc, XML_PROLOG_STATE, ptr, end, nextTokPtr);
}
static
int initScanContent(const ENCODING *enc, const char *ptr, const char *end,
const char **nextTokPtr)
{
return initScan(enc, XML_CONTENT_STATE, ptr, end, nextTokPtr);
}
static
void initUpdatePosition(const ENCODING *enc, const char *ptr,
const char *end, POSITION *pos)
{
normal_updatePosition(&utf8_encoding.enc, ptr, end, pos);
}
const ENCODING *XmlGetUtf8InternalEncoding()
{
return &internal_utf8_encoding.enc;
}
const ENCODING *XmlGetUtf16InternalEncoding()
{
#if BYTE_ORDER == 12
return &internal_little2_encoding.enc;
#elif BYTE_ORDER == 21
return &internal_big2_encoding.enc;
#else
const short n = 1;
return *(const char *)&n ? &internal_little2_encoding.enc : &internal_big2_encoding.enc;
#endif
}
int XmlInitEncoding(INIT_ENCODING *p, const ENCODING **encPtr, const char *name)
{
if (name) {
if (streqci(name, "ISO-8859-1")) {
*encPtr = &latin1_encoding.enc;
return 1;
}
if (streqci(name, "UTF-8")) {
*encPtr = &utf8_encoding.enc;
return 1;
}
if (streqci(name, "US-ASCII")) {
*encPtr = &ascii_encoding.enc;
return 1;
}
if (!streqci(name, "UTF-16"))
return 0;
}
p->initEnc.scanners[XML_PROLOG_STATE] = initScanProlog;
p->initEnc.scanners[XML_CONTENT_STATE] = initScanContent;
p->initEnc.updatePosition = initUpdatePosition;
p->initEnc.minBytesPerChar = 1;
p->encPtr = encPtr;
*encPtr = &(p->initEnc);
return 1;
}
static
int toAscii(const ENCODING *enc, const char *ptr, const char *end)
{
char buf[1];
char *p = buf;
XmlUtf8Convert(enc, &ptr, end, &p, p + 1);
if (p == buf)
return -1;
else
return buf[0];
}
static
int isSpace(int c)
{
switch (c) {
case ' ':
case '\r':
case '\n':
case '\t':
return 1;
}
return 0;
}
/* Return 1 if there's just optional white space
or there's an S followed by name=val. */
static
int parsePseudoAttribute(const ENCODING *enc,
const char *ptr,
const char *end,
const char **namePtr,
const char **valPtr,
const char **nextTokPtr)
{
int c;
char open;
if (ptr == end) {
*namePtr = 0;
return 1;
}
if (!isSpace(toAscii(enc, ptr, end))) {
*nextTokPtr = ptr;
return 0;
}
do {
ptr += enc->minBytesPerChar;
} while (isSpace(toAscii(enc, ptr, end)));
if (ptr == end) {
*namePtr = 0;
return 1;
}
*namePtr = ptr;
for (;;) {
c = toAscii(enc, ptr, end);
if (c == -1) {
*nextTokPtr = ptr;
return 0;
}
if (c == '=')
break;
if (isSpace(c)) {
do {
ptr += enc->minBytesPerChar;
} while (isSpace(c = toAscii(enc, ptr, end)));
if (c != '=') {
*nextTokPtr = ptr;
return 0;
}
break;
}
ptr += enc->minBytesPerChar;
}
if (ptr == *namePtr) {
*nextTokPtr = ptr;
return 0;
}
ptr += enc->minBytesPerChar;
c = toAscii(enc, ptr, end);
while (isSpace(c)) {
ptr += enc->minBytesPerChar;
c = toAscii(enc, ptr, end);
}
if (c != '"' && c != '\'') {
*nextTokPtr = ptr;
return 0;
}
open = c;
ptr += enc->minBytesPerChar;
*valPtr = ptr;
for (;; ptr += enc->minBytesPerChar) {
c = toAscii(enc, ptr, end);
if (c == open)
break;
if (!('a' <= c && c <= 'z')
&& !('A' <= c && c <= 'Z')
&& !('0' <= c && c <= '9')
&& c != '.'
&& c != '-'
&& c != '_') {
*nextTokPtr = ptr;
return 0;
}
}
*nextTokPtr = ptr + enc->minBytesPerChar;
return 1;
}
static
const ENCODING *findEncoding(const ENCODING *enc, const char *ptr, const char *end)
{
#define ENCODING_MAX 128
char buf[ENCODING_MAX];
char *p = buf;
int i;
XmlUtf8Convert(enc, &ptr, end, &p, p + ENCODING_MAX - 1);
if (ptr != end)
return 0;
*p = 0;
for (i = 0; buf[i]; i++) {
if ('a' <= buf[i] && buf[i] <= 'z')
buf[i] += 'A' - 'a';
}
if (streqci(buf, "UTF-8"))
return &utf8_encoding.enc;
if (streqci(buf, "ISO-8859-1"))
return &latin1_encoding.enc;
if (streqci(buf, "US-ASCII"))
return &ascii_encoding.enc;
if (streqci(buf, "UTF-16")) {
static const unsigned short n = 1;
if (enc->minBytesPerChar == 2)
return enc;
return &big2_encoding.enc;
}
return 0;
}
int XmlParseXmlDecl(int isGeneralTextEntity,
const ENCODING *enc,
const char *ptr,
const char *end,
const char **badPtr,
const char **versionPtr,
const char **encodingName,
const ENCODING **encoding,
int *standalone)
{
const char *val = 0;
const char *name = 0;
ptr += 5 * enc->minBytesPerChar;
end -= 2 * enc->minBytesPerChar;
if (!parsePseudoAttribute(enc, ptr, end, &name, &val, &ptr) || !name) {
*badPtr = ptr;
return 0;
}
if (!XmlNameMatchesAscii(enc, name, "version")) {
if (!isGeneralTextEntity) {
*badPtr = name;
return 0;
}
}
else {
if (versionPtr)
*versionPtr = val;
if (!parsePseudoAttribute(enc, ptr, end, &name, &val, &ptr)) {
*badPtr = ptr;
return 0;
}
if (!name) {
if (isGeneralTextEntity) {
/* a TextDecl must have an EncodingDecl */
*badPtr = ptr;
return 0;
}
return 1;
}
}
if (XmlNameMatchesAscii(enc, name, "encoding")) {
int c = toAscii(enc, val, end);
if (!('a' <= c && c <= 'z') && !('A' <= c && c <= 'Z')) {
*badPtr = val;
return 0;
}
if (encodingName)
*encodingName = val;
if (encoding)
*encoding = findEncoding(enc, val, ptr - enc->minBytesPerChar);
if (!parsePseudoAttribute(enc, ptr, end, &name, &val, &ptr)) {
*badPtr = ptr;
return 0;
}
if (!name)
return 1;
}
if (!XmlNameMatchesAscii(enc, name, "standalone") || isGeneralTextEntity) {
*badPtr = name;
return 0;
}
if (XmlNameMatchesAscii(enc, val, "yes")) {
if (standalone)
*standalone = 1;
}
else if (XmlNameMatchesAscii(enc, val, "no")) {
if (standalone)
*standalone = 0;
}
else {
*badPtr = val;
return 0;
}
while (isSpace(toAscii(enc, ptr, end)))
ptr += enc->minBytesPerChar;
if (ptr != end) {
*badPtr = ptr;
return 0;
}
return 1;
}
static
int checkCharRefNumber(int result)
{
switch (result >> 8) {
case 0xD8: case 0xD9: case 0xDA: case 0xDB:
case 0xDC: case 0xDD: case 0xDE: case 0xDF:
return -1;
case 0:
if (latin1_encoding.type[result] == BT_NONXML)
return -1;
break;
case 0xFF:
if (result == 0xFFFE || result == 0xFFFF)
return -1;
break;
}
return result;
}
int XmlUtf8Encode(int c, char *buf)
{
enum {
/* minN is minimum legal resulting value for N byte sequence */
min2 = 0x80,
min3 = 0x800,
min4 = 0x10000
};
if (c < 0)
return 0;
if (c < min2) {
buf[0] = (c | UTF8_cval1);
return 1;
}
if (c < min3) {
buf[0] = ((c >> 6) | UTF8_cval2);
buf[1] = ((c & 0x3f) | 0x80);
return 2;
}
if (c < min4) {
buf[0] = ((c >> 12) | UTF8_cval3);
buf[1] = (((c >> 6) & 0x3f) | 0x80);
buf[2] = ((c & 0x3f) | 0x80);
return 3;
}
if (c < 0x110000) {
buf[0] = ((c >> 18) | UTF8_cval4);
buf[1] = (((c >> 12) & 0x3f) | 0x80);
buf[2] = (((c >> 6) & 0x3f) | 0x80);
buf[3] = ((c & 0x3f) | 0x80);
return 4;
}
return 0;
}
int XmlUtf16Encode(int charNum, unsigned short *buf)
{
if (charNum < 0)
return 0;
if (charNum < 0x10000) {
buf[0] = charNum;
return 1;
}
if (charNum < 0x110000) {
charNum -= 0x10000;
buf[0] = (charNum >> 10) + 0xD800;
buf[1] = (charNum & 0x3FF) + 0xDC00;
return 2;
}
return 0;
}
struct unknown_encoding {
struct normal_encoding normal;
int (*convert)(void *userData, const char *p);
void *userData;
unsigned short utf16[256];
char utf8[256][4];
};
int XmlSizeOfUnknownEncoding()
{
return sizeof(struct unknown_encoding);
}
static
int unknown_isName(const ENCODING *enc, const char *p)
{
int c = ((const struct unknown_encoding *)enc)
->convert(((const struct unknown_encoding *)enc)->userData, p);
if (c & ~0xFFFF)
return 0;
return UCS2_GET_NAMING(namePages, c >> 8, c & 0xFF);
}
static
int unknown_isNmstrt(const ENCODING *enc, const char *p)
{
int c = ((const struct unknown_encoding *)enc)
->convert(((const struct unknown_encoding *)enc)->userData, p);
if (c & ~0xFFFF)
return 0;
return UCS2_GET_NAMING(nmstrtPages, c >> 8, c & 0xFF);
}
static
int unknown_isInvalid(const ENCODING *enc, const char *p)
{
int c = ((const struct unknown_encoding *)enc)
->convert(((const struct unknown_encoding *)enc)->userData, p);
return (c & ~0xFFFF) || checkCharRefNumber(c) < 0;
}
static
void unknown_toUtf8(const ENCODING *enc,
const char **fromP, const char *fromLim,
char **toP, const char *toLim)
{
char buf[XML_UTF8_ENCODE_MAX];
for (;;) {
const char *utf8;
int n;
if (*fromP == fromLim)
break;
utf8 = ((const struct unknown_encoding *)enc)->utf8[(unsigned char)**fromP];
n = *utf8++;
if (n == 0) {
int c = ((const struct unknown_encoding *)enc)
->convert(((const struct unknown_encoding *)enc)->userData, *fromP);
n = XmlUtf8Encode(c, buf);
if (n > toLim - *toP)
break;
utf8 = buf;
*fromP += ((const struct normal_encoding *)enc)->type[(unsigned char)**fromP]
- (BT_LEAD2 - 2);
}
else {
if (n > toLim - *toP)
break;
(*fromP)++;
}
do {
*(*toP)++ = *utf8++;
} while (--n != 0);
}
}
static
void unknown_toUtf16(const ENCODING *enc,
const char **fromP, const char *fromLim,
unsigned short **toP, const unsigned short *toLim)
{
while (*fromP != fromLim && *toP != toLim) {
unsigned short c
= ((const struct unknown_encoding *)enc)->utf16[(unsigned char)**fromP];
if (c == 0) {
c = (unsigned short)((const struct unknown_encoding *)enc)
->convert(((const struct unknown_encoding *)enc)->userData, *fromP);
*fromP += ((const struct normal_encoding *)enc)->type[(unsigned char)**fromP]
- (BT_LEAD2 - 2);
}
else
(*fromP)++;
*(*toP)++ = c;
}
}
ENCODING *
XmlInitUnknownEncoding(void *mem,
int *table,
int (*convert)(void *userData, const char *p),
void *userData)
{
int i;
struct unknown_encoding *e = mem;
for (i = 0; i < sizeof(struct normal_encoding); i++)
((char *)mem)[i] = ((char *)&latin1_encoding)[i];
for (i = 0; i < 128; i++)
if (latin1_encoding.type[i] != BT_OTHER
&& latin1_encoding.type[i] != BT_NONXML
&& table[i] != i)
return 0;
for (i = 0; i < 256; i++) {
int c = table[i];
if (c == -1) {
e->normal.type[i] = BT_MALFORM;
/* This shouldn't really get used. */
e->utf16[i] = 0xFFFF;
e->utf8[i][0] = 1;
e->utf8[i][1] = 0;
}
else if (c < 0) {
if (c < -4)
return 0;
e->normal.type[i] = BT_LEAD2 - (c + 2);
e->utf8[i][0] = 0;
e->utf16[i] = 0;
}
else if (c < 0x80) {
if (latin1_encoding.type[c] != BT_OTHER
&& latin1_encoding.type[c] != BT_NONXML
&& c != i)
return 0;
e->normal.type[i] = latin1_encoding.type[c];
e->utf8[i][0] = 1;
e->utf8[i][1] = (char)c;
e->utf16[i] = c == 0 ? 0xFFFF : c;
}
else if (checkCharRefNumber(c) < 0) {
e->normal.type[i] = BT_NONXML;
/* This shouldn't really get used. */
e->utf16[i] = 0xFFFF;
e->utf8[i][0] = 1;
e->utf8[i][1] = 0;
}
else {
if (c > 0xFFFF)
return 0;
if (UCS2_GET_NAMING(nmstrtPages, c >> 8, c & 0xff))
e->normal.type[i] = BT_NMSTRT;
else if (UCS2_GET_NAMING(namePages, c >> 8, c & 0xff))
e->normal.type[i] = BT_NAME;
else
e->normal.type[i] = BT_OTHER;
e->utf8[i][0] = (char)XmlUtf8Encode(c, e->utf8[i] + 1);
e->utf16[i] = c;
}
}
e->userData = userData;
e->convert = convert;
if (convert) {
e->normal.isName2 = unknown_isName;
e->normal.isName3 = unknown_isName;
e->normal.isName4 = unknown_isName;
e->normal.isNmstrt2 = unknown_isNmstrt;
e->normal.isNmstrt3 = unknown_isNmstrt;
e->normal.isNmstrt4 = unknown_isNmstrt;
e->normal.isInvalid2 = unknown_isInvalid;
e->normal.isInvalid3 = unknown_isInvalid;
e->normal.isInvalid4 = unknown_isInvalid;
}
e->normal.enc.utf8Convert = unknown_toUtf8;
e->normal.enc.utf16Convert = unknown_toUtf16;
return &(e->normal.enc);
}
1.1 apache-1.3/src/lib/expat/xmltok.h
Index: xmltok.h
===================================================================
/*
The contents of this file are subject to the Mozilla Public License
Version 1.0 (the "License"); you may not use this file except in
compliance with the License. You may obtain a copy of the License at
http://www.mozilla.org/MPL/
Software distributed under the License is distributed on an "AS IS"
basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the
License for the specific language governing rights and limitations
under the License.
The Original Code is expat.
The Initial Developer of the Original Code is James Clark.
Portions created by James Clark are Copyright (C) 1998
James Clark. All Rights Reserved.
Contributor(s):
*/
#ifndef XmlTok_INCLUDED
#define XmlTok_INCLUDED 1
#ifdef __cplusplus
extern "C" {
#endif
#ifndef XMLTOKAPI
#define XMLTOKAPI /* as nothing */
#endif
/* The following token may be returned by XmlContentTok */
#define XML_TOK_TRAILING_RSQB -5 /* ] or ]] at the end of the scan; might be start of
illegal ]]> sequence */
/* The following tokens may be returned by both XmlPrologTok and XmlContentTok */
#define XML_TOK_NONE -4 /* The string to be scanned is empty */
#define XML_TOK_TRAILING_CR -3 /* A CR at the end of the scan;
might be part of CRLF sequence */
#define XML_TOK_PARTIAL_CHAR -2 /* only part of a multibyte sequence */
#define XML_TOK_PARTIAL -1 /* only part of a token */
#define XML_TOK_INVALID 0
/* The following tokens are returned by XmlContentTok; some are also
returned by XmlAttributeValueTok, XmlEntityTok, XmlCdataSectionTok */
#define XML_TOK_START_TAG_WITH_ATTS 1
#define XML_TOK_START_TAG_NO_ATTS 2
#define XML_TOK_EMPTY_ELEMENT_WITH_ATTS 3 /* empty element tag <e/> */
#define XML_TOK_EMPTY_ELEMENT_NO_ATTS 4
#define XML_TOK_END_TAG 5
#define XML_TOK_DATA_CHARS 6
#define XML_TOK_DATA_NEWLINE 7
#define XML_TOK_CDATA_SECT_OPEN 8
#define XML_TOK_ENTITY_REF 9
#define XML_TOK_CHAR_REF 10 /* numeric character reference */
/* The following tokens may be returned by both XmlPrologTok and XmlContentTok */
#define XML_TOK_PI 11 /* processing instruction */
#define XML_TOK_XML_DECL 12 /* XML decl or text decl */
#define XML_TOK_COMMENT 13
#define XML_TOK_BOM 14 /* Byte order mark */
/* The following tokens are returned only by XmlPrologTok */
#define XML_TOK_PROLOG_S 15
#define XML_TOK_DECL_OPEN 16 /* <!foo */
#define XML_TOK_DECL_CLOSE 17 /* > */
#define XML_TOK_NAME 18
#define XML_TOK_NMTOKEN 19
#define XML_TOK_POUND_NAME 20 /* #name */
#define XML_TOK_OR 21 /* | */
#define XML_TOK_PERCENT 22
#define XML_TOK_OPEN_PAREN 23
#define XML_TOK_CLOSE_PAREN 24
#define XML_TOK_OPEN_BRACKET 25
#define XML_TOK_CLOSE_BRACKET 26
#define XML_TOK_LITERAL 27
#define XML_TOK_PARAM_ENTITY_REF 28
#define XML_TOK_INSTANCE_START 29
/* The following occur only in element type declarations */
#define XML_TOK_NAME_QUESTION 30 /* name? */
#define XML_TOK_NAME_ASTERISK 31 /* name* */
#define XML_TOK_NAME_PLUS 32 /* name+ */
#define XML_TOK_COND_SECT_OPEN 33 /* <![ */
#define XML_TOK_COND_SECT_CLOSE 34 /* ]]> */
#define XML_TOK_CLOSE_PAREN_QUESTION 35 /* )? */
#define XML_TOK_CLOSE_PAREN_ASTERISK 36 /* )* */
#define XML_TOK_CLOSE_PAREN_PLUS 37 /* )+ */
#define XML_TOK_COMMA 38
/* The following token is returned only by XmlAttributeValueTok */
#define XML_TOK_ATTRIBUTE_VALUE_S 39
/* The following token is returned only by XmlCdataSectionTok */
#define XML_TOK_CDATA_SECT_CLOSE 40
#define XML_N_STATES 3
#define XML_PROLOG_STATE 0
#define XML_CONTENT_STATE 1
#define XML_CDATA_SECTION_STATE 2
#define XML_N_LITERAL_TYPES 2
#define XML_ATTRIBUTE_VALUE_LITERAL 0
#define XML_ENTITY_VALUE_LITERAL 1
/* The size of the buffer passed to XmlUtf8Encode must be at least this. */
#define XML_UTF8_ENCODE_MAX 4
/* The size of the buffer passed to XmlUtf16Encode must be at least this. */
#define XML_UTF16_ENCODE_MAX 2
typedef struct position {
/* first line and first column are 0 not 1 */
unsigned long lineNumber;
unsigned long columnNumber;
} POSITION;
typedef struct {
const char *name;
const char *valuePtr;
const char *valueEnd;
char normalized;
} ATTRIBUTE;
struct encoding;
typedef struct encoding ENCODING;
struct encoding {
int (*scanners[XML_N_STATES])(const ENCODING *,
const char *,
const char *,
const char **);
int (*literalScanners[XML_N_LITERAL_TYPES])(const ENCODING *,
const char *,
const char *,
const char **);
int (*sameName)(const ENCODING *,
const char *, const char *);
int (*nameMatchesAscii)(const ENCODING *,
const char *, const char *);
int (*nameLength)(const ENCODING *, const char *);
const char *(*skipS)(const ENCODING *, const char *);
int (*getAtts)(const ENCODING *enc, const char *ptr,
int attsMax, ATTRIBUTE *atts);
int (*charRefNumber)(const ENCODING *enc, const char *ptr);
int (*predefinedEntityName)(const ENCODING *, const char *, const char *);
void (*updatePosition)(const ENCODING *,
const char *ptr,
const char *end,
POSITION *);
int (*isPublicId)(const ENCODING *enc, const char *ptr, const char *end,
const char **badPtr);
void (*utf8Convert)(const ENCODING *enc,
const char **fromP,
const char *fromLim,
char **toP,
const char *toLim);
void (*utf16Convert)(const ENCODING *enc,
const char **fromP,
const char *fromLim,
unsigned short **toP,
const unsigned short *toLim);
int minBytesPerChar;
char isUtf8;
char isUtf16;
};
/*
Scan the string starting at ptr until the end of the next complete token,
but do not scan past eptr. Return an integer giving the type of token.
Return XML_TOK_NONE when ptr == eptr; nextTokPtr will not be set.
Return XML_TOK_PARTIAL when the string does not contain a complete token;
nextTokPtr will not be set.
Return XML_TOK_INVALID when the string does not start a valid token; nextTokPtr
will be set to point to the character which made the token invalid.
Otherwise the string starts with a valid token; nextTokPtr will be set to point
to the character following the end of that token.
Each data character counts as a single token, but adjacent data characters
may be returned together. Similarly for characters in the prolog outside
literals, comments and processing instructions.
*/
#define XmlTok(enc, state, ptr, end, nextTokPtr) \
(((enc)->scanners[state])(enc, ptr, end, nextTokPtr))
#define XmlPrologTok(enc, ptr, end, nextTokPtr) \
XmlTok(enc, XML_PROLOG_STATE, ptr, end, nextTokPtr)
#define XmlContentTok(enc, ptr, end, nextTokPtr) \
XmlTok(enc, XML_CONTENT_STATE, ptr, end, nextTokPtr)
#define XmlCdataSectionTok(enc, ptr, end, nextTokPtr) \
XmlTok(enc, XML_CDATA_SECTION_STATE, ptr, end, nextTokPtr)
/* This is used for performing a 2nd-level tokenization on
the content of a literal that has already been returned by XmlTok. */
#define XmlLiteralTok(enc, literalType, ptr, end, nextTokPtr) \
(((enc)->literalScanners[literalType])(enc, ptr, end, nextTokPtr))
#define XmlAttributeValueTok(enc, ptr, end, nextTokPtr) \
XmlLiteralTok(enc, XML_ATTRIBUTE_VALUE_LITERAL, ptr, end, nextTokPtr)
#define XmlEntityValueTok(enc, ptr, end, nextTokPtr) \
XmlLiteralTok(enc, XML_ENTITY_VALUE_LITERAL, ptr, end, nextTokPtr)
#define XmlSameName(enc, ptr1, ptr2) (((enc)->sameName)(enc, ptr1, ptr2))
#define XmlNameMatchesAscii(enc, ptr1, ptr2) \
(((enc)->nameMatchesAscii)(enc, ptr1, ptr2))
#define XmlNameLength(enc, ptr) \
(((enc)->nameLength)(enc, ptr))
#define XmlSkipS(enc, ptr) \
(((enc)->skipS)(enc, ptr))
#define XmlGetAttributes(enc, ptr, attsMax, atts) \
(((enc)->getAtts)(enc, ptr, attsMax, atts))
#define XmlCharRefNumber(enc, ptr) \
(((enc)->charRefNumber)(enc, ptr))
#define XmlPredefinedEntityName(enc, ptr, end) \
(((enc)->predefinedEntityName)(enc, ptr, end))
#define XmlUpdatePosition(enc, ptr, end, pos) \
(((enc)->updatePosition)(enc, ptr, end, pos))
#define XmlIsPublicId(enc, ptr, end, badPtr) \
(((enc)->isPublicId)(enc, ptr, end, badPtr))
#define XmlUtf8Convert(enc, fromP, fromLim, toP, toLim) \
(((enc)->utf8Convert)(enc, fromP, fromLim, toP, toLim))
#define XmlUtf16Convert(enc, fromP, fromLim, toP, toLim) \
(((enc)->utf16Convert)(enc, fromP, fromLim, toP, toLim))
typedef struct {
ENCODING initEnc;
const ENCODING **encPtr;
} INIT_ENCODING;
int XMLTOKAPI XmlParseXmlDecl(int isGeneralTextEntity,
const ENCODING *enc,
const char *ptr,
const char *end,
const char **badPtr,
const char **versionPtr,
const char **encodingNamePtr,
const ENCODING **namedEncodingPtr,
int *standalonePtr);
int XMLTOKAPI XmlInitEncoding(INIT_ENCODING *, const ENCODING **, const char *name);
const ENCODING XMLTOKAPI *XmlGetUtf8InternalEncoding();
const ENCODING XMLTOKAPI *XmlGetUtf16InternalEncoding();
int XMLTOKAPI XmlUtf8Encode(int charNumber, char *buf);
int XMLTOKAPI XmlUtf16Encode(int charNumber, unsigned short *buf);
int XMLTOKAPI XmlSizeOfUnknownEncoding();
ENCODING XMLTOKAPI *
XmlInitUnknownEncoding(void *mem,
int *table,
int (*convert)(void *userData, const char *p),
void *userData);
#ifdef __cplusplus
}
#endif
#endif /* not XmlTok_INCLUDED */
1.1 apache-1.3/src/lib/expat/xmltok_impl.c
Index: xmltok_impl.c
===================================================================
/*
The contents of this file are subject to the Mozilla Public License
Version 1.0 (the "License"); you may not use this file except in
compliance with the License. You may obtain a copy of the License at
http://www.mozilla.org/MPL/
Software distributed under the License is distributed on an "AS IS"
basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the
License for the specific language governing rights and limitations
under the License.
The Original Code is expat.
The Initial Developer of the Original Code is James Clark.
Portions created by James Clark are Copyright (C) 1998
James Clark. All Rights Reserved.
Contributor(s):
*/
#ifndef IS_INVALID_CHAR
#define IS_INVALID_CHAR(enc, ptr, n) (0)
#endif
#define INVALID_LEAD_CASE(n, ptr, nextTokPtr) \
case BT_LEAD ## n: \
if (end - ptr < n) \
return XML_TOK_PARTIAL_CHAR; \
if (IS_INVALID_CHAR(enc, ptr, n)) { \
*(nextTokPtr) = (ptr); \
return XML_TOK_INVALID; \
} \
ptr += n; \
break;
#define INVALID_CASES(ptr, nextTokPtr) \
INVALID_LEAD_CASE(2, ptr, nextTokPtr) \
INVALID_LEAD_CASE(3, ptr, nextTokPtr) \
INVALID_LEAD_CASE(4, ptr, nextTokPtr) \
case BT_NONXML: \
case BT_MALFORM: \
case BT_TRAIL: \
*(nextTokPtr) = (ptr); \
return XML_TOK_INVALID;
#define CHECK_NAME_CASE(n, enc, ptr, end, nextTokPtr) \
case BT_LEAD ## n: \
if (end - ptr < n) \
return XML_TOK_PARTIAL_CHAR; \
if (!IS_NAME_CHAR(enc, ptr, n)) { \
*nextTokPtr = ptr; \
return XML_TOK_INVALID; \
} \
ptr += n; \
break;
#define CHECK_NAME_CASES(enc, ptr, end, nextTokPtr) \
case BT_NONASCII: \
if (!IS_NAME_CHAR_MINBPC(enc, ptr)) { \
*nextTokPtr = ptr; \
return XML_TOK_INVALID; \
} \
case BT_NMSTRT: \
case BT_HEX: \
case BT_DIGIT: \
case BT_NAME: \
case BT_MINUS: \
ptr += MINBPC; \
break; \
CHECK_NAME_CASE(2, enc, ptr, end, nextTokPtr) \
CHECK_NAME_CASE(3, enc, ptr, end, nextTokPtr) \
CHECK_NAME_CASE(4, enc, ptr, end, nextTokPtr)
#define CHECK_NMSTRT_CASE(n, enc, ptr, end, nextTokPtr) \
case BT_LEAD ## n: \
if (end - ptr < n) \
return XML_TOK_PARTIAL_CHAR; \
if (!IS_NMSTRT_CHAR(enc, ptr, n)) { \
*nextTokPtr = ptr; \
return XML_TOK_INVALID; \
} \
ptr += n; \
break;
#define CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr) \
case BT_NONASCII: \
if (!IS_NMSTRT_CHAR_MINBPC(enc, ptr)) { \
*nextTokPtr = ptr; \
return XML_TOK_INVALID; \
} \
case BT_NMSTRT: \
case BT_HEX: \
ptr += MINBPC; \
break; \
CHECK_NMSTRT_CASE(2, enc, ptr, end, nextTokPtr) \
CHECK_NMSTRT_CASE(3, enc, ptr, end, nextTokPtr) \
CHECK_NMSTRT_CASE(4, enc, ptr, end, nextTokPtr)
#ifndef PREFIX
#define PREFIX(ident) ident
#endif
/* ptr points to character following "<!-" */
static
int PREFIX(scanComment)(const ENCODING *enc, const char *ptr, const char *end,
const char **nextTokPtr)
{
if (ptr != end) {
if (!CHAR_MATCHES(enc, ptr, '-')) {
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
ptr += MINBPC;
while (ptr != end) {
switch (BYTE_TYPE(enc, ptr)) {
INVALID_CASES(ptr, nextTokPtr)
case BT_MINUS:
if ((ptr += MINBPC) == end)
return XML_TOK_PARTIAL;
if (CHAR_MATCHES(enc, ptr, '-')) {
if ((ptr += MINBPC) == end)
return XML_TOK_PARTIAL;
if (!CHAR_MATCHES(enc, ptr, '>')) {
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
*nextTokPtr = ptr + MINBPC;
return XML_TOK_COMMENT;
}
break;
default:
ptr += MINBPC;
break;
}
}
}
return XML_TOK_PARTIAL;
}
/* ptr points to character following "<!" */
static
int PREFIX(scanDecl)(const ENCODING *enc, const char *ptr, const char *end,
const char **nextTokPtr)
{
if (ptr == end)
return XML_TOK_PARTIAL;
switch (BYTE_TYPE(enc, ptr)) {
case BT_MINUS:
return PREFIX(scanComment)(enc, ptr + MINBPC, end, nextTokPtr);
case BT_LSQB:
*nextTokPtr = ptr + MINBPC;
return XML_TOK_COND_SECT_OPEN;
case BT_NMSTRT:
case BT_HEX:
ptr += MINBPC;
break;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
while (ptr != end) {
switch (BYTE_TYPE(enc, ptr)) {
case BT_PERCNT:
if (ptr + MINBPC == end)
return XML_TOK_PARTIAL;
/* don't allow <!ENTITY% foo "whatever"> */
switch (BYTE_TYPE(enc, ptr + MINBPC)) {
case BT_S: case BT_CR: case BT_LF: case BT_PERCNT:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
/* fall through */
case BT_S: case BT_CR: case BT_LF:
*nextTokPtr = ptr;
return XML_TOK_DECL_OPEN;
case BT_NMSTRT:
case BT_HEX:
ptr += MINBPC;
break;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
}
return XML_TOK_PARTIAL;
}
static
int PREFIX(checkPiTarget)(const ENCODING *enc, const char *ptr, const char *end, int *tokPtr)
{
int upper = 0;
*tokPtr = XML_TOK_PI;
if (end - ptr != MINBPC*3)
return 1;
switch (BYTE_TO_ASCII(enc, ptr)) {
case 'x':
break;
case 'X':
upper = 1;
break;
default:
return 1;
}
ptr += MINBPC;
switch (BYTE_TO_ASCII(enc, ptr)) {
case 'm':
break;
case 'M':
upper = 1;
break;
default:
return 1;
}
ptr += MINBPC;
switch (BYTE_TO_ASCII(enc, ptr)) {
case 'l':
break;
case 'L':
upper = 1;
break;
default:
return 1;
}
if (upper)
return 0;
*tokPtr = XML_TOK_XML_DECL;
return 1;
}
/* ptr points to character following "<?" */
static
int PREFIX(scanPi)(const ENCODING *enc, const char *ptr, const char *end,
const char **nextTokPtr)
{
int tok;
const char *target = ptr;
if (ptr == end)
return XML_TOK_PARTIAL;
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr)
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
while (ptr != end) {
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NAME_CASES(enc, ptr, end, nextTokPtr)
case BT_S: case BT_CR: case BT_LF:
if (!PREFIX(checkPiTarget)(enc, target, ptr, &tok)) {
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
ptr += MINBPC;
while (ptr != end) {
switch (BYTE_TYPE(enc, ptr)) {
INVALID_CASES(ptr, nextTokPtr)
case BT_QUEST:
ptr += MINBPC;
if (ptr == end)
return XML_TOK_PARTIAL;
if (CHAR_MATCHES(enc, ptr, '>')) {
*nextTokPtr = ptr + MINBPC;
return tok;
}
break;
default:
ptr += MINBPC;
break;
}
}
return XML_TOK_PARTIAL;
case BT_QUEST:
if (!PREFIX(checkPiTarget)(enc, target, ptr, &tok)) {
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
ptr += MINBPC;
if (ptr == end)
return XML_TOK_PARTIAL;
if (CHAR_MATCHES(enc, ptr, '>')) {
*nextTokPtr = ptr + MINBPC;
return tok;
}
/* fall through */
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
}
return XML_TOK_PARTIAL;
}
static
int PREFIX(scanCdataSection)(const ENCODING *enc, const char *ptr, const char *end,
const char **nextTokPtr)
{
int i;
/* CDATA[ */
if (end - ptr < 6 * MINBPC)
return XML_TOK_PARTIAL;
for (i = 0; i < 6; i++, ptr += MINBPC) {
if (!CHAR_MATCHES(enc, ptr, "CDATA["[i])) {
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
}
*nextTokPtr = ptr;
return XML_TOK_CDATA_SECT_OPEN;
}
static
int PREFIX(cdataSectionTok)(const ENCODING *enc, const char *ptr, const char *end,
const char **nextTokPtr)
{
if (ptr == end)
return XML_TOK_NONE;
#if MINBPC > 1
{
size_t n = end - ptr;
if (n & (MINBPC - 1)) {
n &= ~(MINBPC - 1);
if (n == 0)
return XML_TOK_PARTIAL;
end = ptr + n;
}
}
#endif
switch (BYTE_TYPE(enc, ptr)) {
case BT_RSQB:
ptr += MINBPC;
if (ptr == end)
return XML_TOK_PARTIAL;
if (!CHAR_MATCHES(enc, ptr, ']'))
break;
ptr += MINBPC;
if (ptr == end)
return XML_TOK_PARTIAL;
if (!CHAR_MATCHES(enc, ptr, '>')) {
ptr -= MINBPC;
break;
}
*nextTokPtr = ptr + MINBPC;
return XML_TOK_CDATA_SECT_CLOSE;
case BT_CR:
ptr += MINBPC;
if (ptr == end)
return XML_TOK_PARTIAL;
if (BYTE_TYPE(enc, ptr) == BT_LF)
ptr += MINBPC;
*nextTokPtr = ptr;
return XML_TOK_DATA_NEWLINE;
case BT_LF:
*nextTokPtr = ptr + MINBPC;
return XML_TOK_DATA_NEWLINE;
INVALID_CASES(ptr, nextTokPtr)
default:
ptr += MINBPC;
break;
}
while (ptr != end) {
switch (BYTE_TYPE(enc, ptr)) {
#define LEAD_CASE(n) \
case BT_LEAD ## n: \
if (end - ptr < n || IS_INVALID_CHAR(enc, ptr, n)) { \
*nextTokPtr = ptr; \
return XML_TOK_DATA_CHARS; \
} \
ptr += n; \
break;
LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4)
#undef LEAD_CASE
case BT_NONXML:
case BT_MALFORM:
case BT_TRAIL:
case BT_CR:
case BT_LF:
case BT_RSQB:
*nextTokPtr = ptr;
return XML_TOK_DATA_CHARS;
default:
ptr += MINBPC;
break;
}
}
*nextTokPtr = ptr;
return XML_TOK_DATA_CHARS;
}
/* ptr points to character following "</" */
static
int PREFIX(scanEndTag)(const ENCODING *enc, const char *ptr, const char *end,
const char **nextTokPtr)
{
if (ptr == end)
return XML_TOK_PARTIAL;
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr)
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
while (ptr != end) {
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NAME_CASES(enc, ptr, end, nextTokPtr)
case BT_S: case BT_CR: case BT_LF:
for (ptr += MINBPC; ptr != end; ptr += MINBPC) {
switch (BYTE_TYPE(enc, ptr)) {
case BT_S: case BT_CR: case BT_LF:
break;
case BT_GT:
*nextTokPtr = ptr + MINBPC;
return XML_TOK_END_TAG;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
}
return XML_TOK_PARTIAL;
case BT_GT:
*nextTokPtr = ptr + MINBPC;
return XML_TOK_END_TAG;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
}
return XML_TOK_PARTIAL;
}
/* ptr points to character following "&#X" */
static
int PREFIX(scanHexCharRef)(const ENCODING *enc, const char *ptr, const char *end,
const char **nextTokPtr)
{
if (ptr != end) {
switch (BYTE_TYPE(enc, ptr)) {
case BT_DIGIT:
case BT_HEX:
break;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
for (ptr += MINBPC; ptr != end; ptr += MINBPC) {
switch (BYTE_TYPE(enc, ptr)) {
case BT_DIGIT:
case BT_HEX:
break;
case BT_SEMI:
*nextTokPtr = ptr + MINBPC;
return XML_TOK_CHAR_REF;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
}
}
return XML_TOK_PARTIAL;
}
/* ptr points to character following "&#" */
static
int PREFIX(scanCharRef)(const ENCODING *enc, const char *ptr, const char *end,
const char **nextTokPtr)
{
if (ptr != end) {
if (CHAR_MATCHES(enc, ptr, 'x'))
return PREFIX(scanHexCharRef)(enc, ptr + MINBPC, end, nextTokPtr);
switch (BYTE_TYPE(enc, ptr)) {
case BT_DIGIT:
break;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
for (ptr += MINBPC; ptr != end; ptr += MINBPC) {
switch (BYTE_TYPE(enc, ptr)) {
case BT_DIGIT:
break;
case BT_SEMI:
*nextTokPtr = ptr + MINBPC;
return XML_TOK_CHAR_REF;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
}
}
return XML_TOK_PARTIAL;
}
/* ptr points to character following "&" */
static
int PREFIX(scanRef)(const ENCODING *enc, const char *ptr, const char *end,
const char **nextTokPtr)
{
if (ptr == end)
return XML_TOK_PARTIAL;
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr)
case BT_NUM:
return PREFIX(scanCharRef)(enc, ptr + MINBPC, end, nextTokPtr);
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
while (ptr != end) {
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NAME_CASES(enc, ptr, end, nextTokPtr)
case BT_SEMI:
*nextTokPtr = ptr + MINBPC;
return XML_TOK_ENTITY_REF;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
}
return XML_TOK_PARTIAL;
}
/* ptr points to character following first character of attribute name */
static
int PREFIX(scanAtts)(const ENCODING *enc, const char *ptr, const char *end,
const char **nextTokPtr)
{
while (ptr != end) {
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NAME_CASES(enc, ptr, end, nextTokPtr)
case BT_S: case BT_CR: case BT_LF:
for (;;) {
int t;
ptr += MINBPC;
if (ptr == end)
return XML_TOK_PARTIAL;
t = BYTE_TYPE(enc, ptr);
if (t == BT_EQUALS)
break;
switch (t) {
case BT_S:
case BT_LF:
case BT_CR:
break;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
}
/* fall through */
case BT_EQUALS:
{
int open;
for (;;) {
ptr += MINBPC;
if (ptr == end)
return XML_TOK_PARTIAL;
open = BYTE_TYPE(enc, ptr);
if (open == BT_QUOT || open == BT_APOS)
break;
switch (open) {
case BT_S:
case BT_LF:
case BT_CR:
break;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
}
ptr += MINBPC;
/* in attribute value */
for (;;) {
int t;
if (ptr == end)
return XML_TOK_PARTIAL;
t = BYTE_TYPE(enc, ptr);
if (t == open)
break;
switch (t) {
INVALID_CASES(ptr, nextTokPtr)
case BT_AMP:
{
int tok = PREFIX(scanRef)(enc, ptr + MINBPC, end, &ptr);
if (tok <= 0) {
if (tok == XML_TOK_INVALID)
*nextTokPtr = ptr;
return tok;
}
break;
}
case BT_LT:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
default:
ptr += MINBPC;
break;
}
}
ptr += MINBPC;
if (ptr == end)
return XML_TOK_PARTIAL;
switch (BYTE_TYPE(enc, ptr)) {
case BT_S:
case BT_CR:
case BT_LF:
break;
case BT_SOL:
goto sol;
case BT_GT:
goto gt;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
/* ptr points to closing quote */
for (;;) {
ptr += MINBPC;
if (ptr == end)
return XML_TOK_PARTIAL;
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr)
case BT_S: case BT_CR: case BT_LF:
continue;
case BT_GT:
gt:
*nextTokPtr = ptr + MINBPC;
return XML_TOK_START_TAG_WITH_ATTS;
case BT_SOL:
sol:
ptr += MINBPC;
if (ptr == end)
return XML_TOK_PARTIAL;
if (!CHAR_MATCHES(enc, ptr, '>')) {
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
*nextTokPtr = ptr + MINBPC;
return XML_TOK_EMPTY_ELEMENT_WITH_ATTS;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
break;
}
break;
}
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
}
return XML_TOK_PARTIAL;
}
/* ptr points to character following "<" */
static
int PREFIX(scanLt)(const ENCODING *enc, const char *ptr, const char *end,
const char **nextTokPtr)
{
if (ptr == end)
return XML_TOK_PARTIAL;
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr)
case BT_EXCL:
if ((ptr += MINBPC) == end)
return XML_TOK_PARTIAL;
switch (BYTE_TYPE(enc, ptr)) {
case BT_MINUS:
return PREFIX(scanComment)(enc, ptr + MINBPC, end, nextTokPtr);
case BT_LSQB:
return PREFIX(scanCdataSection)(enc, ptr + MINBPC, end, nextTokPtr);
}
*nextTokPtr = ptr;
return XML_TOK_INVALID;
case BT_QUEST:
return PREFIX(scanPi)(enc, ptr + MINBPC, end, nextTokPtr);
case BT_SOL:
return PREFIX(scanEndTag)(enc, ptr + MINBPC, end, nextTokPtr);
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
/* we have a start-tag */
while (ptr != end) {
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NAME_CASES(enc, ptr, end, nextTokPtr)
case BT_S: case BT_CR: case BT_LF:
{
ptr += MINBPC;
while (ptr != end) {
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr)
case BT_GT:
goto gt;
case BT_SOL:
goto sol;
case BT_S: case BT_CR: case BT_LF:
ptr += MINBPC;
continue;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
return PREFIX(scanAtts)(enc, ptr, end, nextTokPtr);
}
return XML_TOK_PARTIAL;
}
case BT_GT:
gt:
*nextTokPtr = ptr + MINBPC;
return XML_TOK_START_TAG_NO_ATTS;
case BT_SOL:
sol:
ptr += MINBPC;
if (ptr == end)
return XML_TOK_PARTIAL;
if (!CHAR_MATCHES(enc, ptr, '>')) {
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
*nextTokPtr = ptr + MINBPC;
return XML_TOK_EMPTY_ELEMENT_NO_ATTS;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
}
return XML_TOK_PARTIAL;
}
static
int PREFIX(contentTok)(const ENCODING *enc, const char *ptr, const char *end,
const char **nextTokPtr)
{
if (ptr == end)
return XML_TOK_NONE;
#if MINBPC > 1
{
size_t n = end - ptr;
if (n & (MINBPC - 1)) {
n &= ~(MINBPC - 1);
if (n == 0)
return XML_TOK_PARTIAL;
end = ptr + n;
}
}
#endif
switch (BYTE_TYPE(enc, ptr)) {
case BT_LT:
return PREFIX(scanLt)(enc, ptr + MINBPC, end, nextTokPtr);
case BT_AMP:
return PREFIX(scanRef)(enc, ptr + MINBPC, end, nextTokPtr);
case BT_CR:
ptr += MINBPC;
if (ptr == end)
return XML_TOK_TRAILING_CR;
if (BYTE_TYPE(enc, ptr) == BT_LF)
ptr += MINBPC;
*nextTokPtr = ptr;
return XML_TOK_DATA_NEWLINE;
case BT_LF:
*nextTokPtr = ptr + MINBPC;
return XML_TOK_DATA_NEWLINE;
case BT_RSQB:
ptr += MINBPC;
if (ptr == end)
return XML_TOK_TRAILING_RSQB;
if (!CHAR_MATCHES(enc, ptr, ']'))
break;
ptr += MINBPC;
if (ptr == end)
return XML_TOK_TRAILING_RSQB;
if (!CHAR_MATCHES(enc, ptr, '>')) {
ptr -= MINBPC;
break;
}
*nextTokPtr = ptr;
return XML_TOK_INVALID;
INVALID_CASES(ptr, nextTokPtr)
default:
ptr += MINBPC;
break;
}
while (ptr != end) {
switch (BYTE_TYPE(enc, ptr)) {
#define LEAD_CASE(n) \
case BT_LEAD ## n: \
if (end - ptr < n || IS_INVALID_CHAR(enc, ptr, n)) { \
*nextTokPtr = ptr; \
return XML_TOK_DATA_CHARS; \
} \
ptr += n; \
break;
LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4)
#undef LEAD_CASE
case BT_RSQB:
if (ptr + MINBPC != end) {
if (!CHAR_MATCHES(enc, ptr + MINBPC, ']')) {
ptr += MINBPC;
break;
}
if (ptr + 2*MINBPC != end) {
if (!CHAR_MATCHES(enc, ptr + 2*MINBPC, '>')) {
ptr += MINBPC;
break;
}
*nextTokPtr = ptr + 2*MINBPC;
return XML_TOK_INVALID;
}
}
/* fall through */
case BT_AMP:
case BT_LT:
case BT_NONXML:
case BT_MALFORM:
case BT_TRAIL:
case BT_CR:
case BT_LF:
*nextTokPtr = ptr;
return XML_TOK_DATA_CHARS;
default:
ptr += MINBPC;
break;
}
}
*nextTokPtr = ptr;
return XML_TOK_DATA_CHARS;
}
/* ptr points to character following "%" */
static
int PREFIX(scanPercent)(const ENCODING *enc, const char *ptr, const char *end,
const char **nextTokPtr)
{
if (ptr == end)
return XML_TOK_PARTIAL;
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr)
case BT_S: case BT_LF: case BT_CR: case BT_PERCNT:
*nextTokPtr = ptr;
return XML_TOK_PERCENT;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
while (ptr != end) {
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NAME_CASES(enc, ptr, end, nextTokPtr)
case BT_SEMI:
*nextTokPtr = ptr + MINBPC;
return XML_TOK_PARAM_ENTITY_REF;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
}
return XML_TOK_PARTIAL;
}
static
int PREFIX(scanPoundName)(const ENCODING *enc, const char *ptr, const char *end,
const char **nextTokPtr)
{
if (ptr == end)
return XML_TOK_PARTIAL;
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr)
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
while (ptr != end) {
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NAME_CASES(enc, ptr, end, nextTokPtr)
case BT_CR: case BT_LF: case BT_S:
case BT_RPAR: case BT_GT: case BT_PERCNT: case BT_VERBAR:
*nextTokPtr = ptr;
return XML_TOK_POUND_NAME;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
}
return XML_TOK_PARTIAL;
}
static
int PREFIX(scanLit)(int open, const ENCODING *enc,
const char *ptr, const char *end,
const char **nextTokPtr)
{
while (ptr != end) {
int t = BYTE_TYPE(enc, ptr);
switch (t) {
INVALID_CASES(ptr, nextTokPtr)
case BT_QUOT:
case BT_APOS:
ptr += MINBPC;
if (t != open)
break;
if (ptr == end)
return XML_TOK_PARTIAL;
*nextTokPtr = ptr;
switch (BYTE_TYPE(enc, ptr)) {
case BT_S: case BT_CR: case BT_LF:
case BT_GT: case BT_PERCNT: case BT_LSQB:
return XML_TOK_LITERAL;
default:
return XML_TOK_INVALID;
}
default:
ptr += MINBPC;
break;
}
}
return XML_TOK_PARTIAL;
}
static
int PREFIX(prologTok)(const ENCODING *enc, const char *ptr, const char *end,
const char **nextTokPtr)
{
int tok;
if (ptr == end)
return XML_TOK_NONE;
#if MINBPC > 1
{
size_t n = end - ptr;
if (n & (MINBPC - 1)) {
n &= ~(MINBPC - 1);
if (n == 0)
return XML_TOK_PARTIAL;
end = ptr + n;
}
}
#endif
switch (BYTE_TYPE(enc, ptr)) {
case BT_QUOT:
return PREFIX(scanLit)(BT_QUOT, enc, ptr + MINBPC, end, nextTokPtr);
case BT_APOS:
return PREFIX(scanLit)(BT_APOS, enc, ptr + MINBPC, end, nextTokPtr);
case BT_LT:
{
ptr += MINBPC;
if (ptr == end)
return XML_TOK_PARTIAL;
switch (BYTE_TYPE(enc, ptr)) {
case BT_EXCL:
return PREFIX(scanDecl)(enc, ptr + MINBPC, end, nextTokPtr);
case BT_QUEST:
return PREFIX(scanPi)(enc, ptr + MINBPC, end, nextTokPtr);
case BT_NMSTRT:
case BT_HEX:
case BT_NONASCII:
case BT_LEAD2:
case BT_LEAD3:
case BT_LEAD4:
*nextTokPtr = ptr - MINBPC;
return XML_TOK_INSTANCE_START;
}
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
case BT_CR:
if (ptr + MINBPC == end)
return XML_TOK_TRAILING_CR;
/* fall through */
case BT_S: case BT_LF:
for (;;) {
ptr += MINBPC;
if (ptr == end)
break;
switch (BYTE_TYPE(enc, ptr)) {
case BT_S: case BT_LF:
break;
case BT_CR:
/* don't split CR/LF pair */
if (ptr + MINBPC != end)
break;
/* fall through */
default:
*nextTokPtr = ptr;
return XML_TOK_PROLOG_S;
}
}
*nextTokPtr = ptr;
return XML_TOK_PROLOG_S;
case BT_PERCNT:
return PREFIX(scanPercent)(enc, ptr + MINBPC, end, nextTokPtr);
case BT_COMMA:
*nextTokPtr = ptr + MINBPC;
return XML_TOK_COMMA;
case BT_LSQB:
*nextTokPtr = ptr + MINBPC;
return XML_TOK_OPEN_BRACKET;
case BT_RSQB:
ptr += MINBPC;
if (ptr == end)
return XML_TOK_PARTIAL;
if (CHAR_MATCHES(enc, ptr, ']')) {
if (ptr + MINBPC == end)
return XML_TOK_PARTIAL;
if (CHAR_MATCHES(enc, ptr + MINBPC, '>')) {
*nextTokPtr = ptr + 2*MINBPC;
return XML_TOK_COND_SECT_CLOSE;
}
}
*nextTokPtr = ptr;
return XML_TOK_CLOSE_BRACKET;
case BT_LPAR:
*nextTokPtr = ptr + MINBPC;
return XML_TOK_OPEN_PAREN;
case BT_RPAR:
ptr += MINBPC;
if (ptr == end)
return XML_TOK_PARTIAL;
switch (BYTE_TYPE(enc, ptr)) {
case BT_AST:
*nextTokPtr = ptr + MINBPC;
return XML_TOK_CLOSE_PAREN_ASTERISK;
case BT_QUEST:
*nextTokPtr = ptr + MINBPC;
return XML_TOK_CLOSE_PAREN_QUESTION;
case BT_PLUS:
*nextTokPtr = ptr + MINBPC;
return XML_TOK_CLOSE_PAREN_PLUS;
case BT_CR: case BT_LF: case BT_S:
case BT_GT: case BT_COMMA: case BT_VERBAR:
case BT_RPAR:
*nextTokPtr = ptr;
return XML_TOK_CLOSE_PAREN;
}
*nextTokPtr = ptr;
return XML_TOK_INVALID;
case BT_VERBAR:
*nextTokPtr = ptr + MINBPC;
return XML_TOK_OR;
case BT_GT:
*nextTokPtr = ptr + MINBPC;
return XML_TOK_DECL_CLOSE;
case BT_NUM:
return PREFIX(scanPoundName)(enc, ptr + MINBPC, end, nextTokPtr);
#define LEAD_CASE(n) \
case BT_LEAD ## n: \
if (end - ptr < n) \
return XML_TOK_PARTIAL_CHAR; \
if (IS_NMSTRT_CHAR(enc, ptr, n)) { \
ptr += n; \
tok = XML_TOK_NAME; \
break; \
} \
if (IS_NAME_CHAR(enc, ptr, n)) { \
ptr += n; \
tok = XML_TOK_NMTOKEN; \
break; \
} \
*nextTokPtr = ptr; \
return XML_TOK_INVALID;
LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4)
#undef LEAD_CASE
case BT_NMSTRT:
case BT_HEX:
tok = XML_TOK_NAME;
ptr += MINBPC;
break;
case BT_DIGIT:
case BT_NAME:
case BT_MINUS:
tok = XML_TOK_NMTOKEN;
ptr += MINBPC;
break;
case BT_NONASCII:
if (IS_NMSTRT_CHAR_MINBPC(enc, ptr)) {
ptr += MINBPC;
tok = XML_TOK_NAME;
break;
}
if (IS_NAME_CHAR_MINBPC(enc, ptr)) {
ptr += MINBPC;
tok = XML_TOK_NMTOKEN;
break;
}
/* fall through */
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
while (ptr != end) {
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NAME_CASES(enc, ptr, end, nextTokPtr)
case BT_GT: case BT_RPAR: case BT_COMMA:
case BT_VERBAR: case BT_LSQB: case BT_PERCNT:
case BT_S: case BT_CR: case BT_LF:
*nextTokPtr = ptr;
return tok;
case BT_PLUS:
if (tok != XML_TOK_NAME) {
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
*nextTokPtr = ptr + MINBPC;
return XML_TOK_NAME_PLUS;
case BT_AST:
if (tok != XML_TOK_NAME) {
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
*nextTokPtr = ptr + MINBPC;
return XML_TOK_NAME_ASTERISK;
case BT_QUEST:
if (tok != XML_TOK_NAME) {
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
*nextTokPtr = ptr + MINBPC;
return XML_TOK_NAME_QUESTION;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
}
return XML_TOK_PARTIAL;
}
static
int PREFIX(attributeValueTok)(const ENCODING *enc, const char *ptr, const char *end,
const char **nextTokPtr)
{
const char *start;
if (ptr == end)
return XML_TOK_NONE;
start = ptr;
while (ptr != end) {
switch (BYTE_TYPE(enc, ptr)) {
#define LEAD_CASE(n) \
case BT_LEAD ## n: ptr += n; break;
LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4)
#undef LEAD_CASE
case BT_AMP:
if (ptr == start)
return PREFIX(scanRef)(enc, ptr + MINBPC, end, nextTokPtr);
*nextTokPtr = ptr;
return XML_TOK_DATA_CHARS;
case BT_LT:
/* this is for inside entity references */
*nextTokPtr = ptr;
return XML_TOK_INVALID;
case BT_LF:
if (ptr == start) {
*nextTokPtr = ptr + MINBPC;
return XML_TOK_DATA_NEWLINE;
}
*nextTokPtr = ptr;
return XML_TOK_DATA_CHARS;
case BT_CR:
if (ptr == start) {
ptr += MINBPC;
if (ptr == end)
return XML_TOK_TRAILING_CR;
if (BYTE_TYPE(enc, ptr) == BT_LF)
ptr += MINBPC;
*nextTokPtr = ptr;
return XML_TOK_DATA_NEWLINE;
}
*nextTokPtr = ptr;
return XML_TOK_DATA_CHARS;
case BT_S:
if (ptr == start) {
*nextTokPtr = ptr + MINBPC;
return XML_TOK_ATTRIBUTE_VALUE_S;
}
*nextTokPtr = ptr;
return XML_TOK_DATA_CHARS;
default:
ptr += MINBPC;
break;
}
}
*nextTokPtr = ptr;
return XML_TOK_DATA_CHARS;
}
static
int PREFIX(entityValueTok)(const ENCODING *enc, const char *ptr, const char *end,
const char **nextTokPtr)
{
const char *start;
if (ptr == end)
return XML_TOK_NONE;
start = ptr;
while (ptr != end) {
switch (BYTE_TYPE(enc, ptr)) {
#define LEAD_CASE(n) \
case BT_LEAD ## n: ptr += n; break;
LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4)
#undef LEAD_CASE
case BT_AMP:
if (ptr == start)
return PREFIX(scanRef)(enc, ptr + MINBPC, end, nextTokPtr);
*nextTokPtr = ptr;
return XML_TOK_DATA_CHARS;
case BT_PERCNT:
if (ptr == start)
return PREFIX(scanPercent)(enc, ptr + MINBPC, end, nextTokPtr);
*nextTokPtr = ptr;
return XML_TOK_DATA_CHARS;
case BT_LF:
if (ptr == start) {
*nextTokPtr = ptr + MINBPC;
return XML_TOK_DATA_NEWLINE;
}
*nextTokPtr = ptr;
return XML_TOK_DATA_CHARS;
case BT_CR:
if (ptr == start) {
ptr += MINBPC;
if (ptr == end)
return XML_TOK_TRAILING_CR;
if (BYTE_TYPE(enc, ptr) == BT_LF)
ptr += MINBPC;
*nextTokPtr = ptr;
return XML_TOK_DATA_NEWLINE;
}
*nextTokPtr = ptr;
return XML_TOK_DATA_CHARS;
default:
ptr += MINBPC;
break;
}
}
*nextTokPtr = ptr;
return XML_TOK_DATA_CHARS;
}
static
int PREFIX(isPublicId)(const ENCODING *enc, const char *ptr, const char *end,
const char **badPtr)
{
ptr += MINBPC;
end -= MINBPC;
for (; ptr != end; ptr += MINBPC) {
switch (BYTE_TYPE(enc, ptr)) {
case BT_DIGIT:
case BT_HEX:
case BT_MINUS:
case BT_APOS:
case BT_LPAR:
case BT_RPAR:
case BT_PLUS:
case BT_COMMA:
case BT_SOL:
case BT_EQUALS:
case BT_QUEST:
case BT_CR:
case BT_LF:
case BT_SEMI:
case BT_EXCL:
case BT_AST:
case BT_PERCNT:
case BT_NUM:
break;
case BT_S:
if (CHAR_MATCHES(enc, ptr, '\t')) {
*badPtr = ptr;
return 0;
}
break;
case BT_NAME:
case BT_NMSTRT:
if (!(BYTE_TO_ASCII(enc, ptr) & ~0x7f))
break;
default:
switch (BYTE_TO_ASCII(enc, ptr)) {
case 0x24: /* $ */
case 0x40: /* @ */
break;
default:
*badPtr = ptr;
return 0;
}
break;
}
}
return 1;
}
/* This must only be called for a well-formed start-tag or empty element tag.
Returns the number of attributes. Pointers to the first attsMax attributes
are stored in atts. */
static
int PREFIX(getAtts)(const ENCODING *enc, const char *ptr,
int attsMax, ATTRIBUTE *atts)
{
enum { other, inName, inValue } state = inName;
int nAtts = 0;
int open;
for (ptr += MINBPC;; ptr += MINBPC) {
switch (BYTE_TYPE(enc, ptr)) {
#define START_NAME \
if (state == other) { \
if (nAtts < attsMax) { \
atts[nAtts].name = ptr; \
atts[nAtts].normalized = 1; \
} \
state = inName; \
}
#define LEAD_CASE(n) \
case BT_LEAD ## n: START_NAME ptr += (n - MINBPC); break;
LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4)
#undef LEAD_CASE
case BT_NONASCII:
case BT_NMSTRT:
case BT_HEX:
START_NAME
break;
#undef START_NAME
case BT_QUOT:
if (state != inValue) {
if (nAtts < attsMax)
atts[nAtts].valuePtr = ptr + MINBPC;
state = inValue;
open = BT_QUOT;
}
else if (open == BT_QUOT) {
state = other;
if (nAtts < attsMax)
atts[nAtts].valueEnd = ptr;
nAtts++;
}
break;
case BT_APOS:
if (state != inValue) {
if (nAtts < attsMax)
atts[nAtts].valuePtr = ptr + MINBPC;
state = inValue;
open = BT_APOS;
}
else if (open == BT_APOS) {
state = other;
if (nAtts < attsMax)
atts[nAtts].valueEnd = ptr;
nAtts++;
}
break;
case BT_AMP:
if (nAtts < attsMax)
atts[nAtts].normalized = 0;
break;
case BT_S:
if (state == inName)
state = other;
else if (state == inValue
&& nAtts < attsMax
&& atts[nAtts].normalized
&& (ptr == atts[nAtts].valuePtr
|| BYTE_TO_ASCII(enc, ptr) != ' '
|| BYTE_TO_ASCII(enc, ptr + MINBPC) == ' '
|| BYTE_TYPE(enc, ptr + MINBPC) == open))
atts[nAtts].normalized = 0;
break;
case BT_CR: case BT_LF:
/* This case ensures that the first attribute name is counted
Apart from that we could just change state on the quote. */
if (state == inName)
state = other;
else if (state == inValue && nAtts < attsMax)
atts[nAtts].normalized = 0;
break;
case BT_GT:
case BT_SOL:
if (state != inValue)
return nAtts;
break;
default:
break;
}
}
/* not reached */
}
static
int PREFIX(charRefNumber)(const ENCODING *enc, const char *ptr)
{
int result = 0;
/* skip &# */
ptr += 2*MINBPC;
if (CHAR_MATCHES(enc, ptr, 'x')) {
for (ptr += MINBPC; !CHAR_MATCHES(enc, ptr, ';'); ptr += MINBPC) {
int c = BYTE_TO_ASCII(enc, ptr);
switch (c) {
case '0': case '1': case '2': case '3': case '4':
case '5': case '6': case '7': case '8': case '9':
result <<= 4;
result |= (c - '0');
break;
case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
result <<= 4;
result += 10 + (c - 'A');
break;
case 'a': case 'b': case 'c': case 'd': case 'e': case 'f':
result <<= 4;
result += 10 + (c - 'a');
break;
}
if (result >= 0x110000)
return -1;
}
}
else {
for (; !CHAR_MATCHES(enc, ptr, ';'); ptr += MINBPC) {
int c = BYTE_TO_ASCII(enc, ptr);
result *= 10;
result += (c - '0');
if (result >= 0x110000)
return -1;
}
}
return checkCharRefNumber(result);
}
static
int PREFIX(predefinedEntityName)(const ENCODING *enc, const char *ptr, const char *end)
{
switch (end - ptr) {
case 2 * MINBPC:
if (CHAR_MATCHES(enc, ptr + MINBPC, 't')) {
switch (BYTE_TO_ASCII(enc, ptr)) {
case 'l':
return '<';
case 'g':
return '>';
}
}
break;
case 3 * MINBPC:
if (CHAR_MATCHES(enc, ptr, 'a')) {
ptr += MINBPC;
if (CHAR_MATCHES(enc, ptr, 'm')) {
ptr += MINBPC;
if (CHAR_MATCHES(enc, ptr, 'p'))
return '&';
}
}
break;
case 4 * MINBPC:
switch (BYTE_TO_ASCII(enc, ptr)) {
case 'q':
ptr += MINBPC;
if (CHAR_MATCHES(enc, ptr, 'u')) {
ptr += MINBPC;
if (CHAR_MATCHES(enc, ptr, 'o')) {
ptr += MINBPC;
if (CHAR_MATCHES(enc, ptr, 't'))
return '"';
}
}
break;
case 'a':
ptr += MINBPC;
if (CHAR_MATCHES(enc, ptr, 'p')) {
ptr += MINBPC;
if (CHAR_MATCHES(enc, ptr, 'o')) {
ptr += MINBPC;
if (CHAR_MATCHES(enc, ptr, 's'))
return '\'';
}
}
break;
}
}
return 0;
}
static
int PREFIX(sameName)(const ENCODING *enc, const char *ptr1, const char *ptr2)
{
for (;;) {
switch (BYTE_TYPE(enc, ptr1)) {
#define LEAD_CASE(n) \
case BT_LEAD ## n: \
if (*ptr1++ != *ptr2++) \
return 0;
LEAD_CASE(4) LEAD_CASE(3) LEAD_CASE(2)
#undef LEAD_CASE
/* fall through */
if (*ptr1++ != *ptr2++)
return 0;
break;
case BT_NONASCII:
case BT_NMSTRT:
case BT_HEX:
case BT_DIGIT:
case BT_NAME:
case BT_MINUS:
if (*ptr2++ != *ptr1++)
return 0;
#if MINBPC > 1
if (*ptr2++ != *ptr1++)
return 0;
#if MINBPC > 2
if (*ptr2++ != *ptr1++)
return 0;
#if MINBPC > 3
if (*ptr2++ != *ptr1++)
return 0;
#endif
#endif
#endif
break;
default:
#if MINBPC == 1
if (*ptr1 == *ptr2)
return 1;
#endif
switch (BYTE_TYPE(enc, ptr2)) {
case BT_LEAD2:
case BT_LEAD3:
case BT_LEAD4:
case BT_NONASCII:
case BT_NMSTRT:
case BT_HEX:
case BT_DIGIT:
case BT_NAME:
case BT_MINUS:
return 0;
default:
return 1;
}
}
}
/* not reached */
}
static
int PREFIX(nameMatchesAscii)(const ENCODING *enc, const char *ptr1, const char *ptr2)
{
for (; *ptr2; ptr1 += MINBPC, ptr2++) {
if (!CHAR_MATCHES(end, ptr1, *ptr2))
return 0;
}
switch (BYTE_TYPE(enc, ptr1)) {
case BT_LEAD2:
case BT_LEAD3:
case BT_LEAD4:
case BT_NONASCII:
case BT_NMSTRT:
case BT_HEX:
case BT_DIGIT:
case BT_NAME:
case BT_MINUS:
return 0;
default:
return 1;
}
}
static
int PREFIX(nameLength)(const ENCODING *enc, const char *ptr)
{
const char *start = ptr;
for (;;) {
switch (BYTE_TYPE(enc, ptr)) {
#define LEAD_CASE(n) \
case BT_LEAD ## n: ptr += n; break;
LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4)
#undef LEAD_CASE
case BT_NONASCII:
case BT_NMSTRT:
case BT_HEX:
case BT_DIGIT:
case BT_NAME:
case BT_MINUS:
ptr += MINBPC;
break;
default:
return ptr - start;
}
}
}
static
const char *PREFIX(skipS)(const ENCODING *enc, const char *ptr)
{
for (;;) {
switch (BYTE_TYPE(enc, ptr)) {
case BT_LF:
case BT_CR:
case BT_S:
ptr += MINBPC;
break;
default:
return ptr;
}
}
}
static
void PREFIX(updatePosition)(const ENCODING *enc,
const char *ptr,
const char *end,
POSITION *pos)
{
while (ptr != end) {
switch (BYTE_TYPE(enc, ptr)) {
#define LEAD_CASE(n) \
case BT_LEAD ## n: \
ptr += n; \
break;
LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4)
#undef LEAD_CASE
case BT_LF:
pos->columnNumber = (unsigned)-1;
pos->lineNumber++;
ptr += MINBPC;
break;
case BT_CR:
pos->lineNumber++;
ptr += MINBPC;
if (ptr != end && BYTE_TYPE(enc, ptr) == BT_LF)
ptr += MINBPC;
pos->columnNumber = (unsigned)-1;
break;
default:
ptr += MINBPC;
break;
}
pos->columnNumber++;
}
}
#undef DO_LEAD_CASE
#undef MULTIBYTE_CASES
#undef INVALID_CASES
#undef CHECK_NAME_CASE
#undef CHECK_NAME_CASES
#undef CHECK_NMSTRT_CASE
#undef CHECK_NMSTRT_CASES
1.1 apache-1.3/src/lib/expat/xmltok_impl.h
Index: xmltok_impl.h
===================================================================
/*
The contents of this file are subject to the Mozilla Public License
Version 1.0 (the "License"); you may not use this file except in
compliance with the License. You may obtain a copy of the License at
http://www.mozilla.org/MPL/
Software distributed under the License is distributed on an "AS IS"
basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the
License for the specific language governing rights and limitations
under the License.
The Original Code is expat.
The Initial Developer of the Original Code is James Clark.
Portions created by James Clark are Copyright (C) 1998
James Clark. All Rights Reserved.
Contributor(s):
*/
enum {
BT_NONXML,
BT_MALFORM,
BT_LT,
BT_AMP,
BT_RSQB,
BT_LEAD2,
BT_LEAD3,
BT_LEAD4,
BT_TRAIL,
BT_CR,
BT_LF,
BT_GT,
BT_QUOT,
BT_APOS,
BT_EQUALS,
BT_QUEST,
BT_EXCL,
BT_SOL,
BT_SEMI,
BT_NUM,
BT_LSQB,
BT_S,
BT_NMSTRT,
BT_HEX,
BT_DIGIT,
BT_NAME,
BT_MINUS,
BT_OTHER, /* known not to be a name or name start character */
BT_NONASCII, /* might be a name or name start character */
BT_PERCNT,
BT_LPAR,
BT_RPAR,
BT_AST,
BT_PLUS,
BT_COMMA,
BT_VERBAR
};
#include <stddef.h>