You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Bruno Sanz Marino <br...@inicia.es> on 2003/05/28 20:10:38 UTC

xni pull parser

Hi

Do you know of any xni-pull parse techniques article on the web?

The doc brings not so many information

Thank you

Bruno Sanz

------------------------------
Tiscali ADSL Libre 
http://adsl.tiscali.es/index.php3?produc=libre&did=&despl=&did=adl-7380017

¡¡¡VELOCIDAD 24h. DESDE SÓLO 16,95 €/mes + TIEMPO DE CONEXIÓN!!!
Con las mismas prestaciones que una línea ADSL estándar
------------------------------




---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: xni pull parser

Posted by "K. Venugopal" <k....@sun.com>.
Hi Bruno ,

See if these help

http://xml.apache.org/xerces2-j/faq-write.html#faq-5
http://www.extreme.indiana.edu/xgws/xsoap/xpp/
http://www.apache.org/~andyc/neko/doc/pull/
http://www.google.com.sg/search?q=cache:eHyibQm8eGIJ:www.extreme.indiana.edu/xgws/papers/xml_push_pull.pdf+XNI+Pull+parser&hl=en&ie=UTF-8
http://sourceforge.net/projects/xni2xmlpull/

Regards
venu

Bruno Sanz Marino wrote:

>Hi
>
>Do you know of any xni-pull parse techniques article on the web?
>
>The doc brings not so many information
>
>Thank you
>
>Bruno Sanz
>
>------------------------------
>Tiscali ADSL Libre 
>http://adsl.tiscali.es/index.php3?produc=libre&did=&despl=&did=adl-7380017
>
>¡¡¡VELOCIDAD 24h. DESDE SÓLO 16,95 €/mes + TIEMPO DE CONEXIÓN!!!
>Con las mismas prestaciones que una línea ADSL estándar
>------------------------------
>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
>For additional commands, e-mail: xerces-j-user-help@xml.apache.org
>
>
>  
>



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: Parsing external entities in multifile documents

Posted by Paul Kinnucan <pa...@mathworks.com>.
The following are examples of a book and a chapter file
that I concocted to illustrate my problem. Both files
are exactly as Epic generated them.

Book file:

<?xml version="1.0" encoding="utf-8"?>
<!--ArborText, Inc., 1988-2002, v.4002-->
<!DOCTYPE book PUBLIC "-//The MathWorks//DTD axdocbook variant//"
 "http://www-internal.mathworks.com/devel/Adoc/nightly/matlab/doc/tools/epiccustom/doctypes/tmwbook/tmwbook.dtd" [
<!ENTITY fake.txt SYSTEM "fake.txt">
<!ENTITY fake_chapter1.xml SYSTEM "fake_chapter1.xml">
]>
<?Pub UDT instructions _comment FontColor="red"?>
<?Pub UDT template _font?>
<?Pub Inc?>
<book id="a1054221435">
<title>Fake Book</title>
<bookinfo><?Pub Dtl?>
<productname>==Name of this product, for top of printed title page==</productname>
<titleabbrev>==Short title, for bottom of printed title page==</titleabbrev>
<subtitle>==Title for banner on each HTML page==</subtitle>
<releaseinfo>==Version info, for bottom of printed title page==</releaseinfo>
<copyright><year>==Range of copyright years, e.g., 2000&#x2013;2002.==</year>
<holder>by The MathWorks, Inc.</holder></copyright>
<revhistory>
<revision>
<revnumber>==Number of this printing batch, e.g., &#x201c;Second printing&#x201d;==</revnumber>
<date>==Month and year of printing batch, e.g., September 2000==</date>
<revremark>==Comment, e.g., &#x201c;Revised for version 1.0.2 (Release 12)&#x201d;==</revremark>
</revision>
</revhistory>
<abbrev>==Unique book code for this book==</abbrev>
<subjectset>
<subject><subjectterm>MATLAB</subjectterm></subject>
</subjectset>
</bookinfo>&Table_of_Contents;
<?Pub Caret?>&fake_chapter1.xml;
&Index;
</book>
<?Pub *0000001494 0?>

Chapter file that makes a reference to an external text file:

<?xml version="1.0" encoding="utf-8"?>
<!-- Fragment document type declaration subset:
ArborText, Inc., 1988-2002, v.4002
<!DOCTYPE book PUBLIC "-//The MathWorks//DTD axdocbook variant//"
 "http://www-internal.mathworks.com/devel/Adoc/nightly/matlab/doc/tools/epiccustom/doctypes/tmwbook/tmwbook.dtd" [
<!ENTITY fake.txt SYSTEM "fake.txt">
]>
-->
<?Pub UDT instructions _comment FontColor="red"?>
<?Pub UDT template _font?>
<chapter id="a1054221617">
<title>Fake Chapter</title>
<para>Here is an inserted fake text file:</para>
<para>&fake.txt;<?Pub Caret?></para>
</chapter>
<?Pub *0000000598 0?>


Please note that Epic includes the declaration for &fake.txt; both in the
book doctype and the chapter doctype as you suggested. However, I need
to parse the chapter file as a standalone document and Epic comments out
the chapter doctype element so that it is ignored by the parser.

The result is that when my Java application tries to parse the chapter file
as a standalone document, Xerces signals an unrecoverable error
(undefined external entity) when it encounters the external entity
reference (&fake.txt;).

Is there a way I can successfully parse chapter files without modifiying
them, e.g., by supplying a custom entity resolver that extracts entity
declarations from the commented-out doctype? (Modification of the chapter
file XML is not an option as I explained in my previous post.)

- Paul



K. Venugopal writes:
 > 
 > Hi Paul ,
 > 
 > Paul Kinnucan wrote:
 > 
 > >Hi,
 > >
 > >I need some advice on how to deal with a problem that I have 
 > >encountered trying to use xerces to parse external entities
 > >in multifile documents created by Arbortext's Epic editor.
 > >
 > >The documents in question are technical manuals consisting of a book
 > >file that references a set of chapter files as external entities
 > >defined by the book's doctype declaration.  The chapter files are
 > >themselves XML files, i.e., XML "fragments" of the book. At the head
 > >of each chapter file is an XML comment that encloses a doctype
 > >declaration that specifies the same doctype as that defined by the
 > >book files doctype declaration, i.e., the book's document type.
 > >
 > >The problem occurs when writers use Epic to "include" external text
 > >files (usually nonXML program listings) in a chapter file.  Epic
 > >implements this by inserting an entity definition for the inserted
 > >file in the commented out doctype declaration at the head of the
 > >chapter file and a reference to the entity where the inserted 
 > >text is to appear. When displaying the chapter, Epic knows to
 > >look for the definition of the entity in the commented out
 > >doctype declaration. However, xerces does not. It regards the
 > >external entity as undefined and errors out, preventing me from
 > >parsing the file.
 > >  
 > >
 > If i have understood your problem right you can declare your entities 
 > for text files in your book xml where you have declared entities for 
 > your chapter xml files.  
 > 
 > In addition to this you need to set
 > parser.setProperty("http://java.sun.com/xml/jaxp/properties/schemaLanguage", 
 > "http://www.w3.org/2001/XMLSchema");
 > 
 > It should have worked when you set schema validation to true  and above 
 > property is needed only in case of jaxp . I will look into this .
 > 
 > 
 > Regards
 > venu
 > 
 > >How can I parse such files? I've noticed that the DOMParser class
 > >has setEntityResolver() and getEntityResolver() methods. This
 > >suggests to me that it might be possible for me to define and use my
 > >owe external entity resolver. This resolver would try to use the
 > >default resolver and if that failed would look for a definition of the
 > >entity in the commented-out doctype declaration at the head of the
 > >file. Does the setEntityResolver method actually support such 
 > >a solution? Is there a better way to resolve this problem? Any
 > >help you can give me would be deeply appreciated.
 > >
 > >- Paul
 > >
 > >
 > >---------------------------------------------------------------------
 > >To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
 > >For additional commands, e-mail: xerces-j-user-help@xml.apache.org
 > >
 > >  
 > >
 > 
 > <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
 > <html>
 > <head>
 >   <title></title>
 > </head>
 > <body>
 > <br>
 > Hi Paul ,<br>
 > <br>
 > Paul Kinnucan wrote:<br>
 > <blockquote type="cite"
 >  cite="mid16085.1572.193000.27749@gargle.gargle.HOWL">
 >   <pre wrap="">Hi,
 > 
 > I need some advice on how to deal with a problem that I have 
 > encountered trying to use xerces to parse external entities
 > in multifile documents created by Arbortext's Epic editor.
 > 
 > The documents in question are technical manuals consisting of a book
 > file that references a set of chapter files as external entities
 > defined by the book's doctype declaration.  The chapter files are
 > themselves XML files, i.e., XML "fragments" of the book. At the head
 > of each chapter file is an XML comment that encloses a doctype
 > declaration that specifies the same doctype as that defined by the
 > book files doctype declaration, i.e., the book's document type.
 > 
 > The problem occurs when writers use Epic to "include" external text
 > files (usually nonXML program listings) in a chapter file.  Epic
 > implements this by inserting an entity definition for the inserted
 > file in the commented out doctype declaration at the head of the
 > chapter file and a reference to the entity where the inserted 
 > text is to appear. When displaying the chapter, Epic knows to
 > look for the definition of the entity in the commented out
 > doctype declaration. However, xerces does not. It regards the
 > external entity as undefined and errors out, preventing me from
 > parsing the file.
 >   </pre>
 > </blockquote>
 > <font color="#3333ff"> If i have understood your problem right you can declare 
 > your entities for text files in your book xml where you have declared entities 
 > for your chapter xml files. </font>&nbsp;<br>
 > <br>
 > In addition to this you need to set <br>
 > parser.setProperty(<a class="moz-txt-link-rfc2396E" href="http://java.sun.com/xml/jaxp/properties/schemaLanguage">"http://java.sun.com/xml/jaxp/properties/schemaLanguage"</a>,
 > <a class="moz-txt-link-rfc2396E" href="http://www.w3.org/2001/XMLSchema">"http://www.w3.org/2001/XMLSchema"</a>);<br>
 > <br>
 > It should have worked when you set schema validation to true &nbsp;and above property
 > is needed only in case of jaxp . I will look into this .<br>
 > <br>
 > <br>
 > Regards<br>
 >   venu<br>
 >  <br>
 > <blockquote type="cite"
 >  cite="mid16085.1572.193000.27749@gargle.gargle.HOWL">
 >   <pre wrap="">
 > How can I parse such files? I've noticed that the DOMParser class
 > has setEntityResolver() and getEntityResolver() methods. This
 > suggests to me that it might be possible for me to define and use my
 > owe external entity resolver. This resolver would try to use the
 > default resolver and if that failed would look for a definition of the
 > entity in the commented-out doctype declaration at the head of the
 > file. Does the setEntityResolver method actually support such 
 > a solution? Is there a better way to resolve this problem? Any
 > help you can give me would be deeply appreciated.
 > 
 > - Paul
 > 
 > 
 > ---------------------------------------------------------------------
 > To unsubscribe, e-mail: <a class="moz-txt-link-abbreviated" href="mailto:xerces-j-user-unsubscribe@xml.apache.org">xerces-j-user-unsubscribe@xml.apache.org</a>
 > For additional commands, e-mail: <a class="moz-txt-link-abbreviated" href="mailto:xerces-j-user-help@xml.apache.org">xerces-j-user-help@xml.apache.org</a>
 > 
 >   </pre>
 > </blockquote>
 > <br>
 > </body>
 > </html>


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: Parsing external entities in multifile documents

Posted by Paul Kinnucan <pa...@mathworks.com>.
K. Venugopal writes:
 > 
 > Hi Paul ,
 > 
 > Paul Kinnucan wrote:
 > 
 > >Hi,
 > >
 > >I need some advice on how to deal with a problem that I have 
 > >encountered trying to use xerces to parse external entities
 > >in multifile documents created by Arbortext's Epic editor.
 > >
 > >The documents in question are technical manuals consisting of a book
 > >file that references a set of chapter files as external entities
 > >defined by the book's doctype declaration.  The chapter files are
 > >themselves XML files, i.e., XML "fragments" of the book. At the head
 > >of each chapter file is an XML comment that encloses a doctype
 > >declaration that specifies the same doctype as that defined by the
 > >book files doctype declaration, i.e., the book's document type.
 > >
 > >The problem occurs when writers use Epic to "include" external text
 > >files (usually nonXML program listings) in a chapter file.  Epic
 > >implements this by inserting an entity definition for the inserted
 > >file in the commented out doctype declaration at the head of the
 > >chapter file and a reference to the entity where the inserted 
 > >text is to appear. When displaying the chapter, Epic knows to
 > >look for the definition of the entity in the commented out
 > >doctype declaration. However, xerces does not. It regards the
 > >external entity as undefined and errors out, preventing me from
 > >parsing the file.
 > >  
 > >
 > If i have understood your problem right 

I'm afraid I didn't state the problem completely enough.  The Epic
editor is a WYZYWIG editor that happens to use XML as its internal
markup language. Authors see a formatted rendering of the document on
the screen. They never see or manipulate XML. The XML is all generated
automatically by Epic behind the scenes. For example, to include a
text file by reference in a chapter, the author positions the cursor
at the insertion point in a formatted rendering of the document on the
screen and selects Insert File Reference from the Epic menu. Epic
displays a file selection dialog. The user selects the file to be
inserted. The contents of the file then appear on the screen at the
insertion point. Behind the scenes Epic generates the necessary
external entity declaration in the commented-out doctype at the head
of the internal XML representation of the document and the entity
instance itself at the insertion point.

 > you can declare your entities 
 > for text files in your book xml where you have declared entities for 


This would require authors to edit the XML generated by Epic with some
other editor, copying the entity declaration generated by Epic from
the chapter file to the book file. This would not only be labor
intensive but also error prone and a maintenance nightmare. It could
lead to corruption of the document making it impossible to be edited
by Epic. I have to find a way of parsing the XML generated by Epic
that does not entail manually editing that XML after the fact. I was
hoping there would be some way, e.g., via a custom entity resolver, to
pursuade Xerces to use the entity declarations in the commented out
doctypes that Epic generates at the head of chapter files to resolve
external references.

 > your chapter xml files.  
 > 
 > In addition to this you need to set
 > parser.setProperty("http://java.sun.com/xml/jaxp/properties/schemaLanguage", 
 > "http://www.w3.org/2001/XMLSchema");
 > 
 > It should have worked when you set schema validation to true  and above 
 > property is needed only in case of jaxp . I will look into this .
 > 

I don't see the relevance of this suggestion to my problem.
We use the Docbook DTD. We don't use schemas.

- Paul

 > 
 > Regards
 > venu
 > 
 > >How can I parse such files? I've noticed that the DOMParser class
 > >has setEntityResolver() and getEntityResolver() methods. This
 > >suggests to me that it might be possible for me to define and use my
 > >owe external entity resolver. This resolver would try to use the
 > >default resolver and if that failed would look for a definition of the
 > >entity in the commented-out doctype declaration at the head of the
 > >file. Does the setEntityResolver method actually support such 
 > >a solution? Is there a better way to resolve this problem? Any
 > >help you can give me would be deeply appreciated.
 > >
 > >- Paul
 > >
 > >
 > >---------------------------------------------------------------------
 > >To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
 > >For additional commands, e-mail: xerces-j-user-help@xml.apache.org
 > >
 > >  
 > >
 > 
 > <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
 > <html>
 > <head>
 >   <title></title>
 > </head>
 > <body>
 > <br>
 > Hi Paul ,<br>
 > <br>
 > Paul Kinnucan wrote:<br>
 > <blockquote type="cite"
 >  cite="mid16085.1572.193000.27749@gargle.gargle.HOWL">
 >   <pre wrap="">Hi,
 > 
 > I need some advice on how to deal with a problem that I have 
 > encountered trying to use xerces to parse external entities
 > in multifile documents created by Arbortext's Epic editor.
 > 
 > The documents in question are technical manuals consisting of a book
 > file that references a set of chapter files as external entities
 > defined by the book's doctype declaration.  The chapter files are
 > themselves XML files, i.e., XML "fragments" of the book. At the head
 > of each chapter file is an XML comment that encloses a doctype
 > declaration that specifies the same doctype as that defined by the
 > book files doctype declaration, i.e., the book's document type.
 > 
 > The problem occurs when writers use Epic to "include" external text
 > files (usually nonXML program listings) in a chapter file.  Epic
 > implements this by inserting an entity definition for the inserted
 > file in the commented out doctype declaration at the head of the
 > chapter file and a reference to the entity where the inserted 
 > text is to appear. When displaying the chapter, Epic knows to
 > look for the definition of the entity in the commented out
 > doctype declaration. However, xerces does not. It regards the
 > external entity as undefined and errors out, preventing me from
 > parsing the file.
 >   </pre>
 > </blockquote>
 > <font color="#3333ff"> If i have understood your problem right you can declare 
 > your entities for text files in your book xml where you have declared entities 
 > for your chapter xml files. </font>&nbsp;<br>
 > <br>
 > In addition to this you need to set <br>
 > parser.setProperty(<a class="moz-txt-link-rfc2396E" href="http://java.sun.com/xml/jaxp/properties/schemaLanguage">"http://java.sun.com/xml/jaxp/properties/schemaLanguage"</a>,
 > <a class="moz-txt-link-rfc2396E" href="http://www.w3.org/2001/XMLSchema">"http://www.w3.org/2001/XMLSchema"</a>);<br>
 > <br>
 > It should have worked when you set schema validation to true &nbsp;and above property
 > is needed only in case of jaxp . I will look into this .<br>
 > <br>
 > <br>
 > Regards<br>
 >   venu<br>
 >  <br>
 > <blockquote type="cite"
 >  cite="mid16085.1572.193000.27749@gargle.gargle.HOWL">
 >   <pre wrap="">
 > How can I parse such files? I've noticed that the DOMParser class
 > has setEntityResolver() and getEntityResolver() methods. This
 > suggests to me that it might be possible for me to define and use my
 > owe external entity resolver. This resolver would try to use the
 > default resolver and if that failed would look for a definition of the
 > entity in the commented-out doctype declaration at the head of the
 > file. Does the setEntityResolver method actually support such 
 > a solution? Is there a better way to resolve this problem? Any
 > help you can give me would be deeply appreciated.
 > 
 > - Paul
 > 
 > 
 > ---------------------------------------------------------------------
 > To unsubscribe, e-mail: <a class="moz-txt-link-abbreviated" href="mailto:xerces-j-user-unsubscribe@xml.apache.org">xerces-j-user-unsubscribe@xml.apache.org</a>
 > For additional commands, e-mail: <a class="moz-txt-link-abbreviated" href="mailto:xerces-j-user-help@xml.apache.org">xerces-j-user-help@xml.apache.org</a>
 > 
 >   </pre>
 > </blockquote>
 > <br>
 > </body>
 > </html>


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: Parsing external entities in multifile documents

Posted by "K. Venugopal" <k....@sun.com>.
Hi Paul ,

Paul Kinnucan wrote:

>Hi,
>
>I need some advice on how to deal with a problem that I have 
>encountered trying to use xerces to parse external entities
>in multifile documents created by Arbortext's Epic editor.
>
>The documents in question are technical manuals consisting of a book
>file that references a set of chapter files as external entities
>defined by the book's doctype declaration.  The chapter files are
>themselves XML files, i.e., XML "fragments" of the book. At the head
>of each chapter file is an XML comment that encloses a doctype
>declaration that specifies the same doctype as that defined by the
>book files doctype declaration, i.e., the book's document type.
>
>The problem occurs when writers use Epic to "include" external text
>files (usually nonXML program listings) in a chapter file.  Epic
>implements this by inserting an entity definition for the inserted
>file in the commented out doctype declaration at the head of the
>chapter file and a reference to the entity where the inserted 
>text is to appear. When displaying the chapter, Epic knows to
>look for the definition of the entity in the commented out
>doctype declaration. However, xerces does not. It regards the
>external entity as undefined and errors out, preventing me from
>parsing the file.
>  
>
If i have understood your problem right you can declare your entities 
for text files in your book xml where you have declared entities for 
your chapter xml files.  

In addition to this you need to set
parser.setProperty("http://java.sun.com/xml/jaxp/properties/schemaLanguage", 
"http://www.w3.org/2001/XMLSchema");

It should have worked when you set schema validation to true  and above 
property is needed only in case of jaxp . I will look into this .


Regards
venu

>How can I parse such files? I've noticed that the DOMParser class
>has setEntityResolver() and getEntityResolver() methods. This
>suggests to me that it might be possible for me to define and use my
>owe external entity resolver. This resolver would try to use the
>default resolver and if that failed would look for a definition of the
>entity in the commented-out doctype declaration at the head of the
>file. Does the setEntityResolver method actually support such 
>a solution? Is there a better way to resolve this problem? Any
>help you can give me would be deeply appreciated.
>
>- Paul
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
>For additional commands, e-mail: xerces-j-user-help@xml.apache.org
>
>  
>


Parsing external entities in multifile documents

Posted by Paul Kinnucan <pa...@mathworks.com>.
Hi,

I need some advice on how to deal with a problem that I have 
encountered trying to use xerces to parse external entities
in multifile documents created by Arbortext's Epic editor.

The documents in question are technical manuals consisting of a book
file that references a set of chapter files as external entities
defined by the book's doctype declaration.  The chapter files are
themselves XML files, i.e., XML "fragments" of the book. At the head
of each chapter file is an XML comment that encloses a doctype
declaration that specifies the same doctype as that defined by the
book files doctype declaration, i.e., the book's document type.

The problem occurs when writers use Epic to "include" external text
files (usually nonXML program listings) in a chapter file.  Epic
implements this by inserting an entity definition for the inserted
file in the commented out doctype declaration at the head of the
chapter file and a reference to the entity where the inserted 
text is to appear. When displaying the chapter, Epic knows to
look for the definition of the entity in the commented out
doctype declaration. However, xerces does not. It regards the
external entity as undefined and errors out, preventing me from
parsing the file.

How can I parse such files? I've noticed that the DOMParser class
has setEntityResolver() and getEntityResolver() methods. This
suggests to me that it might be possible for me to define and use my
owe external entity resolver. This resolver would try to use the
default resolver and if that failed would look for a definition of the
entity in the commented-out doctype declaration at the head of the
file. Does the setEntityResolver method actually support such 
a solution? Is there a better way to resolve this problem? Any
help you can give me would be deeply appreciated.

- Paul


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org