You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by "Cole, Russ E" <Ru...@unisys.com> on 2002/01/03 21:45:22 UTC

Is there the equivalent of DeclHandler for XML Schemas?

I am currently using the DeclHandler interface to get DTD declaration
events.  I use it to determine the structure of XML documents of that type.


How do I do this when using a XML Schema instead of a DTD?

Russ Cole

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: Encoding Problems - DMOZ - RDF - Xerces-j

Posted by Jon Shoberg <js...@cbd.net>.
Thats the point I am at now.  There will have to be a small function to read
the stream to disk and purge any offending characters.  More of a hack than
I wanted to implement but oh well ...

Thanks for the eyes :)

Jon

----- Original Message -----
From: "Andy Clark" <an...@apache.org>
To: <xe...@xml.apache.org>
Sent: Monday, January 07, 2002 3:03 PM
Subject: Re: Encoding Problems - DMOZ - RDF - Xerces-j


> Jon Shoberg wrote:
> > I guesswhat I am looking for is the ability to read the document though
a
> > single encoding, or change encodings on the fly as necessary, somehow
....
>
> You have to use a single encoding for the entire document. If
> you have control over the generation of that document, I would
> recommend using an encoding that is capable of representing
> *all* Unicode characters, not just the ones for that specific
> encoding. For example: UTF-8, UTF-16. If you don't have control
> over the document generation then you need to find the person
> who can fix this problem because fundamentally it's an invalid
> XML file.
>
> --
> Andy Clark * andyc@apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: Encoding Problems - DMOZ - RDF - Xerces-j

Posted by Andy Clark <an...@apache.org>.
Jon Shoberg wrote:
> I guesswhat I am looking for is the ability to read the document though a
> single encoding, or change encodings on the fly as necessary, somehow ....

You have to use a single encoding for the entire document. If
you have control over the generation of that document, I would
recommend using an encoding that is capable of representing
*all* Unicode characters, not just the ones for that specific
encoding. For example: UTF-8, UTF-16. If you don't have control
over the document generation then you need to find the person
who can fix this problem because fundamentally it's an invalid
XML file.

-- 
Andy Clark * andyc@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: Encoding Problems - DMOZ - RDF - Xerces-j

Posted by Jon Shoberg <js...@cbd.net>.
I think my mail client appended extraneous info to the second UTF-8 input
stream reader.  Here it is corrected.

I guesswhat I am looking for is the ability to read the document though a
single encoding, or change encodings on the fly as necessary, somehow ....

Thanks

Jon

> Here is the block of code I am using ....
>
> // --- BEGIN CODE ---
> String uri = "http://dmoz.org/rdf/content.rdf.u8.gz;
> URL u = new URL(uri);
> InputStream raw = u.openStream();
> InputStream decompressed = new GZIPInputStream(raw);
> InputStreamReader reader = new InputStreamReader(decompressed,
"ISO8859_1");
> //InputStreamReader reader = new InputStreamReader(decompressed, "UTF-8");
> InputSource in = new InputSource(reader);
> parser.parse(in);
> // --- END CODE ---



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Encoding Problems - DMOZ - RDF - Xerces-j

Posted by Jon Shoberg <js...@cbd.net>.
Here is the block of code I am using ....

// --- BEGIN CODE ---
String uri = "http://dmoz.org/rdf/content.rdf.u8.gz";
URL u = new URL(uri);
InputStream raw = u.openStream();
InputStream decompressed = new GZIPInputStream(raw);
InputStreamReader reader = new InputStreamReader(decompressed, "ISO8859_1");
//InputStreamReader reader = new InputStreamReader(decompressed, "UTF-8");
InputSource in = new InputSource(reader);
parser.parse(in);
// --- END CODE ---


Now everything is working great. It downloads, uncompresses, gets read
properly, except for extraneous characters.  I can't seem to find and
encoding set that will work with the entire document.  The document is the
content listing at dmoz.com.

At this point I'm templted to read the document to see if I can purge it if
any characters that avoids a fatal error. Then read it though for archiving
purposes.

Any thoughts, comments, ideas, examples, would certainly be appreciated.

http://dmoz.org/rdf.html

Jon





----- Original Message -----
From: "Cole, Russ E" <Ru...@unisys.com>
To: <xe...@xml.apache.org>
Sent: Thursday, January 03, 2002 3:45 PM
Subject: Is there the equivalent of DeclHandler for XML Schemas?


> I am currently using the DeclHandler interface to get DTD declaration
> events.  I use it to determine the structure of XML documents of that
type.
>
>
> How do I do this when using a XML Schema instead of a DTD?
>
> Russ Cole
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org