You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Will Hartung <wi...@msoft.com> on 2002/08/23 03:13:13 UTC

Problem: White space is required between the version and the encoding declaration.

I'm getting this error while I'm trying to parse a simple XML document.

SAXParseException: White space is required between the version and the
encoding declaration.

The error appears to be in the DTD that it doesn't like the <?xml
version="1.0"?> line, and I don't understand what the issue is. Can anyone
provide any hints? I get this error on both 1.4.3 and 1.4.4.

XML I'm trying to parse:
<?xml version="1.0"?>

<!DOCTYPE Basic SYSTEM "file:basic.dtd">

<Basic>
    <anInt>1</anInt>
    <aDouble>2.4</aDouble>
    <aString>This is a string</aString>
    <anotherString>ANother string</anotherString>
</Basic>

The DTD (in basic.dtd):
<?xml version="1.0"?>
<!ELEMENT Basic (anInt?, aDouble?, aString?, anotherString?)>

<!ELEMENT anInt (#PCDATA)>
<!ELEMENT aDouble (#PCDATA)>
<!ELEMENT aString (#PCDATA)>
<!ELEMENT anotherString (#PCDATA)>

Thanx!

Best Regards,

Will Hartung
(willh@msoft.com)




---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: Default decoding in Neko

Posted by Andy Clark <an...@apache.org>.
Takumi Fujiwara wrote:
> I know I can change it if I want, I just want to understand why 
> "Windows-1252" is choosen instead of UTF-8?

If the default were UTF-8, the reader would throw
an exception on many pages. Any page that contains
an ISO Latin 1 character (above the typical ASCII
range), would make the UTF-8 reader die.

ISO Latin 1 (or Windows-1252) are safe defaults
because every possible byte is acceptable. But, if
the high bit is set on a byte read by the UTF-8
reader, then it assumes that it matches the proper
UTF-8 sequence and when it doesn't, it throws an
exception.

As long as the page specifies its encoding using
the http-equiv meta tag, then NekoHTML will change
to the correct reader and everything will be fine.
But, if it does *not*, then we need to use a "safe"
encoding. Therefore, I chose Windows-1252.

-- 
Andy Clark * andyc@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Default decoding in Neko

Posted by Takumi Fujiwara <tr...@yahoo.com>.
Hi, 
Could someone pleasea tell me why the default decoding in Neko is Windows-1252 instead of UTF-8? I want to parse pages like yahoo.jp, yahoo.co.jp, dk.yahoo.com, it.yahoo.com, hk.yahoo.com
I know I can change it if I want, I just want to understand why "Windows-1252" is choosen instead of UTF-8?
Thank you.
Sam



---------------------------------
Do You Yahoo!?
Yahoo! Finance - Get real-time stock quotes

RE: Problem: White space is required between the version and the encoding declaration.

Posted by Tom Wang <to...@panscopic.com>.
Will,

Encoding is optional for xml decl.  I am as confused as you're on this.
Anyone else has input on this?

Tom Wang
Panscopic Corporation
Web Reporting, Just Add Data
http://www.panscopic.com/


> -----Original Message-----
> From: Will Hartung [mailto:willh@msoft.com]
> Sent: Friday, August 23, 2002 2:41 PM
> To: xerces-j-user@xml.apache.org; tomw@panscopic.com
> Subject: Re: Problem: White space is required between the version and
> the encoding declaration.
>
>
> So, is the 'encoding' portion required? I thought it was
> optional. Or is it
> only required for DTDs? If I simply add the 'encoding' clause to the DTD,
> then it works, even though the actual XML file has the same, simple <?xml
> version="1.0"?> header.
>
> Confused and peplexed.
>
> Regards,
>
> Will Hartung
> (willh@msoft.com)
>
> ----- Original Message -----
> From: "Tom Wang" <to...@panscopic.com>
> To: <xe...@xml.apache.org>
> Sent: Friday, August 23, 2002 12:40 PM
> Subject: RE: Problem: White space is required between the version and the
> encoding declaration.
>
>
> > Will,
> >
> > You can remove the xml decl or force adding the encoding info:
> >
> > <?xml version="1.0" encoding="UTF-8" ?>
> >
> > Tom Wang
> > Panscopic Corporation
> > Web Reporting, Just Add Data
> > http://www.panscopic.com/
> >
> >
> > > -----Original Message-----
> > > From: Will Hartung [mailto:willh@msoft.com]
> > > Sent: Thursday, August 22, 2002 6:13 PM
> > > To: xerces-j-user@xml.apache.org
> > > Subject: Problem: White space is required between the version and the
> > > encoding declaration.
> > >
> > >
> > >
> > > I'm getting this error while I'm trying to parse a simple XML
> document.
> > >
> > > SAXParseException: White space is required between the version and the
> > > encoding declaration.
> > >
> > > The error appears to be in the DTD that it doesn't like the <?xml
> > > version="1.0"?> line, and I don't understand what the issue is. Can
> anyone
> > > provide any hints? I get this error on both 1.4.3 and 1.4.4.
> > >
> > > XML I'm trying to parse:
> > > <?xml version="1.0"?>
> > >
> > > <!DOCTYPE Basic SYSTEM "file:basic.dtd">
> > >
> > > <Basic>
> > >     <anInt>1</anInt>
> > >     <aDouble>2.4</aDouble>
> > >     <aString>This is a string</aString>
> > >     <anotherString>ANother string</anotherString>
> > > </Basic>
> > >
> > > The DTD (in basic.dtd):
> > > <?xml version="1.0"?>
> > > <!ELEMENT Basic (anInt?, aDouble?, aString?, anotherString?)>
> > >
> > > <!ELEMENT anInt (#PCDATA)>
> > > <!ELEMENT aDouble (#PCDATA)>
> > > <!ELEMENT aString (#PCDATA)>
> > > <!ELEMENT anotherString (#PCDATA)>
> > >
> > > Thanx!
> > >
> > > Best Regards,
> > >
> > > Will Hartung
> > > (willh@msoft.com)
> > >
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> > > For additional commands, e-mail: xerces-j-user-help@xml.apache.org
> > >
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> > For additional commands, e-mail: xerces-j-user-help@xml.apache.org
> >
> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: Problem: White space is required between the version and the encoding declaration.

Posted by Will Hartung <wi...@msoft.com>.
So, is the 'encoding' portion required? I thought it was optional. Or is it
only required for DTDs? If I simply add the 'encoding' clause to the DTD,
then it works, even though the actual XML file has the same, simple <?xml
version="1.0"?> header.

Confused and peplexed.

Regards,

Will Hartung
(willh@msoft.com)

----- Original Message -----
From: "Tom Wang" <to...@panscopic.com>
To: <xe...@xml.apache.org>
Sent: Friday, August 23, 2002 12:40 PM
Subject: RE: Problem: White space is required between the version and the
encoding declaration.


> Will,
>
> You can remove the xml decl or force adding the encoding info:
>
> <?xml version="1.0" encoding="UTF-8" ?>
>
> Tom Wang
> Panscopic Corporation
> Web Reporting, Just Add Data
> http://www.panscopic.com/
>
>
> > -----Original Message-----
> > From: Will Hartung [mailto:willh@msoft.com]
> > Sent: Thursday, August 22, 2002 6:13 PM
> > To: xerces-j-user@xml.apache.org
> > Subject: Problem: White space is required between the version and the
> > encoding declaration.
> >
> >
> >
> > I'm getting this error while I'm trying to parse a simple XML document.
> >
> > SAXParseException: White space is required between the version and the
> > encoding declaration.
> >
> > The error appears to be in the DTD that it doesn't like the <?xml
> > version="1.0"?> line, and I don't understand what the issue is. Can
anyone
> > provide any hints? I get this error on both 1.4.3 and 1.4.4.
> >
> > XML I'm trying to parse:
> > <?xml version="1.0"?>
> >
> > <!DOCTYPE Basic SYSTEM "file:basic.dtd">
> >
> > <Basic>
> >     <anInt>1</anInt>
> >     <aDouble>2.4</aDouble>
> >     <aString>This is a string</aString>
> >     <anotherString>ANother string</anotherString>
> > </Basic>
> >
> > The DTD (in basic.dtd):
> > <?xml version="1.0"?>
> > <!ELEMENT Basic (anInt?, aDouble?, aString?, anotherString?)>
> >
> > <!ELEMENT anInt (#PCDATA)>
> > <!ELEMENT aDouble (#PCDATA)>
> > <!ELEMENT aString (#PCDATA)>
> > <!ELEMENT anotherString (#PCDATA)>
> >
> > Thanx!
> >
> > Best Regards,
> >
> > Will Hartung
> > (willh@msoft.com)
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> > For additional commands, e-mail: xerces-j-user-help@xml.apache.org
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


RE: Problem: White space is required between the version and the encoding declaration.

Posted by Tom Wang <to...@panscopic.com>.
Will,

You can remove the xml decl or force adding the encoding info:

<?xml version="1.0" encoding="UTF-8" ?>

Tom Wang
Panscopic Corporation
Web Reporting, Just Add Data
http://www.panscopic.com/


> -----Original Message-----
> From: Will Hartung [mailto:willh@msoft.com]
> Sent: Thursday, August 22, 2002 6:13 PM
> To: xerces-j-user@xml.apache.org
> Subject: Problem: White space is required between the version and the
> encoding declaration.
>
>
>
> I'm getting this error while I'm trying to parse a simple XML document.
>
> SAXParseException: White space is required between the version and the
> encoding declaration.
>
> The error appears to be in the DTD that it doesn't like the <?xml
> version="1.0"?> line, and I don't understand what the issue is. Can anyone
> provide any hints? I get this error on both 1.4.3 and 1.4.4.
>
> XML I'm trying to parse:
> <?xml version="1.0"?>
>
> <!DOCTYPE Basic SYSTEM "file:basic.dtd">
>
> <Basic>
>     <anInt>1</anInt>
>     <aDouble>2.4</aDouble>
>     <aString>This is a string</aString>
>     <anotherString>ANother string</anotherString>
> </Basic>
>
> The DTD (in basic.dtd):
> <?xml version="1.0"?>
> <!ELEMENT Basic (anInt?, aDouble?, aString?, anotherString?)>
>
> <!ELEMENT anInt (#PCDATA)>
> <!ELEMENT aDouble (#PCDATA)>
> <!ELEMENT aString (#PCDATA)>
> <!ELEMENT anotherString (#PCDATA)>
>
> Thanx!
>
> Best Regards,
>
> Will Hartung
> (willh@msoft.com)
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org