You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Nikhil Khedkar <ni...@yahoo.co.uk> on 2002/11/13 07:41:27 UTC

how to specify Encoding

Hi,
   if I want to pass InputStream to the parser's parse method, do I need to set the encoding of InputStream, equals to the attribute encoding="xyz", the xml file has? Or the parser takes care of this encoding and the InputStream need not bother about it?
   Basically my requirement is, I want to parse xml file, whose encoding I do not know, using parse method, which takes InputStream.
  Is the file's encoding (i.e. the one in which the file is saved) and the attribute, encoding="xyz" must be same. My question is, if the parser takes care of the encoding, how does it make sense from InputStream and fetch the encoding, because even to parse the first line having encoding attribute, it has to know the encoding of InputStream?

Thanks,
    Nikhil

 

 




---------------------------------
Get a bigger mailbox -- choose a size that fits your needs.

Nillable KeyRef element

Posted by Evert Hoff <ev...@pixie.co.za>.
Hi,

I am trying to specify a KeyRef that may either refer to a value or be
empty. I have found different results with attributes and simple
content. I'm not sure what the intension of the spec is in this regard,
so I don't know if this behavior is a bug in Xerces or in accordance
with what the spec intended.

ATTRIBUTES

Please refer to the attached test.xsd. A person has an attribute
"manager-id" which refers to another person's "id" attribute. Use of
this attribute is specified as optional. In test.xml the first person
does not have this attribute. This behaves correctly and the parser does
not give an error.

SIMPLE CONTENT

This same xsd specifies that the root element may contain "person"
elements as well as "dog" elements. A person can have a child element
called "dog-name" which refers to a "dog" element. Nillable has been set
to true so that "dog-name" can be empty (nil). I specifically don't want
to remove the "dog-name" element, but just create an empty "dog-name",
as with the second person in test.xml.

I get the following parser error, which means that it doesn't allow the
nil value for "dog-name" even though dog-name has been set to nillable:

Line 13: Parser error: Key 'person.dog-name' with value 'ID Value:  '
not found for identity constraint of element 'personnel'.

Please let me know if I am misinterpreting the spec or whether this is a
bug.

Thanks in advance,

Evert


Re: how to specify Encoding

Posted by Nikhil Khedkar <ni...@yahoo.co.uk>.
Hi,
	I am using a parser (not xerces) that does not detect
the encoding on it's own. I want to add the handling
of detecting the encoding. I downloaded the source for
xerces-1_4_4 and went through the handling. I have
found out that xerces is doing this with the help of 3
utility classes viz. EBCDICRecognizer.java,
UCSRecognizer.java and UTF8Recognizer.java. My
question is, are these 3 classes taking care of
(detecting) all the encodings that java supports? If
yes, just to know, can you tell me, how does it do
that? What is UCS encoding?

Thanks,
	Nikhil

 Jan Dvorak  wrote:Hi,No, you don't specify any
encoding for the InputStream.The parser takes care of
this. It works approximately like this: The parser
detects whether it's an 8-bit or an 16-bit encoding
from the very first bytes of the stream and then it
reads up to the end of the thing. There shouldn't be
any other characters than the usual US-ASCII ones, so
no risk of confusion. As it encounters the
encoding="..." part, the parser switches the reader
for the stream to the specified encoding. Then it
reads through the rest of the stream.JanDne støeda 13
listopad 2002 07:41 


Nikhil Khedkar napsal(a):| Hi,| if I want to pass
InputStream to the parser's parse method, do I need
to| set the encoding of InputStream, equals to the
attribute encoding="xyz",| the xml file has? Or the
parser takes care of this encoding and the|
InputStream need not bother about it? Basically my
requirement is, I want| to parse xml file, whose
encoding I do not know, using parse method, which|
takes InputStream. Is the file's encoding (i.e. the
one in which the file| is saved) and the attribute,
encoding="xyz" must be same. My question is,| if the
parser takes care of the encoding, how does it make
sense from| InputStream and fetch the encoding,
because even to parse the first line| having encoding
attribute, it has to know the encoding of
InputStream?|| Thanks,| Nikhil|||||||||
---------------------------------| Get a bigger
mailbox -- choose a size that fits your
needs.---------------------------------------------------------------------To
unsubscribe, e-mail:
xerces-j-dev-unsubscribe@xml.apache.orgFor additional
commands, e-mail: xerces-j-dev-help@xml.apache.org

__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: how to specify Encoding

Posted by Jan Dvorak <ja...@mathan.cz>.
Hi,

No, you don't specify any encoding for the InputStream.
The parser takes care of this. 

It works approximately like this: The parser detects whether it's an 8-bit or 
an 16-bit encoding from the very first bytes of the stream and then it reads 
up to the end of the <?xml .... ?> thing. There shouldn't be any other 
characters than the usual US-ASCII ones, so no risk of confusion. As it 
encounters the encoding="..." part, the parser switches the reader for the 
stream to the specified encoding. Then it reads through the rest of the 
stream.

Jan

Dne středa 13 listopad 2002 07:41 Nikhil Khedkar napsal(a):
| Hi,
|    if I want to pass InputStream to the parser's parse method, do I need to
| set the encoding of InputStream, equals to the attribute encoding="xyz",
| the xml file has? Or the parser takes care of this encoding and the
| InputStream need not bother about it? Basically my requirement is, I want
| to parse xml file, whose encoding I do not know, using parse method, which
| takes InputStream. Is the file's encoding (i.e. the one in which the file
| is saved) and the attribute, encoding="xyz" must be same. My question is,
| if the parser takes care of the encoding, how does it make sense from
| InputStream and fetch the encoding, because even to parse the first line
| having encoding attribute, it has to know the encoding of InputStream?
|
| Thanks,
|     Nikhil
|
|
|
|
|
|
|
|
| ---------------------------------
| Get a bigger mailbox -- choose a size that fits your needs.


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org