You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Vipul Veera <vv...@unwiredsoft.com> on 2001/05/24 08:45:15 UTC

extracting chinese text

Hi
I have a DOM which contains chinese text inside it . I want to traverse the
DOM and extract the text string out of it. But when I get a Text Node and do
a getNodeValue on it, it returns a String which contains all '?' charcters
in it and no chinese charecters.
How do i extract these chinese charecters out?


Vipul.


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: extracting chinese text

Posted by Andy Clark <an...@apache.org>.
Vipul Veera wrote:
> I have a DOM which contains chinese text inside it . I want to traverse the
> DOM and extract the text string out of it. But when I get a Text Node and do
> a getNodeValue on it, it returns a String which contains all '?' charcters
> in it and no chinese charecters.
> How do i extract these chinese charecters out?

If the car doesn't start, try turning the key before hauling
it into the shop for repairs... ;)

Why do you say it contains all '?' characters? Is it because
that's what you see when you print it out to the console or
display it in the application? If so, then it's probably a
font problem. Very common but definitely not a problem with
the parser or DOM implementation.

If the characters *really* are '?', then it sounds like a 
transcoding problem when you're parsing. Are you wrapping 
the input stream with a java Reader? If so, default transcoder 
may blindly convert bytes it doesn't know into '?'. But again,
this is not a problem with the parser or DOM implementation.

-- 
Andy Clark * IBM, TRL - Japan * andyc@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org