You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@commons.apache.org by netsql <ne...@roomity.com> on 2006/04/29 20:07:51 UTC

[IOUtils] How to guess encoding of a byte?

So I have a stream I read, supposed to be UTF-8, but it does not display 
or anything.

So I read ints out of the stream. How would I guess the encoding of 
bytes, etc, looking at the ints?

tia,
.V


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: [IOUtils] How to guess encoding of a byte?

Posted by Jacob Kjome <ho...@visi.com>.
For a Java implementation of encoding detection, try the Rome project...

https://rome.dev.java.net/

http://wiki.java.net/bin/view/Javawsxml/Rome

http://wiki.java.net/bin/view/Javawsxml/Rome05CharsetEncoding

https://rome.dev.java.net/source/browse/rome/src/java/com/sun/syndication/io/XmlReader.java


Jake

At 02:35 AM 4/30/2006, you wrote:
 >
 >On Apr 29, 2006, at 20:07, netsql wrote:
 >
 >> How would I guess the encoding of bytes, etc, looking at the ints?
 >
 >You don't. But you can always try:
 >
 >"Universal Encoding Detector"
 >http://chardet.feedparser.org/docs/faq.html
 >
 >"A composite approach to language/encoding detection"
 >http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html
 >
 >http://lxr.mozilla.org/seamonkey/source/extensions/universalchardet/
 >src/base/
 >
 >Cheers
 >
 >--
 >PA, Onnay Equitursay
 >http://alt.textdrive.com/
 >
 >
 >---------------------------------------------------------------------
 >To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
 >For additional commands, e-mail: commons-user-help@jakarta.apache.org
 >
 >
 > 


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: [IOUtils] How to guess encoding of a byte?

Posted by petite_abeille <pe...@mac.com>.
On Apr 29, 2006, at 20:07, netsql wrote:

> How would I guess the encoding of bytes, etc, looking at the ints?

You don't. But you can always try:

"Universal Encoding Detector"
http://chardet.feedparser.org/docs/faq.html

"A composite approach to language/encoding detection"
http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html

http://lxr.mozilla.org/seamonkey/source/extensions/universalchardet/ 
src/base/

Cheers

--
PA, Onnay Equitursay
http://alt.textdrive.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org