You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@commons.apache.org by netsql <ne...@roomity.com> on 2006/04/29 20:07:51 UTC
[IOUtils] How to guess encoding of a byte?
So I have a stream I read, supposed to be UTF-8, but it does not display
or anything.
So I read ints out of the stream. How would I guess the encoding of
bytes, etc, looking at the ints?
tia,
.V
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org
Re: [IOUtils] How to guess encoding of a byte?
Posted by Jacob Kjome <ho...@visi.com>.
For a Java implementation of encoding detection, try the Rome project...
https://rome.dev.java.net/
http://wiki.java.net/bin/view/Javawsxml/Rome
http://wiki.java.net/bin/view/Javawsxml/Rome05CharsetEncoding
https://rome.dev.java.net/source/browse/rome/src/java/com/sun/syndication/io/XmlReader.java
Jake
At 02:35 AM 4/30/2006, you wrote:
>
>On Apr 29, 2006, at 20:07, netsql wrote:
>
>> How would I guess the encoding of bytes, etc, looking at the ints?
>
>You don't. But you can always try:
>
>"Universal Encoding Detector"
>http://chardet.feedparser.org/docs/faq.html
>
>"A composite approach to language/encoding detection"
>http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html
>
>http://lxr.mozilla.org/seamonkey/source/extensions/universalchardet/
>src/base/
>
>Cheers
>
>--
>PA, Onnay Equitursay
>http://alt.textdrive.com/
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: commons-user-help@jakarta.apache.org
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org
Re: [IOUtils] How to guess encoding of a byte?
Posted by petite_abeille <pe...@mac.com>.
On Apr 29, 2006, at 20:07, netsql wrote:
> How would I guess the encoding of bytes, etc, looking at the ints?
You don't. But you can always try:
"Universal Encoding Detector"
http://chardet.feedparser.org/docs/faq.html
"A composite approach to language/encoding detection"
http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html
http://lxr.mozilla.org/seamonkey/source/extensions/universalchardet/
src/base/
Cheers
--
PA, Onnay Equitursay
http://alt.textdrive.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org