You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by GitBox <gi...@apache.org> on 2020/08/06 06:54:40 UTC

[GitHub] [tika] PeterAlfredLee opened a new pull request #336: Add more judge between Charset Windows-1252 and ISO-8859-1(5)

PeterAlfredLee opened a new pull request #336:
URL: https://github.com/apache/tika/pull/336


   According to these web pages: [Windows-1252 Chraracter list](https://www.fileformat.info/info/charset/windows-1252/list.htm) , [ISO-8859-1 Chraracter list](http://www.fileformat.info/info/charset/ISO-8859-1/list.htm), [ISO-8859-15 Chraracter list](https://www.fileformat.info/info/charset/ISO-8859-15/list.htm)
   
   There are 5 byte values ( 0x81, 0x8d, 0x8f, 0x90, 0x9d ) that charset Windows-1252 don't has but charset ISO-8859-1 and  charset ISO-8859-15 have.
   
   I think we can add one more judgment condition:  if content has these byte values , means charset isn't Windows-1252


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] tballison merged pull request #336: Add more judge between Charset Windows-1252 and ISO-8859-1(5)

Posted by GitBox <gi...@apache.org>.
tballison merged pull request #336:
URL: https://github.com/apache/tika/pull/336


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org