You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "peter royal (Created) (JIRA)" <ji...@apache.org> on 2011/12/20 19:44:30 UTC
[jira] [Created] (TIKA-822) MediaType fails to parse charset that
has quoted value
MediaType fails to parse charset that has quoted value
------------------------------------------------------
Key: TIKA-822
URL: https://issues.apache.org/jira/browse/TIKA-822
Project: Tika
Issue Type: Bug
Components: mime
Affects Versions: 1.0
Reporter: peter royal
If a mime type is
text/html; charset="UTF-8"
the value is incorrectly "UTF-8" not UTF-8
patch available at https://github.com/osi/tika/commit/b77814874ebff8f412ebb2f2adc52c6465d603c4
i have a CLA on file.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-822) MediaType fails to parse charset that
has quoted value
Posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173763#comment-13173763 ]
Nick Burch commented on TIKA-822:
---------------------------------
Should we handle single quotes too? I don't think they're valid for http, but potentially could crop up in other situations
> MediaType fails to parse charset that has quoted value
> ------------------------------------------------------
>
> Key: TIKA-822
> URL: https://issues.apache.org/jira/browse/TIKA-822
> Project: Tika
> Issue Type: Bug
> Components: mime
> Affects Versions: 1.0
> Reporter: peter royal
>
> If a mime type is
> text/html; charset="UTF-8"
> the value is incorrectly "UTF-8" not UTF-8
> patch available at https://github.com/osi/tika/commit/b77814874ebff8f412ebb2f2adc52c6465d603c4
> i have a CLA on file.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-822) MediaType fails to parse charset that
has quoted value
Posted by "peter royal (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173833#comment-13173833 ]
peter royal commented on TIKA-822:
----------------------------------
thanks!
> MediaType fails to parse charset that has quoted value
> ------------------------------------------------------
>
> Key: TIKA-822
> URL: https://issues.apache.org/jira/browse/TIKA-822
> Project: Tika
> Issue Type: Bug
> Components: mime
> Affects Versions: 1.0
> Reporter: peter royal
> Fix For: 1.1
>
>
> If a mime type is
> text/html; charset="UTF-8"
> the value is incorrectly "UTF-8" not UTF-8
> patch available at https://github.com/osi/tika/commit/b77814874ebff8f412ebb2f2adc52c6465d603c4
> i have a CLA on file.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (TIKA-822) MediaType fails to parse charset that
has quoted value
Posted by "Nick Burch (Resolved) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nick Burch resolved TIKA-822.
-----------------------------
Resolution: Fixed
Fix Version/s: 1.1
> MediaType fails to parse charset that has quoted value
> ------------------------------------------------------
>
> Key: TIKA-822
> URL: https://issues.apache.org/jira/browse/TIKA-822
> Project: Tika
> Issue Type: Bug
> Components: mime
> Affects Versions: 1.0
> Reporter: peter royal
> Fix For: 1.1
>
>
> If a mime type is
> text/html; charset="UTF-8"
> the value is incorrectly "UTF-8" not UTF-8
> patch available at https://github.com/osi/tika/commit/b77814874ebff8f412ebb2f2adc52c6465d603c4
> i have a CLA on file.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-822) MediaType fails to parse charset that
has quoted value
Posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173796#comment-13173796 ]
Nick Burch commented on TIKA-822:
---------------------------------
OK, thanks for the info and the patch. I've added it, along with single quote support and a note about the outstanding issues for quoted strings, in r1221581.
> MediaType fails to parse charset that has quoted value
> ------------------------------------------------------
>
> Key: TIKA-822
> URL: https://issues.apache.org/jira/browse/TIKA-822
> Project: Tika
> Issue Type: Bug
> Components: mime
> Affects Versions: 1.0
> Reporter: peter royal
> Fix For: 1.1
>
>
> If a mime type is
> text/html; charset="UTF-8"
> the value is incorrectly "UTF-8" not UTF-8
> patch available at https://github.com/osi/tika/commit/b77814874ebff8f412ebb2f2adc52c6465d603c4
> i have a CLA on file.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-822) MediaType fails to parse charset that
has quoted value
Posted by "peter royal (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173768#comment-13173768 ]
peter royal commented on TIKA-822:
----------------------------------
the rfc for mime isn't clear on whether single quotes make a valid quoted string. overall, the parser needs a bit more work to be fully rfc-compliant (quoted strings can have equals in them, for instance).
I was just trying to fix the simple case I came across. the java mail API generates quoted charset fields for text attachments, which is how I found this.
> MediaType fails to parse charset that has quoted value
> ------------------------------------------------------
>
> Key: TIKA-822
> URL: https://issues.apache.org/jira/browse/TIKA-822
> Project: Tika
> Issue Type: Bug
> Components: mime
> Affects Versions: 1.0
> Reporter: peter royal
>
> If a mime type is
> text/html; charset="UTF-8"
> the value is incorrectly "UTF-8" not UTF-8
> patch available at https://github.com/osi/tika/commit/b77814874ebff8f412ebb2f2adc52c6465d603c4
> i have a CLA on file.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-822) MediaType fails to parse charset that
has quoted value
Posted by "peter royal (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173769#comment-13173769 ]
peter royal commented on TIKA-822:
----------------------------------
the rfc for mime isn't clear on whether single quotes make a valid quoted string. overall, the parser needs a bit more work to be fully rfc-compliant (quoted strings can have equals in them, for instance).
I was just trying to fix the simple case I came across. the java mail API generates quoted charset fields for text attachments, which is how I found this.
> MediaType fails to parse charset that has quoted value
> ------------------------------------------------------
>
> Key: TIKA-822
> URL: https://issues.apache.org/jira/browse/TIKA-822
> Project: Tika
> Issue Type: Bug
> Components: mime
> Affects Versions: 1.0
> Reporter: peter royal
>
> If a mime type is
> text/html; charset="UTF-8"
> the value is incorrectly "UTF-8" not UTF-8
> patch available at https://github.com/osi/tika/commit/b77814874ebff8f412ebb2f2adc52c6465d603c4
> i have a CLA on file.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TIKA-822) MediaType fails to parse charset that
has quoted value
Posted by "peter royal (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
peter royal updated TIKA-822:
-----------------------------
Comment: was deleted
(was: the rfc for mime isn't clear on whether single quotes make a valid quoted string. overall, the parser needs a bit more work to be fully rfc-compliant (quoted strings can have equals in them, for instance).
I was just trying to fix the simple case I came across. the java mail API generates quoted charset fields for text attachments, which is how I found this. )
> MediaType fails to parse charset that has quoted value
> ------------------------------------------------------
>
> Key: TIKA-822
> URL: https://issues.apache.org/jira/browse/TIKA-822
> Project: Tika
> Issue Type: Bug
> Components: mime
> Affects Versions: 1.0
> Reporter: peter royal
>
> If a mime type is
> text/html; charset="UTF-8"
> the value is incorrectly "UTF-8" not UTF-8
> patch available at https://github.com/osi/tika/commit/b77814874ebff8f412ebb2f2adc52c6465d603c4
> i have a CLA on file.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira