You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Ken Krugler (JIRA)" <ji...@apache.org> on 2010/07/08 02:08:50 UTC
[jira] Created: (TIKA-459) Improve handling of incorrect charset
names in HTTP response header
Improve handling of incorrect charset names in HTTP response header
-------------------------------------------------------------------
Key: TIKA-459
URL: https://issues.apache.org/jira/browse/TIKA-459
Project: Tika
Issue Type: Improvement
Reporter: Ken Krugler
Assignee: Ken Krugler
Priority: Minor
While crawling a few million pages, I collected stats for charset names that weren't valid.
The attached patch "fixes up" most of these that I encountered, and thus should improve the accuracy of parse results.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (TIKA-459) Improve handling of incorrect charset
names in HTTP response header
Posted by "Ken Krugler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ken Krugler updated TIKA-459:
-----------------------------
Attachment: TIKA-459.patch
> Improve handling of incorrect charset names in HTTP response header
> -------------------------------------------------------------------
>
> Key: TIKA-459
> URL: https://issues.apache.org/jira/browse/TIKA-459
> Project: Tika
> Issue Type: Improvement
> Reporter: Ken Krugler
> Assignee: Ken Krugler
> Priority: Minor
> Attachments: TIKA-459.patch
>
>
> While crawling a few million pages, I collected stats for charset names that weren't valid.
> The attached patch "fixes up" most of these that I encountered, and thus should improve the accuracy of parse results.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (TIKA-459) Improve handling of incorrect charset
names in HTTP response header
Posted by "Ken Krugler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ken Krugler resolved TIKA-459.
------------------------------
Resolution: Fixed
Patch committed at SVN revision 961850
> Improve handling of incorrect charset names in HTTP response header
> -------------------------------------------------------------------
>
> Key: TIKA-459
> URL: https://issues.apache.org/jira/browse/TIKA-459
> Project: Tika
> Issue Type: Improvement
> Reporter: Ken Krugler
> Assignee: Ken Krugler
> Priority: Minor
> Attachments: TIKA-459.patch
>
>
> While crawling a few million pages, I collected stats for charset names that weren't valid.
> The attached patch "fixes up" most of these that I encountered, and thus should improve the accuracy of parse results.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-459) Improve handling of incorrect charset
names in HTTP response header
Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886174#action_12886174 ]
Chris A. Mattmann commented on TIKA-459:
----------------------------------------
+1, Ken this looks good to me.
Cheers,
Chris
> Improve handling of incorrect charset names in HTTP response header
> -------------------------------------------------------------------
>
> Key: TIKA-459
> URL: https://issues.apache.org/jira/browse/TIKA-459
> Project: Tika
> Issue Type: Improvement
> Reporter: Ken Krugler
> Assignee: Ken Krugler
> Priority: Minor
> Attachments: TIKA-459.patch
>
>
> While crawling a few million pages, I collected stats for charset names that weren't valid.
> The attached patch "fixes up" most of these that I encountered, and thus should improve the accuracy of parse results.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.