You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@any23.apache.org by "Lewis John McGibbney (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2012/04/16 18:24:17 UTC

[jira] [Issue Comment Edited] (ANY23-26) Upgrade dependency to Apache Tika 1.1

    [ https://issues.apache.org/jira/browse/ANY23-26?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13254789#comment-13254789 ] 

Lewis John McGibbney edited comment on ANY23-26 at 4/16/12 4:23 PM:
--------------------------------------------------------------------

Initial WIP. This breaks HCardExtractorTest#testImgSrcDataUrl and #testObjectDataDataUri. 

I've attached my failing tests, along with the two HTML documents which the tests currently fail on. They both seem to be failing on either AbstractExtractorTestCase#assertExtract or HCardExtractorTest#assertDefaultVCard... 

For reference we only use Tika core and parsers on the following two classes

./core/src/main/java/org/apache/any23/mime/TikaMIMETypeDetector.java
./core/src/main/java/org/apache/any23/encoding/TikaEncodingDetector.java
                
      was (Author: lewismc):
    Initial WIP. This breaks HCardExtractorTest#testImgSrcDataUrl and #testObjectDataDataUri. 

I've attached my failing tests, along with the two HTML documents which the tests currently fail on. They both seem to be failing on either AbstractExtractorTestCase#assertExtract or HCardExtractorTest#assertDefaultVCard... 

For reference we only use Tika core and parsers on the following two classes

./core/src/main/java/org/apache/any23/mime/TikaMIMETypeDetector.java:import org.apache.tika.mime.MimeTypes;
./core/src/main/java/org/apache/any23/encoding/TikaEncodingDetector.java:import org.apache.tika.parser.txt.CharsetDetector;  
                  
> Upgrade dependency to Apache Tika 1.1
> -------------------------------------
>
>                 Key: ANY23-26
>                 URL: https://issues.apache.org/jira/browse/ANY23-26
>             Project: Apache Any23
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Lewis John McGibbney
>             Fix For: 0.8.0
>
>         Attachments: 14-img-src-data-url.html, 19-object-data-data-uri.html, ANY23-26.patch, org.apache.any23.extractor.html.HCardExtractorTest.txt
>
>
> Upgrading to Apache Tika will hopefully provide a wealth of benefits for the project. This issue should act as an umbrella issue to track these changes. It would be great to delegate as much as possible to Tika if deemed suitable to enhance functionality and to reduce our dependencies on external projects.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira