You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Harinder (JIRA)" <ji...@apache.org> on 2018/04/06 20:44:00 UTC

[jira] [Commented] (TIKA-2091) regression: Zip bomb detected! for HTML file

    [ https://issues.apache.org/jira/browse/TIKA-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16428933#comment-16428933 ] 

Harinder commented on TIKA-2091:
--------------------------------

Hello [~tallison@mitre.org], you mentioned above that the zip bomb issue when extracting HTML files does not occur if you don't use Solr's custom MostlyPassthroughHtmlMapper.  
How would I go about configuring Solr to use Tika's default extractor? 

I have a thread open at SO with full details, [see here|https://stackoverflow.com/questions/49699256/zip-bomb-exception-while-sending-html-document-to-solr].

Thanks!

> regression: Zip bomb detected! for HTML file
> --------------------------------------------
>
>                 Key: TIKA-2091
>                 URL: https://issues.apache.org/jira/browse/TIKA-2091
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.13
>         Environment: Debian jessie Linux, Oracle Java 8
>            Reporter: Rodrigo Rosenfeld Rosas
>            Priority: Major
>
> Hi, while discussing an issue on Solr's mailing list it was suggested to me to open a ticket here. Please let me know if this is not the proper place for such ticket.
> After upgrading to latest Solr, this document is no longer indexing properly in Solr. They told me they upgraded Tika from 1.7 to 1.13 in Solr 6.2. Before the upgrade this documented was indexed as expected:
> https://www.sec.gov/Archives/edgar/data/1472033/000119380513001310/e611133_f6ef-eutelsat.htm
> I hope a fix could go on time for 1.14 ;)
> Cheers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)