You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Stefan Bodewig (JIRA)" <ji...@apache.org> on 2014/08/11 07:48:12 UTC

[jira] [Commented] (COMPRESS-285) checking of availability of XZ compression is expensive - result should be reused

    [ https://issues.apache.org/jira/browse/COMPRESS-285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092467#comment-14092467 ] 

Stefan Bodewig commented on COMPRESS-285:
-----------------------------------------

Thanks Sebb, I think your two suggestions are good ideas and will see to implementing them the coming week, in particular you will only pay for the failed XZ check if you are really trying to uncompress XZ streams.

The additional constructor won't help Wojciech since he's using Compress behind Tika, Tika would need to get adapted to the new constuctor and in the end implement its own logic which would also need to take OSGi contexts into account.  I think it might be a good idea to add an explicit flag whether the result is cacheable and make that flag default to true unless BundleEvent can be loaded - Wojciech would then need to set the flag explicitly.

> checking of availability of XZ compression is expensive - result should be reused
> ---------------------------------------------------------------------------------
>
>                 Key: COMPRESS-285
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-285
>             Project: Commons Compress
>          Issue Type: Improvement
>          Components: Compressors
>    Affects Versions: 1.5, 1.6, 1.7, 1.8
>         Environment: linux 64-bit, java 7, glassfish, solr, tika
>            Reporter: Wojciech Ɓozowicki
>            Priority: Minor
>              Labels: performance
>
> I use solr with apache tika for indexing documents. Tika uses commons-compress to handle compressed files. Using sampler (jvisualvm) I have seen that quite a lot of time (5-7%) during my tests is spent in XZUtils.isXZCompressionAvailable because of unavailable XZ compression (I guess for each time classloaders spend some time looking for unavailable classes, then NoClassDefFoundError).
> I think the result of the first check should be stored and reused.
> Here is the stacktrace (just to show the way tika is using commons-compress):
> org.apache.commons.compress.compressors.xz.XZUtils.isXZCompressionAvailable(XZUtils.java:52)
> 	at org.apache.commons.compress.compressors.CompressorStreamFactory.createCompressorInputStream(CompressorStreamFactory.java:140)
> 	at org.apache.tika.parser.pkg.ZipContainerDetector.detectCompressorFormat(ZipContainerDetector.java:95)
> 	at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:81)
> 	at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)



--
This message was sent by Atlassian JIRA
(v6.2#6252)