You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@camel.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/07/16 22:41:00 UTC

[jira] [Work logged] (CAMEL-13399) ZipAggregationStrategy become slower when size of zip grows

     [ https://issues.apache.org/jira/browse/CAMEL-13399?focusedWorklogId=277818&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-277818 ]

ASF GitHub Bot logged work on CAMEL-13399:
------------------------------------------

                Author: ASF GitHub Bot
            Created on: 16/Jul/19 22:40
            Start Date: 16/Jul/19 22:40
    Worklog Time Spent: 10m 
      Work Description: bedlaj commented on pull request #3046: CAMEL-13399: Optimized ZipAggregationStrategy to use ZipFileSystem
URL: https://github.com/apache/camel/pull/3046
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 277818)
            Time Spent: 10m
    Remaining Estimate: 0h

> ZipAggregationStrategy become slower when size of zip grows
> -----------------------------------------------------------
>
>                 Key: CAMEL-13399
>                 URL: https://issues.apache.org/jira/browse/CAMEL-13399
>             Project: Camel
>          Issue Type: Improvement
>          Components: camel-zipfile
>    Affects Versions: 2.23.1
>            Reporter: Mykhailo Kozik
>            Assignee: Jan Bednar
>            Priority: Major
>         Attachments: Screenshot 2019-04-08 18.41.10.png
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I have a simple route which runs by demand and archives multiple files in one zip archive.
> {code:java}
> from(file:/path/to/source)
> .aggregate(constant(1), new ZipAggregationsStrategy(true, true))
> .completionFromBatchConsumer()
> .eagerCheckCompletion()
> .to(file:/path/to/target){code}
> It works fine when the number of files in source folder is relatively small.
> After adding tracing logs to test size of input files / time taken by process, the following chart could be drawn. 
> !Screenshot 2019-04-08 18.41.10.png!
> That means, to make zip archive from 500mb of files takes over 12 minutes!
> Looks like in order to add a file, camel extracts zip archive to input stream, put file inside it, and build zip archive again. So that becomes near quadratic complexity, and not acceptable for large folders.
> The workaround is to add completionSize or completionPredicate to flush every 100mb, so we got all files archived but splitted into several archives, which works but not the best choice.
>  
> Is there a general solution how to make ZipAggregationStrategy to work in near linear time, so the process does not become slower with large number of files?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)