You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@camel.apache.org by "Hubertus.Willuhn" <hu...@dinsoftware.de> on 2015/10/01 13:13:08 UTC

Stream Cache and ZIP-Archives

Hi,

i am new to this forum....

I have a problem with the stream cache of Camel in conjunction with ZIP
files:

The starting point is a large XML file (> 400MB). This file is split and
processed in a Camel route into smaller units. The result of this first
part-route is a list of file objects represent the references to the actual
ZIP archives. The ZIPs are on a Web server and get downloaded in another
Camel Route (Seda) over HTTP.

So far so good. Now the ZIP must be unpacked, to be later processed. Each
zip file contains a different number of smaller PNG files and XML (up to 200
Files).

For performance reasons, I use Stream caching and multithreading. There are
more than 55,000 files in total.

The problem is, that the stream cache uses more and more Files (>3000) until
Java throws an exception:



It seems as if the Java process keeps too many file pointer open.

My question is, is there a way to clear the cache or close the streams after
processing all the splitted parts of one ZIP.

The Route which downloads the files looks like:



My Route for unzip looks like:

from("seda:unzip?size=1&blockWhenFull=true")

				// unzip
				.unmarshal(zipFile)
				.split(body(Iterator.class)).stopOnException().streaming()

				// enrich
				.process(uzprocessor)

				// save
				.inOnly("seda:attachment")
				.end();

And the last route saves the files to database (nosql storage system)

		from("seda:attachment?concurrentConsumers=30&size=30&blockWhenFull=true")

				.process(processor)

				.end().stop();

Thx for Helping.

Greetings from Germany!




--
View this message in context: http://camel.465427.n5.nabble.com/Stream-Cache-and-ZIP-Archives-tp5772148.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Stream Cache and ZIP-Archives

Posted by Franz Paul Forsthofer <em...@googlemail.com>.
Hi Hubertus,

the Stream Caches and their corresponding files are closed/deleted at
the end of the route, because it could be that some processor in the
route still needs the stream. Maybe it helps in your case to increase
the threshold so that only the first big file 400 MB is written into
the file system and that the splitted data are not written into the
file system. See http://camel.apache.org/stream-caching.html option
spoolThreshold

Best Regards Franz

On Thu, Oct 1, 2015 at 1:13 PM, Hubertus.Willuhn
<hu...@dinsoftware.de> wrote:
> Hi,
>
> i am new to this forum....
>
> I have a problem with the stream cache of Camel in conjunction with ZIP
> files:
>
> The starting point is a large XML file (> 400MB). This file is split and
> processed in a Camel route into smaller units. The result of this first
> part-route is a list of file objects represent the references to the actual
> ZIP archives. The ZIPs are on a Web server and get downloaded in another
> Camel Route (Seda) over HTTP.
>
> So far so good. Now the ZIP must be unpacked, to be later processed. Each
> zip file contains a different number of smaller PNG files and XML (up to 200
> Files).
>
> For performance reasons, I use Stream caching and multithreading. There are
> more than 55,000 files in total.
>
> The problem is, that the stream cache uses more and more Files (>3000) until
> Java throws an exception:
>
>
>
> It seems as if the Java process keeps too many file pointer open.
>
> My question is, is there a way to clear the cache or close the streams after
> processing all the splitted parts of one ZIP.
>
> The Route which downloads the files looks like:
>
>
>
> My Route for unzip looks like:
>
> from("seda:unzip?size=1&blockWhenFull=true")
>
>                                 // unzip
>                                 .unmarshal(zipFile)
>                                 .split(body(Iterator.class)).stopOnException().streaming()
>
>                                 // enrich
>                                 .process(uzprocessor)
>
>                                 // save
>                                 .inOnly("seda:attachment")
>                                 .end();
>
> And the last route saves the files to database (nosql storage system)
>
>                 from("seda:attachment?concurrentConsumers=30&size=30&blockWhenFull=true")
>
>                                 .process(processor)
>
>                                 .end().stop();
>
> Thx for Helping.
>
> Greetings from Germany!
>
>
>
>
> --
> View this message in context: http://camel.465427.n5.nabble.com/Stream-Cache-and-ZIP-Archives-tp5772148.html
> Sent from the Camel - Users mailing list archive at Nabble.com.