You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@camel.apache.org by Shashank Gupta <Sh...@talentica.com> on 2017/11/13 11:17:03 UTC

Query regarding zip consumption using apache camel

Hi,

We am trying to unzip the zip and then reading the pdf and txt files inside that. We are using Camel Apache with Java in our project.
The problem is occurring when we are trying to copy the zip over remote desktop in windows, we are copying the zip directly into the hot folder(which camel is listening ).
Camel starts unzipping and reading the files even before windows completed copying the zip fully.
So sometimes we get "unexpected end of zlib input stream" error and sometimes "java.util.zip.ZipException: invalid stored block lengths".
Zip contains pdf files and associated txt file with each pdf, so sometimes pdf gets unzipped but txt file is still in the process of unzipping and camel starts reading the pdf file so sometimes we get error because of this.

We tried using :-

1.        readLock - changed :- It didn't work because we are not sure about the length and modification time of the zip.

2.        readLock - rename:- It didn't worked out  too.

Can anyone please help us on this issue.


Regards
Shashank Gupta


RE: Query regarding zip consumption using apache camel

Posted by Shashank Gupta <Sh...@talentica.com>.
Hi,

As per my earlier mail,

We tried using :-

 1.        readLock - changed :- It didn't work because we are not sure about the length and modification time of the zip.

 2.        readLock - rename:- It didn't worked out  too.

Is there any other thing we can try to fix this.

Regards
Shashank Gupta


-----Original Message-----
From: Siano, Stephan [mailto:stephan.siano@sap.com] 
Sent: 14 November 2017 13:14
To: users@camel.apache.org
Cc: Onder SEZGIN
Subject: RE: Query regarding zip consumption using apache camel

Hi,

You likely want to set some readLock parameter to your file consumer endpoints. The value of that parameter depends a bit on your needs readLock=changed is slow but reliable, readLock=fileLock is faster, but may not work reliably if you use a shared filesystem. There are other options, so you should probably read the documentation for the file endpoint.

Best regards
Stephan

-----Original Message-----
From: Shashank Gupta [mailto:Shashank.Gupta@talentica.com]
Sent: Montag, 13. November 2017 21:07
To: users@camel.apache.org
Cc: Onder SEZGIN <on...@gmail.com>
Subject: RE: Query regarding zip consumption using apache camel

Hi,

Thanks for the reply.

The problem is that there are 2 separate queue listening to the same folder one is for unzipping the file and another one is for reading the pdf's and txt's As soon as I start copying the zip file which is of 8-10 MB, camel starts unzipping the zip packet even before the completion and the other queue starts reading the pdf's.

Queue 1 is something like this:-

from("file://{{rootOutputDirectory}} /incoming?	include=.*.zip&move=.done&moveFailed=.error&consumer.delay={{document.intake.delay}}")
		  .split(new ZipSplitter()).streaming()
		  .to("file://{{rootOutputDirectory}}/incoming");'

Queue 2 is :-

<route id="document_management_file_processor">   
     <from uri="file://{{rootOutputDirectory}}/ incoming/?			  	include=.*\.pdf|.*\.xml&amp;sortBy=reverse:file:name&amp;move=.done&amp;moveFailed=.error&a	mp;consumer.delay={{document.intake.delay}}"/>
     <to uri="bean:documentIntakeFileProcessor"/>
  </route>

-----Original Message-----
From: Onder SEZGIN [mailto:ondersezgin@gmail.com]
Sent: Tuesday, November 14, 2017 12:39 AM
To: users@camel.apache.org
Subject: Re: Query regarding zip consumption using apache camel

I think your situation is a bit hard to guess unless you share your routing details.

In case, you directly consume files by trying to unmarshall them even if your file transfer is complete.

If you use file component, you may need to look into exclude option to avoid consuming incomplete transferred files.

Or you may need custom file processing strategy before unmarshalling files.

These are what i would suggest by just top of my head without knowing anymore details.

Cheers

On Mon, 13 Nov 2017 at 14:17, Shashank Gupta <Sh...@talentica.com>
wrote:

> Hi,
>
> We am trying to unzip the zip and then reading the pdf and txt files 
> inside that. We are using Camel Apache with Java in our project.
> The problem is occurring when we are trying to copy the zip over 
> remote desktop in windows, we are copying the zip directly into the 
> hot folder(which camel is listening ).
> Camel starts unzipping and reading the files even before windows 
> completed copying the zip fully.
> So sometimes we get "unexpected end of zlib input stream" error and 
> sometimes "java.util.zip.ZipException: invalid stored block lengths".
> Zip contains pdf files and associated txt file with each pdf, so 
> sometimes pdf gets unzipped but txt file is still in the process of 
> unzipping and camel starts reading the pdf file so sometimes we get error because of this.
>
> We tried using :-
>
> 1.        readLock - changed :- It didn't work because we are not sure
> about the length and modification time of the zip.
>
> 2.        readLock - rename:- It didn't worked out  too.
>
> Can anyone please help us on this issue.
>
>
> Regards
> Shashank Gupta
>
>

RE: Query regarding zip consumption using apache camel

Posted by "Siano, Stephan" <st...@sap.com>.
Hi,

You likely want to set some readLock parameter to your file consumer endpoints. The value of that parameter depends a bit on your needs readLock=changed is slow but reliable, readLock=fileLock is faster, but may not work reliably if you use a shared filesystem. There are other options, so you should probably read the documentation for the file endpoint.

Best regards
Stephan

-----Original Message-----
From: Shashank Gupta [mailto:Shashank.Gupta@talentica.com] 
Sent: Montag, 13. November 2017 21:07
To: users@camel.apache.org
Cc: Onder SEZGIN <on...@gmail.com>
Subject: RE: Query regarding zip consumption using apache camel

Hi,

Thanks for the reply.

The problem is that there are 2 separate queue listening to the same folder one is for unzipping the file and another one is for reading the pdf's and txt's As soon as I start copying the zip file which is of 8-10 MB, camel starts unzipping the zip packet even before the completion and the other queue starts reading the pdf's.

Queue 1 is something like this:-

from("file://{{rootOutputDirectory}} /incoming?	include=.*.zip&move=.done&moveFailed=.error&consumer.delay={{document.intake.delay}}")
		  .split(new ZipSplitter()).streaming()
		  .to("file://{{rootOutputDirectory}}/incoming");'

Queue 2 is :-

<route id="document_management_file_processor">   
     <from uri="file://{{rootOutputDirectory}}/ incoming/?			  	include=.*\.pdf|.*\.xml&amp;sortBy=reverse:file:name&amp;move=.done&amp;moveFailed=.error&a	mp;consumer.delay={{document.intake.delay}}"/>
     <to uri="bean:documentIntakeFileProcessor"/>
  </route>

-----Original Message-----
From: Onder SEZGIN [mailto:ondersezgin@gmail.com] 
Sent: Tuesday, November 14, 2017 12:39 AM
To: users@camel.apache.org
Subject: Re: Query regarding zip consumption using apache camel

I think your situation is a bit hard to guess unless you share your routing details.

In case, you directly consume files by trying to unmarshall them even if your file transfer is complete.

If you use file component, you may need to look into exclude option to avoid consuming incomplete transferred files.

Or you may need custom file processing strategy before unmarshalling files.

These are what i would suggest by just top of my head without knowing anymore details.

Cheers

On Mon, 13 Nov 2017 at 14:17, Shashank Gupta <Sh...@talentica.com>
wrote:

> Hi,
>
> We am trying to unzip the zip and then reading the pdf and txt files 
> inside that. We are using Camel Apache with Java in our project.
> The problem is occurring when we are trying to copy the zip over 
> remote desktop in windows, we are copying the zip directly into the 
> hot folder(which camel is listening ).
> Camel starts unzipping and reading the files even before windows 
> completed copying the zip fully.
> So sometimes we get "unexpected end of zlib input stream" error and 
> sometimes "java.util.zip.ZipException: invalid stored block lengths".
> Zip contains pdf files and associated txt file with each pdf, so 
> sometimes pdf gets unzipped but txt file is still in the process of 
> unzipping and camel starts reading the pdf file so sometimes we get error because of this.
>
> We tried using :-
>
> 1.        readLock - changed :- It didn't work because we are not sure
> about the length and modification time of the zip.
>
> 2.        readLock - rename:- It didn't worked out  too.
>
> Can anyone please help us on this issue.
>
>
> Regards
> Shashank Gupta
>
>

RE: Query regarding zip consumption using apache camel

Posted by Shashank Gupta <Sh...@talentica.com>.
Hi,

Thanks for the reply.

The problem is that there are 2 separate queue listening to the same folder one is for unzipping the file and another one is for reading the pdf's and txt's As soon as I start copying the zip file which is of 8-10 MB, camel starts unzipping the zip packet even before the completion and the other queue starts reading the pdf's.

Queue 1 is something like this:-

from("file://{{rootOutputDirectory}} /incoming?	include=.*.zip&move=.done&moveFailed=.error&consumer.delay={{document.intake.delay}}")
		  .split(new ZipSplitter()).streaming()
		  .to("file://{{rootOutputDirectory}}/incoming");'

Queue 2 is :-

<route id="document_management_file_processor">   
     <from uri="file://{{rootOutputDirectory}}/ incoming/?			  	include=.*\.pdf|.*\.xml&amp;sortBy=reverse:file:name&amp;move=.done&amp;moveFailed=.error&a	mp;consumer.delay={{document.intake.delay}}"/>
     <to uri="bean:documentIntakeFileProcessor"/>
  </route>

-----Original Message-----
From: Onder SEZGIN [mailto:ondersezgin@gmail.com] 
Sent: Tuesday, November 14, 2017 12:39 AM
To: users@camel.apache.org
Subject: Re: Query regarding zip consumption using apache camel

I think your situation is a bit hard to guess unless you share your routing details.

In case, you directly consume files by trying to unmarshall them even if your file transfer is complete.

If you use file component, you may need to look into exclude option to avoid consuming incomplete transferred files.

Or you may need custom file processing strategy before unmarshalling files.

These are what i would suggest by just top of my head without knowing anymore details.

Cheers

On Mon, 13 Nov 2017 at 14:17, Shashank Gupta <Sh...@talentica.com>
wrote:

> Hi,
>
> We am trying to unzip the zip and then reading the pdf and txt files 
> inside that. We are using Camel Apache with Java in our project.
> The problem is occurring when we are trying to copy the zip over 
> remote desktop in windows, we are copying the zip directly into the 
> hot folder(which camel is listening ).
> Camel starts unzipping and reading the files even before windows 
> completed copying the zip fully.
> So sometimes we get "unexpected end of zlib input stream" error and 
> sometimes "java.util.zip.ZipException: invalid stored block lengths".
> Zip contains pdf files and associated txt file with each pdf, so 
> sometimes pdf gets unzipped but txt file is still in the process of 
> unzipping and camel starts reading the pdf file so sometimes we get error because of this.
>
> We tried using :-
>
> 1.        readLock - changed :- It didn't work because we are not sure
> about the length and modification time of the zip.
>
> 2.        readLock - rename:- It didn't worked out  too.
>
> Can anyone please help us on this issue.
>
>
> Regards
> Shashank Gupta
>
>

Re: Query regarding zip consumption using apache camel

Posted by Onder SEZGIN <on...@gmail.com>.
I think your situation is a bit hard to guess unless you share your routing
details.

In case, you directly consume files by trying to unmarshall them even if
your file transfer is complete.

If you use file component, you may need to look into exclude option to
avoid consuming incomplete transferred files.

Or you may need custom file processing strategy before unmarshalling files.

These are what i would suggest by just top of my head without knowing
anymore details.

Cheers

On Mon, 13 Nov 2017 at 14:17, Shashank Gupta <Sh...@talentica.com>
wrote:

> Hi,
>
> We am trying to unzip the zip and then reading the pdf and txt files
> inside that. We are using Camel Apache with Java in our project.
> The problem is occurring when we are trying to copy the zip over remote
> desktop in windows, we are copying the zip directly into the hot
> folder(which camel is listening ).
> Camel starts unzipping and reading the files even before windows completed
> copying the zip fully.
> So sometimes we get "unexpected end of zlib input stream" error and
> sometimes "java.util.zip.ZipException: invalid stored block lengths".
> Zip contains pdf files and associated txt file with each pdf, so sometimes
> pdf gets unzipped but txt file is still in the process of unzipping and
> camel starts reading the pdf file so sometimes we get error because of this.
>
> We tried using :-
>
> 1.        readLock - changed :- It didn't work because we are not sure
> about the length and modification time of the zip.
>
> 2.        readLock - rename:- It didn't worked out  too.
>
> Can anyone please help us on this issue.
>
>
> Regards
> Shashank Gupta
>
>

Re: Query regarding zip consumption using apache camel

Posted by Claus Ibsen <cl...@gmail.com>.
Hi

Either tweak the read lock change options to have higher threshold -
see the docs.
Or when you copy the file to that computer, then copy it to some other
directory / or use some other kind of name.
And then afterwards move / rename the file - as this would then do it
on the same computer/file-system and allow it to be a fast atomic
operation. And allow Camel to only start process the file when its
fully moved/renamed.
Or dont let Camel run that route at this time, and then start the
route afterwards you are done copy the file.



On Mon, Nov 13, 2017 at 12:17 PM, Shashank Gupta
<Sh...@talentica.com> wrote:
> Hi,
>
> We am trying to unzip the zip and then reading the pdf and txt files inside that. We are using Camel Apache with Java in our project.
> The problem is occurring when we are trying to copy the zip over remote desktop in windows, we are copying the zip directly into the hot folder(which camel is listening ).
> Camel starts unzipping and reading the files even before windows completed copying the zip fully.
> So sometimes we get "unexpected end of zlib input stream" error and sometimes "java.util.zip.ZipException: invalid stored block lengths".
> Zip contains pdf files and associated txt file with each pdf, so sometimes pdf gets unzipped but txt file is still in the process of unzipping and camel starts reading the pdf file so sometimes we get error because of this.
>
> We tried using :-
>
> 1.        readLock - changed :- It didn't work because we are not sure about the length and modification time of the zip.
>
> 2.        readLock - rename:- It didn't worked out  too.
>
> Can anyone please help us on this issue.
>
>
> Regards
> Shashank Gupta
>



-- 
Claus Ibsen
-----------------
http://davsclaus.com @davsclaus
Camel in Action 2: https://www.manning.com/ibsen2