You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@camel.apache.org by Harsh Sharma <ha...@globant.com> on 2021/04/14 10:14:05 UTC

Review - Large File Use Case

Hello Team,

Recently we have developed one file processing solution using Camel and
wanted to get some review comments or any better alternative on the current
implementation if any.

Trying to add in detail to explain but if any queries would try to clarify
again.

So the the requirement we had -

1) Our input file is around 1.5 GB arranged in message blocks which we
would be reading from AWS -S3 bucket

2) After reading the file we need to process the data as per some business
rules and create a pipe separated output file which need to upload in
destination s3 bucket.

# Our Current Solution

In order to achieve that we have used 2 routes as mentioned below -

a) Route -1

Route Definition -

            from(timer("startTimer").repeatCount(1))

           .noStreamCaching()

           .process(e -> {

                     e.getIn().setBody(createRange());

           })

           .to(direct("start"))

           .end();

   1.

   The reason for writing this route is that as it is a large file we had
   to use the S3 range object to get selective bytes from the file for
   processing.
   2.

   Here we are getting file size by sending additional HeadObjectRequest to
   s3.
   3.

   Then In the body we are setting the byte range list like - let's say if
   the file size is 819200 bytes then the range array list contains 4 objects-
   (0, 2047999) (2048000 - 4095999).. till end of bytes.
   4.

   Then sending that list of ranges to the route -2 direct endpoint for
   further processing and to fetch the real data from s3 based on the range
   that we created.
   5.

   As we had to start this route automatically, we used a timer component
   which automatically starts and can call route-2 directly.
   6.

   We tried to get rid of this additional init route with the help of
   ProducerTemplate option but we are getting exception "Caused by:
   java.util.concurrent.RejectedExecutionException: CamelContext is stopped"
   if we used outside the camel Processor.


b) Route -2

   1.

   For reading the data we have used the camel getObjectRange option in
   aws-s3 component and got the ResponseInputStream in the exchange.
   2.

   Inside the fileProcessor we process the data in parallel by using the
   executor framework.
   3.

   Once the data has been processed we need to marshal it.
   4.

   After that in order to upload it back to s3 we used the multi part
   option of aws-s3 component. But as it needs the whole file before upload
   starts hence we need to create the file locally.
   5.

   Finally once all the ranges are processed then inside the postProcessor
   we pass that file object to the exchange body and send it to s3 using
   multipart.


Route Definition -

from(direct("start"))

 .noStreamCaching()

 .onCompletion()

 .process(postProcessor)

 .to("aws2-s3://test-bucket?s3Client=#client&multiPartUpload=true&partSize=10485760")

 .end()

 .split(body())

 .streaming()

 .process(exchange -> {

 ItemDto item = (ItemDto) exchange.getIn().getBody();

 exchange.getIn().setHeader(AWS2S3Constants.RANGE_START, item.getFrom());

 exchange.getIn().setHeader(AWS2S3Constants.RANGE_END, item.getTo());

 exchange.getIn().setHeader(BLOCK_SEQUENCE, item.getBlockSeq());

 exchange.getIn().setHeader(AWS2S3Constants.KEY, config.getFileName());

 })
.to("aws2-s3://test-bucket?s3Client=#client&repeatCount=1&deleteAfterRead=false&fileName=testfile.dat&operation=getObjectRange")

 .process(FileProcessor)

 .marshal(bindy)

 .to(file(tempFilePath).fileExist("Append").fileName(TEMP_FILE_NAME))

 .end();

The above code is working fine as expected so far but request you to kindly
review the above route definitions and let us know any
suggestions/improvements we can try?

Thanks in advance.

-- 

*Thanks and Regards*,

*Harsh Sharma* | Sr. Software Engineer
Mobile : +91 *7378821400*

-- 


The information contained in this e-mail may be confidential. It has been 
sent for the sole use of the intended recipient(s). If the reader of this 
message is not an intended recipient, you are hereby notified that any 
unauthorized review, use, disclosure, dissemination, distribution or 
copying of this communication, or any of its contents, is strictly 
prohibited. If you have received it by mistake please let us know by e-mail 
immediately and delete it from your system. Many thanks.

 

La información 
contenida en este mensaje puede ser confidencial. Ha sido enviada para el 
uso exclusivo del destinatario(s) previsto. Si el lector de este mensaje no 
fuera el destinatario previsto, por el presente queda Ud. notificado que 
cualquier lectura, uso, publicación, diseminación, distribución o copiado 
de esta comunicación o su contenido está estrictamente prohibido. En caso 
de que Ud. hubiera recibido este mensaje por error le agradeceremos 
notificarnos por e-mail inmediatamente y eliminarlo de su sistema. Muchas 
gracias.