You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Pradeep Srinivasa <pr...@hcl.com> on 2014/09/04 05:47:27 UTC

Disgesting a CSV file as a whole file using spooldir source

Hi All,

We are trying to read CSV files that are generated in a folder with a spooldir source and store them in HDFS (HDFS Sink) using a memory channel.
Currently, when we try to read multiple CSV files it reads them line by line and stores in a single file in HDFS. We cannot distinguish between these CSV files in HDFS.

Is there any way that we can read a CSV file and store them as is using Flume?

Thanks & Regards,
Pradeep Srinivasa.


::DISCLAIMER::
----------------------------------------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted, corrupted,
lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e mail and its contents
(with or without referred errors) shall therefore not attach any liability on the originator or HCL or its affiliates.
Views or opinions, if any, presented in this email are solely those of the author and may not necessarily reflect the
views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of this message without the prior written consent of authorized representative of
HCL is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately.
Before opening any email and/or attachments, please check them for viruses and other defects.

----------------------------------------------------------------------------------------------------------------------------------------------------

Re: Disgesting a CSV file as a whole file using spooldir source

Posted by Santiago Mola <sm...@stratio.com>.
Hi Pradeep,


2014-09-04 5:47 GMT+02:00 Pradeep Srinivasa <pr...@hcl.com>:

>
>
> Is there any way that we can read a CSV file and store them as is using
> Flume?
>
>
>

You should change the serializer used by spooldir  (see "deserializer" in
the spooldir doc). [1]
BlobSerializer will do what you want, reading the whole file into the event
body. There is a maximum length
per blob that you can tune with "deserializer.maxBlobLength". [2]

[1] https://flume.apache.org/FlumeUserGuide.html#spooling-directory-source
[2] https://flume.apache.org/FlumeUserGuide.html#blobdeserializer

Best,
-- 
Santiago M. Mola
smola@stratio.com