You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Sid Ray <si...@fractalsciences.com> on 2014/09/03 22:26:17 UTC

Using Flume to process data

Can you guys please let me know if the following scenario is supported:
I have a system in which there are Tomcat machines which have small JSON
files of 2K size each. The goal is to take those files, convert them to CSV
format and upload them to S3. Then from S3 they are loaded in parallel to
Redshift.

My idea of the architecture was that:

TomcatServer1   --------------
                                       |
TomcatServer2   --------------> Flume---->S3


Is it possbile in Flume we can do the conversion from the JSON file to CSV
files. The idea is that we need to take the contents of the JSON file, do
some database lookup, fetch the id and then create the CSV file out of
that. Is it possible to do this processing in Flume.

Also, what will the HA architecture of Flume look like. Any links etc.

Thanks,
Sid

Re: Using Flume to process data

Posted by Joey Echeverria <jo...@cloudera.com>.
You should be able to accomplish this with the Morplhines
intercepter[1]. It will let you build a configuration file that
converts from JSON to CSV. There's a similar example, though the
target is Avro rather than JSON, in the Kite project[2]. The full docs
for Morphlines will also be helpful[3].

-Joey

[1] http://flume.apache.org/FlumeUserGuide.html#morphline-interceptor
[2] https://github.com/kite-sdk/kite-examples/tree/master/json
[3] http://kitesdk.org/docs/current/kite-morphlines/index.html

On Wed, Sep 3, 2014 at 4:26 PM, Sid Ray <si...@fractalsciences.com> wrote:
> Can you guys please let me know if the following scenario is supported:
> I have a system in which there are Tomcat machines which have small JSON
> files of 2K size each. The goal is to take those files, convert them to CSV
> format and upload them to S3. Then from S3 they are loaded in parallel to
> Redshift.
>
> My idea of the architecture was that:
>
> TomcatServer1   --------------
>                                        |
> TomcatServer2   --------------> Flume---->S3
>
>
> Is it possbile in Flume we can do the conversion from the JSON file to CSV
> files. The idea is that we need to take the contents of the JSON file, do
> some database lookup, fetch the id and then create the CSV file out of that.
> Is it possible to do this processing in Flume.
>
> Also, what will the HA architecture of Flume look like. Any links etc.
>
> Thanks,
> Sid



-- 
Joey Echeverria