You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by Chaitanya Chebolu <ch...@datatorrent.com> on 2017/02/13 10:44:33 UTC

Redshift Output Operator

Hi All,

  I am proposing Amazon Redshift output module.
  Please refer below link about the Redshift:
https://aws.amazon.com/redshift/

  Primary functionality of this module is load data into Redshift tables
from data files using copy command. Refer the below link about the copy
command:
http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html

Input type to this module is byte[].

  I am proposing the below design:
1) Write the tuples into EMR/S3. By default, it writes to S3.
2) Once the file is rolled, upload the file into Redshift using copy
command.

Please share your thoughts on design.

Regards,
Chaitanya

Re: Redshift Output Operator

Posted by Amol Kekre <am...@datatorrent.com>.
Chaitanya,
This is good first cut. Post this work, do take a look at loading data
before file rotation.

Thks
Amol


*Follow @amolhkekre*
*Join us at Apex Big Data World-San Jose
<http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
[image: http://www.apexbigdata.com/san-jose-register.html]
<http://www.apexbigdata.com/san-jose-register.html>

On Mon, Feb 20, 2017 at 10:56 PM, Chaitanya Chebolu <
chaitanya@datatorrent.com> wrote:

> Created JIRA for this task: APEXMALHAR-2416
>
> On Mon, Feb 13, 2017 at 4:14 PM, Chaitanya Chebolu <
> chaitanya@datatorrent.com> wrote:
>
> > Hi All,
> >
> >   I am proposing Amazon Redshift output module.
> >   Please refer below link about the Redshift: https://aws.amazon.com/
> > redshift/
> >
> >   Primary functionality of this module is load data into Redshift tables
> > from data files using copy command. Refer the below link about the copy
> > command:
> > http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html
> >
> > Input type to this module is byte[].
> >
> >   I am proposing the below design:
> > 1) Write the tuples into EMR/S3. By default, it writes to S3.
> > 2) Once the file is rolled, upload the file into Redshift using copy
> > command.
> >
> > Please share your thoughts on design.
> >
> > Regards,
> > Chaitanya
> >
>
>
>
> --
>
> *Chaitanya*
>
> Software Engineer
>
> E: chaitanya@datatorrent.com | Twitter: @chaithu1403
>
> www.datatorrent.com  |  apex.apache.org
>

Re: Redshift Output Operator

Posted by Chaitanya Chebolu <ch...@datatorrent.com>.
Created JIRA for this task: APEXMALHAR-2416

On Mon, Feb 13, 2017 at 4:14 PM, Chaitanya Chebolu <
chaitanya@datatorrent.com> wrote:

> Hi All,
>
>   I am proposing Amazon Redshift output module.
>   Please refer below link about the Redshift: https://aws.amazon.com/
> redshift/
>
>   Primary functionality of this module is load data into Redshift tables
> from data files using copy command. Refer the below link about the copy
> command:
> http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html
>
> Input type to this module is byte[].
>
>   I am proposing the below design:
> 1) Write the tuples into EMR/S3. By default, it writes to S3.
> 2) Once the file is rolled, upload the file into Redshift using copy
> command.
>
> Please share your thoughts on design.
>
> Regards,
> Chaitanya
>



-- 

*Chaitanya*

Software Engineer

E: chaitanya@datatorrent.com | Twitter: @chaithu1403

www.datatorrent.com  |  apex.apache.org