You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Chaitanya Sharma <go...@gmail.com> on 2011/04/19 21:44:55 UTC

Pig Lzo Workflow.

Hi,

I recently for Pig to work with Lzo compression, with pig loaders from
Elephant Bird.

But, from my understanding my work flow is turning out to be:
Step 1 :  lzo-compress the raw input file.
Step 2 :  put the compressed.lzo file to hdfs.
Step 3 :  execute pig jobs with loaders from elephant-bird.

Now, this looks to be an all manual workflow; needs a lot baby sitting.

Please correct me if i'm wrong, but what I am wondering about is, if EB or
Hadoop-Lzo could automate Step #1, Step #2 and would not need manual
intervention?


Thanks,
Chaitanya

Re: Pig Lzo Workflow.

Posted by Gerrit Jansen van Vuuren <ge...@googlemail.com>.

No streams do not do indexes automatically. It does have the ability to
chunk files near to block size before writing to hadoop, doing this does not
require indexing.

Indexing is a separate process that you'll need to run.

Cheers,
 Gerrit

On Tue, Apr 19, 2011 at 11:05 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> Scribe can also write lzo-compressed output.
>
> The indexing step still needs to be taken (Gerrit, does your bigstreams
> write out indexes automatically?).
>
> So our workflow is more like:
>
> 1) Scribe to hdfs with lzo compression
> 2) index
> 3) run pig queries over data with EB loaders.
>
> On Tue, Apr 19, 2011 at 12:48 PM, Gerrit Jansen van Vuuren <
> gerritjvv@googlemail.com> wrote:
>
> > Hi,
> >
> > Have a look at http://code.google.com/p/bigstreams/ and
> > http://code.google.com/p/hadoop-gpl-packing/.
> > If you configure bigstreams to use lzo, it will collect your log files
> from
> > servers and write it out plus load it to hadoop in lzo format.
> >
> > Cheers,
> >  Gerrit
> >
> > On Tue, Apr 19, 2011 at 9:44 PM, Chaitanya Sharma <gopi.daiict@gmail.com
> > >wrote:
> >
> > > Hi,
> > >
> > > I recently for Pig to work with Lzo compression, with pig loaders from
> > > Elephant Bird.
> > >
> > > But, from my understanding my work flow is turning out to be:
> > > Step 1 :  lzo-compress the raw input file.
> > > Step 2 :  put the compressed.lzo file to hdfs.
> > > Step 3 :  execute pig jobs with loaders from elephant-bird.
> > >
> > > Now, this looks to be an all manual workflow; needs a lot baby sitting.
> > >
> > > Please correct me if i'm wrong, but what I am wondering about is, if EB
> > or
> > > Hadoop-Lzo could automate Step #1, Step #2 and would not need manual
> > > intervention?
> > >
> > >
> > > Thanks,
> > > Chaitanya
> > >
> >
>

Re: Pig Lzo Workflow.

Posted by Dmitriy Ryaboy <dv...@gmail.com>.

Scribe can also write lzo-compressed output.

The indexing step still needs to be taken (Gerrit, does your bigstreams
write out indexes automatically?).

So our workflow is more like:

1) Scribe to hdfs with lzo compression
2) index
3) run pig queries over data with EB loaders.

On Tue, Apr 19, 2011 at 12:48 PM, Gerrit Jansen van Vuuren <
gerritjvv@googlemail.com> wrote:

> Hi,
>
> Have a look at http://code.google.com/p/bigstreams/ and
> http://code.google.com/p/hadoop-gpl-packing/.
> If you configure bigstreams to use lzo, it will collect your log files from
> servers and write it out plus load it to hadoop in lzo format.
>
> Cheers,
>  Gerrit
>
> On Tue, Apr 19, 2011 at 9:44 PM, Chaitanya Sharma <gopi.daiict@gmail.com
> >wrote:
>
> > Hi,
> >
> > I recently for Pig to work with Lzo compression, with pig loaders from
> > Elephant Bird.
> >
> > But, from my understanding my work flow is turning out to be:
> > Step 1 :  lzo-compress the raw input file.
> > Step 2 :  put the compressed.lzo file to hdfs.
> > Step 3 :  execute pig jobs with loaders from elephant-bird.
> >
> > Now, this looks to be an all manual workflow; needs a lot baby sitting.
> >
> > Please correct me if i'm wrong, but what I am wondering about is, if EB
> or
> > Hadoop-Lzo could automate Step #1, Step #2 and would not need manual
> > intervention?
> >
> >
> > Thanks,
> > Chaitanya
> >
>

Re: Pig Lzo Workflow.

Posted by Gerrit Jansen van Vuuren <ge...@googlemail.com>.

Hi,

Have a look at http://code.google.com/p/bigstreams/ and
http://code.google.com/p/hadoop-gpl-packing/.
If you configure bigstreams to use lzo, it will collect your log files from
servers and write it out plus load it to hadoop in lzo format.

Cheers,
 Gerrit

On Tue, Apr 19, 2011 at 9:44 PM, Chaitanya Sharma <go...@gmail.com>wrote:

> Hi,
>
> I recently for Pig to work with Lzo compression, with pig loaders from
> Elephant Bird.
>
> But, from my understanding my work flow is turning out to be:
> Step 1 :  lzo-compress the raw input file.
> Step 2 :  put the compressed.lzo file to hdfs.
> Step 3 :  execute pig jobs with loaders from elephant-bird.
>
> Now, this looks to be an all manual workflow; needs a lot baby sitting.
>
> Please correct me if i'm wrong, but what I am wondering about is, if EB or
> Hadoop-Lzo could automate Step #1, Step #2 and would not need manual
> intervention?
>
>
> Thanks,
> Chaitanya
>