You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@crunch.apache.org by Samik Raychaudhuri <sa...@gmail.com> on 2014/07/10 11:12:02 UTC

Reading multiple avro files in a single statement

Hi,

I am a Crunch newbie trying out few things. I have a quick question 
inspired by a pig syntax. The following glob-like syntax works in pig 
for loading multiple avro files:

A = LOAD '/raw/idm/events/year=2014/month=04/day=07/*/*/*.avro' using 
LOAD_IDM;

I am wondering if there is something similar in Crunch API that would do 
this.

Regards.

Re: Reading multiple avro files in a single statement

Posted by Samik Raychaudhuri <sa...@gmail.com>.
Hi Josh,
Thanks - that worked. Did not try Som's method, but that would probably 
have worked as well.
Best.

On 10/07/2014 9:01 PM, Josh Wills wrote:
> Hey Samik,
>
> Glob syntax should work in Crunch as well:
>
> Pipeline p = …;
> PCollection<MyAvroRecords> = 
> p.read(From.avroFile('/raw/idm/events/year=2014/month=04/day=07/*/*/*.avro', 
> Avros.specifics(MyAvroRecords.class)));
>
> J
>
>
> On Thu, Jul 10, 2014 at 8:18 AM, Som Satpathy <somsatpathy@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     Hi Samik,
>
>     You can create an AvroFileSource using org.apache.crunch.io.avro's
>     AvroFileSource(List<Path> paths, AvroType<T> ptype) API, then read
>     source in the pipeline.
>
>     Hope this helps.
>
>     Thanks,
>     Som
>
>
>     On Thu, Jul 10, 2014 at 2:12 AM, Samik Raychaudhuri
>     <samikr@gmail.com <ma...@gmail.com>> wrote:
>
>         Hi,
>
>         I am a Crunch newbie trying out few things. I have a quick
>         question inspired by a pig syntax. The following glob-like
>         syntax works in pig for loading multiple avro files:
>
>         A = LOAD
>         '/raw/idm/events/year=2014/month=04/day=07/*/*/*.avro' using
>         LOAD_IDM;
>
>         I am wondering if there is something similar in Crunch API
>         that would do this.
>
>         Regards.
>
>
>


Re: Reading multiple avro files in a single statement

Posted by Josh Wills <jo...@gmail.com>.
Hey Samik,

Glob syntax should work in Crunch as well:

Pipeline p = …;
PCollection<MyAvroRecords> = p.read(From.avroFile(
'/raw/idm/events/year=2014/month=04/day=07/*/*/*.avro',
Avros.specifics(MyAvroRecords.class)));

J


On Thu, Jul 10, 2014 at 8:18 AM, Som Satpathy <so...@gmail.com> wrote:

> Hi Samik,
>
> You can create an AvroFileSource using org.apache.crunch.io.avro's
> AvroFileSource(List<Path> paths, AvroType<T> ptype) API, then read source
> in the pipeline.
>
> Hope this helps.
>
> Thanks,
> Som
>
>
> On Thu, Jul 10, 2014 at 2:12 AM, Samik Raychaudhuri <sa...@gmail.com>
> wrote:
>
>>  Hi,
>>
>> I am a Crunch newbie trying out few things. I have a quick question
>> inspired by a pig syntax. The following glob-like syntax works in pig for
>> loading multiple avro files:
>>
>> A = LOAD '/raw/idm/events/year=2014/month=04/day=07/*/*/*.avro' using
>> LOAD_IDM;
>>
>> I am wondering if there is something similar in Crunch API that would do
>> this.
>>
>> Regards.
>>
>
>

Re: Reading multiple avro files in a single statement

Posted by Som Satpathy <so...@gmail.com>.
Hi Samik,

You can create an AvroFileSource using org.apache.crunch.io.avro's
AvroFileSource(List<Path> paths, AvroType<T> ptype) API, then read source
in the pipeline.

Hope this helps.

Thanks,
Som


On Thu, Jul 10, 2014 at 2:12 AM, Samik Raychaudhuri <sa...@gmail.com>
wrote:

>  Hi,
>
> I am a Crunch newbie trying out few things. I have a quick question
> inspired by a pig syntax. The following glob-like syntax works in pig for
> loading multiple avro files:
>
> A = LOAD '/raw/idm/events/year=2014/month=04/day=07/*/*/*.avro' using
> LOAD_IDM;
>
> I am wondering if there is something similar in Crunch API that would do
> this.
>
> Regards.
>