You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@crunch.apache.org by Jeremy Lewi <je...@lewi.us> on 2014/01/12 03:16:21 UTC

How to create in memory collection of iterable?

Hi Crunch Users,

Ho

Re: How to create in memory collection of iterable?

Posted by Gabriel Reid <ga...@gmail.com>.
Hi Jeremy,

On Sun, Jan 12, 2014 at 4:26 AM, Jeremy Lewi <je...@lewi.us> wrote:
> I ended up just creating a PTable<String, BowtieMapping> and then invoking
> groupByKey on the table.

Good to hear you resolved it. Although a little late, I can just
confirm that that's how I would create an in-memory PCollection of
Iterables (i.e. create a PTable and then group it by key). The
underlying reason that it's (currently) awkward to construct
PCollection of iterables is because the concept of Iterables in Crunch
isn't something that can be serialized to disk or read from disk, so
there's typically no need to be able to construct a PType for it.

FWIW, when I'm writing unit tests for DoFns I usually don't even
create an in-memory PCollection, but instead call the process method
with a mocked Emitter. The biggest issue with this approach is usually
getting the DoFn correctly initialized if it has some custom
initialization logic.

- Gabriel

>
>
> On Sat, Jan 11, 2014 at 6:20 PM, Jeremy Lewi <je...@lewi.us> wrote:
>>
>> Lets try again,
>>
>> How do I create an in memory collection of iterable avro specific types? I
>> can't seem to figure out how to create a PType for the iterable type.
>>
>> Here's what I'm trying:
>>     ArrayList<BowtieMapping> mappings = new ArrayList<BowtieMapping>();
>>     PCollection<Iterable<BowtieMapping>> example4 =
>>         MemPipeline.typedCollectionOf(
>>             Avros.collections(mappings.getClass()),
>>             mappings);
>>
>> In this case BowtieMapping is the class for my avro specific type.
>>
>> I'm trying to write a unit test for a DoFn.
>>
>> Thanks
>> J
>>
>>
>>
>> On Sat, Jan 11, 2014 at 6:16 PM, Jeremy Lewi <je...@lewi.us> wrote:
>>>
>>> Hi Crunch Users,
>>>
>>> Ho
>>
>>
>

Re: How to create in memory collection of iterable?

Posted by Jeremy Lewi <je...@lewi.us>.
I ended up just creating a PTable<String, BowtieMapping> and then invoking
groupByKey on the table.

J


On Sat, Jan 11, 2014 at 6:20 PM, Jeremy Lewi <je...@lewi.us> wrote:

> Lets try again,
>
> How do I create an in memory collection of iterable avro specific types? I
> can't seem to figure out how to create a PType for the iterable type.
>
> Here's what I'm trying:
>     ArrayList<BowtieMapping> mappings = new ArrayList<BowtieMapping>();
>     PCollection<Iterable<BowtieMapping>> example4 =
>         MemPipeline.typedCollectionOf(
>             Avros.collections(mappings.getClass()),
>             mappings);
>
> In this case BowtieMapping is the class for my avro specific type.
>
> I'm trying to write a unit test for a DoFn.
>
> Thanks
> J
>
>
>
> On Sat, Jan 11, 2014 at 6:16 PM, Jeremy Lewi <je...@lewi.us> wrote:
>
>> Hi Crunch Users,
>>
>> Ho
>>
>
>

Re: How to create in memory collection of iterable?

Posted by Jeremy Lewi <je...@lewi.us>.
Lets try again,

How do I create an in memory collection of iterable avro specific types? I
can't seem to figure out how to create a PType for the iterable type.

Here's what I'm trying:
    ArrayList<BowtieMapping> mappings = new ArrayList<BowtieMapping>();
    PCollection<Iterable<BowtieMapping>> example4 =
        MemPipeline.typedCollectionOf(
            Avros.collections(mappings.getClass()),
            mappings);

In this case BowtieMapping is the class for my avro specific type.

I'm trying to write a unit test for a DoFn.

Thanks
J



On Sat, Jan 11, 2014 at 6:16 PM, Jeremy Lewi <je...@lewi.us> wrote:

> Hi Crunch Users,
>
> Ho
>