You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Ameet Kini <am...@gmail.com> on 2014/02/21 20:16:10 UTC

why is NextIterator private?

I'm looking to subclass HadoopRDD and was hoping to subclass NextIterator
in compute().

Thanks,
Ameet

Re: why is NextIterator private?

Posted by Ameet Kini <am...@gmail.com>.
I'm looking at NewHadoopRDD and looks like there is no equivalent of
NextIterator there. I'm tempted to go with extending that.

Interestingly, in NewHadoopRDD, there is no concept of caching the
configuration if one has not been broadcasted, whereas in HadoopRDD, there
is a cache. I'm now tempted to go with extending NewHadoopRDD, and would
appreciate some insight from the developers on what was the thinking behind
keeping that out.

Thanks,
Ameet


On Fri, Feb 21, 2014 at 2:26 PM, Ameet Kini <am...@gmail.com> wrote:

> On that same token, HadoopPartition.
>
> Yes, I can copy them verbatim and change them as they are small enough.
> But sub-classing would have been nicer.
>
> Ameet
>
>
> On Fri, Feb 21, 2014 at 2:16 PM, Ameet Kini <am...@gmail.com> wrote:
>
>>
>> I'm looking to subclass HadoopRDD and was hoping to subclass NextIterator
>> in compute().
>>
>> Thanks,
>> Ameet
>>
>
>

Re: why is NextIterator private?

Posted by Ameet Kini <am...@gmail.com>.
On that same token, HadoopPartition.

Yes, I can copy them verbatim and change them as they are small enough. But
sub-classing would have been nicer.

Ameet


On Fri, Feb 21, 2014 at 2:16 PM, Ameet Kini <am...@gmail.com> wrote:

>
> I'm looking to subclass HadoopRDD and was hoping to subclass NextIterator
> in compute().
>
> Thanks,
> Ameet
>

Re: why is NextIterator private?

Posted by Ameet Kini <am...@gmail.com>.
Right, I'm clearly facing headwinds. getPartitions returns
Array[Partition], so sub-classing HadoopPartition wouldn't help. Maybe I'm
better off just having a custom InputFormat. I'll explore that option some
more. Thanks for your input.

Ameet


On Fri, Feb 21, 2014 at 3:24 PM, Jey Kottalam <je...@cs.berkeley.edu> wrote:

> What's the motivation for having your own subclass of HadoopPartition?
> As far as I know, that's not a supported use case either.
>
> On Fri, Feb 21, 2014 at 11:54 AM, Ameet Kini <am...@gmail.com> wrote:
> > The use case is to control the partitions as they come out of the
> HadoopRDD.
> > 1. Have my own HadoopPartition that has fields specific to my
> application.
> > These fields would then be used by other RDD operations (also overridden
> by
> > me). This is why I was looking to extend HadoopPartition.
> > 2. Have my own getPartitions which has slightly different partitioning
> > logic. This can almost be solved by subclassing InputFormat and its
> > getSplits method, but I still need to have getPartitions create
> > MyHadoopPartition instead of HadoopPartition.
> >
> > Ameet
> >
> >
> > On Fri, Feb 21, 2014 at 2:37 PM, Jey Kottalam <je...@cs.berkeley.edu>
> wrote:
> >>
> >> What's the motivation for subclassing HadoopRDD? I don't believe
> >> that's a supported use case. Is it not possible to do what you need
> >> with a Hadoop InputFormat?
> >>
> >> On Fri, Feb 21, 2014 at 11:16 AM, Ameet Kini <am...@gmail.com>
> wrote:
> >> > I'm looking to subclass HadoopRDD and was hoping to subclass
> >> > NextIterator
> >> > in compute().
> >> >
> >> > Thanks,
> >> > Ameet
> >
> >
>

Re: why is NextIterator private?

Posted by Jey Kottalam <je...@cs.berkeley.edu>.
What's the motivation for having your own subclass of HadoopPartition?
As far as I know, that's not a supported use case either.

On Fri, Feb 21, 2014 at 11:54 AM, Ameet Kini <am...@gmail.com> wrote:
> The use case is to control the partitions as they come out of the HadoopRDD.
> 1. Have my own HadoopPartition that has fields specific to my application.
> These fields would then be used by other RDD operations (also overridden by
> me). This is why I was looking to extend HadoopPartition.
> 2. Have my own getPartitions which has slightly different partitioning
> logic. This can almost be solved by subclassing InputFormat and its
> getSplits method, but I still need to have getPartitions create
> MyHadoopPartition instead of HadoopPartition.
>
> Ameet
>
>
> On Fri, Feb 21, 2014 at 2:37 PM, Jey Kottalam <je...@cs.berkeley.edu> wrote:
>>
>> What's the motivation for subclassing HadoopRDD? I don't believe
>> that's a supported use case. Is it not possible to do what you need
>> with a Hadoop InputFormat?
>>
>> On Fri, Feb 21, 2014 at 11:16 AM, Ameet Kini <am...@gmail.com> wrote:
>> > I'm looking to subclass HadoopRDD and was hoping to subclass
>> > NextIterator
>> > in compute().
>> >
>> > Thanks,
>> > Ameet
>
>

Re: why is NextIterator private?

Posted by Ameet Kini <am...@gmail.com>.
The use case is to control the partitions as they come out of the
HadoopRDD.
1. Have my own HadoopPartition that has fields specific to my application.
These fields would then be used by other RDD operations (also overridden by
me). This is why I was looking to extend HadoopPartition.
2. Have my own getPartitions which has slightly different partitioning
logic. This can almost be solved by subclassing InputFormat and its
getSplits method, but I still need to have getPartitions create
MyHadoopPartition instead of HadoopPartition.

Ameet


On Fri, Feb 21, 2014 at 2:37 PM, Jey Kottalam <je...@cs.berkeley.edu> wrote:

> What's the motivation for subclassing HadoopRDD? I don't believe
> that's a supported use case. Is it not possible to do what you need
> with a Hadoop InputFormat?
>
> On Fri, Feb 21, 2014 at 11:16 AM, Ameet Kini <am...@gmail.com> wrote:
> > I'm looking to subclass HadoopRDD and was hoping to subclass NextIterator
> > in compute().
> >
> > Thanks,
> > Ameet
>

Re: why is NextIterator private?

Posted by Jey Kottalam <je...@cs.berkeley.edu>.
What's the motivation for subclassing HadoopRDD? I don't believe
that's a supported use case. Is it not possible to do what you need
with a Hadoop InputFormat?

On Fri, Feb 21, 2014 at 11:16 AM, Ameet Kini <am...@gmail.com> wrote:
> I'm looking to subclass HadoopRDD and was hoping to subclass NextIterator
> in compute().
>
> Thanks,
> Ameet