You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by VG <vl...@gmail.com> on 2016/07/22 13:10:15 UTC

Dataset , RDD zipWithIndex -- How to use as a map .

Hi All,

I am really confused how to proceed further. Please help.

I have a dataset created as follows:
Dataset<Row> b = sqlContext.sql("SELECT bid, name FROM business");

Now I need to map each name with a unique index and I did the following
JavaPairRDD<Row, Long> indexedBId = business.javaRDD()
                                                           .zipWithIndex();

In later part of the code I need to change a datastructure and update name
with index value generated above .
I am unable to figure out how to do a look up here..

Please suggest /.

If there is a better way to do this please suggest that.

Regards
VG

Re: Dataset , RDD zipWithIndex -- How to use as a map .

Posted by Pedro Rodriguez <sk...@gmail.com>.
You could either do monotonically_increasing_id or use a window function
and rank. The first is a simple spark SQL function, data bricks has a
pretty helpful post for how to use window functions (in this case the whole
data set is the window).

On Fri, Jul 22, 2016 at 12:20 PM, Marco Mistroni <mm...@gmail.com>
wrote:

> Hi
> So u u have a data frame, then use zipwindex and create a tuple ....
> I m not sure if df API has something useful for zip w index.
> But u can
> - get a data frame
> - convert it to rdd (there's a tordd )
> - do a zip with index
>
> That will give u a rdd with 3 fields...
> I don't think you can update df columns....
> Hth
> On 22 Jul 2016 5:19 pm, "VG" <vl...@gmail.com> wrote:
>
> >
>
> > Hi All,
> >
> > Any suggestions for this
> >
> > Regards,
> > VG
> >
> > On Fri, Jul 22, 2016 at 6:40 PM, VG <vl...@gmail.com> wrote:
>
> >>
>
> >> Hi All,
> >>
> >> I am really confused how to proceed further. Please help.
> >>
> >> I have a dataset created as follows:
> >> Dataset<Row> b = sqlContext.sql("SELECT bid, name FROM business");
> >>
> >> Now I need to map each name with a unique index and I did the following
> >> JavaPairRDD<Row, Long> indexedBId = business.javaRDD()
> >>
>  .zipWithIndex();
> >>
> >> In later part of the code I need to change a datastructure and update
> name with index value generated above .
> >> I am unable to figure out how to do a look up here..
> >>
> >> Please suggest /.
> >>
> >> If there is a better way to do this please suggest that.
> >>
> >> Regards
> >> VG
> >>
> >
>



-- 
Pedro Rodriguez
PhD Student in Distributed Machine Learning | CU Boulder
UC Berkeley AMPLab Alumni

ski.rodriguez@gmail.com | pedrorodriguez.io | 909-353-4423
Github: github.com/EntilZha | LinkedIn:
https://www.linkedin.com/in/pedrorodriguezscience

Re: Dataset , RDD zipWithIndex -- How to use as a map .

Posted by Marco Mistroni <mm...@gmail.com>.
Hi
So u u have a data frame, then use zipwindex and create a tuple ....
I m not sure if df API has something useful for zip w index.
But u can
- get a data frame
- convert it to rdd (there's a tordd )
- do a zip with index

That will give u a rdd with 3 fields...
I don't think you can update df columns....
Hth
On 22 Jul 2016 5:19 pm, "VG" <vl...@gmail.com> wrote:

>

> Hi All,
>
> Any suggestions for this
>
> Regards,
> VG
>
> On Fri, Jul 22, 2016 at 6:40 PM, VG <vl...@gmail.com> wrote:

>>

>> Hi All,
>>
>> I am really confused how to proceed further. Please help.
>>
>> I have a dataset created as follows:
>> Dataset<Row> b = sqlContext.sql("SELECT bid, name FROM business");
>>
>> Now I need to map each name with a unique index and I did the following
>> JavaPairRDD<Row, Long> indexedBId = business.javaRDD()
>>
 .zipWithIndex();
>>
>> In later part of the code I need to change a datastructure and update
name with index value generated above .
>> I am unable to figure out how to do a look up here..
>>
>> Please suggest /.
>>
>> If there is a better way to do this please suggest that.
>>
>> Regards
>> VG
>>
>

Re: Dataset , RDD zipWithIndex -- How to use as a map .

Posted by VG <vl...@gmail.com>.
Hi All,

Any suggestions for this

Regards,
VG

On Fri, Jul 22, 2016 at 6:40 PM, VG <vl...@gmail.com> wrote:

> Hi All,
>
> I am really confused how to proceed further. Please help.
>
> I have a dataset created as follows:
> Dataset<Row> b = sqlContext.sql("SELECT bid, name FROM business");
>
> Now I need to map each name with a unique index and I did the following
> JavaPairRDD<Row, Long> indexedBId = business.javaRDD()
>                                                            .zipWithIndex();
>
> In later part of the code I need to change a datastructure and update name
> with index value generated above .
> I am unable to figure out how to do a look up here..
>
> Please suggest /.
>
> If there is a better way to do this please suggest that.
>
> Regards
> VG
>
>

Re: Dataset , RDD zipWithIndex -- How to use as a map .

Posted by VG <vl...@gmail.com>.
Hi All,

Any suggestions for this

Regards,
VG

On Fri, Jul 22, 2016 at 6:40 PM, VG <vl...@gmail.com> wrote:

> Hi All,
>
> I am really confused how to proceed further. Please help.
>
> I have a dataset created as follows:
> Dataset<Row> b = sqlContext.sql("SELECT bid, name FROM business");
>
> Now I need to map each name with a unique index and I did the following
> JavaPairRDD<Row, Long> indexedBId = business.javaRDD()
>                                                            .zipWithIndex();
>
> In later part of the code I need to change a datastructure and update name
> with index value generated above .
> I am unable to figure out how to do a look up here..
>
> Please suggest /.
>
> If there is a better way to do this please suggest that.
>
> Regards
> VG
>
>