You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by swetha <sw...@gmail.com> on 2015/07/15 02:23:22 UTC

Re: creating a distributed index

Hi Ankur, 

Is IndexedRDD available in Spark 1.4.0? We would like to use this in Spark
Streaming to do lookups/updates/deletes in RDDs using keys by storing them
as key/value pairs. 

Thanks, 
Swetha
      




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/creating-a-distributed-index-tp11204p23842.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: creating a distributed index

Posted by swetha kasireddy <sw...@gmail.com>.
Hi Ankur,

I have the following questions on IndexedRDD.

1.  Does the IndexedRDD support the key types of String? As per the current
documentation, it looks like it supports only Long?

2. Is IndexedRDD efficient when joined with another RDD. So, basically my
usecase  is that I need to create an IndexedRDD for a certain set of data
and then get those keys that are present in the IndexedRDD but not present
in some other RDD.
How would an IndexedRDD support such an usecase in an efficient manner?


Thanks,
Swetha







On Wed, Jul 15, 2015 at 2:46 AM, Jem Tucker <je...@gmail.com> wrote:

> This is very interesting, do you know if this version will be backwards
> compatible with older versions of Spark (1.2.0)?
>
> Thanks,
>
> Jem
>
>
> On Wed, Jul 15, 2015 at 10:04 AM Ankur Dave <an...@gmail.com> wrote:
>
>> The latest version of IndexedRDD supports any key type with a defined
>> serializer
>> <https://github.com/amplab/spark-indexedrdd/blob/master/src/main/scala/edu/berkeley/cs/amplab/spark/indexedrdd/KeySerializer.scala>,
>> including Strings. It's not released yet, but you can use it from the
>> master branch if you're interested.
>>
>> Ankur <http://www.ankurdave.com/>
>>
>> On Wed, Jul 15, 2015 at 12:43 AM, Jem Tucker <je...@gmail.com>
>> wrote:
>>
>>> With regards to Indexed structures in Spark are there any alternatives
>>> to IndexedRDD for more generic keys including Strings?
>>>
>>> Thanks
>>>
>>> Jem
>>>
>>

Re: creating a distributed index

Posted by Jem Tucker <je...@gmail.com>.
This is very interesting, do you know if this version will be backwards
compatible with older versions of Spark (1.2.0)?

Thanks,

Jem

On Wed, Jul 15, 2015 at 10:04 AM Ankur Dave <an...@gmail.com> wrote:

> The latest version of IndexedRDD supports any key type with a defined
> serializer
> <https://github.com/amplab/spark-indexedrdd/blob/master/src/main/scala/edu/berkeley/cs/amplab/spark/indexedrdd/KeySerializer.scala>,
> including Strings. It's not released yet, but you can use it from the
> master branch if you're interested.
>
> Ankur <http://www.ankurdave.com/>
>
> On Wed, Jul 15, 2015 at 12:43 AM, Jem Tucker <je...@gmail.com> wrote:
>
>> With regards to Indexed structures in Spark are there any alternatives to
>> IndexedRDD for more generic keys including Strings?
>>
>> Thanks
>>
>> Jem
>>
>

Re: creating a distributed index

Posted by Ankur Dave <an...@gmail.com>.
The latest version of IndexedRDD supports any key type with a defined
serializer
<https://github.com/amplab/spark-indexedrdd/blob/master/src/main/scala/edu/berkeley/cs/amplab/spark/indexedrdd/KeySerializer.scala>,
including Strings. It's not released yet, but you can use it from the
master branch if you're interested.

Ankur <http://www.ankurdave.com/>

On Wed, Jul 15, 2015 at 12:43 AM, Jem Tucker <je...@gmail.com> wrote:

> With regards to Indexed structures in Spark are there any alternatives to
> IndexedRDD for more generic keys including Strings?
>
> Thanks
>
> Jem
>

Re: creating a distributed index

Posted by Jem Tucker <je...@gmail.com>.
With regards to Indexed structures in Spark are there any alternatives to
IndexedRDD for more generic keys including Strings?

Thanks

Jem

On Wed, Jul 15, 2015 at 7:41 AM Burak Yavuz <br...@gmail.com> wrote:

> Hi Swetha,
>
> IndexedRDD is available as a package on Spark Packages
> <http://spark-packages.org/package/amplab/spark-indexedrdd>.
>
> Best,
> Burak
>
> On Tue, Jul 14, 2015 at 5:23 PM, swetha <sw...@gmail.com> wrote:
>
>> Hi Ankur,
>>
>> Is IndexedRDD available in Spark 1.4.0? We would like to use this in Spark
>> Streaming to do lookups/updates/deletes in RDDs using keys by storing them
>> as key/value pairs.
>>
>> Thanks,
>> Swetha
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/creating-a-distributed-index-tp11204p23842.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Re: creating a distributed index

Posted by Burak Yavuz <br...@gmail.com>.
Hi Swetha,

IndexedRDD is available as a package on Spark Packages
<http://spark-packages.org/package/amplab/spark-indexedrdd>.

Best,
Burak

On Tue, Jul 14, 2015 at 5:23 PM, swetha <sw...@gmail.com> wrote:

> Hi Ankur,
>
> Is IndexedRDD available in Spark 1.4.0? We would like to use this in Spark
> Streaming to do lookups/updates/deletes in RDDs using keys by storing them
> as key/value pairs.
>
> Thanks,
> Swetha
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/creating-a-distributed-index-tp11204p23842.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>