You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by swetha <sw...@gmail.com> on 2015/07/15 02:18:34 UTC

Is IndexedRDD available in Spark 1.4.0?

Hi,

Is IndexedRDD available in Spark 1.4.0? We would like to use this in Spark
Streaming to do lookups/updates/deletes in RDDs using keys by storing them
as key/value pairs.

Thanks,
Swetha



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-IndexedRDD-available-in-Spark-1-4-0-tp23841.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Is IndexedRDD available in Spark 1.4.0?

Posted by Ruslan Dautkhanov <da...@gmail.com>.
Or Spark on HBase )

http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/



-- 
Ruslan Dautkhanov

On Tue, Jul 14, 2015 at 7:07 PM, Ted Yu <yu...@gmail.com> wrote:

> bq. that is, key-value stores
>
> Please consider HBase for this purpose :-)
>
> On Tue, Jul 14, 2015 at 5:55 PM, Tathagata Das <td...@databricks.com>
> wrote:
>
>> I do not recommend using IndexRDD for state management in Spark
>> Streaming. What it does not solve out-of-the-box is checkpointing of
>> indexRDDs, which important because long running streaming jobs can lead to
>> infinite chain of RDDs. Spark Streaming solves it for the updateStateByKey
>> operation which you can use, which gives state management capabilities.
>> Though for most flexible arbitrary look up of stuff, its better to use a
>> dedicated system that is designed and optimized for long term storage of
>> data, that is, key-value stores, databases, etc.
>>
>> On Tue, Jul 14, 2015 at 5:44 PM, Ted Yu <yu...@gmail.com> wrote:
>>
>>> Please take a look at SPARK-2365 which is in progress.
>>>
>>> On Tue, Jul 14, 2015 at 5:18 PM, swetha <sw...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Is IndexedRDD available in Spark 1.4.0? We would like to use this in
>>>> Spark
>>>> Streaming to do lookups/updates/deletes in RDDs using keys by storing
>>>> them
>>>> as key/value pairs.
>>>>
>>>> Thanks,
>>>> Swetha
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Is-IndexedRDD-available-in-Spark-1-4-0-tp23841.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>
>>>>
>>>
>>
>

Re: Is IndexedRDD available in Spark 1.4.0?

Posted by Ted Yu <yu...@gmail.com>.
bq. that is, key-value stores

Please consider HBase for this purpose :-)

On Tue, Jul 14, 2015 at 5:55 PM, Tathagata Das <td...@databricks.com> wrote:

> I do not recommend using IndexRDD for state management in Spark Streaming.
> What it does not solve out-of-the-box is checkpointing of indexRDDs, which
> important because long running streaming jobs can lead to infinite chain of
> RDDs. Spark Streaming solves it for the updateStateByKey operation which
> you can use, which gives state management capabilities. Though for most
> flexible arbitrary look up of stuff, its better to use a dedicated system
> that is designed and optimized for long term storage of data, that is,
> key-value stores, databases, etc.
>
> On Tue, Jul 14, 2015 at 5:44 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> Please take a look at SPARK-2365 which is in progress.
>>
>> On Tue, Jul 14, 2015 at 5:18 PM, swetha <sw...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Is IndexedRDD available in Spark 1.4.0? We would like to use this in
>>> Spark
>>> Streaming to do lookups/updates/deletes in RDDs using keys by storing
>>> them
>>> as key/value pairs.
>>>
>>> Thanks,
>>> Swetha
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Is-IndexedRDD-available-in-Spark-1-4-0-tp23841.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>

Re: Is IndexedRDD available in Spark 1.4.0?

Posted by Tathagata Das <td...@databricks.com>.
I do not recommend using IndexRDD for state management in Spark Streaming.
What it does not solve out-of-the-box is checkpointing of indexRDDs, which
important because long running streaming jobs can lead to infinite chain of
RDDs. Spark Streaming solves it for the updateStateByKey operation which
you can use, which gives state management capabilities. Though for most
flexible arbitrary look up of stuff, its better to use a dedicated system
that is designed and optimized for long term storage of data, that is,
key-value stores, databases, etc.

On Tue, Jul 14, 2015 at 5:44 PM, Ted Yu <yu...@gmail.com> wrote:

> Please take a look at SPARK-2365 which is in progress.
>
> On Tue, Jul 14, 2015 at 5:18 PM, swetha <sw...@gmail.com> wrote:
>
>> Hi,
>>
>> Is IndexedRDD available in Spark 1.4.0? We would like to use this in Spark
>> Streaming to do lookups/updates/deletes in RDDs using keys by storing them
>> as key/value pairs.
>>
>> Thanks,
>> Swetha
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Is-IndexedRDD-available-in-Spark-1-4-0-tp23841.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Re: Is IndexedRDD available in Spark 1.4.0?

Posted by Ted Yu <yu...@gmail.com>.
Please take a look at SPARK-2365 which is in progress.

On Tue, Jul 14, 2015 at 5:18 PM, swetha <sw...@gmail.com> wrote:

> Hi,
>
> Is IndexedRDD available in Spark 1.4.0? We would like to use this in Spark
> Streaming to do lookups/updates/deletes in RDDs using keys by storing them
> as key/value pairs.
>
> Thanks,
> Swetha
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Is-IndexedRDD-available-in-Spark-1-4-0-tp23841.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>