You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Kira <me...@gmail.com> on 2015/12/29 12:30:23 UTC

Ignite Shared Rdd

Hi,

I am new in Apache Ignite and I was helping to find some help here with my
questions about it. Actually I am interested to know what is the motivations
for sharing Rdds across jobs and applications in Spark with Ignite ? and
also what are the types of applications that will need this mechanisme ?

My others questions are how ignite can improve the performence of Spark and
in which context ? any exemple for real world application ?

is there any published paper of the Shared Rdds mechanisme ?

Waiting for your answers
Thank you
Regards



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Ignite-Shared-Rdd-tp2339.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Ignite Shared Rdd

Posted by Kira <me...@gmail.com>.
Hi Alexey,

Thak you for your answer, but I am interested by Shared RDD, I dont know if
Ignite uses this mechanisme in SQL Indexing, if so, can you please tell me
more

Regards



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Ignite-Shared-Rdd-tp2339p2354.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Ignite Shared Rdd

Posted by Alexey Goncharuk <al...@gmail.com>.
Kira,

I also wanted to mention that probably the most obvious way you can gain
performance benefit from Ignite compared to Spark is using indexing for SQL
queries. Last time I checked, spark did not have indexes, so each SQL query
in Spark implies a full scan of the data set. In Ignite you can index a
particular entry field, so a query against this field can be significantly
faster. You can also use indexed entities via IgniteRDD.

Please refer to the following section in documentation:
https://apacheignite.readme.io/docs/sql-queries

​--AG

Re: Ignite Shared Rdd

Posted by Kira <me...@gmail.com>.
Hi,

Thank you all for your answers, I think i ve got what i wanted. However, an
exemple of one real world application will be very nice :p

Thank you again
Regards

2015-12-30 18:17 GMT+01:00 dsetrakyan [via Apache Ignite Users] <
ml-node+s70518n2359h12@n6.nabble.com>:

> I would add that the use-case for a shared RDD is similar to the use case
> for in-memory file system. Traditionally, users have been sharing the state
> across Spark jobs or applications by persisting them to disk and reading
> them from another process. With shared RDD you don’t need the disk anymore,
> you can share the state directly in memory.
>
> Additionally, unlike with a file system, RDD allows users to work with
> their domain objects (not files) directly and query them in memory using
> fast indexed SQL available in Ignite.
>
> D.
>
> On Wed, Dec 30, 2015 at 3:48 AM, Denis Magda <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=2359&i=0>> wrote:
>
>> Hi Kira,
>>
>> It can be any kind of applications that needs to have an RDD that must be
>> shared among different Spark jobs.
>> Such a shared RDD can preserve a state or data that can be reused by many
>> Spark jobs in parallel or later.
>> Shared RDD is a contribution from Ignite to Spark and not a replacement
>> of the latter. [1]
>>
>> In general, Apache Ignite is an in-memory data fabric with many
>> components [2], including distributed computation engine, and [3] use cases
>> while Spark is a distributed computation engine.
>>
>> [1] https://ignite.apache.org/use-cases/spark/shared-memory-layer.html
>> [2] https://apacheignite.readme.io/
>> [3] https://ignite.apache.org/usecases.html
>>
>> --
>> Denis
>>
>>
>> On 12/30/2015 2:02 PM, Kira wrote:
>>
>>> Hi Denis,
>>>
>>> Thank you for yout answer, but what I am looking for is at higher level
>>>
>>> I want to know what are the types of application that require Shared RDDs
>>> (Machine Learning, Graph Processing, Web Algorithms, ... ?), in other
>>> words
>>> *why* does Ignite share RDD among jobs ??? or why would anyone choose
>>> Ignite
>>> over Spark ?
>>>
>>> Regards
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-ignite-users.70518.x6.nabble.com/Ignite-Shared-Rdd-tp2339p2353.html
>>> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>>>
>>
>>
>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-ignite-users.70518.x6.nabble.com/Ignite-Shared-Rdd-tp2339p2359.html
> To unsubscribe from Ignite Shared Rdd, click here
> <http://apache-ignite-users.70518.x6.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=2339&code=bWVubm91ci5yQGdtYWlsLmNvbXwyMzM5fDU4NTg5MTgyNw==>
> .
> NAML
> <http://apache-ignite-users.70518.x6.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Ignite-Shared-Rdd-tp2339p2362.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Ignite Shared Rdd

Posted by Dmitriy Setrakyan <ds...@apache.org>.
I would add that the use-case for a shared RDD is similar to the use case
for in-memory file system. Traditionally, users have been sharing the state
across Spark jobs or applications by persisting them to disk and reading
them from another process. With shared RDD you don’t need the disk anymore,
you can share the state directly in memory.

Additionally, unlike with a file system, RDD allows users to work with
their domain objects (not files) directly and query them in memory using
fast indexed SQL available in Ignite.

D.

On Wed, Dec 30, 2015 at 3:48 AM, Denis Magda <dm...@gridgain.com> wrote:

> Hi Kira,
>
> It can be any kind of applications that needs to have an RDD that must be
> shared among different Spark jobs.
> Such a shared RDD can preserve a state or data that can be reused by many
> Spark jobs in parallel or later.
> Shared RDD is a contribution from Ignite to Spark and not a replacement of
> the latter. [1]
>
> In general, Apache Ignite is an in-memory data fabric with many components
> [2], including distributed computation engine, and [3] use cases while
> Spark is a distributed computation engine.
>
> [1] https://ignite.apache.org/use-cases/spark/shared-memory-layer.html
> [2] https://apacheignite.readme.io/
> [3] https://ignite.apache.org/usecases.html
>
> --
> Denis
>
>
> On 12/30/2015 2:02 PM, Kira wrote:
>
>> Hi Denis,
>>
>> Thank you for yout answer, but what I am looking for is at higher level
>>
>> I want to know what are the types of application that require Shared RDDs
>> (Machine Learning, Graph Processing, Web Algorithms, ... ?), in other
>> words
>> *why* does Ignite share RDD among jobs ??? or why would anyone choose
>> Ignite
>> over Spark ?
>>
>> Regards
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-ignite-users.70518.x6.nabble.com/Ignite-Shared-Rdd-tp2339p2353.html
>> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>>
>
>

Re: Ignite Shared Rdd

Posted by Denis Magda <dm...@gridgain.com>.
Hi Kira,

It can be any kind of applications that needs to have an RDD that must 
be shared among different Spark jobs.
Such a shared RDD can preserve a state or data that can be reused by 
many Spark jobs in parallel or later.
Shared RDD is a contribution from Ignite to Spark and not a replacement 
of the latter. [1]

In general, Apache Ignite is an in-memory data fabric with many 
components [2], including distributed computation engine, and [3] use 
cases while Spark is a distributed computation engine.

[1] https://ignite.apache.org/use-cases/spark/shared-memory-layer.html
[2] https://apacheignite.readme.io/
[3] https://ignite.apache.org/usecases.html

--
Denis

On 12/30/2015 2:02 PM, Kira wrote:
> Hi Denis,
>
> Thank you for yout answer, but what I am looking for is at higher level
>
> I want to know what are the types of application that require Shared RDDs
> (Machine Learning, Graph Processing, Web Algorithms, ... ?), in other words
> *why* does Ignite share RDD among jobs ??? or why would anyone choose Ignite
> over Spark ?
>
> Regards
>
>
>
> --
> View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Ignite-Shared-Rdd-tp2339p2353.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Ignite Shared Rdd

Posted by Kira <me...@gmail.com>.
Hi Denis,

Thank you for yout answer, but what I am looking for is at higher level

I want to know what are the types of application that require Shared RDDs
(Machine Learning, Graph Processing, Web Algorithms, ... ?), in other words
*why* does Ignite share RDD among jobs ??? or why would anyone choose Ignite
over Spark ?

Regards



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Ignite-Shared-Rdd-tp2339p2353.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Ignite Shared Rdd

Posted by Denis Magda <dm...@gridgain.com>.
Hi,

There is a specific documentation sections exists on Apache Ignite 
Readme.io site.

Most probably you'll get the answers on all your questions after looking 
at the following articles
https://apacheignite.readme.io/docs/shared-rdd
https://apacheignite.readme.io/docs/ignitecontext--igniterdd
https://apacheignite.readme.io/docs/installation--deployment
https://apacheignite.readme.io/docs/testing-integration-with-spark-shell

If you have any further questions after you've read the resources above 
don't hesitate keep asking them in this thread.

Regards,
Denis

On 12/29/2015 2:30 PM, Kira wrote:
> Hi,
>
> I am new in Apache Ignite and I was helping to find some help here with my
> questions about it. Actually I am interested to know what is the motivations
> for sharing Rdds across jobs and applications in Spark with Ignite ? and
> also what are the types of applications that will need this mechanisme ?
>
> My others questions are how ignite can improve the performence of Spark and
> in which context ? any exemple for real world application ?
>
> is there any published paper of the Shared Rdds mechanisme ?
>
> Waiting for your answers
> Thank you
> Regards
>
>
>
> --
> View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Ignite-Shared-Rdd-tp2339.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.