You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by phalverson <pe...@gmail.com> on 2016/05/24 21:39:37 UTC

Downsides of Spark-Ignite for extended cache management and access?

My company provides big data analytics for large banks (managing and
analyzing their loan portfolios). We have a number of applications that are
fundamentally grid-based, but which tend to use different frameworks to
handle grid computation. We are considering shifting these to a common
Hadoop stack to consolidate infrastructure and provide a more uniform way of
managing our services, as well as providing more options for different
classes of analytics (MR, Streaming, etc.).

One of these applications seems to be a good fit for Ignite (lots of
concurrent low-latency queries against a massive but highly-partitionable
dataset) and, possibly, Spark (distributed batch computing). It's the latter
I'm uncertain about. I understand the general concept of the IgniteRDD as a
bridge to a distributed Ignite cache (or set of them), but do I give
anything up by deploying our app as a Spark job, vs. a custom YARN app that
hosts Ignite nodes? I'm specifically looking at implications for:

- affinity (both Data with Data and Compute with Data)
- advanced SQL queries (cross-cache joins, aggregations, etc)
- persistence (warm-up, write-through)
- transactions

If I can still have all the benefits of IgniteCache while going the virtual
RDD, then Spark seems a good fit, but I want to be clear on any limitations
that such an abstraction might impose. Appreciate any guidance here.

--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Downsides-of-Spark-Ignite-for-extended-cache-management-and-access-tp5154.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Downsides of Spark-Ignite for extended cache management and access?

Posted by phalverson <pe...@gmail.com>.

Thanks. That was precisely my concern, and I appreciate the guidance.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Downsides-of-Spark-Ignite-for-extended-cache-management-and-access-tp5154p5191.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Downsides of Spark-Ignite for extended cache management and access?

Posted by vkulichenko <va...@gmail.com>.

Hi,

IgniteRDD is useful when you already have a Spark application and want to
use Ignite as an underlying storage to share the state between different
Spark jobs (which is not possible with plain Spark) and to get advantage of
fast indexed SQL queries provided by Ignite (there are no indexes in Spark).

IgniteRDD provides full RDD API, but cache API is limited. For example, it
doesn't expose transactions. If you're creating an application from scratch,
I would recommend to use Ignite API directly with all its features (cache,
compute, streaming, etc.).

Makes sense?

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Downsides-of-Spark-Ignite-for-extended-cache-management-and-access-tp5154p5188.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.