You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Nikolay Izhikov (JIRA)" <ji...@apache.org> on 2017/11/28 10:33:00 UTC

[jira] [Commented] (IGNITE-3084) Spark Data Frames Support in Apache Ignite

    [ https://issues.apache.org/jira/browse/IGNITE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16268523#comment-16268523 ] 

Nikolay Izhikov commented on IGNITE-3084:
-----------------------------------------

We can't have IgniteCatalog for 2.1 version of spark.
So I propose to update spark dependencies for module {{spark}} to 2.2.0 in this task.

1. To setup IgniteCatalog we need to override `SharedState.externalCatalog` val. So spark can lookup Ignite tables.
2. externalCatalog is null while SharedState instance initialized.  [https://docs.scala-lang.org/tutorials/FAQ/initialization-order.html]
3. externalCatalog is used in internal initializer - [SharedState.scala|https://github.com/apache/spark/blob/v2.1.2/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala#L96]
4. In 2.2.0 version SharedState.scala fixed in the way that allow override of externalCatalog - [SharedState-2.2.0|https://github.com/apache/spark/blob/v2.2.0/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala#L93]

{code:scala}
  {
    val defaultDbDefinition = CatalogDatabase(
      SessionCatalog.DEFAULT_DATABASE, "default database", warehousePath, Map())
    if (!externalCatalog.databaseExists(SessionCatalog.DEFAULT_DATABASE)) { // <-- Problem is here! externalCatalog == null if we override it.
      externalCatalog.createDatabase(defaultDbDefinition, ignoreIfExists = true)
    }
  }
{code}

> Spark Data Frames Support in Apache Ignite
> ------------------------------------------
>
>                 Key: IGNITE-3084
>                 URL: https://issues.apache.org/jira/browse/IGNITE-3084
>             Project: Ignite
>          Issue Type: Task
>          Components: spark
>    Affects Versions: 1.5.0.final
>            Reporter: Vladimir Ozerov
>            Assignee: Nikolay Izhikov
>              Labels: bigdata
>             Fix For: 2.4
>
>
> Apache Spark already benefits from integration with Apache Ignite. The latter provides shared RDDs, an implementation of Spark RDD, that help Spark to share a state between Spark workers and execute SQL queries much faster. The next logical step is to enable support for modern Spark Data Frames API in a similar way.
> As a contributor, you will be fully in charge of the integration of Spark Data Frame API and Apache Ignite.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)