You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by Николай Ижиков <ni...@gmail.com> on 2017/10/05 14:05:15 UTC

Spark+Ignite SQL syntax proposal

Hello, guys.

I’m working on IGNITE-3084 [1] “Spark Data Frames Support in Apache Ignite”
and have a proposal to discuss.

I want to provide a consistent way to query Ignite key-value caches from
Spark SQL engine.

To implement it I have to determine java class for the key and value.
It required for calculating schema for a Spark Data Frame.
As far as I know, there is no meta information for key-value cache in
Ignite for now.

If a regular data source is used, a user can provide key class and value
class throw options. Example:

```
val df = spark.read
  .format(IGNITE)
  .option("config", CONFIG)
  .option("cache", CACHE_NAME)
  .option("keyClass", "java.lang.Long")
  .option("valueClass", "java.lang.String")
  .load()

df.printSchema()

df.createOrReplaceTempView("testCache")

val igniteDF = spark.sql("SELECT key, value FROM testCache WHERE key >= 2
AND value like '%0'")
```

But If we use Ignite implementation of Spark catalog we don’t want to
register existing caches by hand.
Anton Vinogradov proposes syntax that I personally like very much:

*Let’s use following table name for a key-value cache -
`cacheName[keyClass,valueClass]`*

Example:

```
val df3 = igniteSession.sql("SELECT * FROM
`testCache[java.lang.Integer,java.lang.String]` WHERE key % 2 = 0")

df3.printSchema()

df3.show()
```

Thoughts?

[1] https://issues.apache.org/jira/browse/IGNITE-3084

--
Nikolay Izhikov
NIzhikov.dev@gmail.com

Re: Spark+Ignite SQL syntax proposal

Posted by Nikolay Izhikov <ni...@gmail.com>.

Hello, Ray.

I think it can be done as a second step.
After DataFrame for a current Spark release would be merged.

Thoughts?

07.10.2017 16:42, Ray пишет:
> Hi Nikolay,
> 
> Could you also implement the DataFrame support for spark-2.10 module?
> There's some legacy spark users still using Spark 1.6, they need the
> DataFrame features too.
> 
> Thanks
> 
> 
> 
> 
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>

Re: Spark+Ignite SQL syntax proposal

Posted by Ray <ra...@cisco.com>.

Hi Nikolay,

Could you also implement the DataFrame support for spark-2.10 module?
There's some legacy spark users still using Spark 1.6, they need the
DataFrame features too.

Thanks




--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Re: Spark+Ignite SQL syntax proposal

Posted by Ray <ra...@cisco.com>.

Hi Nikolay,

Could you also implement the DataFrame support for spark-2.10 module?
There's still some legacy users who still uses spark 1.6, they need
DataFrame feature too.

Thanks



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Re: Spark+Ignite SQL syntax proposal

Posted by Николай Ижиков <ni...@gmail.com>.

Ok. Got it. Will remove key value support from catalog.

6 окт. 2017 г. 6:34 AM пользователь "Denis Magda" <dm...@apache.org>
написал:

> I tend to agree with Val that key-value support seems excessive. My
> suggestion is to consider Ignite as a SQL database for this specific
> integration implementing only relevant functionality.
>
> —
> Denis
>
> > On Oct 5, 2017, at 5:41 PM, Valentin Kulichenko <
> valentin.kulichenko@gmail.com> wrote:
> >
> > Nikolay,
> >
> > I don't think we need this, especially with this kind of syntax which is
> > very confusing. Main use case for data frames is SQL, so let's
> concentrate
> > on it. We should use Ignite's SQL engine capabilities as much as
> possible.
> > If we see other use cases down the road, we can always support them.
> >
> > -Val
> >
> > On Thu, Oct 5, 2017 at 10:57 AM, Николай Ижиков <ni...@gmail.com>
> > wrote:
> >
> >> Hello, Valentin.
> >>
> >> I implemented the ability to make Spark SQL Queries for both:
> >>
> >> 1.  Ignite SQL Table. Internally table described by QueryEntity with
> meta
> >> information about data.
> >> 2.  Key-Value cache - regular Ignite cache without meta information
> about
> >> stored data.
> >>
> >> In the second case, we have to know which types cache stores.
> >> So for this case, I propose use syntax I describe
> >>
> >>
> >> 2017-10-05 20:45 GMT+03:00 Valentin Kulichenko <
> >> valentin.kulichenko@gmail.com>:
> >>
> >>> Nikolay,
> >>>
> >>> I don't understand. Why do we require to provide key and value types in
> >>> SQL? What is the issue you're trying to solve with this syntax?
> >>>
> >>> -Val
> >>>
> >>> On Thu, Oct 5, 2017 at 7:05 AM, Николай Ижиков <nizhikov.dev@gmail.com
> >
> >>> wrote:
> >>>
> >>>> Hello, guys.
> >>>>
> >>>> I’m working on IGNITE-3084 [1] “Spark Data Frames Support in Apache
> >>> Ignite”
> >>>> and have a proposal to discuss.
> >>>>
> >>>> I want to provide a consistent way to query Ignite key-value caches
> >> from
> >>>> Spark SQL engine.
> >>>>
> >>>> To implement it I have to determine java class for the key and value.
> >>>> It required for calculating schema for a Spark Data Frame.
> >>>> As far as I know, there is no meta information for key-value cache in
> >>>> Ignite for now.
> >>>>
> >>>> If a regular data source is used, a user can provide key class and
> >> value
> >>>> class throw options. Example:
> >>>>
> >>>> ```
> >>>> val df = spark.read
> >>>>  .format(IGNITE)
> >>>>  .option("config", CONFIG)
> >>>>  .option("cache", CACHE_NAME)
> >>>>  .option("keyClass", "java.lang.Long")
> >>>>  .option("valueClass", "java.lang.String")
> >>>>  .load()
> >>>>
> >>>> df.printSchema()
> >>>>
> >>>> df.createOrReplaceTempView("testCache")
> >>>>
> >>>> val igniteDF = spark.sql("SELECT key, value FROM testCache WHERE key
> >>> = 2
> >>>> AND value like '%0'")
> >>>> ```
> >>>>
> >>>> But If we use Ignite implementation of Spark catalog we don’t want to
> >>>> register existing caches by hand.
> >>>> Anton Vinogradov proposes syntax that I personally like very much:
> >>>>
> >>>> *Let’s use following table name for a key-value cache -
> >>>> `cacheName[keyClass,valueClass]`*
> >>>>
> >>>> Example:
> >>>>
> >>>> ```
> >>>> val df3 = igniteSession.sql("SELECT * FROM
> >>>> `testCache[java.lang.Integer,java.lang.String]` WHERE key % 2 = 0")
> >>>>
> >>>> df3.printSchema()
> >>>>
> >>>> df3.show()
> >>>> ```
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> [1] https://issues.apache.org/jira/browse/IGNITE-3084
> >>>>
> >>>> --
> >>>> Nikolay Izhikov
> >>>> NIzhikov.dev@gmail.com
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> Nikolay Izhikov
> >> NIzhikov.dev@gmail.com
> >>
>
>

Re: Spark+Ignite SQL syntax proposal

Posted by Denis Magda <dm...@apache.org>.

I tend to agree with Val that key-value support seems excessive. My suggestion is to consider Ignite as a SQL database for this specific integration implementing only relevant functionality.

—
Denis

> On Oct 5, 2017, at 5:41 PM, Valentin Kulichenko <va...@gmail.com> wrote:
> 
> Nikolay,
> 
> I don't think we need this, especially with this kind of syntax which is
> very confusing. Main use case for data frames is SQL, so let's concentrate
> on it. We should use Ignite's SQL engine capabilities as much as possible.
> If we see other use cases down the road, we can always support them.
> 
> -Val
> 
> On Thu, Oct 5, 2017 at 10:57 AM, Николай Ижиков <ni...@gmail.com>
> wrote:
> 
>> Hello, Valentin.
>> 
>> I implemented the ability to make Spark SQL Queries for both:
>> 
>> 1.  Ignite SQL Table. Internally table described by QueryEntity with meta
>> information about data.
>> 2.  Key-Value cache - regular Ignite cache without meta information about
>> stored data.
>> 
>> In the second case, we have to know which types cache stores.
>> So for this case, I propose use syntax I describe
>> 
>> 
>> 2017-10-05 20:45 GMT+03:00 Valentin Kulichenko <
>> valentin.kulichenko@gmail.com>:
>> 
>>> Nikolay,
>>> 
>>> I don't understand. Why do we require to provide key and value types in
>>> SQL? What is the issue you're trying to solve with this syntax?
>>> 
>>> -Val
>>> 
>>> On Thu, Oct 5, 2017 at 7:05 AM, Николай Ижиков <ni...@gmail.com>
>>> wrote:
>>> 
>>>> Hello, guys.
>>>> 
>>>> I’m working on IGNITE-3084 [1] “Spark Data Frames Support in Apache
>>> Ignite”
>>>> and have a proposal to discuss.
>>>> 
>>>> I want to provide a consistent way to query Ignite key-value caches
>> from
>>>> Spark SQL engine.
>>>> 
>>>> To implement it I have to determine java class for the key and value.
>>>> It required for calculating schema for a Spark Data Frame.
>>>> As far as I know, there is no meta information for key-value cache in
>>>> Ignite for now.
>>>> 
>>>> If a regular data source is used, a user can provide key class and
>> value
>>>> class throw options. Example:
>>>> 
>>>> ```
>>>> val df = spark.read
>>>>  .format(IGNITE)
>>>>  .option("config", CONFIG)
>>>>  .option("cache", CACHE_NAME)
>>>>  .option("keyClass", "java.lang.Long")
>>>>  .option("valueClass", "java.lang.String")
>>>>  .load()
>>>> 
>>>> df.printSchema()
>>>> 
>>>> df.createOrReplaceTempView("testCache")
>>>> 
>>>> val igniteDF = spark.sql("SELECT key, value FROM testCache WHERE key
>>> = 2
>>>> AND value like '%0'")
>>>> ```
>>>> 
>>>> But If we use Ignite implementation of Spark catalog we don’t want to
>>>> register existing caches by hand.
>>>> Anton Vinogradov proposes syntax that I personally like very much:
>>>> 
>>>> *Let’s use following table name for a key-value cache -
>>>> `cacheName[keyClass,valueClass]`*
>>>> 
>>>> Example:
>>>> 
>>>> ```
>>>> val df3 = igniteSession.sql("SELECT * FROM
>>>> `testCache[java.lang.Integer,java.lang.String]` WHERE key % 2 = 0")
>>>> 
>>>> df3.printSchema()
>>>> 
>>>> df3.show()
>>>> ```
>>>> 
>>>> Thoughts?
>>>> 
>>>> [1] https://issues.apache.org/jira/browse/IGNITE-3084
>>>> 
>>>> --
>>>> Nikolay Izhikov
>>>> NIzhikov.dev@gmail.com
>>>> 
>>> 
>> 
>> 
>> 
>> --
>> Nikolay Izhikov
>> NIzhikov.dev@gmail.com
>>

Re: Spark+Ignite SQL syntax proposal

Posted by Valentin Kulichenko <va...@gmail.com>.

Nikolay,

I don't think we need this, especially with this kind of syntax which is
very confusing. Main use case for data frames is SQL, so let's concentrate
on it. We should use Ignite's SQL engine capabilities as much as possible.
If we see other use cases down the road, we can always support them.

-Val

On Thu, Oct 5, 2017 at 10:57 AM, Николай Ижиков <ni...@gmail.com>
wrote:

> Hello, Valentin.
>
> I implemented the ability to make Spark SQL Queries for both:
>
> 1.  Ignite SQL Table. Internally table described by QueryEntity with meta
> information about data.
> 2.  Key-Value cache - regular Ignite cache without meta information about
> stored data.
>
> In the second case, we have to know which types cache stores.
> So for this case, I propose use syntax I describe
>
>
> 2017-10-05 20:45 GMT+03:00 Valentin Kulichenko <
> valentin.kulichenko@gmail.com>:
>
> > Nikolay,
> >
> > I don't understand. Why do we require to provide key and value types in
> > SQL? What is the issue you're trying to solve with this syntax?
> >
> > -Val
> >
> > On Thu, Oct 5, 2017 at 7:05 AM, Николай Ижиков <ni...@gmail.com>
> > wrote:
> >
> > > Hello, guys.
> > >
> > > I’m working on IGNITE-3084 [1] “Spark Data Frames Support in Apache
> > Ignite”
> > > and have a proposal to discuss.
> > >
> > > I want to provide a consistent way to query Ignite key-value caches
> from
> > > Spark SQL engine.
> > >
> > > To implement it I have to determine java class for the key and value.
> > > It required for calculating schema for a Spark Data Frame.
> > > As far as I know, there is no meta information for key-value cache in
> > > Ignite for now.
> > >
> > > If a regular data source is used, a user can provide key class and
> value
> > > class throw options. Example:
> > >
> > > ```
> > > val df = spark.read
> > >   .format(IGNITE)
> > >   .option("config", CONFIG)
> > >   .option("cache", CACHE_NAME)
> > >   .option("keyClass", "java.lang.Long")
> > >   .option("valueClass", "java.lang.String")
> > >   .load()
> > >
> > > df.printSchema()
> > >
> > > df.createOrReplaceTempView("testCache")
> > >
> > > val igniteDF = spark.sql("SELECT key, value FROM testCache WHERE key
> >= 2
> > > AND value like '%0'")
> > > ```
> > >
> > > But If we use Ignite implementation of Spark catalog we don’t want to
> > > register existing caches by hand.
> > > Anton Vinogradov proposes syntax that I personally like very much:
> > >
> > > *Let’s use following table name for a key-value cache -
> > > `cacheName[keyClass,valueClass]`*
> > >
> > > Example:
> > >
> > > ```
> > > val df3 = igniteSession.sql("SELECT * FROM
> > > `testCache[java.lang.Integer,java.lang.String]` WHERE key % 2 = 0")
> > >
> > > df3.printSchema()
> > >
> > > df3.show()
> > > ```
> > >
> > > Thoughts?
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-3084
> > >
> > > --
> > > Nikolay Izhikov
> > > NIzhikov.dev@gmail.com
> > >
> >
>
>
>
> --
> Nikolay Izhikov
> NIzhikov.dev@gmail.com
>

Re: Spark+Ignite SQL syntax proposal

Posted by Николай Ижиков <ni...@gmail.com>.

Hello, Valentin.

I implemented the ability to make Spark SQL Queries for both:

1.  Ignite SQL Table. Internally table described by QueryEntity with meta
information about data.
2.  Key-Value cache - regular Ignite cache without meta information about
stored data.

In the second case, we have to know which types cache stores.
So for this case, I propose use syntax I describe


2017-10-05 20:45 GMT+03:00 Valentin Kulichenko <
valentin.kulichenko@gmail.com>:

> Nikolay,
>
> I don't understand. Why do we require to provide key and value types in
> SQL? What is the issue you're trying to solve with this syntax?
>
> -Val
>
> On Thu, Oct 5, 2017 at 7:05 AM, Николай Ижиков <ni...@gmail.com>
> wrote:
>
> > Hello, guys.
> >
> > I’m working on IGNITE-3084 [1] “Spark Data Frames Support in Apache
> Ignite”
> > and have a proposal to discuss.
> >
> > I want to provide a consistent way to query Ignite key-value caches from
> > Spark SQL engine.
> >
> > To implement it I have to determine java class for the key and value.
> > It required for calculating schema for a Spark Data Frame.
> > As far as I know, there is no meta information for key-value cache in
> > Ignite for now.
> >
> > If a regular data source is used, a user can provide key class and value
> > class throw options. Example:
> >
> > ```
> > val df = spark.read
> >   .format(IGNITE)
> >   .option("config", CONFIG)
> >   .option("cache", CACHE_NAME)
> >   .option("keyClass", "java.lang.Long")
> >   .option("valueClass", "java.lang.String")
> >   .load()
> >
> > df.printSchema()
> >
> > df.createOrReplaceTempView("testCache")
> >
> > val igniteDF = spark.sql("SELECT key, value FROM testCache WHERE key >= 2
> > AND value like '%0'")
> > ```
> >
> > But If we use Ignite implementation of Spark catalog we don’t want to
> > register existing caches by hand.
> > Anton Vinogradov proposes syntax that I personally like very much:
> >
> > *Let’s use following table name for a key-value cache -
> > `cacheName[keyClass,valueClass]`*
> >
> > Example:
> >
> > ```
> > val df3 = igniteSession.sql("SELECT * FROM
> > `testCache[java.lang.Integer,java.lang.String]` WHERE key % 2 = 0")
> >
> > df3.printSchema()
> >
> > df3.show()
> > ```
> >
> > Thoughts?
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-3084
> >
> > --
> > Nikolay Izhikov
> > NIzhikov.dev@gmail.com
> >
>



-- 
Nikolay Izhikov
NIzhikov.dev@gmail.com

Re: Spark+Ignite SQL syntax proposal

Posted by Valentin Kulichenko <va...@gmail.com>.

Nikolay,

I don't understand. Why do we require to provide key and value types in
SQL? What is the issue you're trying to solve with this syntax?

-Val

On Thu, Oct 5, 2017 at 7:05 AM, Николай Ижиков <ni...@gmail.com>
wrote:

> Hello, guys.
>
> I’m working on IGNITE-3084 [1] “Spark Data Frames Support in Apache Ignite”
> and have a proposal to discuss.
>
> I want to provide a consistent way to query Ignite key-value caches from
> Spark SQL engine.
>
> To implement it I have to determine java class for the key and value.
> It required for calculating schema for a Spark Data Frame.
> As far as I know, there is no meta information for key-value cache in
> Ignite for now.
>
> If a regular data source is used, a user can provide key class and value
> class throw options. Example:
>
> ```
> val df = spark.read
>   .format(IGNITE)
>   .option("config", CONFIG)
>   .option("cache", CACHE_NAME)
>   .option("keyClass", "java.lang.Long")
>   .option("valueClass", "java.lang.String")
>   .load()
>
> df.printSchema()
>
> df.createOrReplaceTempView("testCache")
>
> val igniteDF = spark.sql("SELECT key, value FROM testCache WHERE key >= 2
> AND value like '%0'")
> ```
>
> But If we use Ignite implementation of Spark catalog we don’t want to
> register existing caches by hand.
> Anton Vinogradov proposes syntax that I personally like very much:
>
> *Let’s use following table name for a key-value cache -
> `cacheName[keyClass,valueClass]`*
>
> Example:
>
> ```
> val df3 = igniteSession.sql("SELECT * FROM
> `testCache[java.lang.Integer,java.lang.String]` WHERE key % 2 = 0")
>
> df3.printSchema()
>
> df3.show()
> ```
>
> Thoughts?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-3084
>
> --
> Nikolay Izhikov
> NIzhikov.dev@gmail.com
>