You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ashish Mukherjee <as...@gmail.com> on 2015/03/24 08:57:24 UTC

Question about Data Sources API

Hello,

I have some questions related to the Data Sources API -

1. Is the Data Source API stable as of Spark 1.3.0?

2. The Data Source API seems to be available only in Scala. Is there any
plan to make it available for Java too?

3.  Are only filters and projections pushed down to the data source and all
the data pulled into Spark for other processing?

Regards,
Ashish

Re: Question about Data Sources API

Posted by Michael Armbrust <mi...@databricks.com>.
>
> My question wrt Java/Scala was related to extending the classes to support
> new custom data sources, so was wondering if those could be written in
> Java, since our company is a Java shop.
>

Yes, you should be able to extend the required interfaces using Java.

The additional push downs I am looking for are aggregations with grouping
> and sorting.
> Essentially, I am trying to evaluate if this API can give me much of what
> is possible with the Apache MetaModel project.
>

We don't currently push those down today as our initial focus is on getting
data into Spark so that you can join with other sources and then do such
processing.  Its possible we will extend the pushdown API though in the
future.

Re: Question about Data Sources API

Posted by Ashish Mukherjee <as...@gmail.com>.
Hello Michael,

Thanks for your quick reply.

My question wrt Java/Scala was related to extending the classes to support
new custom data sources, so was wondering if those could be written in
Java, since our company is a Java shop.

The additional push downs I am looking for are aggregations with grouping
and sorting.

Essentially, I am trying to evaluate if this API can give me much of what
is possible with the Apache MetaModel project.

Regards,
Ashish

On Tue, Mar 24, 2015 at 1:57 PM, Michael Armbrust <mi...@databricks.com>
wrote:

> On Tue, Mar 24, 2015 at 12:57 AM, Ashish Mukherjee <
> ashish.mukherjee@gmail.com> wrote:
>>
>> 1. Is the Data Source API stable as of Spark 1.3.0?
>>
>
> It is marked DeveloperApi, but in general we do not plan to change even
> these APIs unless there is a very compelling reason to.
>
>
>> 2. The Data Source API seems to be available only in Scala. Is there any
>> plan to make it available for Java too?
>>
>
> We tried to make all the suggested interfaces (other than CatalystScan
> which exposes internals and is only for experimentation) usable from Java.
> Is there something in particular you are having trouble with?
>
>
>> 3.  Are only filters and projections pushed down to the data source and
>> all the data pulled into Spark for other processing?
>>
>
> For now, this is all that is provided by the public stable API.  We left a
> hook for more powerful push downs
> (sqlContext.experimental.extraStrategies), and would be interested in
> feedback on other operations we should push down as we expand the API.
>

Re: Question about Data Sources API

Posted by Michael Armbrust <mi...@databricks.com>.
On Tue, Mar 24, 2015 at 12:57 AM, Ashish Mukherjee <
ashish.mukherjee@gmail.com> wrote:
>
> 1. Is the Data Source API stable as of Spark 1.3.0?
>

It is marked DeveloperApi, but in general we do not plan to change even
these APIs unless there is a very compelling reason to.


> 2. The Data Source API seems to be available only in Scala. Is there any
> plan to make it available for Java too?
>

We tried to make all the suggested interfaces (other than CatalystScan
which exposes internals and is only for experimentation) usable from Java.
Is there something in particular you are having trouble with?


> 3.  Are only filters and projections pushed down to the data source and
> all the data pulled into Spark for other processing?
>

For now, this is all that is provided by the public stable API.  We left a
hook for more powerful push downs
(sqlContext.experimental.extraStrategies), and would be interested in
feedback on other operations we should push down as we expand the API.