You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Ryan Blue <rb...@netflix.com.INVALID> on 2018/11/30 20:37:01 UTC

Public v2 interface location

Hi everyone,

In the DSv2 sync this week, we discussed adding a new SQL module, sql-api,
that would contain the interfaces for authors to plug in external sources.
The rationale for adding this package is that the common logical plans and
rules to validate those plans should live in Catalyst, but no classes in
catalyst are currently public for plugin authors to extend. Catalyst would
depend on the sql-api module to pull in the interfaces that plugin authors
implement.

I was just working on moving the proposed TableCatalog interface into a new
sql-api module, but I ran into a problem: the new APIs still need to
reference classes in Catalyst, like DataType/StructType, AnalysisException,
and Statistics.

I don’t think it makes sense to move all of the referenced classes into
sql-api as well, but I could be convinced otherwise. If we decide not to
move them, then that leaves us back where we started: we can either expose
the v2 API from the catalyst package, or we can keep the v2 API, logical
plans, and rules in core instead of catalyst.

Anyone want to weigh in with a preference for how to move forward?

rb
-- 
Ryan Blue
Software Engineer
Netflix

Re: Public v2 interface location

Posted by Ryan Blue <rb...@netflix.com.INVALID>.

Jackey,

The proposal to add a sql-api module was based on the need to have the SQL
API classes, like `Table` available to Catalyst so we can have logical
plans and analyzer rules in that module. But, nothing in Catalyst is public
and so it doesn't contain user-implemented APIs. There are 3 options to
solve that problem:

1. Add a module that catalyst depends on with the APIs, sql-api. But I ran
into the problem I described above: needing to depend on Catalyst classes.
2. Add the API to catalyst. The problem is adding publicly available API
classes to a previously non-public module.
3. Add the API to core. The problem here is that it is more difficult to
keep rules and logical plans in catalyst, where I would expect them to be.

I'm not sure which option is the right one, but I no longer think that
option #1 is very promising.

On Fri, Nov 30, 2018 at 10:47 PM JackyLee <qc...@163.com> wrote:

> Hi, Ryan Blue.
>
> I don't think it would be a good idea to add the sql-api module.
> I prefer to add sql-api to sql/core. The sql is just another representation
> of dataset, thus there is no need to add new module to do this. Besides, it
> would be easier to add sql-api in core.
>
> By the way, I don't think it's a good time to add sql api, we have not yet
> determined many details of the DataSource V2 API.
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: Public v2 interface location

Posted by JackyLee <qc...@163.com>.

Hi, Ryan Blue.

I don't think it would be a good idea to add the sql-api module. 
I prefer to add sql-api to sql/core. The sql is just another representation
of dataset, thus there is no need to add new module to do this. Besides, it
would be easier to add sql-api in core.

By the way, I don't think it's a good time to add sql api, we have not yet
determined many details of the DataSource V2 API.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org