You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Lavelle, Shawn" <Sh...@osii.com> on 2020/08/03 12:27:18 UTC

DataSource API v2 & Spark-SQL

Hello Spark community,
   I have a custom datasource in v1 API that I'm trying to port to v2 API, in Java.  Currently I have a DataSource registered via catalog.createTable(name, <package>, schema, options map).  When trying to do this in data source API v2, I get an error saying my class (package) isn't a valid data source Can you help me out?

Spark versions are 3.0.0 w/scala 2.12, artifacts are Spark-core, spark-sql, spark-hive, spark-hive-thriftserver, spark-catalyst

Here's what the dataSource definition:  public class LogTableSource implements  TableProvider,  SupportsRead,  DataSourceRegister, Serializable

I'm guessing that I am missing one of the required interfaces. Note, I did try this with using the LogTableSource below as "DefaultSource" but the behavior is the same.  Also, I keep reading about a DataSourceV2 Marker Interface, but it seems deprecated?

Also, I tried to add DataSourceV2ScanRelation but that won't compile:
Output() in DataSourceV2ScanRelation cannot override Output() in QueryPlan return type Seq<AttributeReference> is not compatible with Seq<Attribute>

  I'm fairly stumped - everything I've read online says there's a marker interface of some kind and yet I can't find it in my package list.

  Looking forward to hearing from you,

~ Shawn







[OSI]
Shawn Lavelle

Software Development

4101 Arrowhead Drive
Medina, Minnesota 55340-9457
Phone: 763 551 0559
Email: Shawn.Lavelle@osii.com
Website: www.osii.com<https://www.osii.com>

RE: DataSource API v2 & Spark-SQL

Posted by "Lavelle, Shawn" <Sh...@osii.com.INVALID>.
Thanks for following up, I will give this a go!

 ~  Shawn

-----Original Message-----
From: roizaig <ro...@gmail.com>
Sent: Thursday, April 29, 2021 7:42 AM
To: user@spark.apache.org
Subject: Re: DataSource API v2 & Spark-SQL

You can create a custom data source following  this blog <http://roizaig.blogspot.com/2021/04/create-custom-data-source-with-spark-3x.html>
. It shows how to read a java log file using spark v3 api as an example.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org




[OSI Logo]
Shawn Lavelle

Software Development

OSI Digital Grid Solutions
4101 Arrowhead Drive
Medina, Minnesota 55340-9457
Phone: 763 551 0559
Email: Shawn.Lavelle@osii.com
Website: www.osii.com<https://www.osii.com>
[Emerson Logo]
We are proud to
now be a part of
Emerson.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: DataSource API v2 & Spark-SQL

Posted by roizaig <ro...@gmail.com>.
You can create a custom data source following  this blog
<http://roizaig.blogspot.com/2021/04/create-custom-data-source-with-spark-3x.html> 
. It shows how to read a java log file using spark v3 api as an example. 



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


RE: DataSource API v2 & Spark-SQL

Posted by "Lavelle, Shawn" <Sh...@osii.com>.
Thanks for clarifying, Russel.  Is spark native catalog reference on the roadmap for dsv2 or should I be trying to use something else?

~ Shawn

From: Russell Spitzer [mailto:russell.spitzer@gmail.com]
Sent: Monday, August 3, 2020 8:27 AM
To: Lavelle, Shawn <Sh...@osii.com>
Cc: user <us...@spark.apache.org>
Subject: Re: DataSource API v2 & Spark-SQL

<<<< EXTERNAL email. Do not open links or attachments unless you recognize the sender. If suspicious report here<https://osiinet/Global/IMS/SitePages/Reporting.aspx>. >>>>

That's a bad error message. Basically you can't make a spark native catalog reference for a dsv2 source. You have to use that Datasources catalog or use the programmatic API. Both dsv1 and dsv2 programattic apis work (plus or minus some options)

On Mon, Aug 3, 2020, 7:28 AM Lavelle, Shawn <Sh...@osii.com>> wrote:
Hello Spark community,
   I have a custom datasource in v1 API that I’m trying to port to v2 API, in Java.  Currently I have a DataSource registered via catalog.createTable(name, <package>, schema, options map).  When trying to do this in data source API v2, I get an error saying my class (package) isn’t a valid data source Can you help me out?

Spark versions are 3.0.0 w/scala 2.12, artifacts are Spark-core, spark-sql, spark-hive, spark-hive-thriftserver, spark-catalyst

Here’s what the dataSource definition:  public class LogTableSource implements  TableProvider,  SupportsRead,  DataSourceRegister, Serializable

I’m guessing that I am missing one of the required interfaces. Note, I did try this with using the LogTableSource below as “DefaultSource” but the behavior is the same.  Also, I keep reading about a DataSourceV2 Marker Interface, but it seems deprecated?

Also, I tried to add DataSourceV2ScanRelation but that won’t compile:
Output() in DataSourceV2ScanRelation cannot override Output() in QueryPlan return type Seq<AttributeReference> is not compatible with Seq<Attribute>

  I’m fairly stumped – everything I’ve read online says there’s a marker interface of some kind and yet I can’t find it in my package list.

  Looking forward to hearing from you,

~ Shawn





[Image removed by sender. OSI]
Shawn Lavelle

Software Development

4101 Arrowhead Drive
Medina, Minnesota 55340-9457
Phone: 763 551 0559
Email: Shawn.Lavelle@osii.com<ma...@osii.com>
Website: www.osii.com<https://www.osii.com>

Re: DataSource API v2 & Spark-SQL

Posted by Russell Spitzer <ru...@gmail.com>.
That's a bad error message. Basically you can't make a spark native catalog
reference for a dsv2 source. You have to use that Datasources catalog or
use the programmatic API. Both dsv1 and dsv2 programattic apis work (plus
or minus some options)

On Mon, Aug 3, 2020, 7:28 AM Lavelle, Shawn <Sh...@osii.com> wrote:

> Hello Spark community,
>
>    I have a custom datasource in v1 API that I’m trying to port to v2 API,
> in Java.  Currently I have a DataSource registered via
> catalog.createTable(name, <package>, schema, options map).  When trying to
> do this in data source API v2, I get an error saying my class (package)
> isn’t a valid data source Can you help me out?
>
>
>
> Spark versions are 3.0.0 w/scala 2.12, artifacts are Spark-core,
> spark-sql, spark-hive, spark-hive-thriftserver, spark-catalyst
>
>
>
> Here’s what the dataSource definition:  *public class LogTableSource
> implements  TableProvider,  SupportsRead,  DataSourceRegister, Serializable*
>
>
>
> I’m guessing that I am missing one of the required interfaces. Note, I did
> try this with using the LogTableSource below as “DefaultSource” but the
> behavior is the same.  Also, I keep reading about a DataSourceV2 Marker
> Interface, but it seems deprecated?
>
>
>
> Also, I tried to add *DataSourceV2ScanRelation* but that won’t compile:
>
> Output() in DataSourceV2ScanRelation cannot override Output() in QueryPlan
> return type Seq<AttributeReference> is not compatible with Seq<Attribute>
>
>
>
>   I’m fairly stumped – everything I’ve read online says there’s a marker
> interface of some kind and yet I can’t find it in my package list.
>
>
>
>   Looking forward to hearing from you,
>
>
>
> ~ Shawn
>
>
>
>
>
>
>
>
> [image: OSI]
> Shawn Lavelle
>
> Software Development
>
> 4101 Arrowhead Drive
> Medina, Minnesota 55340-9457
> Phone: 763 551 0559
> *Email:* Shawn.Lavelle@osii.com
> *Website:* www.osii.com
>