You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Michael Nitschinger (JIRA)" <ji...@apache.org> on 2015/06/26 12:21:04 UTC

[jira] [Updated] (SPARK-8655) DataFrameReader#option supports more than String as value

     [ https://issues.apache.org/jira/browse/SPARK-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Nitschinger updated SPARK-8655:
---------------------------------------
    Description: 
I'm working on a custom data source, porting it from 1.3 to 1.4.

On 1.3 I could easily extend the SparkSQL imports and get access to it, which meant I could use custom options right away. One of those is I pass a Filter down to my Relation for tighter schema inference against a schemaless database.

So I would have something like:

n1ql(filter: Filter = null, userSchema: StructType = null, bucketName: String = null)

Since I want to move my API behind the DataFrameReader, the SQLContext is not available anymore, only through the RelationProvider, which I've implemented and it works nicely.

The only problem I have now is that while I can pass in custom options, they are all String typed. So I have no way to pass down my optional Filter anymore (since parameters is a Map[String, String]).

Would it be possible to extend the options so that more than just Strings can be passed in? Right now I probably need to work around that by documenting how people can pass in a string which I turn into a Filter, but that's somewhat hacky.

Note that built-in impls like JSON or JDBC have no issues, because since they can access the SQLContext (private) without issues, they don't need to go through the decoupling of the RelationProvider and can do any custom arguments they want on their methods.

  was:
I'm working on a custom data source, porting it from 1.3 to 1.4.

On 1.3 I could easily extend the SparkSQL imports and get access to it, which meant I could use custom options right away. One of those is I pass a Filter down to my Relation for tighter schema inference against a schemaless database.

So I would have something like:

n1ql(filter: Filter = null, userSchema: StructType = null, bucketName: String = null)

Since I want to move my API behind the DataFrameReader, the SQLContext is not available anymore, only through the RelationProvider, which I've implemented and it works nicely.

The only problem I have now is that while I can pass in custom options, they are all String typed. So I have no way to pass down my optional Filter anymore (since parameters is a Map[String, String]).

Would it be possible to extend the options so that more than just Strings can be passed in? Right now I probably need to work around that by documenting how people can pass in a string which I turn into a Filter, but that's somewhat hacky.


> DataFrameReader#option supports more than String as value
> ---------------------------------------------------------
>
>                 Key: SPARK-8655
>                 URL: https://issues.apache.org/jira/browse/SPARK-8655
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.4.0
>            Reporter: Michael Nitschinger
>
> I'm working on a custom data source, porting it from 1.3 to 1.4.
> On 1.3 I could easily extend the SparkSQL imports and get access to it, which meant I could use custom options right away. One of those is I pass a Filter down to my Relation for tighter schema inference against a schemaless database.
> So I would have something like:
> n1ql(filter: Filter = null, userSchema: StructType = null, bucketName: String = null)
> Since I want to move my API behind the DataFrameReader, the SQLContext is not available anymore, only through the RelationProvider, which I've implemented and it works nicely.
> The only problem I have now is that while I can pass in custom options, they are all String typed. So I have no way to pass down my optional Filter anymore (since parameters is a Map[String, String]).
> Would it be possible to extend the options so that more than just Strings can be passed in? Right now I probably need to work around that by documenting how people can pass in a string which I turn into a Filter, but that's somewhat hacky.
> Note that built-in impls like JSON or JDBC have no issues, because since they can access the SQLContext (private) without issues, they don't need to go through the decoupling of the RelationProvider and can do any custom arguments they want on their methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org