You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Aditya <ad...@gmail.com> on 2022/05/05 06:39:13 UTC

Disable/Remove datasources in Spark

Hi,
I am trying to force all users to use only 1 datasource (A custom
datasource I plan to write) to read/write data.

So, I was looking at the DataSource api in Spark:
1. I was able to figure out how to create my own Datasource (Reference
<https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala>
)
2. But I am not able to figure out how to "disable" all other data sources

Is there any way by which we can disable the existing datasources or some
other approach to override the existing datasources?

Re: Disable/Remove datasources in Spark

Posted by wilson <wi...@4shield.net>.
btw, I use drill to query webserver log only, b/c drill has that a 
storage plugin for httpd server log.

but I found spark is also convenient to query webserver log for which I 
wrote a note:

https://notes.4shield.net/how-to-query-webserver-log-with-spark.html

Thanks

wilson wrote:
> though this is off-topic. but Apache Drill can does that. for instance, 
> you can keep only the csv storage plugin in the configuration, but 
> remove all other storage plugins. then users on drill can query csv only.
> 
> regards
> 
> 
> Aditya wrote:
>> So, is there a way for me to get a list of "leaf" dataframes/RDD that 
>> they are using in their logic ?
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Disable/Remove datasources in Spark

Posted by wilson <wi...@4shield.net>.
though this is off-topic. but Apache Drill can does that. for instance, 
you can keep only the csv storage plugin in the configuration, but 
remove all other storage plugins. then users on drill can query csv only.

regards


Aditya wrote:
> So, is there a way for me to get a list of "leaf" dataframes/RDD that 
> they are using in their logic ?

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Disable/Remove datasources in Spark

Posted by Aditya <ad...@gmail.com>.
My understanding is if I can disable a parquet datasource,  the user will
get an error when they try spark.read.parquet()

To give context my main objective is that I provide a few dataframes to my
users, and I don't want them to be able to access any data other than these
specific dataframes.
So, is there a way for me to get a list of "leaf" dataframes/RDD that they
are using in their logic ?

Would appreciate any other approaches/inputs to handle this.

Thanks


On Thu, May 5, 2022 at 1:08 PM wilson <wi...@4shield.net> wrote:

> it's maybe impossible to disable that? user can run spark.read... to
> read any datasource he can reach.
>
>
> Aditya wrote:
> > 2. But I am not able to figure out how to "disable" all other data
> sources
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Re: Disable/Remove datasources in Spark

Posted by wilson <wi...@4shield.net>.
it's maybe impossible to disable that? user can run spark.read... to 
read any datasource he can reach.


Aditya wrote:
> 2. But I am not able to figure out how to "disable" all other data sources

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org