You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by John Omernik <jo...@omernik.com> on 2016/05/25 12:05:57 UTC

Discussion - "Hidden" Workspaces

Prior to opening a JIRA on this, I was curious what the community thought.
  I'd like to have a setting for workspaces that would indicate "hidden".
 (Defaulting to false if not specified to not break any already implemented
workspace definitions)

For example:

"workspaces" {
   "dev": {
       "location": "/mydev",
       "writable": true,
       "defaultInputFormat": null,
       "hidden": true
      }
 }

This would have the effect that when running "show schemas" this workspace
would not show up in the list.

Reasoning:  When organizing a large enterprise data
lake/ocean/cistern/swamp, limited "functional" options provided to the user
are better then "all" the options.   For example, as an administrator, I
may want to define workspaces to help clarify ETL processes, or service
loads that if the user HAS filesystem access they CAN access, however, they
will never want to, instead, the user would focused on cleaned/enriched
data.  My users would rarely use the "cp" plugin, however, I don't want to
eliminate it.  Basically, it doesn't show in show schema, but it can still
be used both directly in queries, and through the use command.

Another example: I create home schemas based on a home directory of every
user.  Users's will know it's there, and can easily access it, however,
showing up in "show schemas" doesn't provide value, and just clutters the
data returned in the response.  I want to attempt to provide a clean
interface and depiction of valuable schemas to my user via workspaces, and
this small flag, I believe would be a low impact way to do that.

I would love discussion on this, if others would find this valuable, I will
happily make a JIRA.

John

Re: Discussion - "Hidden" Workspaces

Posted by John Omernik <jo...@omernik.com>.
Ah good points.  I think this also factors into the Workspace Security
topic I bumped up.  Trying to ensure we have the proper tools to
holistically manage our data environment as presented to the user by Drill
I think is important for any admin.

On Wed, May 25, 2016 at 9:34 AM, Andries Engelbrecht <
aengelbrecht@maprtech.com> wrote:

> It is an interesting idea, but may warrant more discussion in the overall
> Drill metadata management.
>
> For example how will it affect other SPs that are not DFS?
> How will it be represented/managed in INFORMATION_SCHEMA when tools are
> used to work with Drill metadata?
>
> I support that this is a good idea, but we need to take all the aspects in
> consideration as Drill is a very powerful tool for data discovery and need
> to consider the overall ecosystem.
>
> --Andries
>
>
> > On May 25, 2016, at 5:05 AM, John Omernik <jo...@omernik.com> wrote:
> >
> > Prior to opening a JIRA on this, I was curious what the community
> thought.
> >  I'd like to have a setting for workspaces that would indicate "hidden".
> > (Defaulting to false if not specified to not break any already
> implemented
> > workspace definitions)
> >
> > For example:
> >
> > "workspaces" {
> >   "dev": {
> >       "location": "/mydev",
> >       "writable": true,
> >       "defaultInputFormat": null,
> >       "hidden": true
> >      }
> > }
> >
> > This would have the effect that when running "show schemas" this
> workspace
> > would not show up in the list.
> >
> > Reasoning:  When organizing a large enterprise data
> > lake/ocean/cistern/swamp, limited "functional" options provided to the
> user
> > are better then "all" the options.   For example, as an administrator, I
> > may want to define workspaces to help clarify ETL processes, or service
> > loads that if the user HAS filesystem access they CAN access, however,
> they
> > will never want to, instead, the user would focused on cleaned/enriched
> > data.  My users would rarely use the "cp" plugin, however, I don't want
> to
> > eliminate it.  Basically, it doesn't show in show schema, but it can
> still
> > be used both directly in queries, and through the use command.
> >
> > Another example: I create home schemas based on a home directory of every
> > user.  Users's will know it's there, and can easily access it, however,
> > showing up in "show schemas" doesn't provide value, and just clutters the
> > data returned in the response.  I want to attempt to provide a clean
> > interface and depiction of valuable schemas to my user via workspaces,
> and
> > this small flag, I believe would be a low impact way to do that.
> >
> > I would love discussion on this, if others would find this valuable, I
> will
> > happily make a JIRA.
> >
> > John
>
>

Re: Discussion - "Hidden" Workspaces

Posted by Andries Engelbrecht <ae...@maprtech.com>.
It is an interesting idea, but may warrant more discussion in the overall Drill metadata management.

For example how will it affect other SPs that are not DFS?
How will it be represented/managed in INFORMATION_SCHEMA when tools are used to work with Drill metadata?

I support that this is a good idea, but we need to take all the aspects in consideration as Drill is a very powerful tool for data discovery and need to consider the overall ecosystem.

--Andries


> On May 25, 2016, at 5:05 AM, John Omernik <jo...@omernik.com> wrote:
> 
> Prior to opening a JIRA on this, I was curious what the community thought.
>  I'd like to have a setting for workspaces that would indicate "hidden".
> (Defaulting to false if not specified to not break any already implemented
> workspace definitions)
> 
> For example:
> 
> "workspaces" {
>   "dev": {
>       "location": "/mydev",
>       "writable": true,
>       "defaultInputFormat": null,
>       "hidden": true
>      }
> }
> 
> This would have the effect that when running "show schemas" this workspace
> would not show up in the list.
> 
> Reasoning:  When organizing a large enterprise data
> lake/ocean/cistern/swamp, limited "functional" options provided to the user
> are better then "all" the options.   For example, as an administrator, I
> may want to define workspaces to help clarify ETL processes, or service
> loads that if the user HAS filesystem access they CAN access, however, they
> will never want to, instead, the user would focused on cleaned/enriched
> data.  My users would rarely use the "cp" plugin, however, I don't want to
> eliminate it.  Basically, it doesn't show in show schema, but it can still
> be used both directly in queries, and through the use command.
> 
> Another example: I create home schemas based on a home directory of every
> user.  Users's will know it's there, and can easily access it, however,
> showing up in "show schemas" doesn't provide value, and just clutters the
> data returned in the response.  I want to attempt to provide a clean
> interface and depiction of valuable schemas to my user via workspaces, and
> this small flag, I believe would be a low impact way to do that.
> 
> I would love discussion on this, if others would find this valuable, I will
> happily make a JIRA.
> 
> John


Re: Discussion - "Hidden" Workspaces

Posted by Charles Givre <cg...@gmail.com>.
+2
I really like this idea.  
—C

> On May 25, 2016, at 08:52, Jim Scott <js...@maprtech.com> wrote:
> 
> +1
> 
> On Wed, May 25, 2016 at 7:05 AM, John Omernik <jo...@omernik.com> wrote:
> 
>> Prior to opening a JIRA on this, I was curious what the community thought.
>>  I'd like to have a setting for workspaces that would indicate "hidden".
>> (Defaulting to false if not specified to not break any already implemented
>> workspace definitions)
>> 
>> For example:
>> 
>> "workspaces" {
>>   "dev": {
>>       "location": "/mydev",
>>       "writable": true,
>>       "defaultInputFormat": null,
>>       "hidden": true
>>      }
>> }
>> 
>> This would have the effect that when running "show schemas" this workspace
>> would not show up in the list.
>> 
>> Reasoning:  When organizing a large enterprise data
>> lake/ocean/cistern/swamp, limited "functional" options provided to the user
>> are better then "all" the options.   For example, as an administrator, I
>> may want to define workspaces to help clarify ETL processes, or service
>> loads that if the user HAS filesystem access they CAN access, however, they
>> will never want to, instead, the user would focused on cleaned/enriched
>> data.  My users would rarely use the "cp" plugin, however, I don't want to
>> eliminate it.  Basically, it doesn't show in show schema, but it can still
>> be used both directly in queries, and through the use command.
>> 
>> Another example: I create home schemas based on a home directory of every
>> user.  Users's will know it's there, and can easily access it, however,
>> showing up in "show schemas" doesn't provide value, and just clutters the
>> data returned in the response.  I want to attempt to provide a clean
>> interface and depiction of valuable schemas to my user via workspaces, and
>> this small flag, I believe would be a low impact way to do that.
>> 
>> I would love discussion on this, if others would find this valuable, I will
>> happily make a JIRA.
>> 
>> John
>> 
> 
> 
> 
> -- 
> *Jim Scott*
> Director, Enterprise Strategy & Architecture
> +1 (347) 746-9281
> @kingmesal <https://twitter.com/kingmesal>
> 
> <http://www.mapr.com/>
> [image: MapR Technologies] <http://www.mapr.com>
> 
> Now Available - Free Hadoop On-Demand Training
> <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>


Re: Discussion - "Hidden" Workspaces

Posted by Jim Scott <js...@maprtech.com>.
+1

On Wed, May 25, 2016 at 7:05 AM, John Omernik <jo...@omernik.com> wrote:

> Prior to opening a JIRA on this, I was curious what the community thought.
>   I'd like to have a setting for workspaces that would indicate "hidden".
>  (Defaulting to false if not specified to not break any already implemented
> workspace definitions)
>
> For example:
>
> "workspaces" {
>    "dev": {
>        "location": "/mydev",
>        "writable": true,
>        "defaultInputFormat": null,
>        "hidden": true
>       }
>  }
>
> This would have the effect that when running "show schemas" this workspace
> would not show up in the list.
>
> Reasoning:  When organizing a large enterprise data
> lake/ocean/cistern/swamp, limited "functional" options provided to the user
> are better then "all" the options.   For example, as an administrator, I
> may want to define workspaces to help clarify ETL processes, or service
> loads that if the user HAS filesystem access they CAN access, however, they
> will never want to, instead, the user would focused on cleaned/enriched
> data.  My users would rarely use the "cp" plugin, however, I don't want to
> eliminate it.  Basically, it doesn't show in show schema, but it can still
> be used both directly in queries, and through the use command.
>
> Another example: I create home schemas based on a home directory of every
> user.  Users's will know it's there, and can easily access it, however,
> showing up in "show schemas" doesn't provide value, and just clutters the
> data returned in the response.  I want to attempt to provide a clean
> interface and depiction of valuable schemas to my user via workspaces, and
> this small flag, I believe would be a low impact way to do that.
>
> I would love discussion on this, if others would find this valuable, I will
> happily make a JIRA.
>
> John
>



-- 
*Jim Scott*
Director, Enterprise Strategy & Architecture
+1 (347) 746-9281
@kingmesal <https://twitter.com/kingmesal>

<http://www.mapr.com/>
[image: MapR Technologies] <http://www.mapr.com>

Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>