You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Charles Givre <cg...@gmail.com> on 2018/12/17 16:53:47 UTC

Drill on YARN Questions

Hello all, 
We are trying to set up a Drill cluster on our corporate data lake.  Our cluster requires dynamic YARN queue allocation for multi-tenant environment.  Is this something that Drill supports or is there a workaround?
Thanks!
—C

Re: Drill on YARN Questions

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Hi Charles,
I'm not quite sure what "dynamic queue allocation" means: all YARN containers are allocated dynamically through YARN via queues. 
It may be helpful to review how Drill-on-YARN (DoY) works. DoY does NOT attempt to use YARN for each query. Impala tried that with Llama and discovered that the latency of YARN allocation is not compatible with the needs of a query engine: YARN takes 10s of seconds to launch containers, queries must complete in fractions of a second.
Instead, DoY treats YARN as a resource manger for long-running applications. Think of it as an old-school Kubernetes. That is, you use YARN to launch Drill, and to account for the cluster resources used by Drill. The DoY UI allows you to grow/shrink the cluster, which turns around and asks YARN for more or fewer containers.
(Jyothsna; we should integrate your graceful shutdown work into DoY for cluster shrinking.)
To be very clear, Drill is long-running and clusters grow or shrink over long periods of time (perhaps over a day: more Drill during the day, less at night.) Queries are rapid-fire and run on the available Drillbits.

DoY is designed for a multi-tenant setup. The only trick is that each tenant cluster must assign distinct ports and ZK roots. The details are spelled out in the DoY docs. (K8s avoids the need for mucking with ports via an overlay network, something that YARN does not provide.)

IIf the above leaves questions open, please do provide a bit more detail about what you want to achieve.

Thanks,
- Paul

 

    On Monday, December 17, 2018, 8:53:54 AM PST, Charles Givre <cg...@gmail.com> wrote:  
 
 Hello all, 
We are trying to set up a Drill cluster on our corporate data lake.  Our cluster requires dynamic YARN queue allocation for multi-tenant environment.  Is this something that Drill supports or is there a workaround?
Thanks!
—C  

Re: Drill on YARN Questions

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Hi Charles,
I'm not quite sure what "dynamic queue allocation" means: all YARN containers are allocated dynamically through YARN via queues. 
It may be helpful to review how Drill-on-YARN (DoY) works. DoY does NOT attempt to use YARN for each query. Impala tried that with Llama and discovered that the latency of YARN allocation is not compatible with the needs of a query engine: YARN takes 10s of seconds to launch containers, queries must complete in fractions of a second.
Instead, DoY treats YARN as a resource manger for long-running applications. Think of it as an old-school Kubernetes. That is, you use YARN to launch Drill, and to account for the cluster resources used by Drill. The DoY UI allows you to grow/shrink the cluster, which turns around and asks YARN for more or fewer containers.
(Jyothsna; we should integrate your graceful shutdown work into DoY for cluster shrinking.)
To be very clear, Drill is long-running and clusters grow or shrink over long periods of time (perhaps over a day: more Drill during the day, less at night.) Queries are rapid-fire and run on the available Drillbits.

DoY is designed for a multi-tenant setup. The only trick is that each tenant cluster must assign distinct ports and ZK roots. The details are spelled out in the DoY docs. (K8s avoids the need for mucking with ports via an overlay network, something that YARN does not provide.)

IIf the above leaves questions open, please do provide a bit more detail about what you want to achieve.

Thanks,
- Paul

 

    On Monday, December 17, 2018, 8:53:54 AM PST, Charles Givre <cg...@gmail.com> wrote:  
 
 Hello all, 
We are trying to set up a Drill cluster on our corporate data lake.  Our cluster requires dynamic YARN queue allocation for multi-tenant environment.  Is this something that Drill supports or is there a workaround?
Thanks!
—C  

Re: Drill on YARN Questions

Posted by Kwizera hugues Teddy <nb...@gmail.com>.
Hi Paul Rogers,

Sorry, your answer didn't give me the information I need.
When @Charles  told "YARN queue allocation " I understand  The
<https://stackoverflow.com/questions/26546613/what-is-the-difference-between-the-fair-and-capacity-schedulers>
"*fair and capacity schedulers*",

2 important feature for a multi-tenant cluster.

Thanks,

Kwizera

On Tue, Dec 18, 2018 at 4:01 AM Paul Rogers <pa...@yahoo.com.invalid>
wrote:

> Hi Kwizera,
> I hope my answer to Charles gave you the information you need. If not,
> please check out the DoY documentation or ask follow-up questions.
> Key thing to remember: Drill is a long-running YARN service; queries DO
> NOT go through YARN queues, they go through Drill directly.
>
> Thanks,
> - Paul
>
>
>
>     On Monday, December 17, 2018, 11:01:04 AM PST, Kwizera hugues Teddy <
> nbted2017@gmail.com> wrote:
>
>  Hello,
> Same questions ,
> I would like to know how drill deal with this yarn fonctionality?
> Cheers.
>
> On Mon, Dec 17, 2018, 17:53 Charles Givre <cgivre@gmail.com wrote:
>
> > Hello all,
> > We are trying to set up a Drill cluster on our corporate data lake.  Our
> > cluster requires dynamic YARN queue allocation for multi-tenant
> > environment.  Is this something that Drill supports or is there a
> > workaround?
> > Thanks!
> > —C

Re: Drill on YARN Questions

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Hi,

Sorry for the delay, was traveling.

If you can access the web UI from a browser, this means that 1) your server has the right certificates, and 2) your browser has the correct files to validate the certificates.

If TLS does not work from Java, then this may simply mean that you have to configure TLS for the Java install used to run the DoY command line. I'm not an expert on this, but many articles exist. Basically, you need to ensure that the certificate for your signing authority is available to Java. If you are using self-signed certificates, you need the authority for your internal certificates.

I'm skating on thin ice here as I've not done this in a while. Abhishek or Sorabh, can you provide more details?

The way to check this is to write a very simple Java program that sends that resize URL to the DoY app master. If you get that to work in your own test app, then it will work in the DoY client. All the DoY client does for resize is parse some the command, work though some config settings, and issue a REST call to the AM. 

Thanks,
- Paul

 

    On Tuesday, January 15, 2019, 2:39:40 AM PST, Kwizera hugues Teddy <nb...@gmail.com> wrote:  
 
 hello Paul,

Yes, I can access AM web UI.

I remark that the problem it's caused by SSL/TLS access Enabled( ssl-enabled:
true)_.

- https://xxxxxxxxx:10048/rest/status  : work fine in the browser

I think I have to deal with certificates on AM host. Do you have an idea?

Thanks.

On Mon, Jan 14, 2019 at 4:53 PM Paul Rogers <pa...@yahoo.com.invalid>
wrote:

> Hi,
>
> Can you reach the AM web UI? The Web UI URL was shown below. It also
> should have been given when you started DoY.
>
> I notice that you're using SSL/TLS access. Doing so requires the right
> certificates on the AM host. Again, trying to connect via your browser may
> help identify if that works.
>
> If the Web UI works, then check the host name and port number in your
> browser compared to that shown in the error message.
>
> The resize command on the command line does nothing other than some
> validation, then it sends the URL shown below. You can try entering the URL
> directly into your browser. Again, if that fails, there is something amiss
> with your config. If that works, then we'll have to figure out what might
> be wrong with the DoY command line tool.
>
> Please try out the above and let us know what you learn.
>
> Thanks,
> - Paul
>
>
>
>    On Monday, January 14, 2019, 7:30:44 AM PST, Kwizera hugues Teddy <
> nbted2017@gmail.com> wrote:
>
>  Hello all,
>
> I am experiencing an error on Resize and Status .
> The errors are from the REST call on the AM.
>
> command : $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE status
> Result:
> Application ID: xxxxxxxxxxxxxxxx Application State: RUNNING Host:
> xxxxxxxxxxxxxxxx Queue: root.xxxxx.default User: xxxxxxxx Start Time:
> 2019-01-14 14:56:29 Application Name: Drill-on-YARN-cluster_01 Tracking
> URL: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Failed to get AM
> status
> REST request failed: https://xxxxxxxxxxxxxxx:9048/rest/status
>
> Command : $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE resize
> Result :
>      Resizing cluster for Application ID:
> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>      Resize failed: REST request failed:
> https://xxxxxxxxxxxxxxx:9048/rest/shrink/1
>
>  I didn't found how I can resolve this issue. maybe someone can help me
>
> Thanks.
>
>
>
> On Sat, Jan 12, 2019 at 8:30 AM Kwizera hugues Teddy <nb...@gmail.com>
> wrote:
>
> > Hello ,
> >
> > Other option work .
> >
> > As you say an update is needed in docs  and the remove of wrong
> > information.
> >
> > Thanks.
> >
> > On Sat, Jan 12, 2019, 08:10 Abhishek Girish <agirish@apache.org wrote:
> >
> >> Hello Teddy,
> >>
> >> I don't recollect a restart option for the drill-on-yarn.sh script. I've
> >> always used a combination of stop and start, like Paul mentions. Could
> you
> >> please try if that works and get back to us? We could certainly have a
> >> minor enhancement to support restart - until then i'll request Bridget
> to
> >> update the documentation.
> >>
> >> Regards,
> >> Abhishek
> >>
> >> On Fri, Jan 11, 2019 at 11:05 PM Kwizera hugues Teddy <
> >> nbted2017@gmail.com>
> >> wrote:
> >>
> >> > Hello Paul ,
> >> >
> >> > Thanks you for your response with some interesting information(files
> in
> >> > /tmp).
> >> >
> >> > For my side all other command line  work
> normally(start|stop|status...|)
> >> > but no restart(this option not recognized). I tried to search the code
> >> > source and I found that the restart command is not implemented . then
> I
> >> > wonder why the documentation does not match the source code ?.
> >> >
> >> > Thanks .Teddy
> >> >
> >> >
> >> > On Sat, Jan 12, 2019, 02:39 Paul Rogers <par0328@yahoo.com.invalid
> >> wrote:
> >> >
> >> > > Let's try to troubleshoot. Does the combination of stop and start
> >> work?
> >> > If
> >> > > so, then there could be a bug with the restart command itself.
> >> > >
> >> > > If neither start nor stop work, it could be that you are missing the
> >> > > application ID file created when you first started DoY. Some
> >> background.
> >> > >
> >> > > When we submit an app to YARN, YARN gives us an app ID. We need this
> >> in
> >> > > order to track down the app master for DoY so we can send it
> commands
> >> > later.
> >> > >
> >> > > When the command line tool starts DoY, it writes the YARN app ID to
> a
> >> > > file. Can't remember the details, but it is probably in the
> >> $DRILL_SITE
> >> > > directory. The contents are, as I recall, a long hexadecimal string.
> >> > >
> >> > > When you invoke the command line, the tool reads this file to figure
> >> to
> >> > > track down the DoY app master. The tool then sends commands to the
> app
> >> > > master: in this case, a request to shut down. Then, for reset, the
> >> tool
> >> > > will communicate with YARN to start a new instance.
> >> > >
> >> > > The tool is suppose to give detailed error messages. Did you get
> any?
> >> > That
> >> > > might tell us which of these steps failed.
> >> > >
> >> > > Can you connect to the DoY Web UI at the URL provided when you
> started
> >> > > DoY? If you can, this means that the DoY App Master is up and
> running.
> >> > >
> >> > > Are you running the client from the same node on which you started
> it?
> >> > > That file I mentioned is local to the "DoY client" machine; it is
> not
> >> in
> >> > > DFS.
> >> > >
> >> > > Then, there is one more very obscure bug you can check. On some
> >> > > distributions, the YARN task files are written to the /tmp
> directory.
> >> > Some
> >> > > Linux systems remove these files from time to time. Once the files
> are
> >> > > gone, YARN can no longer control its containers: it won't be able to
> >> stop
> >> > > the app master or the Drillbit containers. There are two fixes.
> >> First, go
> >> > > kill all the processes by hand. Then, move the YARN state files out
> of
> >> > > /tmp, or exclude YARN's files from the periodic cleanup.
> >> > >
> >> > > Try some of the above and let us know what you find.
> >> > >
> >> > > Also, perhaps Abhishek can offer some suggestions as he tested the
> >> heck
> >> > > out of the feature and may have additional suggestions.
> >> > >
> >> > > Thanks,
> >> > > - Paul
> >> > >
> >> > >
> >> > >
> >> > >    On Friday, January 11, 2019, 7:46:55 AM PST, Kwizera hugues Teddy
> >> <
> >> > > nbted2017@gmail.com> wrote:
> >> > >
> >> > >  hello,
> >> > >
> >> > >  2 weeks ago, I began to discover DoY. Today by reading drill
> >> documents (
> >> > > https://drill.apache.org/docs/appendix-a-release-note-issues/ ) I
> saw
> >> > that
> >> > > we can restart drill cluster by :
> >> > >
> >> > >  $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE restart
> >> > >
> >> > > But doesn't work when I tested it.
> >> > >
> >> > > No idea about it?
> >> > >
> >> > > Thanks.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > On Wed, Jan 2, 2019 at 3:18 AM Paul Rogers
> <par0328@yahoo.com.invalid
> >> >
> >> > > wrote:
> >> > >
> >> > > > Hi Charles,
> >> > > >
> >> > > > Your engineers have identified a common need, but one which is
> very
> >> > > > difficult to satisfy.
> >> > > >
> >> > > > TL;DR: DoY gets as close to the requirements as possible within
> the
> >> > > > constraints of YARN and Drill. But, future projects could do more.
> >> > > >
> >> > > > Your engineers want resource segregation among tenants:
> >> multi-tenancy.
> >> > > > This is very difficult to achieve at the application level.
> Consider
> >> > > Drill.
> >> > > > It would need some way to identify users to know which tenant they
> >> > belong
> >> > > > to. Then, Drill would need a way to enqueue users whose queries
> >> would
> >> > > > exceed the memory or CPU limit for that tenant. Plus, Drill would
> >> have
> >> > to
> >> > > > be able to limit memory and CPU for each query. Much work has been
> >> done
> >> > > to
> >> > > > limit memory, but CPU is very difficult. Mature products such as
> >> > Teradata
> >> > > > can do this, but Teradata has 40 years of effort behind it.
> >> > > >
> >> > > > Since it is hard to build multi-tenancy in at the app level (not
> >> > > > impossible, just very, very hard), the thought is to apply it at
> the
> >> > > > cluster level. This is done in YARN via limiting the resources
> >> > available
> >> > > to
> >> > > > processes (typically map/reduce) and to limit the number of
> running
> >> > > > processes. Works for M/R because each map task uses disk to
> shuffle
> >> > > results
> >> > > > to a reduce task, so map and reduce tasks can run asynchronously.
> >> > > >
> >> > > > For tools such as Drill, which do in-memory processing (really,
> >> > > > across-the-network exchanges), both the sender and receiver have
> to
> >> run
> >> > > > concurrently. This is much harder to schedule than async m/r
> tasks:
> >> it
> >> > > > means that the entire Drill cluster (of whatever size) be up and
> >> > running
> >> > > to
> >> > > > run a query.
> >> > > >
> >> > > > The start-up time for Drill is far, far longer than a query. So,
> it
> >> is
> >> > > not
> >> > > > feasible to use YARN to launch a Drill cluster for each query the
> >> way
> >> > you
> >> > > > would do with Spark. Instead, under YARN, Drill is a long running
> >> > service
> >> > > > that handles many queries.
> >> > > >
> >> > > > Obviously, this is not ideal: I'm sure your engineers want to use
> a
> >> > > > tenant's resources for Drill when running queries, else for Spark,
> >> > Hive,
> >> > > or
> >> > > > maybe TensorFlow. If Drill has to be long-running, I'm sure they's
> >> like
> >> > > to
> >> > > > slosh resources between tenants as is done in YARN. As noted
> above,
> >> > this
> >> > > is
> >> > > > a hard problem that DoY did not attempt to solve.
> >> > > >
> >> > > > One might suggest that Drill grab resources from YARN when Tenant
> A
> >> > wants
> >> > > > to run a query, and release them when that tenant is done,
> grabbing
> >> new
> >> > > > resources when Tenant B wants to run. Impala tried this with Llama
> >> and
> >> > > > found it did not work. (This is why DoY is quite a bit simpler; no
> >> > reason
> >> > > > to rerun a failed experiment.)
> >> > > >
> >> > > > Some folks are looking to Kubernetes (K8s) as a solution. But,
> that
> >> > just
> >> > > > replaces YARN with K8s: Drill is still a long-running process.
> >> > > >
> >> > > > To solve the problem you identify, you'll need either:
> >> > > >
> >> > > > * A bunch of work in Drill to build multi-tenancy into Drill, or
> >> > > > * A cloud-like solution in which each tenant spins up a Drill
> >> cluster
> >> > > > within its budget, spinning it down, or resizing it, to stay with
> an
> >> > > > overall budget.
> >> > > >
> >> > > > The second option can be achieved under YARN with DoY, assuming
> that
> >> > DoY
> >> > > > added support for graceful shutdown (or the cluster is reduced in
> >> size
> >> > > only
> >> > > > when no queries are active.) Longer-term, a more modern solution
> >> would
> >> > be
> >> > > > Drill-on-Kubernetes (DoK?) which Abhishek started on.
> >> > > >
> >> > > > Engineering is the art of compromise. The question for your
> >> engineers
> >> > is
> >> > > > how to achieve the best result given the limitations of the
> software
> >> > > > available today. At the same time, helping the Drill community
> >> improve
> >> > > the
> >> > > > solutions over time.
> >> > > >
> >> > > > Thanks,
> >> > > > - Paul
> >> > > >
> >> > > >
> >> > > >
> >> > > >    On Sunday, December 30, 2018, 9:38:04 PM PST, Charles Givre <
> >> > > > cgivre@gmail.com> wrote:
> >> > > >
> >> > > >  Hi Paul,
> >> > > > Here’s what our engineers said:
> >> > > >
> >> > > > From Paul’s response, I understand that there is a slight
> confusion
> >> > > around
> >> > > > how multi-tenancy has been enabled in our data lake.
> >> > > >
> >> > > > Some more details on this –
> >> > > >
> >> > > > Drill already has the concept of multitenancy where we can have
> >> > multiple
> >> > > > drill clusters running on the same data lake enabled through
> >> different
> >> > > > ports and zookeeper. But, all of this is launched through the same
> >> hard
> >> > > > coded yarn queue that we provide as a config parameter.
> >> > > >
> >> > > > In our data lake, each tenant has a certain amount of compute
> >> capacity
> >> > > > allotted to them which they can use for their project work. This
> is
> >> > > > provisioned through individual YARN queues for each tenant
> (resource
> >> > > > caging). This restricts the tenants from using cluster resources
> >> > beyond a
> >> > > > certain limit and not impacting other tenants at the same time.
> >> > > >
> >> > > > Access to these YARN queues is provisioned through ACL
> memberships.
> >> > > >
> >> > > > ——
> >> > > >
> >> > > > Does this make sense?  Is this possible to get Drill to work in
> this
> >> > > > manner, or should we look into opening up JIRAs and working on new
> >> > > > capabilities?
> >> > > >
> >> > > >
> >> > > >
> >> > > > > On Dec 17, 2018, at 21:59, Paul Rogers
> <par0328@yahoo.com.INVALID
> >> >
> >> > > > wrote:
> >> > > > >
> >> > > > > Hi Kwizera,
> >> > > > > I hope my answer to Charles gave you the information you need.
> If
> >> > not,
> >> > > > please check out the DoY documentation or ask follow-up questions.
> >> > > > > Key thing to remember: Drill is a long-running YARN service;
> >> queries
> >> > DO
> >> > > > NOT go through YARN queues, they go through Drill directly.
> >> > > > >
> >> > > > > Thanks,
> >> > > > > - Paul
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >    On Monday, December 17, 2018, 11:01:04 AM PST, Kwizera hugues
> >> > Teddy
> >> > > <
> >> > > > nbted2017@gmail.com> wrote:
> >> > > > >
> >> > > > > Hello,
> >> > > > > Same questions ,
> >> > > > > I would like to know how drill deal with this yarn
> fonctionality?
> >> > > > > Cheers.
> >> > > > >
> >> > > > > On Mon, Dec 17, 2018, 17:53 Charles Givre <cgivre@gmail.com
> >> wrote:
> >> > > > >
> >> > > > >> Hello all,
> >> > > > >> We are trying to set up a Drill cluster on our corporate data
> >> lake.
> >> > > Our
> >> > > > >> cluster requires dynamic YARN queue allocation for multi-tenant
> >> > > > >> environment.  Is this something that Drill supports or is
> there a
> >> > > > >> workaround?
> >> > > > >> Thanks!
> >> > > > >> —C
> >> > > >
> >> >
> >>
> >  

Re: Drill on YARN Questions

Posted by Kwizera hugues Teddy <nb...@gmail.com>.
hello Paul,

Yes, I can access AM web UI.

I remark that the problem it's caused by SSL/TLS access Enabled( ssl-enabled:
true)_.

- https://xxxxxxxxx:10048/rest/status  : work fine in the browser

I think I have to deal with certificates on AM host. Do you have an idea?

Thanks.

On Mon, Jan 14, 2019 at 4:53 PM Paul Rogers <pa...@yahoo.com.invalid>
wrote:

> Hi,
>
> Can you reach the AM web UI? The Web UI URL was shown below. It also
> should have been given when you started DoY.
>
> I notice that you're using SSL/TLS access. Doing so requires the right
> certificates on the AM host. Again, trying to connect via your browser may
> help identify if that works.
>
> If the Web UI works, then check the host name and port number in your
> browser compared to that shown in the error message.
>
> The resize command on the command line does nothing other than some
> validation, then it sends the URL shown below. You can try entering the URL
> directly into your browser. Again, if that fails, there is something amiss
> with your config. If that works, then we'll have to figure out what might
> be wrong with the DoY command line tool.
>
> Please try out the above and let us know what you learn.
>
> Thanks,
> - Paul
>
>
>
>     On Monday, January 14, 2019, 7:30:44 AM PST, Kwizera hugues Teddy <
> nbted2017@gmail.com> wrote:
>
>  Hello all,
>
> I am experiencing an error on Resize and Status .
> The errors are from the REST call on the AM.
>
> command : $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE status
> Result:
> Application ID: xxxxxxxxxxxxxxxx Application State: RUNNING Host:
> xxxxxxxxxxxxxxxx Queue: root.xxxxx.default User: xxxxxxxx Start Time:
> 2019-01-14 14:56:29 Application Name: Drill-on-YARN-cluster_01 Tracking
> URL: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Failed to get AM
> status
> REST request failed: https://xxxxxxxxxxxxxxx:9048/rest/status
>
> Command : $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE resize
> Result :
>       Resizing cluster for Application ID:
> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>       Resize failed: REST request failed:
> https://xxxxxxxxxxxxxxx:9048/rest/shrink/1
>
>  I didn't found how I can resolve this issue. maybe someone can help me
>
> Thanks.
>
>
>
> On Sat, Jan 12, 2019 at 8:30 AM Kwizera hugues Teddy <nb...@gmail.com>
> wrote:
>
> > Hello ,
> >
> > Other option work .
> >
> > As you say an update is needed in docs  and the remove of wrong
> > information.
> >
> > Thanks.
> >
> > On Sat, Jan 12, 2019, 08:10 Abhishek Girish <agirish@apache.org wrote:
> >
> >> Hello Teddy,
> >>
> >> I don't recollect a restart option for the drill-on-yarn.sh script. I've
> >> always used a combination of stop and start, like Paul mentions. Could
> you
> >> please try if that works and get back to us? We could certainly have a
> >> minor enhancement to support restart - until then i'll request Bridget
> to
> >> update the documentation.
> >>
> >> Regards,
> >> Abhishek
> >>
> >> On Fri, Jan 11, 2019 at 11:05 PM Kwizera hugues Teddy <
> >> nbted2017@gmail.com>
> >> wrote:
> >>
> >> > Hello Paul ,
> >> >
> >> > Thanks you for your response with some interesting information(files
> in
> >> > /tmp).
> >> >
> >> > For my side all other command line  work
> normally(start|stop|status...|)
> >> > but no restart(this option not recognized). I tried to search the code
> >> > source and I found that the restart command is not implemented . then
> I
> >> > wonder why the documentation does not match the source code ?.
> >> >
> >> > Thanks .Teddy
> >> >
> >> >
> >> > On Sat, Jan 12, 2019, 02:39 Paul Rogers <par0328@yahoo.com.invalid
> >> wrote:
> >> >
> >> > > Let's try to troubleshoot. Does the combination of stop and start
> >> work?
> >> > If
> >> > > so, then there could be a bug with the restart command itself.
> >> > >
> >> > > If neither start nor stop work, it could be that you are missing the
> >> > > application ID file created when you first started DoY. Some
> >> background.
> >> > >
> >> > > When we submit an app to YARN, YARN gives us an app ID. We need this
> >> in
> >> > > order to track down the app master for DoY so we can send it
> commands
> >> > later.
> >> > >
> >> > > When the command line tool starts DoY, it writes the YARN app ID to
> a
> >> > > file. Can't remember the details, but it is probably in the
> >> $DRILL_SITE
> >> > > directory. The contents are, as I recall, a long hexadecimal string.
> >> > >
> >> > > When you invoke the command line, the tool reads this file to figure
> >> to
> >> > > track down the DoY app master. The tool then sends commands to the
> app
> >> > > master: in this case, a request to shut down. Then, for reset, the
> >> tool
> >> > > will communicate with YARN to start a new instance.
> >> > >
> >> > > The tool is suppose to give detailed error messages. Did you get
> any?
> >> > That
> >> > > might tell us which of these steps failed.
> >> > >
> >> > > Can you connect to the DoY Web UI at the URL provided when you
> started
> >> > > DoY? If you can, this means that the DoY App Master is up and
> running.
> >> > >
> >> > > Are you running the client from the same node on which you started
> it?
> >> > > That file I mentioned is local to the "DoY client" machine; it is
> not
> >> in
> >> > > DFS.
> >> > >
> >> > > Then, there is one more very obscure bug you can check. On some
> >> > > distributions, the YARN task files are written to the /tmp
> directory.
> >> > Some
> >> > > Linux systems remove these files from time to time. Once the files
> are
> >> > > gone, YARN can no longer control its containers: it won't be able to
> >> stop
> >> > > the app master or the Drillbit containers. There are two fixes.
> >> First, go
> >> > > kill all the processes by hand. Then, move the YARN state files out
> of
> >> > > /tmp, or exclude YARN's files from the periodic cleanup.
> >> > >
> >> > > Try some of the above and let us know what you find.
> >> > >
> >> > > Also, perhaps Abhishek can offer some suggestions as he tested the
> >> heck
> >> > > out of the feature and may have additional suggestions.
> >> > >
> >> > > Thanks,
> >> > > - Paul
> >> > >
> >> > >
> >> > >
> >> > >    On Friday, January 11, 2019, 7:46:55 AM PST, Kwizera hugues Teddy
> >> <
> >> > > nbted2017@gmail.com> wrote:
> >> > >
> >> > >  hello,
> >> > >
> >> > >  2 weeks ago, I began to discover DoY. Today by reading drill
> >> documents (
> >> > > https://drill.apache.org/docs/appendix-a-release-note-issues/ ) I
> saw
> >> > that
> >> > > we can restart drill cluster by :
> >> > >
> >> > >  $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE restart
> >> > >
> >> > > But doesn't work when I tested it.
> >> > >
> >> > > No idea about it?
> >> > >
> >> > > Thanks.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > On Wed, Jan 2, 2019 at 3:18 AM Paul Rogers
> <par0328@yahoo.com.invalid
> >> >
> >> > > wrote:
> >> > >
> >> > > > Hi Charles,
> >> > > >
> >> > > > Your engineers have identified a common need, but one which is
> very
> >> > > > difficult to satisfy.
> >> > > >
> >> > > > TL;DR: DoY gets as close to the requirements as possible within
> the
> >> > > > constraints of YARN and Drill. But, future projects could do more.
> >> > > >
> >> > > > Your engineers want resource segregation among tenants:
> >> multi-tenancy.
> >> > > > This is very difficult to achieve at the application level.
> Consider
> >> > > Drill.
> >> > > > It would need some way to identify users to know which tenant they
> >> > belong
> >> > > > to. Then, Drill would need a way to enqueue users whose queries
> >> would
> >> > > > exceed the memory or CPU limit for that tenant. Plus, Drill would
> >> have
> >> > to
> >> > > > be able to limit memory and CPU for each query. Much work has been
> >> done
> >> > > to
> >> > > > limit memory, but CPU is very difficult. Mature products such as
> >> > Teradata
> >> > > > can do this, but Teradata has 40 years of effort behind it.
> >> > > >
> >> > > > Since it is hard to build multi-tenancy in at the app level (not
> >> > > > impossible, just very, very hard), the thought is to apply it at
> the
> >> > > > cluster level. This is done in YARN via limiting the resources
> >> > available
> >> > > to
> >> > > > processes (typically map/reduce) and to limit the number of
> running
> >> > > > processes. Works for M/R because each map task uses disk to
> shuffle
> >> > > results
> >> > > > to a reduce task, so map and reduce tasks can run asynchronously.
> >> > > >
> >> > > > For tools such as Drill, which do in-memory processing (really,
> >> > > > across-the-network exchanges), both the sender and receiver have
> to
> >> run
> >> > > > concurrently. This is much harder to schedule than async m/r
> tasks:
> >> it
> >> > > > means that the entire Drill cluster (of whatever size) be up and
> >> > running
> >> > > to
> >> > > > run a query.
> >> > > >
> >> > > > The start-up time for Drill is far, far longer than a query. So,
> it
> >> is
> >> > > not
> >> > > > feasible to use YARN to launch a Drill cluster for each query the
> >> way
> >> > you
> >> > > > would do with Spark. Instead, under YARN, Drill is a long running
> >> > service
> >> > > > that handles many queries.
> >> > > >
> >> > > > Obviously, this is not ideal: I'm sure your engineers want to use
> a
> >> > > > tenant's resources for Drill when running queries, else for Spark,
> >> > Hive,
> >> > > or
> >> > > > maybe TensorFlow. If Drill has to be long-running, I'm sure they's
> >> like
> >> > > to
> >> > > > slosh resources between tenants as is done in YARN. As noted
> above,
> >> > this
> >> > > is
> >> > > > a hard problem that DoY did not attempt to solve.
> >> > > >
> >> > > > One might suggest that Drill grab resources from YARN when Tenant
> A
> >> > wants
> >> > > > to run a query, and release them when that tenant is done,
> grabbing
> >> new
> >> > > > resources when Tenant B wants to run. Impala tried this with Llama
> >> and
> >> > > > found it did not work. (This is why DoY is quite a bit simpler; no
> >> > reason
> >> > > > to rerun a failed experiment.)
> >> > > >
> >> > > > Some folks are looking to Kubernetes (K8s) as a solution. But,
> that
> >> > just
> >> > > > replaces YARN with K8s: Drill is still a long-running process.
> >> > > >
> >> > > > To solve the problem you identify, you'll need either:
> >> > > >
> >> > > > * A bunch of work in Drill to build multi-tenancy into Drill, or
> >> > > > * A cloud-like solution in which each tenant spins up a Drill
> >> cluster
> >> > > > within its budget, spinning it down, or resizing it, to stay with
> an
> >> > > > overall budget.
> >> > > >
> >> > > > The second option can be achieved under YARN with DoY, assuming
> that
> >> > DoY
> >> > > > added support for graceful shutdown (or the cluster is reduced in
> >> size
> >> > > only
> >> > > > when no queries are active.) Longer-term, a more modern solution
> >> would
> >> > be
> >> > > > Drill-on-Kubernetes (DoK?) which Abhishek started on.
> >> > > >
> >> > > > Engineering is the art of compromise. The question for your
> >> engineers
> >> > is
> >> > > > how to achieve the best result given the limitations of the
> software
> >> > > > available today. At the same time, helping the Drill community
> >> improve
> >> > > the
> >> > > > solutions over time.
> >> > > >
> >> > > > Thanks,
> >> > > > - Paul
> >> > > >
> >> > > >
> >> > > >
> >> > > >    On Sunday, December 30, 2018, 9:38:04 PM PST, Charles Givre <
> >> > > > cgivre@gmail.com> wrote:
> >> > > >
> >> > > >  Hi Paul,
> >> > > > Here’s what our engineers said:
> >> > > >
> >> > > > From Paul’s response, I understand that there is a slight
> confusion
> >> > > around
> >> > > > how multi-tenancy has been enabled in our data lake.
> >> > > >
> >> > > > Some more details on this –
> >> > > >
> >> > > > Drill already has the concept of multitenancy where we can have
> >> > multiple
> >> > > > drill clusters running on the same data lake enabled through
> >> different
> >> > > > ports and zookeeper. But, all of this is launched through the same
> >> hard
> >> > > > coded yarn queue that we provide as a config parameter.
> >> > > >
> >> > > > In our data lake, each tenant has a certain amount of compute
> >> capacity
> >> > > > allotted to them which they can use for their project work. This
> is
> >> > > > provisioned through individual YARN queues for each tenant
> (resource
> >> > > > caging). This restricts the tenants from using cluster resources
> >> > beyond a
> >> > > > certain limit and not impacting other tenants at the same time.
> >> > > >
> >> > > > Access to these YARN queues is provisioned through ACL
> memberships.
> >> > > >
> >> > > > ——
> >> > > >
> >> > > > Does this make sense?  Is this possible to get Drill to work in
> this
> >> > > > manner, or should we look into opening up JIRAs and working on new
> >> > > > capabilities?
> >> > > >
> >> > > >
> >> > > >
> >> > > > > On Dec 17, 2018, at 21:59, Paul Rogers
> <par0328@yahoo.com.INVALID
> >> >
> >> > > > wrote:
> >> > > > >
> >> > > > > Hi Kwizera,
> >> > > > > I hope my answer to Charles gave you the information you need.
> If
> >> > not,
> >> > > > please check out the DoY documentation or ask follow-up questions.
> >> > > > > Key thing to remember: Drill is a long-running YARN service;
> >> queries
> >> > DO
> >> > > > NOT go through YARN queues, they go through Drill directly.
> >> > > > >
> >> > > > > Thanks,
> >> > > > > - Paul
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >    On Monday, December 17, 2018, 11:01:04 AM PST, Kwizera hugues
> >> > Teddy
> >> > > <
> >> > > > nbted2017@gmail.com> wrote:
> >> > > > >
> >> > > > > Hello,
> >> > > > > Same questions ,
> >> > > > > I would like to know how drill deal with this yarn
> fonctionality?
> >> > > > > Cheers.
> >> > > > >
> >> > > > > On Mon, Dec 17, 2018, 17:53 Charles Givre <cgivre@gmail.com
> >> wrote:
> >> > > > >
> >> > > > >> Hello all,
> >> > > > >> We are trying to set up a Drill cluster on our corporate data
> >> lake.
> >> > > Our
> >> > > > >> cluster requires dynamic YARN queue allocation for multi-tenant
> >> > > > >> environment.  Is this something that Drill supports or is
> there a
> >> > > > >> workaround?
> >> > > > >> Thanks!
> >> > > > >> —C
> >> > > >
> >> >
> >>
> >

Re: Drill on YARN Questions

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Hi,

Can you reach the AM web UI? The Web UI URL was shown below. It also should have been given when you started DoY.

I notice that you're using SSL/TLS access. Doing so requires the right certificates on the AM host. Again, trying to connect via your browser may help identify if that works.

If the Web UI works, then check the host name and port number in your browser compared to that shown in the error message.

The resize command on the command line does nothing other than some validation, then it sends the URL shown below. You can try entering the URL directly into your browser. Again, if that fails, there is something amiss with your config. If that works, then we'll have to figure out what might be wrong with the DoY command line tool.

Please try out the above and let us know what you learn.

Thanks,
- Paul

 

    On Monday, January 14, 2019, 7:30:44 AM PST, Kwizera hugues Teddy <nb...@gmail.com> wrote:  
 
 Hello all,

I am experiencing an error on Resize and Status .
The errors are from the REST call on the AM.

command : $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE status
Result:
Application ID: xxxxxxxxxxxxxxxx Application State: RUNNING Host:
xxxxxxxxxxxxxxxx Queue: root.xxxxx.default User: xxxxxxxx Start Time:
2019-01-14 14:56:29 Application Name: Drill-on-YARN-cluster_01 Tracking
URL: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Failed to get AM
status
REST request failed: https://xxxxxxxxxxxxxxx:9048/rest/status

Command : $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE resize
Result :
      Resizing cluster for Application ID:
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
      Resize failed: REST request failed:
https://xxxxxxxxxxxxxxx:9048/rest/shrink/1

 I didn't found how I can resolve this issue. maybe someone can help me

Thanks.



On Sat, Jan 12, 2019 at 8:30 AM Kwizera hugues Teddy <nb...@gmail.com>
wrote:

> Hello ,
>
> Other option work .
>
> As you say an update is needed in docs  and the remove of wrong
> information.
>
> Thanks.
>
> On Sat, Jan 12, 2019, 08:10 Abhishek Girish <agirish@apache.org wrote:
>
>> Hello Teddy,
>>
>> I don't recollect a restart option for the drill-on-yarn.sh script. I've
>> always used a combination of stop and start, like Paul mentions. Could you
>> please try if that works and get back to us? We could certainly have a
>> minor enhancement to support restart - until then i'll request Bridget to
>> update the documentation.
>>
>> Regards,
>> Abhishek
>>
>> On Fri, Jan 11, 2019 at 11:05 PM Kwizera hugues Teddy <
>> nbted2017@gmail.com>
>> wrote:
>>
>> > Hello Paul ,
>> >
>> > Thanks you for your response with some interesting information(files in
>> > /tmp).
>> >
>> > For my side all other command line  work normally(start|stop|status...|)
>> > but no restart(this option not recognized). I tried to search the code
>> > source and I found that the restart command is not implemented . then I
>> > wonder why the documentation does not match the source code ?.
>> >
>> > Thanks .Teddy
>> >
>> >
>> > On Sat, Jan 12, 2019, 02:39 Paul Rogers <par0328@yahoo.com.invalid
>> wrote:
>> >
>> > > Let's try to troubleshoot. Does the combination of stop and start
>> work?
>> > If
>> > > so, then there could be a bug with the restart command itself.
>> > >
>> > > If neither start nor stop work, it could be that you are missing the
>> > > application ID file created when you first started DoY. Some
>> background.
>> > >
>> > > When we submit an app to YARN, YARN gives us an app ID. We need this
>> in
>> > > order to track down the app master for DoY so we can send it commands
>> > later.
>> > >
>> > > When the command line tool starts DoY, it writes the YARN app ID to a
>> > > file. Can't remember the details, but it is probably in the
>> $DRILL_SITE
>> > > directory. The contents are, as I recall, a long hexadecimal string.
>> > >
>> > > When you invoke the command line, the tool reads this file to figure
>> to
>> > > track down the DoY app master. The tool then sends commands to the app
>> > > master: in this case, a request to shut down. Then, for reset, the
>> tool
>> > > will communicate with YARN to start a new instance.
>> > >
>> > > The tool is suppose to give detailed error messages. Did you get any?
>> > That
>> > > might tell us which of these steps failed.
>> > >
>> > > Can you connect to the DoY Web UI at the URL provided when you started
>> > > DoY? If you can, this means that the DoY App Master is up and running.
>> > >
>> > > Are you running the client from the same node on which you started it?
>> > > That file I mentioned is local to the "DoY client" machine; it is not
>> in
>> > > DFS.
>> > >
>> > > Then, there is one more very obscure bug you can check. On some
>> > > distributions, the YARN task files are written to the /tmp directory.
>> > Some
>> > > Linux systems remove these files from time to time. Once the files are
>> > > gone, YARN can no longer control its containers: it won't be able to
>> stop
>> > > the app master or the Drillbit containers. There are two fixes.
>> First, go
>> > > kill all the processes by hand. Then, move the YARN state files out of
>> > > /tmp, or exclude YARN's files from the periodic cleanup.
>> > >
>> > > Try some of the above and let us know what you find.
>> > >
>> > > Also, perhaps Abhishek can offer some suggestions as he tested the
>> heck
>> > > out of the feature and may have additional suggestions.
>> > >
>> > > Thanks,
>> > > - Paul
>> > >
>> > >
>> > >
>> > >    On Friday, January 11, 2019, 7:46:55 AM PST, Kwizera hugues Teddy
>> <
>> > > nbted2017@gmail.com> wrote:
>> > >
>> > >  hello,
>> > >
>> > >  2 weeks ago, I began to discover DoY. Today by reading drill
>> documents (
>> > > https://drill.apache.org/docs/appendix-a-release-note-issues/ ) I saw
>> > that
>> > > we can restart drill cluster by :
>> > >
>> > >  $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE restart
>> > >
>> > > But doesn't work when I tested it.
>> > >
>> > > No idea about it?
>> > >
>> > > Thanks.
>> > >
>> > >
>> > >
>> > >
>> > > On Wed, Jan 2, 2019 at 3:18 AM Paul Rogers <par0328@yahoo.com.invalid
>> >
>> > > wrote:
>> > >
>> > > > Hi Charles,
>> > > >
>> > > > Your engineers have identified a common need, but one which is very
>> > > > difficult to satisfy.
>> > > >
>> > > > TL;DR: DoY gets as close to the requirements as possible within the
>> > > > constraints of YARN and Drill. But, future projects could do more.
>> > > >
>> > > > Your engineers want resource segregation among tenants:
>> multi-tenancy.
>> > > > This is very difficult to achieve at the application level. Consider
>> > > Drill.
>> > > > It would need some way to identify users to know which tenant they
>> > belong
>> > > > to. Then, Drill would need a way to enqueue users whose queries
>> would
>> > > > exceed the memory or CPU limit for that tenant. Plus, Drill would
>> have
>> > to
>> > > > be able to limit memory and CPU for each query. Much work has been
>> done
>> > > to
>> > > > limit memory, but CPU is very difficult. Mature products such as
>> > Teradata
>> > > > can do this, but Teradata has 40 years of effort behind it.
>> > > >
>> > > > Since it is hard to build multi-tenancy in at the app level (not
>> > > > impossible, just very, very hard), the thought is to apply it at the
>> > > > cluster level. This is done in YARN via limiting the resources
>> > available
>> > > to
>> > > > processes (typically map/reduce) and to limit the number of running
>> > > > processes. Works for M/R because each map task uses disk to shuffle
>> > > results
>> > > > to a reduce task, so map and reduce tasks can run asynchronously.
>> > > >
>> > > > For tools such as Drill, which do in-memory processing (really,
>> > > > across-the-network exchanges), both the sender and receiver have to
>> run
>> > > > concurrently. This is much harder to schedule than async m/r tasks:
>> it
>> > > > means that the entire Drill cluster (of whatever size) be up and
>> > running
>> > > to
>> > > > run a query.
>> > > >
>> > > > The start-up time for Drill is far, far longer than a query. So, it
>> is
>> > > not
>> > > > feasible to use YARN to launch a Drill cluster for each query the
>> way
>> > you
>> > > > would do with Spark. Instead, under YARN, Drill is a long running
>> > service
>> > > > that handles many queries.
>> > > >
>> > > > Obviously, this is not ideal: I'm sure your engineers want to use a
>> > > > tenant's resources for Drill when running queries, else for Spark,
>> > Hive,
>> > > or
>> > > > maybe TensorFlow. If Drill has to be long-running, I'm sure they's
>> like
>> > > to
>> > > > slosh resources between tenants as is done in YARN. As noted above,
>> > this
>> > > is
>> > > > a hard problem that DoY did not attempt to solve.
>> > > >
>> > > > One might suggest that Drill grab resources from YARN when Tenant A
>> > wants
>> > > > to run a query, and release them when that tenant is done, grabbing
>> new
>> > > > resources when Tenant B wants to run. Impala tried this with Llama
>> and
>> > > > found it did not work. (This is why DoY is quite a bit simpler; no
>> > reason
>> > > > to rerun a failed experiment.)
>> > > >
>> > > > Some folks are looking to Kubernetes (K8s) as a solution. But, that
>> > just
>> > > > replaces YARN with K8s: Drill is still a long-running process.
>> > > >
>> > > > To solve the problem you identify, you'll need either:
>> > > >
>> > > > * A bunch of work in Drill to build multi-tenancy into Drill, or
>> > > > * A cloud-like solution in which each tenant spins up a Drill
>> cluster
>> > > > within its budget, spinning it down, or resizing it, to stay with an
>> > > > overall budget.
>> > > >
>> > > > The second option can be achieved under YARN with DoY, assuming that
>> > DoY
>> > > > added support for graceful shutdown (or the cluster is reduced in
>> size
>> > > only
>> > > > when no queries are active.) Longer-term, a more modern solution
>> would
>> > be
>> > > > Drill-on-Kubernetes (DoK?) which Abhishek started on.
>> > > >
>> > > > Engineering is the art of compromise. The question for your
>> engineers
>> > is
>> > > > how to achieve the best result given the limitations of the software
>> > > > available today. At the same time, helping the Drill community
>> improve
>> > > the
>> > > > solutions over time.
>> > > >
>> > > > Thanks,
>> > > > - Paul
>> > > >
>> > > >
>> > > >
>> > > >    On Sunday, December 30, 2018, 9:38:04 PM PST, Charles Givre <
>> > > > cgivre@gmail.com> wrote:
>> > > >
>> > > >  Hi Paul,
>> > > > Here’s what our engineers said:
>> > > >
>> > > > From Paul’s response, I understand that there is a slight confusion
>> > > around
>> > > > how multi-tenancy has been enabled in our data lake.
>> > > >
>> > > > Some more details on this –
>> > > >
>> > > > Drill already has the concept of multitenancy where we can have
>> > multiple
>> > > > drill clusters running on the same data lake enabled through
>> different
>> > > > ports and zookeeper. But, all of this is launched through the same
>> hard
>> > > > coded yarn queue that we provide as a config parameter.
>> > > >
>> > > > In our data lake, each tenant has a certain amount of compute
>> capacity
>> > > > allotted to them which they can use for their project work. This is
>> > > > provisioned through individual YARN queues for each tenant (resource
>> > > > caging). This restricts the tenants from using cluster resources
>> > beyond a
>> > > > certain limit and not impacting other tenants at the same time.
>> > > >
>> > > > Access to these YARN queues is provisioned through ACL memberships.
>> > > >
>> > > > ——
>> > > >
>> > > > Does this make sense?  Is this possible to get Drill to work in this
>> > > > manner, or should we look into opening up JIRAs and working on new
>> > > > capabilities?
>> > > >
>> > > >
>> > > >
>> > > > > On Dec 17, 2018, at 21:59, Paul Rogers <par0328@yahoo.com.INVALID
>> >
>> > > > wrote:
>> > > > >
>> > > > > Hi Kwizera,
>> > > > > I hope my answer to Charles gave you the information you need. If
>> > not,
>> > > > please check out the DoY documentation or ask follow-up questions.
>> > > > > Key thing to remember: Drill is a long-running YARN service;
>> queries
>> > DO
>> > > > NOT go through YARN queues, they go through Drill directly.
>> > > > >
>> > > > > Thanks,
>> > > > > - Paul
>> > > > >
>> > > > >
>> > > > >
>> > > > >    On Monday, December 17, 2018, 11:01:04 AM PST, Kwizera hugues
>> > Teddy
>> > > <
>> > > > nbted2017@gmail.com> wrote:
>> > > > >
>> > > > > Hello,
>> > > > > Same questions ,
>> > > > > I would like to know how drill deal with this yarn fonctionality?
>> > > > > Cheers.
>> > > > >
>> > > > > On Mon, Dec 17, 2018, 17:53 Charles Givre <cgivre@gmail.com
>> wrote:
>> > > > >
>> > > > >> Hello all,
>> > > > >> We are trying to set up a Drill cluster on our corporate data
>> lake.
>> > > Our
>> > > > >> cluster requires dynamic YARN queue allocation for multi-tenant
>> > > > >> environment.  Is this something that Drill supports or is there a
>> > > > >> workaround?
>> > > > >> Thanks!
>> > > > >> —C
>> > > >
>> >
>>
>  

Re: Drill on YARN Questions

Posted by Kwizera hugues Teddy <nb...@gmail.com>.
Hello all,

I am experiencing an error on Resize and Status .
The errors are from the REST call on the AM.

command : $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE status
Result:
Application ID: xxxxxxxxxxxxxxxx Application State: RUNNING Host:
xxxxxxxxxxxxxxxx Queue: root.xxxxx.default User: xxxxxxxx Start Time:
2019-01-14 14:56:29 Application Name: Drill-on-YARN-cluster_01 Tracking
URL: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Failed to get AM
status
REST request failed: https://xxxxxxxxxxxxxxx:9048/rest/status

Command : $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE resize
Result :
      Resizing cluster for Application ID:
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
      Resize failed: REST request failed:
https://xxxxxxxxxxxxxxx:9048/rest/shrink/1

 I didn't found how I can resolve this issue. maybe someone can help me

Thanks.



On Sat, Jan 12, 2019 at 8:30 AM Kwizera hugues Teddy <nb...@gmail.com>
wrote:

> Hello ,
>
> Other option work .
>
> As you say an update is needed in docs  and the remove of wrong
> information.
>
> Thanks.
>
> On Sat, Jan 12, 2019, 08:10 Abhishek Girish <agirish@apache.org wrote:
>
>> Hello Teddy,
>>
>> I don't recollect a restart option for the drill-on-yarn.sh script. I've
>> always used a combination of stop and start, like Paul mentions. Could you
>> please try if that works and get back to us? We could certainly have a
>> minor enhancement to support restart - until then i'll request Bridget to
>> update the documentation.
>>
>> Regards,
>> Abhishek
>>
>> On Fri, Jan 11, 2019 at 11:05 PM Kwizera hugues Teddy <
>> nbted2017@gmail.com>
>> wrote:
>>
>> > Hello Paul ,
>> >
>> > Thanks you for your response with some interesting information(files in
>> > /tmp).
>> >
>> > For my side all other command line  work normally(start|stop|status...|)
>> > but no restart(this option not recognized). I tried to search the code
>> > source and I found that the restart command is not implemented . then I
>> > wonder why the documentation does not match the source code ?.
>> >
>> > Thanks .Teddy
>> >
>> >
>> > On Sat, Jan 12, 2019, 02:39 Paul Rogers <par0328@yahoo.com.invalid
>> wrote:
>> >
>> > > Let's try to troubleshoot. Does the combination of stop and start
>> work?
>> > If
>> > > so, then there could be a bug with the restart command itself.
>> > >
>> > > If neither start nor stop work, it could be that you are missing the
>> > > application ID file created when you first started DoY. Some
>> background.
>> > >
>> > > When we submit an app to YARN, YARN gives us an app ID. We need this
>> in
>> > > order to track down the app master for DoY so we can send it commands
>> > later.
>> > >
>> > > When the command line tool starts DoY, it writes the YARN app ID to a
>> > > file. Can't remember the details, but it is probably in the
>> $DRILL_SITE
>> > > directory. The contents are, as I recall, a long hexadecimal string.
>> > >
>> > > When you invoke the command line, the tool reads this file to figure
>> to
>> > > track down the DoY app master. The tool then sends commands to the app
>> > > master: in this case, a request to shut down. Then, for reset, the
>> tool
>> > > will communicate with YARN to start a new instance.
>> > >
>> > > The tool is suppose to give detailed error messages. Did you get any?
>> > That
>> > > might tell us which of these steps failed.
>> > >
>> > > Can you connect to the DoY Web UI at the URL provided when you started
>> > > DoY? If you can, this means that the DoY App Master is up and running.
>> > >
>> > > Are you running the client from the same node on which you started it?
>> > > That file I mentioned is local to the "DoY client" machine; it is not
>> in
>> > > DFS.
>> > >
>> > > Then, there is one more very obscure bug you can check. On some
>> > > distributions, the YARN task files are written to the /tmp directory.
>> > Some
>> > > Linux systems remove these files from time to time. Once the files are
>> > > gone, YARN can no longer control its containers: it won't be able to
>> stop
>> > > the app master or the Drillbit containers. There are two fixes.
>> First, go
>> > > kill all the processes by hand. Then, move the YARN state files out of
>> > > /tmp, or exclude YARN's files from the periodic cleanup.
>> > >
>> > > Try some of the above and let us know what you find.
>> > >
>> > > Also, perhaps Abhishek can offer some suggestions as he tested the
>> heck
>> > > out of the feature and may have additional suggestions.
>> > >
>> > > Thanks,
>> > > - Paul
>> > >
>> > >
>> > >
>> > >     On Friday, January 11, 2019, 7:46:55 AM PST, Kwizera hugues Teddy
>> <
>> > > nbted2017@gmail.com> wrote:
>> > >
>> > >  hello,
>> > >
>> > >  2 weeks ago, I began to discover DoY. Today by reading drill
>> documents (
>> > > https://drill.apache.org/docs/appendix-a-release-note-issues/ ) I saw
>> > that
>> > > we can restart drill cluster by :
>> > >
>> > >  $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE restart
>> > >
>> > > But doesn't work when I tested it.
>> > >
>> > > No idea about it?
>> > >
>> > > Thanks.
>> > >
>> > >
>> > >
>> > >
>> > > On Wed, Jan 2, 2019 at 3:18 AM Paul Rogers <par0328@yahoo.com.invalid
>> >
>> > > wrote:
>> > >
>> > > > Hi Charles,
>> > > >
>> > > > Your engineers have identified a common need, but one which is very
>> > > > difficult to satisfy.
>> > > >
>> > > > TL;DR: DoY gets as close to the requirements as possible within the
>> > > > constraints of YARN and Drill. But, future projects could do more.
>> > > >
>> > > > Your engineers want resource segregation among tenants:
>> multi-tenancy.
>> > > > This is very difficult to achieve at the application level. Consider
>> > > Drill.
>> > > > It would need some way to identify users to know which tenant they
>> > belong
>> > > > to. Then, Drill would need a way to enqueue users whose queries
>> would
>> > > > exceed the memory or CPU limit for that tenant. Plus, Drill would
>> have
>> > to
>> > > > be able to limit memory and CPU for each query. Much work has been
>> done
>> > > to
>> > > > limit memory, but CPU is very difficult. Mature products such as
>> > Teradata
>> > > > can do this, but Teradata has 40 years of effort behind it.
>> > > >
>> > > > Since it is hard to build multi-tenancy in at the app level (not
>> > > > impossible, just very, very hard), the thought is to apply it at the
>> > > > cluster level. This is done in YARN via limiting the resources
>> > available
>> > > to
>> > > > processes (typically map/reduce) and to limit the number of running
>> > > > processes. Works for M/R because each map task uses disk to shuffle
>> > > results
>> > > > to a reduce task, so map and reduce tasks can run asynchronously.
>> > > >
>> > > > For tools such as Drill, which do in-memory processing (really,
>> > > > across-the-network exchanges), both the sender and receiver have to
>> run
>> > > > concurrently. This is much harder to schedule than async m/r tasks:
>> it
>> > > > means that the entire Drill cluster (of whatever size) be up and
>> > running
>> > > to
>> > > > run a query.
>> > > >
>> > > > The start-up time for Drill is far, far longer than a query. So, it
>> is
>> > > not
>> > > > feasible to use YARN to launch a Drill cluster for each query the
>> way
>> > you
>> > > > would do with Spark. Instead, under YARN, Drill is a long running
>> > service
>> > > > that handles many queries.
>> > > >
>> > > > Obviously, this is not ideal: I'm sure your engineers want to use a
>> > > > tenant's resources for Drill when running queries, else for Spark,
>> > Hive,
>> > > or
>> > > > maybe TensorFlow. If Drill has to be long-running, I'm sure they's
>> like
>> > > to
>> > > > slosh resources between tenants as is done in YARN. As noted above,
>> > this
>> > > is
>> > > > a hard problem that DoY did not attempt to solve.
>> > > >
>> > > > One might suggest that Drill grab resources from YARN when Tenant A
>> > wants
>> > > > to run a query, and release them when that tenant is done, grabbing
>> new
>> > > > resources when Tenant B wants to run. Impala tried this with Llama
>> and
>> > > > found it did not work. (This is why DoY is quite a bit simpler; no
>> > reason
>> > > > to rerun a failed experiment.)
>> > > >
>> > > > Some folks are looking to Kubernetes (K8s) as a solution. But, that
>> > just
>> > > > replaces YARN with K8s: Drill is still a long-running process.
>> > > >
>> > > > To solve the problem you identify, you'll need either:
>> > > >
>> > > > * A bunch of work in Drill to build multi-tenancy into Drill, or
>> > > > * A cloud-like solution in which each tenant spins up a Drill
>> cluster
>> > > > within its budget, spinning it down, or resizing it, to stay with an
>> > > > overall budget.
>> > > >
>> > > > The second option can be achieved under YARN with DoY, assuming that
>> > DoY
>> > > > added support for graceful shutdown (or the cluster is reduced in
>> size
>> > > only
>> > > > when no queries are active.) Longer-term, a more modern solution
>> would
>> > be
>> > > > Drill-on-Kubernetes (DoK?) which Abhishek started on.
>> > > >
>> > > > Engineering is the art of compromise. The question for your
>> engineers
>> > is
>> > > > how to achieve the best result given the limitations of the software
>> > > > available today. At the same time, helping the Drill community
>> improve
>> > > the
>> > > > solutions over time.
>> > > >
>> > > > Thanks,
>> > > > - Paul
>> > > >
>> > > >
>> > > >
>> > > >    On Sunday, December 30, 2018, 9:38:04 PM PST, Charles Givre <
>> > > > cgivre@gmail.com> wrote:
>> > > >
>> > > >  Hi Paul,
>> > > > Here’s what our engineers said:
>> > > >
>> > > > From Paul’s response, I understand that there is a slight confusion
>> > > around
>> > > > how multi-tenancy has been enabled in our data lake.
>> > > >
>> > > > Some more details on this –
>> > > >
>> > > > Drill already has the concept of multitenancy where we can have
>> > multiple
>> > > > drill clusters running on the same data lake enabled through
>> different
>> > > > ports and zookeeper. But, all of this is launched through the same
>> hard
>> > > > coded yarn queue that we provide as a config parameter.
>> > > >
>> > > > In our data lake, each tenant has a certain amount of compute
>> capacity
>> > > > allotted to them which they can use for their project work. This is
>> > > > provisioned through individual YARN queues for each tenant (resource
>> > > > caging). This restricts the tenants from using cluster resources
>> > beyond a
>> > > > certain limit and not impacting other tenants at the same time.
>> > > >
>> > > > Access to these YARN queues is provisioned through ACL memberships.
>> > > >
>> > > > ——
>> > > >
>> > > > Does this make sense?  Is this possible to get Drill to work in this
>> > > > manner, or should we look into opening up JIRAs and working on new
>> > > > capabilities?
>> > > >
>> > > >
>> > > >
>> > > > > On Dec 17, 2018, at 21:59, Paul Rogers <par0328@yahoo.com.INVALID
>> >
>> > > > wrote:
>> > > > >
>> > > > > Hi Kwizera,
>> > > > > I hope my answer to Charles gave you the information you need. If
>> > not,
>> > > > please check out the DoY documentation or ask follow-up questions.
>> > > > > Key thing to remember: Drill is a long-running YARN service;
>> queries
>> > DO
>> > > > NOT go through YARN queues, they go through Drill directly.
>> > > > >
>> > > > > Thanks,
>> > > > > - Paul
>> > > > >
>> > > > >
>> > > > >
>> > > > >    On Monday, December 17, 2018, 11:01:04 AM PST, Kwizera hugues
>> > Teddy
>> > > <
>> > > > nbted2017@gmail.com> wrote:
>> > > > >
>> > > > > Hello,
>> > > > > Same questions ,
>> > > > > I would like to know how drill deal with this yarn fonctionality?
>> > > > > Cheers.
>> > > > >
>> > > > > On Mon, Dec 17, 2018, 17:53 Charles Givre <cgivre@gmail.com
>> wrote:
>> > > > >
>> > > > >> Hello all,
>> > > > >> We are trying to set up a Drill cluster on our corporate data
>> lake.
>> > > Our
>> > > > >> cluster requires dynamic YARN queue allocation for multi-tenant
>> > > > >> environment.  Is this something that Drill supports or is there a
>> > > > >> workaround?
>> > > > >> Thanks!
>> > > > >> —C
>> > > >
>> >
>>
>

Re: Drill on YARN Questions

Posted by Kwizera hugues Teddy <nb...@gmail.com>.
Hello ,

Other option work .

As you say an update is needed in docs  and the remove of wrong information.

Thanks.

On Sat, Jan 12, 2019, 08:10 Abhishek Girish <agirish@apache.org wrote:

> Hello Teddy,
>
> I don't recollect a restart option for the drill-on-yarn.sh script. I've
> always used a combination of stop and start, like Paul mentions. Could you
> please try if that works and get back to us? We could certainly have a
> minor enhancement to support restart - until then i'll request Bridget to
> update the documentation.
>
> Regards,
> Abhishek
>
> On Fri, Jan 11, 2019 at 11:05 PM Kwizera hugues Teddy <nbted2017@gmail.com
> >
> wrote:
>
> > Hello Paul ,
> >
> > Thanks you for your response with some interesting information(files in
> > /tmp).
> >
> > For my side all other command line  work normally(start|stop|status...|)
> > but no restart(this option not recognized). I tried to search the code
> > source and I found that the restart command is not implemented . then I
> > wonder why the documentation does not match the source code ?.
> >
> > Thanks .Teddy
> >
> >
> > On Sat, Jan 12, 2019, 02:39 Paul Rogers <par0328@yahoo.com.invalid
> wrote:
> >
> > > Let's try to troubleshoot. Does the combination of stop and start work?
> > If
> > > so, then there could be a bug with the restart command itself.
> > >
> > > If neither start nor stop work, it could be that you are missing the
> > > application ID file created when you first started DoY. Some
> background.
> > >
> > > When we submit an app to YARN, YARN gives us an app ID. We need this in
> > > order to track down the app master for DoY so we can send it commands
> > later.
> > >
> > > When the command line tool starts DoY, it writes the YARN app ID to a
> > > file. Can't remember the details, but it is probably in the $DRILL_SITE
> > > directory. The contents are, as I recall, a long hexadecimal string.
> > >
> > > When you invoke the command line, the tool reads this file to figure to
> > > track down the DoY app master. The tool then sends commands to the app
> > > master: in this case, a request to shut down. Then, for reset, the tool
> > > will communicate with YARN to start a new instance.
> > >
> > > The tool is suppose to give detailed error messages. Did you get any?
> > That
> > > might tell us which of these steps failed.
> > >
> > > Can you connect to the DoY Web UI at the URL provided when you started
> > > DoY? If you can, this means that the DoY App Master is up and running.
> > >
> > > Are you running the client from the same node on which you started it?
> > > That file I mentioned is local to the "DoY client" machine; it is not
> in
> > > DFS.
> > >
> > > Then, there is one more very obscure bug you can check. On some
> > > distributions, the YARN task files are written to the /tmp directory.
> > Some
> > > Linux systems remove these files from time to time. Once the files are
> > > gone, YARN can no longer control its containers: it won't be able to
> stop
> > > the app master or the Drillbit containers. There are two fixes. First,
> go
> > > kill all the processes by hand. Then, move the YARN state files out of
> > > /tmp, or exclude YARN's files from the periodic cleanup.
> > >
> > > Try some of the above and let us know what you find.
> > >
> > > Also, perhaps Abhishek can offer some suggestions as he tested the heck
> > > out of the feature and may have additional suggestions.
> > >
> > > Thanks,
> > > - Paul
> > >
> > >
> > >
> > >     On Friday, January 11, 2019, 7:46:55 AM PST, Kwizera hugues Teddy <
> > > nbted2017@gmail.com> wrote:
> > >
> > >  hello,
> > >
> > >  2 weeks ago, I began to discover DoY. Today by reading drill
> documents (
> > > https://drill.apache.org/docs/appendix-a-release-note-issues/ ) I saw
> > that
> > > we can restart drill cluster by :
> > >
> > >  $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE restart
> > >
> > > But doesn't work when I tested it.
> > >
> > > No idea about it?
> > >
> > > Thanks.
> > >
> > >
> > >
> > >
> > > On Wed, Jan 2, 2019 at 3:18 AM Paul Rogers <pa...@yahoo.com.invalid>
> > > wrote:
> > >
> > > > Hi Charles,
> > > >
> > > > Your engineers have identified a common need, but one which is very
> > > > difficult to satisfy.
> > > >
> > > > TL;DR: DoY gets as close to the requirements as possible within the
> > > > constraints of YARN and Drill. But, future projects could do more.
> > > >
> > > > Your engineers want resource segregation among tenants:
> multi-tenancy.
> > > > This is very difficult to achieve at the application level. Consider
> > > Drill.
> > > > It would need some way to identify users to know which tenant they
> > belong
> > > > to. Then, Drill would need a way to enqueue users whose queries would
> > > > exceed the memory or CPU limit for that tenant. Plus, Drill would
> have
> > to
> > > > be able to limit memory and CPU for each query. Much work has been
> done
> > > to
> > > > limit memory, but CPU is very difficult. Mature products such as
> > Teradata
> > > > can do this, but Teradata has 40 years of effort behind it.
> > > >
> > > > Since it is hard to build multi-tenancy in at the app level (not
> > > > impossible, just very, very hard), the thought is to apply it at the
> > > > cluster level. This is done in YARN via limiting the resources
> > available
> > > to
> > > > processes (typically map/reduce) and to limit the number of running
> > > > processes. Works for M/R because each map task uses disk to shuffle
> > > results
> > > > to a reduce task, so map and reduce tasks can run asynchronously.
> > > >
> > > > For tools such as Drill, which do in-memory processing (really,
> > > > across-the-network exchanges), both the sender and receiver have to
> run
> > > > concurrently. This is much harder to schedule than async m/r tasks:
> it
> > > > means that the entire Drill cluster (of whatever size) be up and
> > running
> > > to
> > > > run a query.
> > > >
> > > > The start-up time for Drill is far, far longer than a query. So, it
> is
> > > not
> > > > feasible to use YARN to launch a Drill cluster for each query the way
> > you
> > > > would do with Spark. Instead, under YARN, Drill is a long running
> > service
> > > > that handles many queries.
> > > >
> > > > Obviously, this is not ideal: I'm sure your engineers want to use a
> > > > tenant's resources for Drill when running queries, else for Spark,
> > Hive,
> > > or
> > > > maybe TensorFlow. If Drill has to be long-running, I'm sure they's
> like
> > > to
> > > > slosh resources between tenants as is done in YARN. As noted above,
> > this
> > > is
> > > > a hard problem that DoY did not attempt to solve.
> > > >
> > > > One might suggest that Drill grab resources from YARN when Tenant A
> > wants
> > > > to run a query, and release them when that tenant is done, grabbing
> new
> > > > resources when Tenant B wants to run. Impala tried this with Llama
> and
> > > > found it did not work. (This is why DoY is quite a bit simpler; no
> > reason
> > > > to rerun a failed experiment.)
> > > >
> > > > Some folks are looking to Kubernetes (K8s) as a solution. But, that
> > just
> > > > replaces YARN with K8s: Drill is still a long-running process.
> > > >
> > > > To solve the problem you identify, you'll need either:
> > > >
> > > > * A bunch of work in Drill to build multi-tenancy into Drill, or
> > > > * A cloud-like solution in which each tenant spins up a Drill cluster
> > > > within its budget, spinning it down, or resizing it, to stay with an
> > > > overall budget.
> > > >
> > > > The second option can be achieved under YARN with DoY, assuming that
> > DoY
> > > > added support for graceful shutdown (or the cluster is reduced in
> size
> > > only
> > > > when no queries are active.) Longer-term, a more modern solution
> would
> > be
> > > > Drill-on-Kubernetes (DoK?) which Abhishek started on.
> > > >
> > > > Engineering is the art of compromise. The question for your engineers
> > is
> > > > how to achieve the best result given the limitations of the software
> > > > available today. At the same time, helping the Drill community
> improve
> > > the
> > > > solutions over time.
> > > >
> > > > Thanks,
> > > > - Paul
> > > >
> > > >
> > > >
> > > >    On Sunday, December 30, 2018, 9:38:04 PM PST, Charles Givre <
> > > > cgivre@gmail.com> wrote:
> > > >
> > > >  Hi Paul,
> > > > Here’s what our engineers said:
> > > >
> > > > From Paul’s response, I understand that there is a slight confusion
> > > around
> > > > how multi-tenancy has been enabled in our data lake.
> > > >
> > > > Some more details on this –
> > > >
> > > > Drill already has the concept of multitenancy where we can have
> > multiple
> > > > drill clusters running on the same data lake enabled through
> different
> > > > ports and zookeeper. But, all of this is launched through the same
> hard
> > > > coded yarn queue that we provide as a config parameter.
> > > >
> > > > In our data lake, each tenant has a certain amount of compute
> capacity
> > > > allotted to them which they can use for their project work. This is
> > > > provisioned through individual YARN queues for each tenant (resource
> > > > caging). This restricts the tenants from using cluster resources
> > beyond a
> > > > certain limit and not impacting other tenants at the same time.
> > > >
> > > > Access to these YARN queues is provisioned through ACL memberships.
> > > >
> > > > ——
> > > >
> > > > Does this make sense?  Is this possible to get Drill to work in this
> > > > manner, or should we look into opening up JIRAs and working on new
> > > > capabilities?
> > > >
> > > >
> > > >
> > > > > On Dec 17, 2018, at 21:59, Paul Rogers <pa...@yahoo.com.INVALID>
> > > > wrote:
> > > > >
> > > > > Hi Kwizera,
> > > > > I hope my answer to Charles gave you the information you need. If
> > not,
> > > > please check out the DoY documentation or ask follow-up questions.
> > > > > Key thing to remember: Drill is a long-running YARN service;
> queries
> > DO
> > > > NOT go through YARN queues, they go through Drill directly.
> > > > >
> > > > > Thanks,
> > > > > - Paul
> > > > >
> > > > >
> > > > >
> > > > >    On Monday, December 17, 2018, 11:01:04 AM PST, Kwizera hugues
> > Teddy
> > > <
> > > > nbted2017@gmail.com> wrote:
> > > > >
> > > > > Hello,
> > > > > Same questions ,
> > > > > I would like to know how drill deal with this yarn fonctionality?
> > > > > Cheers.
> > > > >
> > > > > On Mon, Dec 17, 2018, 17:53 Charles Givre <cgivre@gmail.com wrote:
> > > > >
> > > > >> Hello all,
> > > > >> We are trying to set up a Drill cluster on our corporate data
> lake.
> > > Our
> > > > >> cluster requires dynamic YARN queue allocation for multi-tenant
> > > > >> environment.  Is this something that Drill supports or is there a
> > > > >> workaround?
> > > > >> Thanks!
> > > > >> —C
> > > >
> >
>

Re: Drill on YARN Questions

Posted by Abhishek Girish <ag...@apache.org>.
Hello Teddy,

I don't recollect a restart option for the drill-on-yarn.sh script. I've
always used a combination of stop and start, like Paul mentions. Could you
please try if that works and get back to us? We could certainly have a
minor enhancement to support restart - until then i'll request Bridget to
update the documentation.

Regards,
Abhishek

On Fri, Jan 11, 2019 at 11:05 PM Kwizera hugues Teddy <nb...@gmail.com>
wrote:

> Hello Paul ,
>
> Thanks you for your response with some interesting information(files in
> /tmp).
>
> For my side all other command line  work normally(start|stop|status...|)
> but no restart(this option not recognized). I tried to search the code
> source and I found that the restart command is not implemented . then I
> wonder why the documentation does not match the source code ?.
>
> Thanks .Teddy
>
>
> On Sat, Jan 12, 2019, 02:39 Paul Rogers <par0328@yahoo.com.invalid wrote:
>
> > Let's try to troubleshoot. Does the combination of stop and start work?
> If
> > so, then there could be a bug with the restart command itself.
> >
> > If neither start nor stop work, it could be that you are missing the
> > application ID file created when you first started DoY. Some background.
> >
> > When we submit an app to YARN, YARN gives us an app ID. We need this in
> > order to track down the app master for DoY so we can send it commands
> later.
> >
> > When the command line tool starts DoY, it writes the YARN app ID to a
> > file. Can't remember the details, but it is probably in the $DRILL_SITE
> > directory. The contents are, as I recall, a long hexadecimal string.
> >
> > When you invoke the command line, the tool reads this file to figure to
> > track down the DoY app master. The tool then sends commands to the app
> > master: in this case, a request to shut down. Then, for reset, the tool
> > will communicate with YARN to start a new instance.
> >
> > The tool is suppose to give detailed error messages. Did you get any?
> That
> > might tell us which of these steps failed.
> >
> > Can you connect to the DoY Web UI at the URL provided when you started
> > DoY? If you can, this means that the DoY App Master is up and running.
> >
> > Are you running the client from the same node on which you started it?
> > That file I mentioned is local to the "DoY client" machine; it is not in
> > DFS.
> >
> > Then, there is one more very obscure bug you can check. On some
> > distributions, the YARN task files are written to the /tmp directory.
> Some
> > Linux systems remove these files from time to time. Once the files are
> > gone, YARN can no longer control its containers: it won't be able to stop
> > the app master or the Drillbit containers. There are two fixes. First, go
> > kill all the processes by hand. Then, move the YARN state files out of
> > /tmp, or exclude YARN's files from the periodic cleanup.
> >
> > Try some of the above and let us know what you find.
> >
> > Also, perhaps Abhishek can offer some suggestions as he tested the heck
> > out of the feature and may have additional suggestions.
> >
> > Thanks,
> > - Paul
> >
> >
> >
> >     On Friday, January 11, 2019, 7:46:55 AM PST, Kwizera hugues Teddy <
> > nbted2017@gmail.com> wrote:
> >
> >  hello,
> >
> >  2 weeks ago, I began to discover DoY. Today by reading drill documents (
> > https://drill.apache.org/docs/appendix-a-release-note-issues/ ) I saw
> that
> > we can restart drill cluster by :
> >
> >  $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE restart
> >
> > But doesn't work when I tested it.
> >
> > No idea about it?
> >
> > Thanks.
> >
> >
> >
> >
> > On Wed, Jan 2, 2019 at 3:18 AM Paul Rogers <pa...@yahoo.com.invalid>
> > wrote:
> >
> > > Hi Charles,
> > >
> > > Your engineers have identified a common need, but one which is very
> > > difficult to satisfy.
> > >
> > > TL;DR: DoY gets as close to the requirements as possible within the
> > > constraints of YARN and Drill. But, future projects could do more.
> > >
> > > Your engineers want resource segregation among tenants: multi-tenancy.
> > > This is very difficult to achieve at the application level. Consider
> > Drill.
> > > It would need some way to identify users to know which tenant they
> belong
> > > to. Then, Drill would need a way to enqueue users whose queries would
> > > exceed the memory or CPU limit for that tenant. Plus, Drill would have
> to
> > > be able to limit memory and CPU for each query. Much work has been done
> > to
> > > limit memory, but CPU is very difficult. Mature products such as
> Teradata
> > > can do this, but Teradata has 40 years of effort behind it.
> > >
> > > Since it is hard to build multi-tenancy in at the app level (not
> > > impossible, just very, very hard), the thought is to apply it at the
> > > cluster level. This is done in YARN via limiting the resources
> available
> > to
> > > processes (typically map/reduce) and to limit the number of running
> > > processes. Works for M/R because each map task uses disk to shuffle
> > results
> > > to a reduce task, so map and reduce tasks can run asynchronously.
> > >
> > > For tools such as Drill, which do in-memory processing (really,
> > > across-the-network exchanges), both the sender and receiver have to run
> > > concurrently. This is much harder to schedule than async m/r tasks: it
> > > means that the entire Drill cluster (of whatever size) be up and
> running
> > to
> > > run a query.
> > >
> > > The start-up time for Drill is far, far longer than a query. So, it is
> > not
> > > feasible to use YARN to launch a Drill cluster for each query the way
> you
> > > would do with Spark. Instead, under YARN, Drill is a long running
> service
> > > that handles many queries.
> > >
> > > Obviously, this is not ideal: I'm sure your engineers want to use a
> > > tenant's resources for Drill when running queries, else for Spark,
> Hive,
> > or
> > > maybe TensorFlow. If Drill has to be long-running, I'm sure they's like
> > to
> > > slosh resources between tenants as is done in YARN. As noted above,
> this
> > is
> > > a hard problem that DoY did not attempt to solve.
> > >
> > > One might suggest that Drill grab resources from YARN when Tenant A
> wants
> > > to run a query, and release them when that tenant is done, grabbing new
> > > resources when Tenant B wants to run. Impala tried this with Llama and
> > > found it did not work. (This is why DoY is quite a bit simpler; no
> reason
> > > to rerun a failed experiment.)
> > >
> > > Some folks are looking to Kubernetes (K8s) as a solution. But, that
> just
> > > replaces YARN with K8s: Drill is still a long-running process.
> > >
> > > To solve the problem you identify, you'll need either:
> > >
> > > * A bunch of work in Drill to build multi-tenancy into Drill, or
> > > * A cloud-like solution in which each tenant spins up a Drill cluster
> > > within its budget, spinning it down, or resizing it, to stay with an
> > > overall budget.
> > >
> > > The second option can be achieved under YARN with DoY, assuming that
> DoY
> > > added support for graceful shutdown (or the cluster is reduced in size
> > only
> > > when no queries are active.) Longer-term, a more modern solution would
> be
> > > Drill-on-Kubernetes (DoK?) which Abhishek started on.
> > >
> > > Engineering is the art of compromise. The question for your engineers
> is
> > > how to achieve the best result given the limitations of the software
> > > available today. At the same time, helping the Drill community improve
> > the
> > > solutions over time.
> > >
> > > Thanks,
> > > - Paul
> > >
> > >
> > >
> > >    On Sunday, December 30, 2018, 9:38:04 PM PST, Charles Givre <
> > > cgivre@gmail.com> wrote:
> > >
> > >  Hi Paul,
> > > Here’s what our engineers said:
> > >
> > > From Paul’s response, I understand that there is a slight confusion
> > around
> > > how multi-tenancy has been enabled in our data lake.
> > >
> > > Some more details on this –
> > >
> > > Drill already has the concept of multitenancy where we can have
> multiple
> > > drill clusters running on the same data lake enabled through different
> > > ports and zookeeper. But, all of this is launched through the same hard
> > > coded yarn queue that we provide as a config parameter.
> > >
> > > In our data lake, each tenant has a certain amount of compute capacity
> > > allotted to them which they can use for their project work. This is
> > > provisioned through individual YARN queues for each tenant (resource
> > > caging). This restricts the tenants from using cluster resources
> beyond a
> > > certain limit and not impacting other tenants at the same time.
> > >
> > > Access to these YARN queues is provisioned through ACL memberships.
> > >
> > > ——
> > >
> > > Does this make sense?  Is this possible to get Drill to work in this
> > > manner, or should we look into opening up JIRAs and working on new
> > > capabilities?
> > >
> > >
> > >
> > > > On Dec 17, 2018, at 21:59, Paul Rogers <pa...@yahoo.com.INVALID>
> > > wrote:
> > > >
> > > > Hi Kwizera,
> > > > I hope my answer to Charles gave you the information you need. If
> not,
> > > please check out the DoY documentation or ask follow-up questions.
> > > > Key thing to remember: Drill is a long-running YARN service; queries
> DO
> > > NOT go through YARN queues, they go through Drill directly.
> > > >
> > > > Thanks,
> > > > - Paul
> > > >
> > > >
> > > >
> > > >    On Monday, December 17, 2018, 11:01:04 AM PST, Kwizera hugues
> Teddy
> > <
> > > nbted2017@gmail.com> wrote:
> > > >
> > > > Hello,
> > > > Same questions ,
> > > > I would like to know how drill deal with this yarn fonctionality?
> > > > Cheers.
> > > >
> > > > On Mon, Dec 17, 2018, 17:53 Charles Givre <cgivre@gmail.com wrote:
> > > >
> > > >> Hello all,
> > > >> We are trying to set up a Drill cluster on our corporate data lake.
> > Our
> > > >> cluster requires dynamic YARN queue allocation for multi-tenant
> > > >> environment.  Is this something that Drill supports or is there a
> > > >> workaround?
> > > >> Thanks!
> > > >> —C
> > >
>

Re: Drill on YARN Questions

Posted by Kwizera hugues Teddy <nb...@gmail.com>.
Hello Paul ,

Thanks you for your response with some interesting information(files in
/tmp).

For my side all other command line  work normally(start|stop|status...|)
but no restart(this option not recognized). I tried to search the code
source and I found that the restart command is not implemented . then I
wonder why the documentation does not match the source code ?.

Thanks .Teddy


On Sat, Jan 12, 2019, 02:39 Paul Rogers <par0328@yahoo.com.invalid wrote:

> Let's try to troubleshoot. Does the combination of stop and start work? If
> so, then there could be a bug with the restart command itself.
>
> If neither start nor stop work, it could be that you are missing the
> application ID file created when you first started DoY. Some background.
>
> When we submit an app to YARN, YARN gives us an app ID. We need this in
> order to track down the app master for DoY so we can send it commands later.
>
> When the command line tool starts DoY, it writes the YARN app ID to a
> file. Can't remember the details, but it is probably in the $DRILL_SITE
> directory. The contents are, as I recall, a long hexadecimal string.
>
> When you invoke the command line, the tool reads this file to figure to
> track down the DoY app master. The tool then sends commands to the app
> master: in this case, a request to shut down. Then, for reset, the tool
> will communicate with YARN to start a new instance.
>
> The tool is suppose to give detailed error messages. Did you get any? That
> might tell us which of these steps failed.
>
> Can you connect to the DoY Web UI at the URL provided when you started
> DoY? If you can, this means that the DoY App Master is up and running.
>
> Are you running the client from the same node on which you started it?
> That file I mentioned is local to the "DoY client" machine; it is not in
> DFS.
>
> Then, there is one more very obscure bug you can check. On some
> distributions, the YARN task files are written to the /tmp directory. Some
> Linux systems remove these files from time to time. Once the files are
> gone, YARN can no longer control its containers: it won't be able to stop
> the app master or the Drillbit containers. There are two fixes. First, go
> kill all the processes by hand. Then, move the YARN state files out of
> /tmp, or exclude YARN's files from the periodic cleanup.
>
> Try some of the above and let us know what you find.
>
> Also, perhaps Abhishek can offer some suggestions as he tested the heck
> out of the feature and may have additional suggestions.
>
> Thanks,
> - Paul
>
>
>
>     On Friday, January 11, 2019, 7:46:55 AM PST, Kwizera hugues Teddy <
> nbted2017@gmail.com> wrote:
>
>  hello,
>
>  2 weeks ago, I began to discover DoY. Today by reading drill documents (
> https://drill.apache.org/docs/appendix-a-release-note-issues/ ) I saw that
> we can restart drill cluster by :
>
>  $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE restart
>
> But doesn't work when I tested it.
>
> No idea about it?
>
> Thanks.
>
>
>
>
> On Wed, Jan 2, 2019 at 3:18 AM Paul Rogers <pa...@yahoo.com.invalid>
> wrote:
>
> > Hi Charles,
> >
> > Your engineers have identified a common need, but one which is very
> > difficult to satisfy.
> >
> > TL;DR: DoY gets as close to the requirements as possible within the
> > constraints of YARN and Drill. But, future projects could do more.
> >
> > Your engineers want resource segregation among tenants: multi-tenancy.
> > This is very difficult to achieve at the application level. Consider
> Drill.
> > It would need some way to identify users to know which tenant they belong
> > to. Then, Drill would need a way to enqueue users whose queries would
> > exceed the memory or CPU limit for that tenant. Plus, Drill would have to
> > be able to limit memory and CPU for each query. Much work has been done
> to
> > limit memory, but CPU is very difficult. Mature products such as Teradata
> > can do this, but Teradata has 40 years of effort behind it.
> >
> > Since it is hard to build multi-tenancy in at the app level (not
> > impossible, just very, very hard), the thought is to apply it at the
> > cluster level. This is done in YARN via limiting the resources available
> to
> > processes (typically map/reduce) and to limit the number of running
> > processes. Works for M/R because each map task uses disk to shuffle
> results
> > to a reduce task, so map and reduce tasks can run asynchronously.
> >
> > For tools such as Drill, which do in-memory processing (really,
> > across-the-network exchanges), both the sender and receiver have to run
> > concurrently. This is much harder to schedule than async m/r tasks: it
> > means that the entire Drill cluster (of whatever size) be up and running
> to
> > run a query.
> >
> > The start-up time for Drill is far, far longer than a query. So, it is
> not
> > feasible to use YARN to launch a Drill cluster for each query the way you
> > would do with Spark. Instead, under YARN, Drill is a long running service
> > that handles many queries.
> >
> > Obviously, this is not ideal: I'm sure your engineers want to use a
> > tenant's resources for Drill when running queries, else for Spark, Hive,
> or
> > maybe TensorFlow. If Drill has to be long-running, I'm sure they's like
> to
> > slosh resources between tenants as is done in YARN. As noted above, this
> is
> > a hard problem that DoY did not attempt to solve.
> >
> > One might suggest that Drill grab resources from YARN when Tenant A wants
> > to run a query, and release them when that tenant is done, grabbing new
> > resources when Tenant B wants to run. Impala tried this with Llama and
> > found it did not work. (This is why DoY is quite a bit simpler; no reason
> > to rerun a failed experiment.)
> >
> > Some folks are looking to Kubernetes (K8s) as a solution. But, that just
> > replaces YARN with K8s: Drill is still a long-running process.
> >
> > To solve the problem you identify, you'll need either:
> >
> > * A bunch of work in Drill to build multi-tenancy into Drill, or
> > * A cloud-like solution in which each tenant spins up a Drill cluster
> > within its budget, spinning it down, or resizing it, to stay with an
> > overall budget.
> >
> > The second option can be achieved under YARN with DoY, assuming that DoY
> > added support for graceful shutdown (or the cluster is reduced in size
> only
> > when no queries are active.) Longer-term, a more modern solution would be
> > Drill-on-Kubernetes (DoK?) which Abhishek started on.
> >
> > Engineering is the art of compromise. The question for your engineers is
> > how to achieve the best result given the limitations of the software
> > available today. At the same time, helping the Drill community improve
> the
> > solutions over time.
> >
> > Thanks,
> > - Paul
> >
> >
> >
> >    On Sunday, December 30, 2018, 9:38:04 PM PST, Charles Givre <
> > cgivre@gmail.com> wrote:
> >
> >  Hi Paul,
> > Here’s what our engineers said:
> >
> > From Paul’s response, I understand that there is a slight confusion
> around
> > how multi-tenancy has been enabled in our data lake.
> >
> > Some more details on this –
> >
> > Drill already has the concept of multitenancy where we can have multiple
> > drill clusters running on the same data lake enabled through different
> > ports and zookeeper. But, all of this is launched through the same hard
> > coded yarn queue that we provide as a config parameter.
> >
> > In our data lake, each tenant has a certain amount of compute capacity
> > allotted to them which they can use for their project work. This is
> > provisioned through individual YARN queues for each tenant (resource
> > caging). This restricts the tenants from using cluster resources beyond a
> > certain limit and not impacting other tenants at the same time.
> >
> > Access to these YARN queues is provisioned through ACL memberships.
> >
> > ——
> >
> > Does this make sense?  Is this possible to get Drill to work in this
> > manner, or should we look into opening up JIRAs and working on new
> > capabilities?
> >
> >
> >
> > > On Dec 17, 2018, at 21:59, Paul Rogers <pa...@yahoo.com.INVALID>
> > wrote:
> > >
> > > Hi Kwizera,
> > > I hope my answer to Charles gave you the information you need. If not,
> > please check out the DoY documentation or ask follow-up questions.
> > > Key thing to remember: Drill is a long-running YARN service; queries DO
> > NOT go through YARN queues, they go through Drill directly.
> > >
> > > Thanks,
> > > - Paul
> > >
> > >
> > >
> > >    On Monday, December 17, 2018, 11:01:04 AM PST, Kwizera hugues Teddy
> <
> > nbted2017@gmail.com> wrote:
> > >
> > > Hello,
> > > Same questions ,
> > > I would like to know how drill deal with this yarn fonctionality?
> > > Cheers.
> > >
> > > On Mon, Dec 17, 2018, 17:53 Charles Givre <cgivre@gmail.com wrote:
> > >
> > >> Hello all,
> > >> We are trying to set up a Drill cluster on our corporate data lake.
> Our
> > >> cluster requires dynamic YARN queue allocation for multi-tenant
> > >> environment.  Is this something that Drill supports or is there a
> > >> workaround?
> > >> Thanks!
> > >> —C
> >

Re: Drill on YARN Questions

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Let's try to troubleshoot. Does the combination of stop and start work? If so, then there could be a bug with the restart command itself.

If neither start nor stop work, it could be that you are missing the application ID file created when you first started DoY. Some background.

When we submit an app to YARN, YARN gives us an app ID. We need this in order to track down the app master for DoY so we can send it commands later.

When the command line tool starts DoY, it writes the YARN app ID to a file. Can't remember the details, but it is probably in the $DRILL_SITE directory. The contents are, as I recall, a long hexadecimal string.

When you invoke the command line, the tool reads this file to figure to track down the DoY app master. The tool then sends commands to the app master: in this case, a request to shut down. Then, for reset, the tool will communicate with YARN to start a new instance.

The tool is suppose to give detailed error messages. Did you get any? That might tell us which of these steps failed.

Can you connect to the DoY Web UI at the URL provided when you started DoY? If you can, this means that the DoY App Master is up and running.

Are you running the client from the same node on which you started it? That file I mentioned is local to the "DoY client" machine; it is not in DFS.

Then, there is one more very obscure bug you can check. On some distributions, the YARN task files are written to the /tmp directory. Some Linux systems remove these files from time to time. Once the files are gone, YARN can no longer control its containers: it won't be able to stop the app master or the Drillbit containers. There are two fixes. First, go kill all the processes by hand. Then, move the YARN state files out of /tmp, or exclude YARN's files from the periodic cleanup.

Try some of the above and let us know what you find.

Also, perhaps Abhishek can offer some suggestions as he tested the heck out of the feature and may have additional suggestions.

Thanks,
- Paul

 

    On Friday, January 11, 2019, 7:46:55 AM PST, Kwizera hugues Teddy <nb...@gmail.com> wrote:  
 
 hello,

 2 weeks ago, I began to discover DoY. Today by reading drill documents (
https://drill.apache.org/docs/appendix-a-release-note-issues/ ) I saw that
we can restart drill cluster by :

 $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE restart

But doesn't work when I tested it.

No idea about it?

Thanks.




On Wed, Jan 2, 2019 at 3:18 AM Paul Rogers <pa...@yahoo.com.invalid>
wrote:

> Hi Charles,
>
> Your engineers have identified a common need, but one which is very
> difficult to satisfy.
>
> TL;DR: DoY gets as close to the requirements as possible within the
> constraints of YARN and Drill. But, future projects could do more.
>
> Your engineers want resource segregation among tenants: multi-tenancy.
> This is very difficult to achieve at the application level. Consider Drill.
> It would need some way to identify users to know which tenant they belong
> to. Then, Drill would need a way to enqueue users whose queries would
> exceed the memory or CPU limit for that tenant. Plus, Drill would have to
> be able to limit memory and CPU for each query. Much work has been done to
> limit memory, but CPU is very difficult. Mature products such as Teradata
> can do this, but Teradata has 40 years of effort behind it.
>
> Since it is hard to build multi-tenancy in at the app level (not
> impossible, just very, very hard), the thought is to apply it at the
> cluster level. This is done in YARN via limiting the resources available to
> processes (typically map/reduce) and to limit the number of running
> processes. Works for M/R because each map task uses disk to shuffle results
> to a reduce task, so map and reduce tasks can run asynchronously.
>
> For tools such as Drill, which do in-memory processing (really,
> across-the-network exchanges), both the sender and receiver have to run
> concurrently. This is much harder to schedule than async m/r tasks: it
> means that the entire Drill cluster (of whatever size) be up and running to
> run a query.
>
> The start-up time for Drill is far, far longer than a query. So, it is not
> feasible to use YARN to launch a Drill cluster for each query the way you
> would do with Spark. Instead, under YARN, Drill is a long running service
> that handles many queries.
>
> Obviously, this is not ideal: I'm sure your engineers want to use a
> tenant's resources for Drill when running queries, else for Spark, Hive, or
> maybe TensorFlow. If Drill has to be long-running, I'm sure they's like to
> slosh resources between tenants as is done in YARN. As noted above, this is
> a hard problem that DoY did not attempt to solve.
>
> One might suggest that Drill grab resources from YARN when Tenant A wants
> to run a query, and release them when that tenant is done, grabbing new
> resources when Tenant B wants to run. Impala tried this with Llama and
> found it did not work. (This is why DoY is quite a bit simpler; no reason
> to rerun a failed experiment.)
>
> Some folks are looking to Kubernetes (K8s) as a solution. But, that just
> replaces YARN with K8s: Drill is still a long-running process.
>
> To solve the problem you identify, you'll need either:
>
> * A bunch of work in Drill to build multi-tenancy into Drill, or
> * A cloud-like solution in which each tenant spins up a Drill cluster
> within its budget, spinning it down, or resizing it, to stay with an
> overall budget.
>
> The second option can be achieved under YARN with DoY, assuming that DoY
> added support for graceful shutdown (or the cluster is reduced in size only
> when no queries are active.) Longer-term, a more modern solution would be
> Drill-on-Kubernetes (DoK?) which Abhishek started on.
>
> Engineering is the art of compromise. The question for your engineers is
> how to achieve the best result given the limitations of the software
> available today. At the same time, helping the Drill community improve the
> solutions over time.
>
> Thanks,
> - Paul
>
>
>
>    On Sunday, December 30, 2018, 9:38:04 PM PST, Charles Givre <
> cgivre@gmail.com> wrote:
>
>  Hi Paul,
> Here’s what our engineers said:
>
> From Paul’s response, I understand that there is a slight confusion around
> how multi-tenancy has been enabled in our data lake.
>
> Some more details on this –
>
> Drill already has the concept of multitenancy where we can have multiple
> drill clusters running on the same data lake enabled through different
> ports and zookeeper. But, all of this is launched through the same hard
> coded yarn queue that we provide as a config parameter.
>
> In our data lake, each tenant has a certain amount of compute capacity
> allotted to them which they can use for their project work. This is
> provisioned through individual YARN queues for each tenant (resource
> caging). This restricts the tenants from using cluster resources beyond a
> certain limit and not impacting other tenants at the same time.
>
> Access to these YARN queues is provisioned through ACL memberships.
>
> ——
>
> Does this make sense?  Is this possible to get Drill to work in this
> manner, or should we look into opening up JIRAs and working on new
> capabilities?
>
>
>
> > On Dec 17, 2018, at 21:59, Paul Rogers <pa...@yahoo.com.INVALID>
> wrote:
> >
> > Hi Kwizera,
> > I hope my answer to Charles gave you the information you need. If not,
> please check out the DoY documentation or ask follow-up questions.
> > Key thing to remember: Drill is a long-running YARN service; queries DO
> NOT go through YARN queues, they go through Drill directly.
> >
> > Thanks,
> > - Paul
> >
> >
> >
> >    On Monday, December 17, 2018, 11:01:04 AM PST, Kwizera hugues Teddy <
> nbted2017@gmail.com> wrote:
> >
> > Hello,
> > Same questions ,
> > I would like to know how drill deal with this yarn fonctionality?
> > Cheers.
> >
> > On Mon, Dec 17, 2018, 17:53 Charles Givre <cgivre@gmail.com wrote:
> >
> >> Hello all,
> >> We are trying to set up a Drill cluster on our corporate data lake.  Our
> >> cluster requires dynamic YARN queue allocation for multi-tenant
> >> environment.  Is this something that Drill supports or is there a
> >> workaround?
> >> Thanks!
> >> —C
>  

Re: Drill on YARN Questions

Posted by Kwizera hugues Teddy <nb...@gmail.com>.
hello,

 2 weeks ago, I began to discover DoY. Today by reading drill documents (
https://drill.apache.org/docs/appendix-a-release-note-issues/ ) I saw that
we can restart drill cluster by :

 $DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE restart

But doesn't work when I tested it.

No idea about it?

Thanks.




On Wed, Jan 2, 2019 at 3:18 AM Paul Rogers <pa...@yahoo.com.invalid>
wrote:

> Hi Charles,
>
> Your engineers have identified a common need, but one which is very
> difficult to satisfy.
>
> TL;DR: DoY gets as close to the requirements as possible within the
> constraints of YARN and Drill. But, future projects could do more.
>
> Your engineers want resource segregation among tenants: multi-tenancy.
> This is very difficult to achieve at the application level. Consider Drill.
> It would need some way to identify users to know which tenant they belong
> to. Then, Drill would need a way to enqueue users whose queries would
> exceed the memory or CPU limit for that tenant. Plus, Drill would have to
> be able to limit memory and CPU for each query. Much work has been done to
> limit memory, but CPU is very difficult. Mature products such as Teradata
> can do this, but Teradata has 40 years of effort behind it.
>
> Since it is hard to build multi-tenancy in at the app level (not
> impossible, just very, very hard), the thought is to apply it at the
> cluster level. This is done in YARN via limiting the resources available to
> processes (typically map/reduce) and to limit the number of running
> processes. Works for M/R because each map task uses disk to shuffle results
> to a reduce task, so map and reduce tasks can run asynchronously.
>
> For tools such as Drill, which do in-memory processing (really,
> across-the-network exchanges), both the sender and receiver have to run
> concurrently. This is much harder to schedule than async m/r tasks: it
> means that the entire Drill cluster (of whatever size) be up and running to
> run a query.
>
> The start-up time for Drill is far, far longer than a query. So, it is not
> feasible to use YARN to launch a Drill cluster for each query the way you
> would do with Spark. Instead, under YARN, Drill is a long running service
> that handles many queries.
>
> Obviously, this is not ideal: I'm sure your engineers want to use a
> tenant's resources for Drill when running queries, else for Spark, Hive, or
> maybe TensorFlow. If Drill has to be long-running, I'm sure they's like to
> slosh resources between tenants as is done in YARN. As noted above, this is
> a hard problem that DoY did not attempt to solve.
>
> One might suggest that Drill grab resources from YARN when Tenant A wants
> to run a query, and release them when that tenant is done, grabbing new
> resources when Tenant B wants to run. Impala tried this with Llama and
> found it did not work. (This is why DoY is quite a bit simpler; no reason
> to rerun a failed experiment.)
>
> Some folks are looking to Kubernetes (K8s) as a solution. But, that just
> replaces YARN with K8s: Drill is still a long-running process.
>
> To solve the problem you identify, you'll need either:
>
> * A bunch of work in Drill to build multi-tenancy into Drill, or
> * A cloud-like solution in which each tenant spins up a Drill cluster
> within its budget, spinning it down, or resizing it, to stay with an
> overall budget.
>
> The second option can be achieved under YARN with DoY, assuming that DoY
> added support for graceful shutdown (or the cluster is reduced in size only
> when no queries are active.) Longer-term, a more modern solution would be
> Drill-on-Kubernetes (DoK?) which Abhishek started on.
>
> Engineering is the art of compromise. The question for your engineers is
> how to achieve the best result given the limitations of the software
> available today. At the same time, helping the Drill community improve the
> solutions over time.
>
> Thanks,
> - Paul
>
>
>
>     On Sunday, December 30, 2018, 9:38:04 PM PST, Charles Givre <
> cgivre@gmail.com> wrote:
>
>  Hi Paul,
> Here’s what our engineers said:
>
> From Paul’s response, I understand that there is a slight confusion around
> how multi-tenancy has been enabled in our data lake.
>
> Some more details on this –
>
> Drill already has the concept of multitenancy where we can have multiple
> drill clusters running on the same data lake enabled through different
> ports and zookeeper. But, all of this is launched through the same hard
> coded yarn queue that we provide as a config parameter.
>
> In our data lake, each tenant has a certain amount of compute capacity
> allotted to them which they can use for their project work. This is
> provisioned through individual YARN queues for each tenant (resource
> caging). This restricts the tenants from using cluster resources beyond a
> certain limit and not impacting other tenants at the same time.
>
> Access to these YARN queues is provisioned through ACL memberships.
>
> ——
>
> Does this make sense?  Is this possible to get Drill to work in this
> manner, or should we look into opening up JIRAs and working on new
> capabilities?
>
>
>
> > On Dec 17, 2018, at 21:59, Paul Rogers <pa...@yahoo.com.INVALID>
> wrote:
> >
> > Hi Kwizera,
> > I hope my answer to Charles gave you the information you need. If not,
> please check out the DoY documentation or ask follow-up questions.
> > Key thing to remember: Drill is a long-running YARN service; queries DO
> NOT go through YARN queues, they go through Drill directly.
> >
> > Thanks,
> > - Paul
> >
> >
> >
> >    On Monday, December 17, 2018, 11:01:04 AM PST, Kwizera hugues Teddy <
> nbted2017@gmail.com> wrote:
> >
> > Hello,
> > Same questions ,
> > I would like to know how drill deal with this yarn fonctionality?
> > Cheers.
> >
> > On Mon, Dec 17, 2018, 17:53 Charles Givre <cgivre@gmail.com wrote:
> >
> >> Hello all,
> >> We are trying to set up a Drill cluster on our corporate data lake.  Our
> >> cluster requires dynamic YARN queue allocation for multi-tenant
> >> environment.  Is this something that Drill supports or is there a
> >> workaround?
> >> Thanks!
> >> —C
>

Re: Drill on YARN Questions

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Hi Charles,

Your engineers have identified a common need, but one which is very difficult to satisfy.

TL;DR: DoY gets as close to the requirements as possible within the constraints of YARN and Drill. But, future projects could do more.

Your engineers want resource segregation among tenants: multi-tenancy. This is very difficult to achieve at the application level. Consider Drill. It would need some way to identify users to know which tenant they belong to. Then, Drill would need a way to enqueue users whose queries would exceed the memory or CPU limit for that tenant. Plus, Drill would have to be able to limit memory and CPU for each query. Much work has been done to limit memory, but CPU is very difficult. Mature products such as Teradata can do this, but Teradata has 40 years of effort behind it.

Since it is hard to build multi-tenancy in at the app level (not impossible, just very, very hard), the thought is to apply it at the cluster level. This is done in YARN via limiting the resources available to processes (typically map/reduce) and to limit the number of running processes. Works for M/R because each map task uses disk to shuffle results to a reduce task, so map and reduce tasks can run asynchronously.

For tools such as Drill, which do in-memory processing (really, across-the-network exchanges), both the sender and receiver have to run concurrently. This is much harder to schedule than async m/r tasks: it means that the entire Drill cluster (of whatever size) be up and running to run a query.

The start-up time for Drill is far, far longer than a query. So, it is not feasible to use YARN to launch a Drill cluster for each query the way you would do with Spark. Instead, under YARN, Drill is a long running service that handles many queries.

Obviously, this is not ideal: I'm sure your engineers want to use a tenant's resources for Drill when running queries, else for Spark, Hive, or maybe TensorFlow. If Drill has to be long-running, I'm sure they's like to slosh resources between tenants as is done in YARN. As noted above, this is a hard problem that DoY did not attempt to solve.

One might suggest that Drill grab resources from YARN when Tenant A wants to run a query, and release them when that tenant is done, grabbing new resources when Tenant B wants to run. Impala tried this with Llama and found it did not work. (This is why DoY is quite a bit simpler; no reason to rerun a failed experiment.)

Some folks are looking to Kubernetes (K8s) as a solution. But, that just replaces YARN with K8s: Drill is still a long-running process.

To solve the problem you identify, you'll need either:

* A bunch of work in Drill to build multi-tenancy into Drill, or
* A cloud-like solution in which each tenant spins up a Drill cluster within its budget, spinning it down, or resizing it, to stay with an overall budget.

The second option can be achieved under YARN with DoY, assuming that DoY added support for graceful shutdown (or the cluster is reduced in size only when no queries are active.) Longer-term, a more modern solution would be Drill-on-Kubernetes (DoK?) which Abhishek started on.

Engineering is the art of compromise. The question for your engineers is how to achieve the best result given the limitations of the software available today. At the same time, helping the Drill community improve the solutions over time.

Thanks,
- Paul

 

    On Sunday, December 30, 2018, 9:38:04 PM PST, Charles Givre <cg...@gmail.com> wrote:  
 
 Hi Paul, 
Here’s what our engineers said:

From Paul’s response, I understand that there is a slight confusion around how multi-tenancy has been enabled in our data lake.

Some more details on this – 

Drill already has the concept of multitenancy where we can have multiple drill clusters running on the same data lake enabled through different ports and zookeeper. But, all of this is launched through the same hard coded yarn queue that we provide as a config parameter.

In our data lake, each tenant has a certain amount of compute capacity allotted to them which they can use for their project work. This is provisioned through individual YARN queues for each tenant (resource caging). This restricts the tenants from using cluster resources beyond a certain limit and not impacting other tenants at the same time. 

Access to these YARN queues is provisioned through ACL memberships. 

——

Does this make sense?  Is this possible to get Drill to work in this manner, or should we look into opening up JIRAs and working on new capabilities?



> On Dec 17, 2018, at 21:59, Paul Rogers <pa...@yahoo.com.INVALID> wrote:
> 
> Hi Kwizera,
> I hope my answer to Charles gave you the information you need. If not, please check out the DoY documentation or ask follow-up questions.
> Key thing to remember: Drill is a long-running YARN service; queries DO NOT go through YARN queues, they go through Drill directly.
> 
> Thanks,
> - Paul
> 
> 
> 
>    On Monday, December 17, 2018, 11:01:04 AM PST, Kwizera hugues Teddy <nb...@gmail.com> wrote:  
> 
> Hello,
> Same questions ,
> I would like to know how drill deal with this yarn fonctionality?
> Cheers.
> 
> On Mon, Dec 17, 2018, 17:53 Charles Givre <cgivre@gmail.com wrote:
> 
>> Hello all,
>> We are trying to set up a Drill cluster on our corporate data lake.  Our
>> cluster requires dynamic YARN queue allocation for multi-tenant
>> environment.  Is this something that Drill supports or is there a
>> workaround?
>> Thanks!
>> —C  
  

Re: Drill on YARN Questions

Posted by Charles Givre <cg...@gmail.com>.
Hi Paul, 
Here’s what our engineers said:

From Paul’s response, I understand that there is a slight confusion around how multi-tenancy has been enabled in our data lake.

Some more details on this – 

Drill already has the concept of multitenancy where we can have multiple drill clusters running on the same data lake enabled through different ports and zookeeper. But, all of this is launched through the same hard coded yarn queue that we provide as a config parameter.

In our data lake, each tenant has a certain amount of compute capacity allotted to them which they can use for their project work. This is provisioned through individual YARN queues for each tenant (resource caging). This restricts the tenants from using cluster resources beyond a certain limit and not impacting other tenants at the same time. 

Access to these YARN queues is provisioned through ACL memberships. 

——

Does this make sense?   Is this possible to get Drill to work in this manner, or should we look into opening up JIRAs and working on new capabilities?



> On Dec 17, 2018, at 21:59, Paul Rogers <pa...@yahoo.com.INVALID> wrote:
> 
> Hi Kwizera,
> I hope my answer to Charles gave you the information you need. If not, please check out the DoY documentation or ask follow-up questions.
> Key thing to remember: Drill is a long-running YARN service; queries DO NOT go through YARN queues, they go through Drill directly.
> 
> Thanks,
> - Paul
> 
> 
> 
>    On Monday, December 17, 2018, 11:01:04 AM PST, Kwizera hugues Teddy <nb...@gmail.com> wrote:  
> 
> Hello,
> Same questions ,
> I would like to know how drill deal with this yarn fonctionality?
> Cheers.
> 
> On Mon, Dec 17, 2018, 17:53 Charles Givre <cgivre@gmail.com wrote:
> 
>> Hello all,
>> We are trying to set up a Drill cluster on our corporate data lake.  Our
>> cluster requires dynamic YARN queue allocation for multi-tenant
>> environment.  Is this something that Drill supports or is there a
>> workaround?
>> Thanks!
>> —C  


Re: Drill on YARN Questions

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Hi Kwizera,
I hope my answer to Charles gave you the information you need. If not, please check out the DoY documentation or ask follow-up questions.
Key thing to remember: Drill is a long-running YARN service; queries DO NOT go through YARN queues, they go through Drill directly.

Thanks,
- Paul

 

    On Monday, December 17, 2018, 11:01:04 AM PST, Kwizera hugues Teddy <nb...@gmail.com> wrote:  
 
 Hello,
Same questions ,
I would like to know how drill deal with this yarn fonctionality?
Cheers.

On Mon, Dec 17, 2018, 17:53 Charles Givre <cgivre@gmail.com wrote:

> Hello all,
> We are trying to set up a Drill cluster on our corporate data lake.  Our
> cluster requires dynamic YARN queue allocation for multi-tenant
> environment.  Is this something that Drill supports or is there a
> workaround?
> Thanks!
> —C  

Re: Drill on YARN Questions

Posted by Kwizera hugues Teddy <nb...@gmail.com>.
Hello,
Same questions ,
I would like to know how drill deal with this yarn fonctionality?
Cheers.

On Mon, Dec 17, 2018, 17:53 Charles Givre <cgivre@gmail.com wrote:

> Hello all,
> We are trying to set up a Drill cluster on our corporate data lake.  Our
> cluster requires dynamic YARN queue allocation for multi-tenant
> environment.  Is this something that Drill supports or is there a
> workaround?
> Thanks!
> —C