You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by Tom Seddon <mr...@gmail.com> on 2013/11/02 16:24:14 UTC

Re: Distributed Drill question

Thanks Jacques, yes that answers it.  I'm researching as much as I can
about Drill for my masters project.  Perhaps I'll be in a position to
contribute documentation at a later stage but not sure the best way to
approach it. I'm currently trawling through the various presentations but
grateful for any suggestions on how else to approach it. Perhaps by
analysing the codebase/generating javadoc?
On 30 Oct 2013 22:11, "Jacques Nadeau" <ja...@apache.org> wrote:

> We're a bit lacking in docs, sorry about that.
>
> Drill maintains the concept of host affinity for individual operations.  In
> the case of scans, this is typically associated with the locality
> information of the HDFS blocks or HBase region servers.  Drillbits are
> designed to be run next to the storage processes and have awareness of this
> information.
>
> Does that answer your question?
>
> Thanks,
> Jacques
>
>
> On Wed, Oct 30, 2013 at 2:43 AM, Tom Seddon <mr...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I would like to know more about how Drill's parallel processing of
> queries
> > relates, if at all, to the parallel nature of a data source such as
> > Hadeoop.  Am I correct in thinking that if a Drill cluster is querying
> data
> > from a Hadoop cluster, that the drillbits are unaware of where the data
> > resides in HDFS, as their interaction is through the NameNode.  If this
> is
> > the case, how does scaling Drill out help performance if it's always
> having
> > to route through the NameNode?
> >
> > Sorry if this is a silly question.  I've tried to find the answer by
> > reading the documentation and the mailing list, but I'm still not clear
> on
> > it.
> >
> > Thanks,
> >
> > Tom
> >
>