You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Shankar Mane <sh...@games24x7.com> on 2016/05/06 07:50:02 UTC

Drill (CTAS) Default hadoop Replication factor on HDFS ?

We have hadoop cluster where default replication factor (dfs.replication)
is set to 1 ( this cluster is just plug and play, hence we don't need to
store more than 1 copies).

When we used drill *CTAS*, it has created table on *HDFS* with their
own *replication
factor of 3. *

*Questions are *-
1. why cant it uses Hadoop default replication factor ?
2. Is there any setting in Drill to change hadoop replication factor
realtime ?

Re: Drill (CTAS) Default hadoop Replication factor on HDFS ?

Posted by Jason Altekruse <ja...@dremio.com>.
I think this is a bug with the config block feature. We currently apply
this at the storage plugin level, but it does not appear that we are
sourcing this configuration from the writers and instead creating a new
configuration in each of our three current record writers. I have filed
this bug to investigate the current design further and fix the bug [1].

[1] - https://issues.apache.org/jira/browse/DRILL-4663

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Tue, May 10, 2016 at 12:54 AM, Shankar Mane <sh...@games24x7.com>
wrote:

> Thanks Abhishek Girish. Copying hdfs-site.xml  into Drill conf directory
> (on all nodes) works for me.
>
> And also tried config options setting. It does getting applied at storage
> plugin level But no effects.
>
> On Sat, May 7, 2016 at 11:29 PM, Jacques Nadeau <ja...@dremio.com>
> wrote:
>
> > My suggestion would be to use Drill's capability to have config options
> in
> > the storage plugin rather than copying the hdfs-site.xml everywhere.
> Keeps
> > it in one place and allows you to tune per system you are interacting
> with
> > (instead of globally). See here for more detail:
> >
> > https://issues.apache.org/jira/browse/DRILL-4383
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Fri, May 6, 2016 at 10:17 AM, Abhishek Girish <
> > abhishek.girish@gmail.com>
> > wrote:
> >
> > > Hello,
> > >
> > > Assuming you have defined your replication factor setting inside your
> > > cluster hdfs-site.xml, it might be worth a try to copy this config file
> > > into your Drill conf directory (on all nodes). While I haven't tried
> this
> > > myself, i'm hoping this could help.
> > >
> > > -Abhishek
> > >
> > > On Fri, May 6, 2016 at 12:50 AM, Shankar Mane <
> > shankar.mane@games24x7.com>
> > > wrote:
> > >
> > > > We have hadoop cluster where default replication factor
> > (dfs.replication)
> > > > is set to 1 ( this cluster is just plug and play, hence we don't need
> > to
> > > > store more than 1 copies).
> > > >
> > > > When we used drill *CTAS*, it has created table on *HDFS* with their
> > > > own *replication
> > > > factor of 3. *
> > > >
> > > > *Questions are *-
> > > > 1. why cant it uses Hadoop default replication factor ?
> > > > 2. Is there any setting in Drill to change hadoop replication factor
> > > > realtime ?
> > > >
> > >
> >
>

Re: Drill (CTAS) Default hadoop Replication factor on HDFS ?

Posted by Shankar Mane <sh...@games24x7.com>.
Thanks Abhishek Girish. Copying hdfs-site.xml  into Drill conf directory
(on all nodes) works for me.

And also tried config options setting. It does getting applied at storage
plugin level But no effects.

On Sat, May 7, 2016 at 11:29 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> My suggestion would be to use Drill's capability to have config options in
> the storage plugin rather than copying the hdfs-site.xml everywhere. Keeps
> it in one place and allows you to tune per system you are interacting with
> (instead of globally). See here for more detail:
>
> https://issues.apache.org/jira/browse/DRILL-4383
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Fri, May 6, 2016 at 10:17 AM, Abhishek Girish <
> abhishek.girish@gmail.com>
> wrote:
>
> > Hello,
> >
> > Assuming you have defined your replication factor setting inside your
> > cluster hdfs-site.xml, it might be worth a try to copy this config file
> > into your Drill conf directory (on all nodes). While I haven't tried this
> > myself, i'm hoping this could help.
> >
> > -Abhishek
> >
> > On Fri, May 6, 2016 at 12:50 AM, Shankar Mane <
> shankar.mane@games24x7.com>
> > wrote:
> >
> > > We have hadoop cluster where default replication factor
> (dfs.replication)
> > > is set to 1 ( this cluster is just plug and play, hence we don't need
> to
> > > store more than 1 copies).
> > >
> > > When we used drill *CTAS*, it has created table on *HDFS* with their
> > > own *replication
> > > factor of 3. *
> > >
> > > *Questions are *-
> > > 1. why cant it uses Hadoop default replication factor ?
> > > 2. Is there any setting in Drill to change hadoop replication factor
> > > realtime ?
> > >
> >
>

Re: Drill (CTAS) Default hadoop Replication factor on HDFS ?

Posted by Jacques Nadeau <ja...@dremio.com>.
My suggestion would be to use Drill's capability to have config options in
the storage plugin rather than copying the hdfs-site.xml everywhere. Keeps
it in one place and allows you to tune per system you are interacting with
(instead of globally). See here for more detail:

https://issues.apache.org/jira/browse/DRILL-4383

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Fri, May 6, 2016 at 10:17 AM, Abhishek Girish <ab...@gmail.com>
wrote:

> Hello,
>
> Assuming you have defined your replication factor setting inside your
> cluster hdfs-site.xml, it might be worth a try to copy this config file
> into your Drill conf directory (on all nodes). While I haven't tried this
> myself, i'm hoping this could help.
>
> -Abhishek
>
> On Fri, May 6, 2016 at 12:50 AM, Shankar Mane <sh...@games24x7.com>
> wrote:
>
> > We have hadoop cluster where default replication factor (dfs.replication)
> > is set to 1 ( this cluster is just plug and play, hence we don't need to
> > store more than 1 copies).
> >
> > When we used drill *CTAS*, it has created table on *HDFS* with their
> > own *replication
> > factor of 3. *
> >
> > *Questions are *-
> > 1. why cant it uses Hadoop default replication factor ?
> > 2. Is there any setting in Drill to change hadoop replication factor
> > realtime ?
> >
>

Re: Drill (CTAS) Default hadoop Replication factor on HDFS ?

Posted by Abhishek Girish <ab...@gmail.com>.
Hello,

Assuming you have defined your replication factor setting inside your
cluster hdfs-site.xml, it might be worth a try to copy this config file
into your Drill conf directory (on all nodes). While I haven't tried this
myself, i'm hoping this could help.

-Abhishek

On Fri, May 6, 2016 at 12:50 AM, Shankar Mane <sh...@games24x7.com>
wrote:

> We have hadoop cluster where default replication factor (dfs.replication)
> is set to 1 ( this cluster is just plug and play, hence we don't need to
> store more than 1 copies).
>
> When we used drill *CTAS*, it has created table on *HDFS* with their
> own *replication
> factor of 3. *
>
> *Questions are *-
> 1. why cant it uses Hadoop default replication factor ?
> 2. Is there any setting in Drill to change hadoop replication factor
> realtime ?
>