You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by Lisen Mu <im...@gmail.com> on 2013/04/07 13:01:21 UTC

Re: Storage file format

Ted,

Sorry to bring up old thread, but could you provide more information
on this? Data collocation might be essential to operator performance
like join.

 > For some file systems such as the one in the MapR distribution, you can
> force files to have identical locality but on less advanced systems like
> HDFS, this is not typically possible.

Re: Storage file format

Posted by Lisen Mu <im...@gmail.com>.

Jason,

Thanks for your information!




On Wed, Apr 10, 2013 at 6:09 AM, Jason Frantz <jf...@maprtech.com> wrote:

> Jumping in for Ted... This can be done in a MapR specific way by setting
> chunksize to 0. See the answer here for a slightly longer explanation:
>
>
> http://answers.mapr.com/questions/3600/large-number-of-small-files-optimal-chunk-size
>
>
> On Sun, Apr 7, 2013 at 4:01 AM, Lisen Mu <im...@gmail.com> wrote:
>
> > Ted,
> >
> > Sorry to bring up old thread, but could you provide more information
> > on this? Data collocation might be essential to operator performance
> > like join.
> >
> >  > For some file systems such as the one in the MapR distribution, you
> can
> > > force files to have identical locality but on less advanced systems
> like
> > > HDFS, this is not typically possible.
> >
>

Re: Storage file format

Posted by Ted Dunning <te...@gmail.com>.

On Thu, Apr 11, 2013 at 6:52 PM, Lisen Mu <im...@gmail.com> wrote:

> Ted,
>
> Thanks for the reply!
>
> > A 1GbE linke and 12 drives, however, results in about 100MB/s network and
> > nearly 1GB/s disk which is a large imbalance.
>
> 1 Gb sadly... with 4 SATA
>

So your disk is 4x faster than your network.

Locality will be very important to you.


> > Presumably.  Note that the real problem here is how to get HBase to
> > collocate the regions.  That will be much harder than telling Drill.  The
> > problem is that regions are randomly assigned to region-servers.
>
> Exactly. I'm still figuring it out. It's something deep into HBase or
> alternative to HBase.
>

Re: Storage file format

Posted by Lisen Mu <im...@gmail.com>.

Ted,

Thanks for the reply!

> A 1GbE linke and 12 drives, however, results in about 100MB/s network and
> nearly 1GB/s disk which is a large imbalance.

1 Gb sadly... with 4 SATA

> Presumably.  Note that the real problem here is how to get HBase to
> collocate the regions.  That will be much harder than telling Drill.  The
> problem is that regions are randomly assigned to region-servers.

Exactly. I'm still figuring it out. It's something deep into HBase or
alternative to HBase.

On Thu, Apr 11, 2013 at 10:09 PM, Ted Dunning <te...@gmail.com> wrote:

> On Thu, Apr 11, 2013 at 12:53 AM, Lisen Mu <im...@gmail.com> wrote:
>
> > ...
> > We have Gbps adapter and SATA, so our network is not slower than disk on
> > each machine indeed.
>
>
> Is that 1 Gbps or 10 Gbps?  Is that one SATA disk or many.
>
> A relatively standard configuration is to use one or two 10 GbE and 12 SATA
> drives per box.  This gives about 1GB / s  for network and disk
> respectively.
>
> A 1GbE linke and 12 drives, however, results in about 100MB/s network and
> nearly 1GB/s disk which is a large imbalance.
>
>
> however our services are hosted in a public ISP, and
> > we have little control over the environment (rack, switch etc.). Will
> that
> > be a problem?  We assume yes...
> >
>
> It can be, but can also work reasonably well.
>
>
> >
> > So we are still take data collocation into consideration, which raises
> > another question: how to tell Drill about this data collocation info.
> >
>
> In many cases it won't be necessary to tell Drill about anything.  If you
> have collocated data, it will just run much faster.
>
>
> > .... But if I have 2 partitioned data source R & S (2
> > HTables for example) with site dependency on join condition ( R[i] JOIN
> > S[j] = null if i != j, and R[i] andS[i] on same machine), then I can do a
> > distributed join on each site R[i] JOIN S[i] and UNION much less data
> > together. Again, how to tell Drill about this information is a problem. I
> > assume that with good PartitionDef it's doable.
> >
>
> Presumably.  Note that the real problem here is how to get HBase to
> collocate the regions.  That will be much harder than telling Drill.  The
> problem is that regions are randomly assigned to region-servers.
>

Re: Storage file format

Posted by Ted Dunning <te...@gmail.com>.

On Thu, Apr 11, 2013 at 12:53 AM, Lisen Mu <im...@gmail.com> wrote:

> ...
> We have Gbps adapter and SATA, so our network is not slower than disk on
> each machine indeed.

Is that 1 Gbps or 10 Gbps?  Is that one SATA disk or many.

A relatively standard configuration is to use one or two 10 GbE and 12 SATA
drives per box.  This gives about 1GB / s  for network and disk
respectively.

A 1GbE linke and 12 drives, however, results in about 100MB/s network and
nearly 1GB/s disk which is a large imbalance.

however our services are hosted in a public ISP, and
> we have little control over the environment (rack, switch etc.). Will that
> be a problem?  We assume yes...
>

It can be, but can also work reasonably well.

>
> So we are still take data collocation into consideration, which raises
> another question: how to tell Drill about this data collocation info.
>

In many cases it won't be necessary to tell Drill about anything.  If you
have collocated data, it will just run much faster.

> .... But if I have 2 partitioned data source R & S (2
> HTables for example) with site dependency on join condition ( R[i] JOIN
> S[j] = null if i != j, and R[i] andS[i] on same machine), then I can do a
> distributed join on each site R[i] JOIN S[i] and UNION much less data
> together. Again, how to tell Drill about this information is a problem. I
> assume that with good PartitionDef it's doable.
>

Presumably.  Note that the real problem here is how to get HBase to
collocate the regions.  That will be much harder than telling Drill.  The
problem is that regions are randomly assigned to region-servers.

Re: Storage file format

Posted by Lisen Mu <im...@gmail.com>.

Thanks Ted!

We have Gbps adapter and SATA, so our network is not slower than disk on
each machine indeed. however our services are hosted in a public ISP, and
we have little control over the environment (rack, switch etc.). Will that
be a problem?  We assume yes...

So we are still take data collocation into consideration, which raises
another question: how to tell Drill about this data collocation info.

About -setchunksize: It may not be exactly what I have in mind. If I have 2
data files within the same directory I can get much faster mapper side Join
with -setchunksize. But if I have 2 partitioned data source R & S (2
HTables for example) with site dependency on join condition ( R[i] JOIN
S[j] = null if i != j, and R[i] andS[i] on same machine), then I can do a
distributed join on each site R[i] JOIN S[i] and UNION much less data
together. Again, how to tell Drill about this information is a problem. I
assume that with good PartitionDef it's doable.

On Thu, Apr 11, 2013 at 3:19 AM, Ted Dunning <te...@gmail.com> wrote:

> Thanks Jason!
>
> Lisen,
>
> Yes.  This is exactly why our customer does this.  The end result is more
> than an order of magnitude speedup in the join because a complex join can
> be done entirely in the input format or in the mapper.
>
> Note that a significant of the massive speedup seen in this case is due to
> the limitations of Hadoop map-reduce.  Drill won't suffer that problem.
>  The speedup should still be quite significant, especially on machines
> where disk is much faster than network.
>
>
> On Tue, Apr 9, 2013 at 3:09 PM, Jason Frantz <jf...@maprtech.com> wrote:
>
> > Jumping in for Ted... This can be done in a MapR specific way by setting
> > chunksize to 0. See the answer here for a slightly longer explanation:
> >
> >
> >
> http://answers.mapr.com/questions/3600/large-number-of-small-files-optimal-chunk-size
> >
> >
> > On Sun, Apr 7, 2013 at 4:01 AM, Lisen Mu <im...@gmail.com> wrote:
> >
> > > Ted,
> > >
> > > Sorry to bring up old thread, but could you provide more information
> > > on this? Data collocation might be essential to operator performance
> > > like join.
> > >
> > >  > For some file systems such as the one in the MapR distribution, you
> > can
> > > > force files to have identical locality but on less advanced systems
> > like
> > > > HDFS, this is not typically possible.
> > >
> >
>

Re: Storage file format

Posted by Ted Dunning <te...@gmail.com>.

Thanks Jason!

Lisen,

Yes.  This is exactly why our customer does this.  The end result is more
than an order of magnitude speedup in the join because a complex join can
be done entirely in the input format or in the mapper.

Note that a significant of the massive speedup seen in this case is due to
the limitations of Hadoop map-reduce.  Drill won't suffer that problem.
 The speedup should still be quite significant, especially on machines
where disk is much faster than network.

On Tue, Apr 9, 2013 at 3:09 PM, Jason Frantz <jf...@maprtech.com> wrote:

> Jumping in for Ted... This can be done in a MapR specific way by setting
> chunksize to 0. See the answer here for a slightly longer explanation:
>
>
> http://answers.mapr.com/questions/3600/large-number-of-small-files-optimal-chunk-size
>
>
> On Sun, Apr 7, 2013 at 4:01 AM, Lisen Mu <im...@gmail.com> wrote:
>
> > Ted,
> >
> > Sorry to bring up old thread, but could you provide more information
> > on this? Data collocation might be essential to operator performance
> > like join.
> >
> >  > For some file systems such as the one in the MapR distribution, you
> can
> > > force files to have identical locality but on less advanced systems
> like
> > > HDFS, this is not typically possible.
> >
>

Re: Storage file format

Posted by Jason Frantz <jf...@maprtech.com>.

Jumping in for Ted... This can be done in a MapR specific way by setting
chunksize to 0. See the answer here for a slightly longer explanation:

http://answers.mapr.com/questions/3600/large-number-of-small-files-optimal-chunk-size

On Sun, Apr 7, 2013 at 4:01 AM, Lisen Mu <im...@gmail.com> wrote:

> Ted,
>
> Sorry to bring up old thread, but could you provide more information
> on this? Data collocation might be essential to operator performance
> like join.
>
>  > For some file systems such as the one in the MapR distribution, you can
> > force files to have identical locality but on less advanced systems like
> > HDFS, this is not typically possible.
>