You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by 谢良 <xi...@xiaomi.com> on 2014/07/29 05:03:14 UTC

About "dfs.client-write-packet-size" setting

The default dfs.client-write-packet-size value is 64k, at least it's in my Hadoop2 env.
I did a benchmark about i via ycsb loading 2 million records(3*200 bytes):
1) dfs.client-write-packet-size=64k ygc count:399, ygct:4.208s
2) dfs.client-write-packet-size=8k ygc count:163, ygct:2.644s
you see, it's about 40% benefit on gct:)
It's because: in DFSOutputStream.Packet class, each "Create a new packet" operation,
will call "buf = new byte[PacketHeader.PKT_MAX_HEADER_LEN + pktSize];",
here "pktSize" comes from dfs.client-write-packet-size setting, and in HBase write scenario,
we sync WAL asap, so all the new packets are very small
(in my ycsb testing, most of them were only hundreds of bytes, or a few kilo bytes), 
rarely reached to 64k, so always allocating 64k array is just a waste.
It would be better that if we add it to refguide note:)

ps; 8k just a test setting, we should set it according the real kv size pattern.

Thanks,

Re: About "dfs.client-write-packet-size" setting

Posted by Stack <st...@duboce.net>.
On Tue, Jul 29, 2014 at 10:09 AM, Todd Lipcon <to...@cloudera.com> wrote:

> What about a patch to HDFS to reuse these buffers in a pool?
>
>
Makes sense Todd.  A bit of recycling should make the numbers better again
and remove need of explicit sizing. I can try and have a go at this one.

St.Ack

Re: About "dfs.client-write-packet-size" setting

Posted by Todd Lipcon <to...@cloudera.com>.
What about a patch to HDFS to reuse these buffers in a pool?


On Tue, Jul 29, 2014 at 10:08 AM, Stack <st...@duboce.net> wrote:

> You the man @liang xie.  Let me try your suggestion here on my little test
> bench.  Lets get the below into refguide also....
> St.Ack
>
>
> On Mon, Jul 28, 2014 at 8:03 PM, 谢良 <xi...@xiaomi.com> wrote:
>
> > The default dfs.client-write-packet-size value is 64k, at least it's in
> my
> > Hadoop2 env.
> > I did a benchmark about i via ycsb loading 2 million records(3*200
> bytes):
> > 1) dfs.client-write-packet-size=64k ygc count:399, ygct:4.208s
> > 2) dfs.client-write-packet-size=8k ygc count:163, ygct:2.644s
> > you see, it's about 40% benefit on gct:)
> > It's because: in DFSOutputStream.Packet class, each "Create a new packet"
> > operation,
> > will call "buf = new byte[PacketHeader.PKT_MAX_HEADER_LEN + pktSize];",
> > here "pktSize" comes from dfs.client-write-packet-size setting, and in
> > HBase write scenario,
> > we sync WAL asap, so all the new packets are very small
> > (in my ycsb testing, most of them were only hundreds of bytes, or a few
> > kilo bytes),
> > rarely reached to 64k, so always allocating 64k array is just a waste.
> > It would be better that if we add it to refguide note:)
> >
> > ps; 8k just a test setting, we should set it according the real kv size
> > pattern.
> >
> > Thanks,
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: About "dfs.client-write-packet-size" setting

Posted by Stack <st...@duboce.net>.
You the man @liang xie.  Let me try your suggestion here on my little test
bench.  Lets get the below into refguide also....
St.Ack


On Mon, Jul 28, 2014 at 8:03 PM, 谢良 <xi...@xiaomi.com> wrote:

> The default dfs.client-write-packet-size value is 64k, at least it's in my
> Hadoop2 env.
> I did a benchmark about i via ycsb loading 2 million records(3*200 bytes):
> 1) dfs.client-write-packet-size=64k ygc count:399, ygct:4.208s
> 2) dfs.client-write-packet-size=8k ygc count:163, ygct:2.644s
> you see, it's about 40% benefit on gct:)
> It's because: in DFSOutputStream.Packet class, each "Create a new packet"
> operation,
> will call "buf = new byte[PacketHeader.PKT_MAX_HEADER_LEN + pktSize];",
> here "pktSize" comes from dfs.client-write-packet-size setting, and in
> HBase write scenario,
> we sync WAL asap, so all the new packets are very small
> (in my ycsb testing, most of them were only hundreds of bytes, or a few
> kilo bytes),
> rarely reached to 64k, so always allocating 64k array is just a waste.
> It would be better that if we add it to refguide note:)
>
> ps; 8k just a test setting, we should set it according the real kv size
> pattern.
>
> Thanks,
>