You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Anoop Sam John <an...@huawei.com> on 2012/05/29 14:55:26 UTC

Regarding memstoreTS in bulkloaded HFiles

Hi Devs

            In HFile V2 we have introduced the memstore TS to be getting written to the HFiles. In case of bulk load also, now we are writing a long value as part of every KV. I think in case of the bulk loading there is no meaning for the memstore TS. Can we avoid this?



As of now we are not able to set any Block encoder algo as part of bulk loading. But I have created HBASE-6040 which solves this. I have checked the current available encoder algos but none of them handles the memstoreTS as such. There is a new type of trie encoder issue open. In this it seems it will handle this kind of scenario. Only one long value will get stored as memstoreTS for one block.    Still thes all makes it mandatory that some block encoder scheme to be used.



Do we need to think making the memstoreTS write into the HFile (in version 2) as some way configurable? In case of bulk loading we can turn it OFF. Pls correct me if my understanding is wrong



-Anoop-

RE: Regarding memstoreTS in bulkloaded HFiles

Posted by Anoop Sam John <an...@huawei.com>.
@Stack
> As of now we are not able to set any Block encoder algo as part of bulk loading.
HBASE-6040 will address this issue  :)

>How would we Anoop?  If we create KVs during an upload, the KV
>instance will have a memstorets data member?
 Yes Stack. KV instances will have a long member memstoreTS in it defaults to 0L.  In case of bulk load this value will remain as it is.
When we use HFile V2, the writer will include the memstoreTS into bytes written as part of KSs.
As I said in my last mail, this wont use 8 bytes per KV, but 1 byte per KV when the value is 0. We write it as Vlong type.
(Initially I thought it will be 8 bytes which will be like a huge wastage of space). One byte also we need to be considered? When the number of records and KVs per records are highs may be this will take up space...

-Anoop-
________________________________________
From: saint.ack@gmail.com [saint.ack@gmail.com] on behalf of Stack [stack@duboce.net]
Sent: Thursday, May 31, 2012 12:26 AM
To: dev@hbase.apache.org
Subject: Re: Regarding memstoreTS in bulkloaded HFiles

On Tue, May 29, 2012 at 5:55 AM, Anoop Sam John <an...@huawei.com> wrote:
>            In HFile V2 we have introduced the memstore TS to be getting written to the HFiles. In case of bulk load also, now we are writing a long value as part of every KV. I think in case of the bulk loading there is no meaning for the memstore TS. Can we avoid this?
>

How would we Anoop?  If we create KVs during an upload, the KV
instance will have a memstorets data member?

> As of now we are not able to set any Block encoder algo as part of bulk loading.

This would be a nice feature.  Some of the encodings are intensive so
doing it offline would be sweet for read-heavy deploys.

> Do we need to think making the memstoreTS write into the HFile (in version 2) as some way configurable? In case of bulk loading we can turn it OFF. Pls correct me if my understanding is wrong
>

What you thinking?  This could be a nice addtiion.
St.Ack

Re: Regarding memstoreTS in bulkloaded HFiles

Posted by Stack <st...@duboce.net>.
On Tue, May 29, 2012 at 5:55 AM, Anoop Sam John <an...@huawei.com> wrote:
>            In HFile V2 we have introduced the memstore TS to be getting written to the HFiles. In case of bulk load also, now we are writing a long value as part of every KV. I think in case of the bulk loading there is no meaning for the memstore TS. Can we avoid this?
>

How would we Anoop?  If we create KVs during an upload, the KV
instance will have a memstorets data member?

> As of now we are not able to set any Block encoder algo as part of bulk loading.

This would be a nice feature.  Some of the encodings are intensive so
doing it offline would be sweet for read-heavy deploys.

> Do we need to think making the memstoreTS write into the HFile (in version 2) as some way configurable? In case of bulk loading we can turn it OFF. Pls correct me if my understanding is wrong
>

What you thinking?  This could be a nice addtiion.
St.Ack

RE: Regarding memstoreTS in bulkloaded HFiles

Posted by Anoop Sam John <an...@huawei.com>.
Hi All,
            One correction here is
We store memstoreTS as a VLong. In case of bulk load all the KVs will have a memstoreTS=0. So this will result in one byte getting written per KV. [Not 8 bytes per KV].  

Still we need consider avoiding this 1 byte? Suggestions from those guys who use bulk loading feature.....?
   
-Anoop-
_________________________________
From: Anoop Sam John [anoopsj@huawei.com]
Sent: Wednesday, May 30, 2012 9:27 AM
To: dev@hbase.apache.org
Subject: RE: Regarding memstoreTS in bulkloaded HFiles

Matt,
        Thanks for your reply. Yes I have seen that your algo is dealing with this memstoreTS. Why I said may be we should allow to exclude (in bulkload- Just allow)
it should not be like for this reason some one should use the any of the encoding scheme. May be as per the use case some one may not wish to go with any of the encoding ( may be). In that case also these 8 bytes per KV may be we can easily save.  As per my study on the code, in bulk load case storing this might not be needed at all..  And now we will write all these values as 0 only as there is no memstoreTS in bulk load.

This memstoreTS was added for readers consistency when memstore is getting flushed in btw read ( correct me if my understanding is wrong pls )

Mean while  good work Matt on the new encoder. Looking forward to get it included in HBase.  :)

-Anoop-
________________________________________
From: Matt Corgan [mcorgan@hotpads.com]
Sent: Wednesday, May 30, 2012 1:57 AM
To: dev@hbase.apache.org
Subject: Re: Regarding memstoreTS in bulkloaded HFiles

Hi Anoop,

I'm working on the Trie encoding you mentioned.  Just to confirm - it does
support encoding the memstore timestamp, and in the case that they are all
0, it will not take up any space.

I think the other DataBlockEncoders also write it to disk.  See
PrefixTrieDataBlockEncoder.afterEncodingKeyValue(..)

As for whether it's ever needed in a bulk load, I unfortunately don't know.
 My guess would be no, or that it's too exotic of a use case to worry
about.  Maybe someone else can confirm.  But, I'd say you might as well
support the option to include it since it will not take up any space after
encoded.

Matt

On Tue, May 29, 2012 at 5:55 AM, Anoop Sam John <an...@huawei.com> wrote:

> Hi Devs
>
>            In HFile V2 we have introduced the memstore TS to be getting
> written to the HFiles. In case of bulk load also, now we are writing a long
> value as part of every KV. I think in case of the bulk loading there is no
> meaning for the memstore TS. Can we avoid this?
>
>
>
> As of now we are not able to set any Block encoder algo as part of bulk
> loading. But I have created HBASE-6040 which solves this. I have checked
> the current available encoder algos but none of them handles the memstoreTS
> as such. There is a new type of trie encoder issue open. In this it seems
> it will handle this kind of scenario. Only one long value will get stored
> as memstoreTS for one block.    Still thes all makes it mandatory that some
> block encoder scheme to be used.
>
>
>
> Do we need to think making the memstoreTS write into the HFile (in version
> 2) as some way configurable? In case of bulk loading we can turn it OFF.
> Pls correct me if my understanding is wrong
>
>
>
> -Anoop-
>

RE: Regarding memstoreTS in bulkloaded HFiles

Posted by Anoop Sam John <an...@huawei.com>.
Matt,
        Thanks for your reply. Yes I have seen that your algo is dealing with this memstoreTS. Why I said may be we should allow to exclude (in bulkload- Just allow)
it should not be like for this reason some one should use the any of the encoding scheme. May be as per the use case some one may not wish to go with any of the encoding ( may be). In that case also these 8 bytes per KV may be we can easily save.  As per my study on the code, in bulk load case storing this might not be needed at all..  And now we will write all these values as 0 only as there is no memstoreTS in bulk load.

This memstoreTS was added for readers consistency when memstore is getting flushed in btw read ( correct me if my understanding is wrong pls )

Mean while  good work Matt on the new encoder. Looking forward to get it included in HBase.  :)

-Anoop-
________________________________________
From: Matt Corgan [mcorgan@hotpads.com]
Sent: Wednesday, May 30, 2012 1:57 AM
To: dev@hbase.apache.org
Subject: Re: Regarding memstoreTS in bulkloaded HFiles

Hi Anoop,

I'm working on the Trie encoding you mentioned.  Just to confirm - it does
support encoding the memstore timestamp, and in the case that they are all
0, it will not take up any space.

I think the other DataBlockEncoders also write it to disk.  See
PrefixTrieDataBlockEncoder.afterEncodingKeyValue(..)

As for whether it's ever needed in a bulk load, I unfortunately don't know.
 My guess would be no, or that it's too exotic of a use case to worry
about.  Maybe someone else can confirm.  But, I'd say you might as well
support the option to include it since it will not take up any space after
encoded.

Matt

On Tue, May 29, 2012 at 5:55 AM, Anoop Sam John <an...@huawei.com> wrote:

> Hi Devs
>
>            In HFile V2 we have introduced the memstore TS to be getting
> written to the HFiles. In case of bulk load also, now we are writing a long
> value as part of every KV. I think in case of the bulk loading there is no
> meaning for the memstore TS. Can we avoid this?
>
>
>
> As of now we are not able to set any Block encoder algo as part of bulk
> loading. But I have created HBASE-6040 which solves this. I have checked
> the current available encoder algos but none of them handles the memstoreTS
> as such. There is a new type of trie encoder issue open. In this it seems
> it will handle this kind of scenario. Only one long value will get stored
> as memstoreTS for one block.    Still thes all makes it mandatory that some
> block encoder scheme to be used.
>
>
>
> Do we need to think making the memstoreTS write into the HFile (in version
> 2) as some way configurable? In case of bulk loading we can turn it OFF.
> Pls correct me if my understanding is wrong
>
>
>
> -Anoop-
>

Re: Regarding memstoreTS in bulkloaded HFiles

Posted by Matt Corgan <mc...@hotpads.com>.
Hi Anoop,

I'm working on the Trie encoding you mentioned.  Just to confirm - it does
support encoding the memstore timestamp, and in the case that they are all
0, it will not take up any space.

I think the other DataBlockEncoders also write it to disk.  See
PrefixTrieDataBlockEncoder.afterEncodingKeyValue(..)

As for whether it's ever needed in a bulk load, I unfortunately don't know.
 My guess would be no, or that it's too exotic of a use case to worry
about.  Maybe someone else can confirm.  But, I'd say you might as well
support the option to include it since it will not take up any space after
encoded.

Matt

On Tue, May 29, 2012 at 5:55 AM, Anoop Sam John <an...@huawei.com> wrote:

> Hi Devs
>
>            In HFile V2 we have introduced the memstore TS to be getting
> written to the HFiles. In case of bulk load also, now we are writing a long
> value as part of every KV. I think in case of the bulk loading there is no
> meaning for the memstore TS. Can we avoid this?
>
>
>
> As of now we are not able to set any Block encoder algo as part of bulk
> loading. But I have created HBASE-6040 which solves this. I have checked
> the current available encoder algos but none of them handles the memstoreTS
> as such. There is a new type of trie encoder issue open. In this it seems
> it will handle this kind of scenario. Only one long value will get stored
> as memstoreTS for one block.    Still thes all makes it mandatory that some
> block encoder scheme to be used.
>
>
>
> Do we need to think making the memstoreTS write into the HFile (in version
> 2) as some way configurable? In case of bulk loading we can turn it OFF.
> Pls correct me if my understanding is wrong
>
>
>
> -Anoop-
>