You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by ramkrishna vasudevan <ra...@gmail.com> on 2013/07/18 19:14:56 UTC

DISCUSS : HFile V3 proposal for tags in 0.96

For the past couple of months, we have been working through various
prototypes for supporting inline storage of tags in cells as persisted on
disk. Our goals are to support optional use of tags with minimal changes to
core code while also avoiding performance impacts to users who do not use
tags.

 For background, refer to the comments in

https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13708228&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13708228

and

https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653

 We have iterated on a couple of prototypes that implement tag awareness in
DataBlockEncoders, later as a new type of Codec for Cells. This point is
discussed in the above comments in HBASE-8496.

We think that tag awareness in Cell Codecs is the right way, but there are
some shortcomings with the current interfaces internal to HFile that need
to addressed in order to avoid any performance impacts for those who do not
want to use inline tags, and that may involve a drastic amount of code
change.

 We can avoid several problems with HFile V2 internals, and backwards
compatibility concerns, and allow for working tags support with no
performance impact and low risk to all HBase users who do not want tag
support, while still allowing for inline tags capabilities in a shipping
version of HBase, by introducing this in a new V3 version for HFile.

 The new V3 version for HFile differs from earlier versions by supporting
inline tag storage.  This version does not change the HFileBlock format
whereas it just serializes and deserializes the Tag information that would
be persisted in the HFile. Having HFile V3 would also help to keep Tags
optional such that the existing cases where there are no tags are totally
unaffected.  Also we ensure that we keep the changes outside of the V3
reader and writer minimal.  Compatibility would not be a problem with
future versions when we go with Cell Codecs.  What Codecs used for writing
the file will be persisted in the HFile header.  Now for files that are
either V2 or V3 we will instantiate two default codecs that know to deal
with serializations with and without tags.

 There have been thoughts on an HFile V3 prior, e.g.:

https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653

 We have been working on this and will have a clean patch with good amount
of testing in time for 0.96.

Although our focus is on performance-neutral persistence of inline cell
tags in 0.96 to enable a couple of security coprocessor users, introducing
an HFile V3 provides design freedom for some other features and problems
too that can be developed through the 0.96 cycle into 0.98.

Pls voice your opinion on this so that we can make this clear and may be
define the scope of the patch.  Also feel free to comment on HBASE-8496 on
your thoughts and ideas.

Regards

Ram

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by Ted Yu <yu...@gmail.com>.
bq. By default code will go with V2.

Good.

Looking forward to the patch.

On Thu, Jul 18, 2013 at 9:57 PM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> >>Any consideration that the tags are serialized before the memstoreTS
> instead of after ?
> The argument is basically simple like memstoreTS is optional and that comes
> only in HFile and not in KV.  The tags are as part of the current design
> comes after Value in the KV structure.  Hence the same would be better to
> be applied on HFiles also.
> >>When would PrefixTree be able to handle tags ?
> May be my stmt confused you.  Pls see the point on PrefixTreeEncoders in
> the previous mail.  I meant that as per the current design PrefixKey,
> DiffKey, FastDiff extend BufferedDataEncoders and hence
> BufferedDataEncoders are made tag aware.
>
> PrefixTreecodec has been handled separately to make it work with tags.
> >> Put in another way, after this feature goes in, would
> HFile V3 always be written ?
> By default code will go with V2. So when user says he needs V3 he would
> need to update the hfile.format.version to 3.  This would ensure that the
> system uses V3.
>
> Thanks Ted.
>
> Regards
> Ram
>
>
> On Fri, Jul 19, 2013 at 10:10 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > bq. V3 would now serailize the tags also after the Value part before the
> > memstoreTS
> >
> > Any consideration that the tags are serialized before the memstoreTS
> > instead of after ?
> >
> > bq. The BuffereddataEncoder, being the base class for all encoders other
> > than PrefixTree would now be tag aware.
> >
> > When would PrefixTree be able to handle tags ?
> >
> > When a new HFile is opened, would user be able to specify that there is
> no
> > tagging involved ? Put in another way, after this feature goes in, would
> > HFile V3 always be written ?
> >
> > Thanks
> >
> > On Thu, Jul 18, 2013 at 9:29 PM, ramkrishna vasudevan <
> > ramkrishna.s.vasudevan@gmail.com> wrote:
> >
> > > What changes/differences that we would be introducing in the V3 format
> > > would be (I will put down in words under subcategory)
> > >
> > > To reduce the code duplicate we would subclass ReaderV3 and WriterV3
> from
> > > ReaderV2 and WriterV2 respectively.
> > > *HFileBlockFormat*
> > > *=============*
> > > No change in V2 and V3.
> > >
> > > *KV serialization*
> > > *============*
> > > V2 no change
> > > V3 would now serailize the tags also after the Value part before the
> > > memstoreTS
> > >
> > > *FixedFileTrailer*
> > > *===========*
> > > Introduces a new information into the trailer which can be used in V3
> to
> > > make tags optional.  Suppose take the case that user selects V3 but in
> > one
> > > CF there are no tags.  Then we would write the tag bytes while flushing
> > but
> > > during compaction using this header info we would just avoid writing
> tags
> > > in the compacted files.  This would mean no impact on read performances
> > > after the compaction has been completed.
> > > V2 would code also tries to get this trailer info but this being null
> no
> > > impact on any of the existing code.
> > >
> > > *WriterV3 and ReaderV3*
> > > *=================*
> > > Tries to handle the tags based on the meta data from the trailer info.
> >  All
> > > the apis like seekTo, next(), getKeyValue() are now able to handle tags
> > > based on the flag passed during the construction of the Readers and
> > > Writers.  We can be sure that for any instances of V2 the includeTags
> > flag
> > > would always be false.
> > >
> > > *DataBlockEncoders*
> > > *==============*
> > > Additonal arguments added to the apis in the interfaces related to
> > > HFileDataBlockEncoders, BufferedDataBlockEncoders,
> > > HFileDataBlockEncodingContext etc.  Again for V2 the new apis would
> still
> > > behave the same way and there would be no impact for V2 based usecases.
> > > The BuffereddataEncoder, being the base class for all encoders other
> than
> > > PrefixTree would now be tag aware.
> > >
> > > *PrefixTreeEncoders*
> > > *==============*
> > > Trying to keep changes minimal here but would ensure that there are no
> > > behaviourial changes while using PrefixTree with V2.
> > >
> > > *KeyValue class*
> > > *===========*
> > > Wil include changes to have a Tag class inside this.  Apis to identify
> > tags
> > > in a KV would be needed.  Util method changes also would be there.
> > >
> > > For V2 based read/write flow the existing code path applies with
> > no/minimal
> > > changes.
> > >
> > > Many testcases has to be changed to accomodate the api changes
> happening
> > to
> > > the internal interfaces.
> > > I have listed down the changes at a high level, may be once you could
> > see a
> > > patch that would give more clarity. Let me know if further information
> > > would be needed.
> > >
> > > Regards
> > > Ram
> > >
> > >
> > > On Thu, Jul 18, 2013 at 11:25 PM, Jimmy Xiang <jx...@cloudera.com>
> > wrote:
> > >
> > > > Can you share some more details about it?  A graph/chart/table
> showing
> > > the
> > > > specific difference will be helpful.
> > > >
> > > > Thanks,
> > > > Jimmy
> > > >
> > > >
> > > > On Thu, Jul 18, 2013 at 10:23 AM, Ted Yu <yu...@gmail.com>
> wrote:
> > > >
> > > > > I have been following comments on HBASE-8496.
> > > > >
> > > > > I think introducing cell tagging through HFile v3 is acceptable.
> > > > >
> > > > > Looking forward to seeing your implementation.
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Thu, Jul 18, 2013 at 10:14 AM, ramkrishna vasudevan <
> > > > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > > >
> > > > > > For the past couple of months, we have been working through
> various
> > > > > > prototypes for supporting inline storage of tags in cells as
> > > persisted
> > > > on
> > > > > > disk. Our goals are to support optional use of tags with minimal
> > > > changes
> > > > > to
> > > > > > core code while also avoiding performance impacts to users who do
> > not
> > > > use
> > > > > > tags.
> > > > > >
> > > > > >  For background, refer to the comments in
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13708228&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13708228
> > > > > >
> > > > > > and
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653
> > > > > >
> > > > > >  We have iterated on a couple of prototypes that implement tag
> > > > awareness
> > > > > in
> > > > > > DataBlockEncoders, later as a new type of Codec for Cells. This
> > point
> > > > is
> > > > > > discussed in the above comments in HBASE-8496.
> > > > > >
> > > > > > We think that tag awareness in Cell Codecs is the right way, but
> > > there
> > > > > are
> > > > > > some shortcomings with the current interfaces internal to HFile
> > that
> > > > need
> > > > > > to addressed in order to avoid any performance impacts for those
> > who
> > > do
> > > > > not
> > > > > > want to use inline tags, and that may involve a drastic amount of
> > > code
> > > > > > change.
> > > > > >
> > > > > >  We can avoid several problems with HFile V2 internals, and
> > backwards
> > > > > > compatibility concerns, and allow for working tags support with
> no
> > > > > > performance impact and low risk to all HBase users who do not
> want
> > > tag
> > > > > > support, while still allowing for inline tags capabilities in a
> > > > shipping
> > > > > > version of HBase, by introducing this in a new V3 version for
> > HFile.
> > > > > >
> > > > > >  The new V3 version for HFile differs from earlier versions by
> > > > supporting
> > > > > > inline tag storage.  This version does not change the HFileBlock
> > > format
> > > > > > whereas it just serializes and deserializes the Tag information
> > that
> > > > > would
> > > > > > be persisted in the HFile. Having HFile V3 would also help to
> keep
> > > Tags
> > > > > > optional such that the existing cases where there are no tags are
> > > > totally
> > > > > > unaffected.  Also we ensure that we keep the changes outside of
> the
> > > V3
> > > > > > reader and writer minimal.  Compatibility would not be a problem
> > with
> > > > > > future versions when we go with Cell Codecs.  What Codecs used
> for
> > > > > writing
> > > > > > the file will be persisted in the HFile header.  Now for files
> that
> > > are
> > > > > > either V2 or V3 we will instantiate two default codecs that know
> to
> > > > deal
> > > > > > with serializations with and without tags.
> > > > > >
> > > > > >  There have been thoughts on an HFile V3 prior, e.g.:
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653
> > > > > >
> > > > > >  We have been working on this and will have a clean patch with
> good
> > > > > amount
> > > > > > of testing in time for 0.96.
> > > > > >
> > > > > > Although our focus is on performance-neutral persistence of
> inline
> > > cell
> > > > > > tags in 0.96 to enable a couple of security coprocessor users,
> > > > > introducing
> > > > > > an HFile V3 provides design freedom for some other features and
> > > > problems
> > > > > > too that can be developed through the 0.96 cycle into 0.98.
> > > > > >
> > > > > > Pls voice your opinion on this so that we can make this clear and
> > may
> > > > be
> > > > > > define the scope of the patch.  Also feel free to comment on
> > > HBASE-8496
> > > > > on
> > > > > > your thoughts and ideas.
> > > > > >
> > > > > > Regards
> > > > > >
> > > > > > Ram
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by ramkrishna vasudevan <ra...@gmail.com>.
>>Any consideration that the tags are serialized before the memstoreTS
instead of after ?
The argument is basically simple like memstoreTS is optional and that comes
only in HFile and not in KV.  The tags are as part of the current design
comes after Value in the KV structure.  Hence the same would be better to
be applied on HFiles also.
>>When would PrefixTree be able to handle tags ?
May be my stmt confused you.  Pls see the point on PrefixTreeEncoders in
the previous mail.  I meant that as per the current design PrefixKey,
DiffKey, FastDiff extend BufferedDataEncoders and hence
BufferedDataEncoders are made tag aware.

PrefixTreecodec has been handled separately to make it work with tags.
>> Put in another way, after this feature goes in, would
HFile V3 always be written ?
By default code will go with V2. So when user says he needs V3 he would
need to update the hfile.format.version to 3.  This would ensure that the
system uses V3.

Thanks Ted.

Regards
Ram


On Fri, Jul 19, 2013 at 10:10 AM, Ted Yu <yu...@gmail.com> wrote:

> bq. V3 would now serailize the tags also after the Value part before the
> memstoreTS
>
> Any consideration that the tags are serialized before the memstoreTS
> instead of after ?
>
> bq. The BuffereddataEncoder, being the base class for all encoders other
> than PrefixTree would now be tag aware.
>
> When would PrefixTree be able to handle tags ?
>
> When a new HFile is opened, would user be able to specify that there is no
> tagging involved ? Put in another way, after this feature goes in, would
> HFile V3 always be written ?
>
> Thanks
>
> On Thu, Jul 18, 2013 at 9:29 PM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
> > What changes/differences that we would be introducing in the V3 format
> > would be (I will put down in words under subcategory)
> >
> > To reduce the code duplicate we would subclass ReaderV3 and WriterV3 from
> > ReaderV2 and WriterV2 respectively.
> > *HFileBlockFormat*
> > *=============*
> > No change in V2 and V3.
> >
> > *KV serialization*
> > *============*
> > V2 no change
> > V3 would now serailize the tags also after the Value part before the
> > memstoreTS
> >
> > *FixedFileTrailer*
> > *===========*
> > Introduces a new information into the trailer which can be used in V3 to
> > make tags optional.  Suppose take the case that user selects V3 but in
> one
> > CF there are no tags.  Then we would write the tag bytes while flushing
> but
> > during compaction using this header info we would just avoid writing tags
> > in the compacted files.  This would mean no impact on read performances
> > after the compaction has been completed.
> > V2 would code also tries to get this trailer info but this being null no
> > impact on any of the existing code.
> >
> > *WriterV3 and ReaderV3*
> > *=================*
> > Tries to handle the tags based on the meta data from the trailer info.
>  All
> > the apis like seekTo, next(), getKeyValue() are now able to handle tags
> > based on the flag passed during the construction of the Readers and
> > Writers.  We can be sure that for any instances of V2 the includeTags
> flag
> > would always be false.
> >
> > *DataBlockEncoders*
> > *==============*
> > Additonal arguments added to the apis in the interfaces related to
> > HFileDataBlockEncoders, BufferedDataBlockEncoders,
> > HFileDataBlockEncodingContext etc.  Again for V2 the new apis would still
> > behave the same way and there would be no impact for V2 based usecases.
> > The BuffereddataEncoder, being the base class for all encoders other than
> > PrefixTree would now be tag aware.
> >
> > *PrefixTreeEncoders*
> > *==============*
> > Trying to keep changes minimal here but would ensure that there are no
> > behaviourial changes while using PrefixTree with V2.
> >
> > *KeyValue class*
> > *===========*
> > Wil include changes to have a Tag class inside this.  Apis to identify
> tags
> > in a KV would be needed.  Util method changes also would be there.
> >
> > For V2 based read/write flow the existing code path applies with
> no/minimal
> > changes.
> >
> > Many testcases has to be changed to accomodate the api changes happening
> to
> > the internal interfaces.
> > I have listed down the changes at a high level, may be once you could
> see a
> > patch that would give more clarity. Let me know if further information
> > would be needed.
> >
> > Regards
> > Ram
> >
> >
> > On Thu, Jul 18, 2013 at 11:25 PM, Jimmy Xiang <jx...@cloudera.com>
> wrote:
> >
> > > Can you share some more details about it?  A graph/chart/table showing
> > the
> > > specific difference will be helpful.
> > >
> > > Thanks,
> > > Jimmy
> > >
> > >
> > > On Thu, Jul 18, 2013 at 10:23 AM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > I have been following comments on HBASE-8496.
> > > >
> > > > I think introducing cell tagging through HFile v3 is acceptable.
> > > >
> > > > Looking forward to seeing your implementation.
> > > >
> > > > Cheers
> > > >
> > > > On Thu, Jul 18, 2013 at 10:14 AM, ramkrishna vasudevan <
> > > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > >
> > > > > For the past couple of months, we have been working through various
> > > > > prototypes for supporting inline storage of tags in cells as
> > persisted
> > > on
> > > > > disk. Our goals are to support optional use of tags with minimal
> > > changes
> > > > to
> > > > > core code while also avoiding performance impacts to users who do
> not
> > > use
> > > > > tags.
> > > > >
> > > > >  For background, refer to the comments in
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13708228&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13708228
> > > > >
> > > > > and
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653
> > > > >
> > > > >  We have iterated on a couple of prototypes that implement tag
> > > awareness
> > > > in
> > > > > DataBlockEncoders, later as a new type of Codec for Cells. This
> point
> > > is
> > > > > discussed in the above comments in HBASE-8496.
> > > > >
> > > > > We think that tag awareness in Cell Codecs is the right way, but
> > there
> > > > are
> > > > > some shortcomings with the current interfaces internal to HFile
> that
> > > need
> > > > > to addressed in order to avoid any performance impacts for those
> who
> > do
> > > > not
> > > > > want to use inline tags, and that may involve a drastic amount of
> > code
> > > > > change.
> > > > >
> > > > >  We can avoid several problems with HFile V2 internals, and
> backwards
> > > > > compatibility concerns, and allow for working tags support with no
> > > > > performance impact and low risk to all HBase users who do not want
> > tag
> > > > > support, while still allowing for inline tags capabilities in a
> > > shipping
> > > > > version of HBase, by introducing this in a new V3 version for
> HFile.
> > > > >
> > > > >  The new V3 version for HFile differs from earlier versions by
> > > supporting
> > > > > inline tag storage.  This version does not change the HFileBlock
> > format
> > > > > whereas it just serializes and deserializes the Tag information
> that
> > > > would
> > > > > be persisted in the HFile. Having HFile V3 would also help to keep
> > Tags
> > > > > optional such that the existing cases where there are no tags are
> > > totally
> > > > > unaffected.  Also we ensure that we keep the changes outside of the
> > V3
> > > > > reader and writer minimal.  Compatibility would not be a problem
> with
> > > > > future versions when we go with Cell Codecs.  What Codecs used for
> > > > writing
> > > > > the file will be persisted in the HFile header.  Now for files that
> > are
> > > > > either V2 or V3 we will instantiate two default codecs that know to
> > > deal
> > > > > with serializations with and without tags.
> > > > >
> > > > >  There have been thoughts on an HFile V3 prior, e.g.:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653
> > > > >
> > > > >  We have been working on this and will have a clean patch with good
> > > > amount
> > > > > of testing in time for 0.96.
> > > > >
> > > > > Although our focus is on performance-neutral persistence of inline
> > cell
> > > > > tags in 0.96 to enable a couple of security coprocessor users,
> > > > introducing
> > > > > an HFile V3 provides design freedom for some other features and
> > > problems
> > > > > too that can be developed through the 0.96 cycle into 0.98.
> > > > >
> > > > > Pls voice your opinion on this so that we can make this clear and
> may
> > > be
> > > > > define the scope of the patch.  Also feel free to comment on
> > HBASE-8496
> > > > on
> > > > > your thoughts and ideas.
> > > > >
> > > > > Regards
> > > > >
> > > > > Ram
> > > > >
> > > >
> > >
> >
>

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by Ted Yu <yu...@gmail.com>.
bq. V3 would now serailize the tags also after the Value part before the
memstoreTS

Any consideration that the tags are serialized before the memstoreTS
instead of after ?

bq. The BuffereddataEncoder, being the base class for all encoders other
than PrefixTree would now be tag aware.

When would PrefixTree be able to handle tags ?

When a new HFile is opened, would user be able to specify that there is no
tagging involved ? Put in another way, after this feature goes in, would
HFile V3 always be written ?

Thanks

On Thu, Jul 18, 2013 at 9:29 PM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> What changes/differences that we would be introducing in the V3 format
> would be (I will put down in words under subcategory)
>
> To reduce the code duplicate we would subclass ReaderV3 and WriterV3 from
> ReaderV2 and WriterV2 respectively.
> *HFileBlockFormat*
> *=============*
> No change in V2 and V3.
>
> *KV serialization*
> *============*
> V2 no change
> V3 would now serailize the tags also after the Value part before the
> memstoreTS
>
> *FixedFileTrailer*
> *===========*
> Introduces a new information into the trailer which can be used in V3 to
> make tags optional.  Suppose take the case that user selects V3 but in one
> CF there are no tags.  Then we would write the tag bytes while flushing but
> during compaction using this header info we would just avoid writing tags
> in the compacted files.  This would mean no impact on read performances
> after the compaction has been completed.
> V2 would code also tries to get this trailer info but this being null no
> impact on any of the existing code.
>
> *WriterV3 and ReaderV3*
> *=================*
> Tries to handle the tags based on the meta data from the trailer info.  All
> the apis like seekTo, next(), getKeyValue() are now able to handle tags
> based on the flag passed during the construction of the Readers and
> Writers.  We can be sure that for any instances of V2 the includeTags flag
> would always be false.
>
> *DataBlockEncoders*
> *==============*
> Additonal arguments added to the apis in the interfaces related to
> HFileDataBlockEncoders, BufferedDataBlockEncoders,
> HFileDataBlockEncodingContext etc.  Again for V2 the new apis would still
> behave the same way and there would be no impact for V2 based usecases.
> The BuffereddataEncoder, being the base class for all encoders other than
> PrefixTree would now be tag aware.
>
> *PrefixTreeEncoders*
> *==============*
> Trying to keep changes minimal here but would ensure that there are no
> behaviourial changes while using PrefixTree with V2.
>
> *KeyValue class*
> *===========*
> Wil include changes to have a Tag class inside this.  Apis to identify tags
> in a KV would be needed.  Util method changes also would be there.
>
> For V2 based read/write flow the existing code path applies with no/minimal
> changes.
>
> Many testcases has to be changed to accomodate the api changes happening to
> the internal interfaces.
> I have listed down the changes at a high level, may be once you could see a
> patch that would give more clarity. Let me know if further information
> would be needed.
>
> Regards
> Ram
>
>
> On Thu, Jul 18, 2013 at 11:25 PM, Jimmy Xiang <jx...@cloudera.com> wrote:
>
> > Can you share some more details about it?  A graph/chart/table showing
> the
> > specific difference will be helpful.
> >
> > Thanks,
> > Jimmy
> >
> >
> > On Thu, Jul 18, 2013 at 10:23 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > I have been following comments on HBASE-8496.
> > >
> > > I think introducing cell tagging through HFile v3 is acceptable.
> > >
> > > Looking forward to seeing your implementation.
> > >
> > > Cheers
> > >
> > > On Thu, Jul 18, 2013 at 10:14 AM, ramkrishna vasudevan <
> > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > >
> > > > For the past couple of months, we have been working through various
> > > > prototypes for supporting inline storage of tags in cells as
> persisted
> > on
> > > > disk. Our goals are to support optional use of tags with minimal
> > changes
> > > to
> > > > core code while also avoiding performance impacts to users who do not
> > use
> > > > tags.
> > > >
> > > >  For background, refer to the comments in
> > > >
> > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13708228&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13708228
> > > >
> > > > and
> > > >
> > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653
> > > >
> > > >  We have iterated on a couple of prototypes that implement tag
> > awareness
> > > in
> > > > DataBlockEncoders, later as a new type of Codec for Cells. This point
> > is
> > > > discussed in the above comments in HBASE-8496.
> > > >
> > > > We think that tag awareness in Cell Codecs is the right way, but
> there
> > > are
> > > > some shortcomings with the current interfaces internal to HFile that
> > need
> > > > to addressed in order to avoid any performance impacts for those who
> do
> > > not
> > > > want to use inline tags, and that may involve a drastic amount of
> code
> > > > change.
> > > >
> > > >  We can avoid several problems with HFile V2 internals, and backwards
> > > > compatibility concerns, and allow for working tags support with no
> > > > performance impact and low risk to all HBase users who do not want
> tag
> > > > support, while still allowing for inline tags capabilities in a
> > shipping
> > > > version of HBase, by introducing this in a new V3 version for HFile.
> > > >
> > > >  The new V3 version for HFile differs from earlier versions by
> > supporting
> > > > inline tag storage.  This version does not change the HFileBlock
> format
> > > > whereas it just serializes and deserializes the Tag information that
> > > would
> > > > be persisted in the HFile. Having HFile V3 would also help to keep
> Tags
> > > > optional such that the existing cases where there are no tags are
> > totally
> > > > unaffected.  Also we ensure that we keep the changes outside of the
> V3
> > > > reader and writer minimal.  Compatibility would not be a problem with
> > > > future versions when we go with Cell Codecs.  What Codecs used for
> > > writing
> > > > the file will be persisted in the HFile header.  Now for files that
> are
> > > > either V2 or V3 we will instantiate two default codecs that know to
> > deal
> > > > with serializations with and without tags.
> > > >
> > > >  There have been thoughts on an HFile V3 prior, e.g.:
> > > >
> > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653
> > > >
> > > >  We have been working on this and will have a clean patch with good
> > > amount
> > > > of testing in time for 0.96.
> > > >
> > > > Although our focus is on performance-neutral persistence of inline
> cell
> > > > tags in 0.96 to enable a couple of security coprocessor users,
> > > introducing
> > > > an HFile V3 provides design freedom for some other features and
> > problems
> > > > too that can be developed through the 0.96 cycle into 0.98.
> > > >
> > > > Pls voice your opinion on this so that we can make this clear and may
> > be
> > > > define the scope of the patch.  Also feel free to comment on
> HBASE-8496
> > > on
> > > > your thoughts and ideas.
> > > >
> > > > Regards
> > > >
> > > > Ram
> > > >
> > >
> >
>

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by ramkrishna vasudevan <ra...@gmail.com>.
What changes/differences that we would be introducing in the V3 format
would be (I will put down in words under subcategory)

To reduce the code duplicate we would subclass ReaderV3 and WriterV3 from
ReaderV2 and WriterV2 respectively.
*HFileBlockFormat*
*=============*
No change in V2 and V3.

*KV serialization*
*============*
V2 no change
V3 would now serailize the tags also after the Value part before the
memstoreTS

*FixedFileTrailer*
*===========*
Introduces a new information into the trailer which can be used in V3 to
make tags optional.  Suppose take the case that user selects V3 but in one
CF there are no tags.  Then we would write the tag bytes while flushing but
during compaction using this header info we would just avoid writing tags
in the compacted files.  This would mean no impact on read performances
after the compaction has been completed.
V2 would code also tries to get this trailer info but this being null no
impact on any of the existing code.

*WriterV3 and ReaderV3*
*=================*
Tries to handle the tags based on the meta data from the trailer info.  All
the apis like seekTo, next(), getKeyValue() are now able to handle tags
based on the flag passed during the construction of the Readers and
Writers.  We can be sure that for any instances of V2 the includeTags flag
would always be false.

*DataBlockEncoders*
*==============*
Additonal arguments added to the apis in the interfaces related to
HFileDataBlockEncoders, BufferedDataBlockEncoders,
HFileDataBlockEncodingContext etc.  Again for V2 the new apis would still
behave the same way and there would be no impact for V2 based usecases.
The BuffereddataEncoder, being the base class for all encoders other than
PrefixTree would now be tag aware.

*PrefixTreeEncoders*
*==============*
Trying to keep changes minimal here but would ensure that there are no
behaviourial changes while using PrefixTree with V2.

*KeyValue class*
*===========*
Wil include changes to have a Tag class inside this.  Apis to identify tags
in a KV would be needed.  Util method changes also would be there.

For V2 based read/write flow the existing code path applies with no/minimal
changes.

Many testcases has to be changed to accomodate the api changes happening to
the internal interfaces.
I have listed down the changes at a high level, may be once you could see a
patch that would give more clarity. Let me know if further information
would be needed.

Regards
Ram


On Thu, Jul 18, 2013 at 11:25 PM, Jimmy Xiang <jx...@cloudera.com> wrote:

> Can you share some more details about it?  A graph/chart/table showing the
> specific difference will be helpful.
>
> Thanks,
> Jimmy
>
>
> On Thu, Jul 18, 2013 at 10:23 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > I have been following comments on HBASE-8496.
> >
> > I think introducing cell tagging through HFile v3 is acceptable.
> >
> > Looking forward to seeing your implementation.
> >
> > Cheers
> >
> > On Thu, Jul 18, 2013 at 10:14 AM, ramkrishna vasudevan <
> > ramkrishna.s.vasudevan@gmail.com> wrote:
> >
> > > For the past couple of months, we have been working through various
> > > prototypes for supporting inline storage of tags in cells as persisted
> on
> > > disk. Our goals are to support optional use of tags with minimal
> changes
> > to
> > > core code while also avoiding performance impacts to users who do not
> use
> > > tags.
> > >
> > >  For background, refer to the comments in
> > >
> > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13708228&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13708228
> > >
> > > and
> > >
> > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653
> > >
> > >  We have iterated on a couple of prototypes that implement tag
> awareness
> > in
> > > DataBlockEncoders, later as a new type of Codec for Cells. This point
> is
> > > discussed in the above comments in HBASE-8496.
> > >
> > > We think that tag awareness in Cell Codecs is the right way, but there
> > are
> > > some shortcomings with the current interfaces internal to HFile that
> need
> > > to addressed in order to avoid any performance impacts for those who do
> > not
> > > want to use inline tags, and that may involve a drastic amount of code
> > > change.
> > >
> > >  We can avoid several problems with HFile V2 internals, and backwards
> > > compatibility concerns, and allow for working tags support with no
> > > performance impact and low risk to all HBase users who do not want tag
> > > support, while still allowing for inline tags capabilities in a
> shipping
> > > version of HBase, by introducing this in a new V3 version for HFile.
> > >
> > >  The new V3 version for HFile differs from earlier versions by
> supporting
> > > inline tag storage.  This version does not change the HFileBlock format
> > > whereas it just serializes and deserializes the Tag information that
> > would
> > > be persisted in the HFile. Having HFile V3 would also help to keep Tags
> > > optional such that the existing cases where there are no tags are
> totally
> > > unaffected.  Also we ensure that we keep the changes outside of the V3
> > > reader and writer minimal.  Compatibility would not be a problem with
> > > future versions when we go with Cell Codecs.  What Codecs used for
> > writing
> > > the file will be persisted in the HFile header.  Now for files that are
> > > either V2 or V3 we will instantiate two default codecs that know to
> deal
> > > with serializations with and without tags.
> > >
> > >  There have been thoughts on an HFile V3 prior, e.g.:
> > >
> > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653
> > >
> > >  We have been working on this and will have a clean patch with good
> > amount
> > > of testing in time for 0.96.
> > >
> > > Although our focus is on performance-neutral persistence of inline cell
> > > tags in 0.96 to enable a couple of security coprocessor users,
> > introducing
> > > an HFile V3 provides design freedom for some other features and
> problems
> > > too that can be developed through the 0.96 cycle into 0.98.
> > >
> > > Pls voice your opinion on this so that we can make this clear and may
> be
> > > define the scope of the patch.  Also feel free to comment on HBASE-8496
> > on
> > > your thoughts and ideas.
> > >
> > > Regards
> > >
> > > Ram
> > >
> >
>

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by Jimmy Xiang <jx...@cloudera.com>.
Can you share some more details about it?  A graph/chart/table showing the
specific difference will be helpful.

Thanks,
Jimmy


On Thu, Jul 18, 2013 at 10:23 AM, Ted Yu <yu...@gmail.com> wrote:

> I have been following comments on HBASE-8496.
>
> I think introducing cell tagging through HFile v3 is acceptable.
>
> Looking forward to seeing your implementation.
>
> Cheers
>
> On Thu, Jul 18, 2013 at 10:14 AM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
> > For the past couple of months, we have been working through various
> > prototypes for supporting inline storage of tags in cells as persisted on
> > disk. Our goals are to support optional use of tags with minimal changes
> to
> > core code while also avoiding performance impacts to users who do not use
> > tags.
> >
> >  For background, refer to the comments in
> >
> >
> >
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13708228&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13708228
> >
> > and
> >
> >
> >
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653
> >
> >  We have iterated on a couple of prototypes that implement tag awareness
> in
> > DataBlockEncoders, later as a new type of Codec for Cells. This point is
> > discussed in the above comments in HBASE-8496.
> >
> > We think that tag awareness in Cell Codecs is the right way, but there
> are
> > some shortcomings with the current interfaces internal to HFile that need
> > to addressed in order to avoid any performance impacts for those who do
> not
> > want to use inline tags, and that may involve a drastic amount of code
> > change.
> >
> >  We can avoid several problems with HFile V2 internals, and backwards
> > compatibility concerns, and allow for working tags support with no
> > performance impact and low risk to all HBase users who do not want tag
> > support, while still allowing for inline tags capabilities in a shipping
> > version of HBase, by introducing this in a new V3 version for HFile.
> >
> >  The new V3 version for HFile differs from earlier versions by supporting
> > inline tag storage.  This version does not change the HFileBlock format
> > whereas it just serializes and deserializes the Tag information that
> would
> > be persisted in the HFile. Having HFile V3 would also help to keep Tags
> > optional such that the existing cases where there are no tags are totally
> > unaffected.  Also we ensure that we keep the changes outside of the V3
> > reader and writer minimal.  Compatibility would not be a problem with
> > future versions when we go with Cell Codecs.  What Codecs used for
> writing
> > the file will be persisted in the HFile header.  Now for files that are
> > either V2 or V3 we will instantiate two default codecs that know to deal
> > with serializations with and without tags.
> >
> >  There have been thoughts on an HFile V3 prior, e.g.:
> >
> >
> >
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653
> >
> >  We have been working on this and will have a clean patch with good
> amount
> > of testing in time for 0.96.
> >
> > Although our focus is on performance-neutral persistence of inline cell
> > tags in 0.96 to enable a couple of security coprocessor users,
> introducing
> > an HFile V3 provides design freedom for some other features and problems
> > too that can be developed through the 0.96 cycle into 0.98.
> >
> > Pls voice your opinion on this so that we can make this clear and may be
> > define the scope of the patch.  Also feel free to comment on HBASE-8496
> on
> > your thoughts and ideas.
> >
> > Regards
> >
> > Ram
> >
>

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by Ted Yu <yu...@gmail.com>.
I have been following comments on HBASE-8496.

I think introducing cell tagging through HFile v3 is acceptable.

Looking forward to seeing your implementation.

Cheers

On Thu, Jul 18, 2013 at 10:14 AM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> For the past couple of months, we have been working through various
> prototypes for supporting inline storage of tags in cells as persisted on
> disk. Our goals are to support optional use of tags with minimal changes to
> core code while also avoiding performance impacts to users who do not use
> tags.
>
>  For background, refer to the comments in
>
>
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13708228&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13708228
>
> and
>
>
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653
>
>  We have iterated on a couple of prototypes that implement tag awareness in
> DataBlockEncoders, later as a new type of Codec for Cells. This point is
> discussed in the above comments in HBASE-8496.
>
> We think that tag awareness in Cell Codecs is the right way, but there are
> some shortcomings with the current interfaces internal to HFile that need
> to addressed in order to avoid any performance impacts for those who do not
> want to use inline tags, and that may involve a drastic amount of code
> change.
>
>  We can avoid several problems with HFile V2 internals, and backwards
> compatibility concerns, and allow for working tags support with no
> performance impact and low risk to all HBase users who do not want tag
> support, while still allowing for inline tags capabilities in a shipping
> version of HBase, by introducing this in a new V3 version for HFile.
>
>  The new V3 version for HFile differs from earlier versions by supporting
> inline tag storage.  This version does not change the HFileBlock format
> whereas it just serializes and deserializes the Tag information that would
> be persisted in the HFile. Having HFile V3 would also help to keep Tags
> optional such that the existing cases where there are no tags are totally
> unaffected.  Also we ensure that we keep the changes outside of the V3
> reader and writer minimal.  Compatibility would not be a problem with
> future versions when we go with Cell Codecs.  What Codecs used for writing
> the file will be persisted in the HFile header.  Now for files that are
> either V2 or V3 we will instantiate two default codecs that know to deal
> with serializations with and without tags.
>
>  There have been thoughts on an HFile V3 prior, e.g.:
>
>
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653
>
>  We have been working on this and will have a clean patch with good amount
> of testing in time for 0.96.
>
> Although our focus is on performance-neutral persistence of inline cell
> tags in 0.96 to enable a couple of security coprocessor users, introducing
> an HFile V3 provides design freedom for some other features and problems
> too that can be developed through the 0.96 cycle into 0.98.
>
> Pls voice your opinion on this so that we can make this clear and may be
> define the scope of the patch.  Also feel free to comment on HBASE-8496 on
> your thoughts and ideas.
>
> Regards
>
> Ram
>

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by Ted Yu <yu...@gmail.com>.
I was reading Owen's presentation at Hadoop Summit on ORC.

Slide #14 describes how codecs are used for generic compression.

I think we can adopt some of their ideas in HFile v3.

Cheers

On Fri, Jul 19, 2013 at 9:48 AM, Andrew Purtell <ap...@apache.org> wrote:

> On Fri, Jul 19, 2013 at 4:23 AM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
> > If tags are activated but empty, is it going to be the
> > same thing? Or are we going to have all the tags overhead? Like can we
> have
> > a byte to say "no tags in that file" in addition to "tags are activated
> for
> > that file"?
> >
>
> This reminds me of an interesting discussion we had. So like with
> memstoreTS, if we determine that no cells in a file have tags (or
> timestamps) then we can flag that in file metadata and turn off any related
> persistence when writing out the data blocks. With millions of KVs in a
> file that can achieve substantial space savings. Having a new file format
> on the table also opens up possibilities like block headers: an N-byte
> structure (where N is something like 4 or 8 bytes maybe) at the start of
> each block that describes the encoding strategy taken for the block:
> whether tags are present or not, if we used FAST_DIFF, or some new packing
> together of related values (we put the keys up front with one or two byte
> pointers into the block where their values are, de-dup values in the latter
> part of the block), or a dictionary scheme (and with which dictionary in
> what meta block) etc. We might borrow ideas from Parquet or ORC. We can
> stop serializing HFile blocks as individual cells into streams and look at
> them as a group of cells to write into a bytebuffer, providing a lot more
> freedom for efficiently structuring the internal details of the block. Let
> me make sure this point makes it out into the public discussion, to
> highlight the additional benefit of having an experimental file format
> available in the 0.96 cycle - it's a place where we and users can go off on
> new directions far beyond inline tags. Of course such changes in unreleased
> trunk code could make that possible too, but what I have observed is
> "professional" HBase devs are much more likely to look at trunk than a
> user. Users really want to work on and contribute a patch for what they are
> running in production. Consider recent contributions from Yahoo and Taobao
> as an example of what I mean. The bar for putting something into V2 is
> extremely high as it should be on account of how performance critical that
> code is. I'm not suggesting less rigor for V3, what I am suggesting is V3
> can provide design freedom by going in different directions than the legacy
> V2 code.
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by Andrew Purtell <ap...@apache.org>.
On Fri, Jul 19, 2013 at 4:23 AM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> If tags are activated but empty, is it going to be the
> same thing? Or are we going to have all the tags overhead? Like can we have
> a byte to say "no tags in that file" in addition to "tags are activated for
> that file"?
>

This reminds me of an interesting discussion we had. So like with
memstoreTS, if we determine that no cells in a file have tags (or
timestamps) then we can flag that in file metadata and turn off any related
persistence when writing out the data blocks. With millions of KVs in a
file that can achieve substantial space savings. Having a new file format
on the table also opens up possibilities like block headers: an N-byte
structure (where N is something like 4 or 8 bytes maybe) at the start of
each block that describes the encoding strategy taken for the block:
whether tags are present or not, if we used FAST_DIFF, or some new packing
together of related values (we put the keys up front with one or two byte
pointers into the block where their values are, de-dup values in the latter
part of the block), or a dictionary scheme (and with which dictionary in
what meta block) etc. We might borrow ideas from Parquet or ORC. We can
stop serializing HFile blocks as individual cells into streams and look at
them as a group of cells to write into a bytebuffer, providing a lot more
freedom for efficiently structuring the internal details of the block. Let
me make sure this point makes it out into the public discussion, to
highlight the additional benefit of having an experimental file format
available in the 0.96 cycle - it's a place where we and users can go off on
new directions far beyond inline tags. Of course such changes in unreleased
trunk code could make that possible too, but what I have observed is
"professional" HBase devs are much more likely to look at trunk than a
user. Users really want to work on and contribute a patch for what they are
running in production. Consider recent contributions from Yahoo and Taobao
as an example of what I mean. The bar for putting something into V2 is
extremely high as it should be on account of how performance critical that
code is. I'm not suggesting less rigor for V3, what I am suggesting is V3
can provide design freedom by going in different directions than the legacy
V2 code.

-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by ramkrishna vasudevan <ra...@gmail.com>.
>>What sort of changes do you foresee necessary in core to support
cellcodecs?  Between rpc and hfilev3?

Agree that codec provides the maximum flexibility to define how the cells
are serailized/deserialized be it be the RPC or WAL.

The RPC or the WAL codec just deals with the KVs in a sequential way.
 Every KV that comes in is just persisted and read back in the same way.
Also in the RPC and WAL the mvcc(memstoreTS) does not have any significance.

When we come into the HFile layer we end up having seekable reads and it is
not always sequential.  Also the HFile reader/writer and the block encoder
are tightly coupled with the KeyValue structure.
The datablock encoder have a seeker inside them.  We need something similar
to that in side the HFile codecs.

Also the current way of how the HFile datablock works

Current write path
==================
HFileWriter->append(kv) > form Hfileblock byte buffer>Encoders read the
bytebuffer-> Encode(based on algo) per kv into new bytebuffer-> The new
bytebuffer is persisted.

The read path creates an EncodedScanner for encoded buffers or a plain
Scanner for no encoding.
Every algo has its own Seeker interfaces.
Introduction of Codec in this read path would need usage of a Decoder which
should be seekable.  Also we will need different Decoder impl for each of
the algo.

Now introducing codec in the current write path would mean that

HFileWriter->Codec.encode(kv) -> form hfileblock byte buffer > Use
codec.decoder to read the bytebuffer -> Encode(based on algo) per kv into
new bytebuffer -> The new byte buffer is persisted.

with the current seeker interfaces it is difficult to perform rewind(),
next(), seekToKeyInBlock() with codec (on current code).


To support the changes that may be needed to support cell codecs in HFile
we basically need to change/modify the existing interfaces.
If the current api for Encoding/Decoding of the codecs are seen it
basically has a advance() and current().  If we do not want to modify the
basic Decoder interface then we would be creating a
HFileEncoder/HFileDecoder
And all the seeker related apis would go into them.  Based on every codec
that we write we would need to ensure that the HFileEncoder/Decoder has its
own
algo of the datablockEncoders like KVcodecFastdiff, KVCodecDiffKey etc.

So when ever the codec changes we need to create our own algos that knows
how to deal with the codec way of decoding.
This also means that when NONE encoding algo is used, the codec should
create a seeker that would help to deal with plain byte buffers.
This would mean that the current BufferedDataEncoder would not directly
help us as they are tightly coupled with KV strucutre.

*Thanks to Anoop for his advice on the above design when i was stuck with
the way how to proceed with codecs.*

Now if we think we don't change any of the interfaces but just introduce V3
that would work with Codecs, then to illustrate the problem, how to
implement
next(), blockSeek(), readKeyValueLen() in ScannerV3 (just taking an eg)

We would just be doing
{code}
 private final void readKeyValueLen() {
+      try {
+        // TODO : Can we specify the max value here
+        decoder.mark(Integer.MAX_VALUE);
+        decoder.advance();
+        currKV = decoder.current();
+        decoder.reset();
+      } catch (IOException e) {
+        throw new RuntimeException("Error while reading the keyvalue len",
e);
+      }
+    }
{code}

But here the thing is for doing seekBefore() and seekTO() we would need to
everytime read a full KV using the decoder and then take the necessary
action
whether to go back or move to the next one.  Basically this will be a
costly operation considering the criticality of the read path.

Hence we would need some seekers as mentioned above who will work hand in
hand with the codec that is specified.  Hence the above changes would mean
that
changing HFileV3 with codec would affect major portion of the existing code
and hence we avoid it from doing for now.

PrefixTreecodec has it own version of Reversible scanner that knows to deal
with the prefixtree structure.  We may need something similar for HFiles
with some seekable()/rewindable() type of apis.  And these seekable()
things should be customised per codec.

Also we should also think on introducing codecs that would be easier to
work with Cells.  That is another area that would need to be checked before
we finalise the apis for HFile related codecs.  Basically Cells do not deal
with keys as we do now in seekTo(), seekBefore().  This we can have in
another discussion.

Regards
Ram








On Sat, Jul 20, 2013 at 5:01 AM, Stack <st...@duboce.net> wrote:

> On Fri, Jul 19, 2013 at 3:34 PM, Andrew Purtell <ap...@apache.org>
> wrote:
>
> > On Fri, Jul 19, 2013 at 2:02 PM, Elliott Clark <ec...@apache.org>
> wrote:
> >
> > > That should mean that it's possible to make this change in a later
> > version
> > > without holding up 0.96.
> > >
> >
> > Will this hold up 0.96? What would you suggest we do, besides making a
> > patch available for review along with test results, which is in the
> works.
> > As you say this work can co-exist perfectly well with all code in 0.96
> > without interference with it. (Although it is reasonable to be skeptical
> of
> > my claims until viewing a patch.)
> >
> >
> 0.96 is way too late already.  I am shooting for end of July as cut-off
> point when I intend rolling a 0.95.2 RC.  I do not want to take on new
> features after 0.95.2.
>
> What I would suggest you fellows do is given you have enough votes and
> reviewers between you, then I would drive hard at getting the needed core
> and any api breaking changes committed over the next week or two so you
> have these in before the gate comes down.  Perhaps the new file format will
> make it in in time but I do not think we should hold up 0.96 till it is
> baked; it can come in post 0.96 especially if it is all additions and not
> core amendments.
>
> Is there a design we can be checking out meantime?
>
> St.Ack
>

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by ramkrishna vasudevan <ra...@gmail.com>.
Uploaded a patch for feedback in HBASE-8496.  Pls feel to provide your
comments/suggestions to take this discussion forward.
We are running the Integration suites with Tags from our end.


Regards
Ram


On Wed, Jul 24, 2013 at 11:03 PM, Andrew Purtell <ap...@apache.org>wrote:

> On Tue, Jul 23, 2013 at 3:43 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
> > Any idea when this V3 will be in for testing?
> >
>
> We are testing the patch internally and it will be up on a JIRA this week.
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by Andrew Purtell <ap...@apache.org>.
On Tue, Jul 23, 2013 at 3:43 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Any idea when this V3 will be in for testing?
>

We are testing the patch internally and it will be up on a JIRA this week.


-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
All of this looks promising and I agree that we should not add overhead to
users who don't need thos specific features.

End of July is coming quickly... Any idea when this V3 will be in for
testing?

JMS
Le 2013-07-22 13:24, "Andrew Purtell" <an...@gmail.com> a écrit :

> [ Reposting from a different account - looks like infra borked LDAP
> somehow. ]
>
> On Fri, Jul 19, 2013 at 4:31 PM, Stack <st...@duboce.net> wrote:
>
> > 0.96 is way too late already.  I am shooting for end of July as cut-off
> > point when I intend rolling a 0.95.2 RC.
> >
>
> We are shooting for a HFileV3 in time for inclusion in this RC.
>
> > What I would suggest you fellows do is given you have enough votes
> > and reviewers between you, then I would drive hard at getting the
> > needed core and any api breaking changes committed over the next
> > week or two so you have these in before the gate comes down.
>
> We were planning to do it this way but now until it became clear that 0.96
> WILL GO OUT at the end of this month. :-) We do not want to change APIs or
> make invasive changes in 0.95/0.96 now because it is so close to going out.
>
> We would be unhappy with a release of 0.96 if we would be unable to make
> progress with security use cases in a shipping 0.96. I'm not sure how long
> we would have to wait for 0.98. At the same time we don't want to make a
> user pay a price for security if they don't want it, that is what has been
> occupying us on this work for the past couple of months. We don't want to
> hold up the 0.96 release either and see HFileV3 as a way everyone wins.
>
> So we refactored the most recent prototype of tag persistence in file into
> HFileV3 in a way that minimizes changes to core code. We're only changing
> the data block encoder base class, because the design intent of that
> package is to be shared among all file formats. No changes to anything on
> the client or RPC. These changes are being looked at by three committers
> right now.
>
> We abandoned further work on Cell and CellCodec etc. until 0.98, agree, the
> clock has run out there. Plumbing tags through to the client will be part
> of that. For the use cases we have on deck for tags, only server side
> support is needed and clients can set operation attributes to get them to
> the server for now.
>
> On Fri, Jul 19, 2013 at 4:31 PM, Stack <st...@duboce.net> wrote:
>
> > On Fri, Jul 19, 2013 at 3:34 PM, Andrew Purtell <ap...@apache.org>
> > wrote:
> >
> > > On Fri, Jul 19, 2013 at 2:02 PM, Elliott Clark <ec...@apache.org>
> > wrote:
> > >
> > > > That should mean that it's possible to make this change in a later
> > > version
> > > > without holding up 0.96.
> > > >
> > >
> > > Will this hold up 0.96? What would you suggest we do, besides making a
> > > patch available for review along with test results, which is in the
> > works.
> > > As you say this work can co-exist perfectly well with all code in 0.96
> > > without interference with it. (Although it is reasonable to be
> skeptical
> > of
> > > my claims until viewing a patch.)
> > >
> > >
> > 0.96 is way too late already.  I am shooting for end of July as cut-off
> > point when I intend rolling a 0.95.2 RC.  I do not want to take on new
> > features after 0.95.2.
> >
> > What I would suggest you fellows do is given you have enough votes and
> > reviewers between you, then I would drive hard at getting the needed core
> > and any api breaking changes committed over the next week or two so you
> > have these in before the gate comes down.  Perhaps the new file format
> will
> > make it in in time but I do not think we should hold up 0.96 till it is
> > baked; it can come in post 0.96 especially if it is all additions and not
> > core amendments.
> >
> > Is there a design we can be checking out meantime?
> >
> > St.Ack
> >
>

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by Andrew Purtell <an...@gmail.com>.
[ Reposting from a different account - looks like infra borked LDAP
somehow. ]

On Fri, Jul 19, 2013 at 4:31 PM, Stack <st...@duboce.net> wrote:

> 0.96 is way too late already.  I am shooting for end of July as cut-off
> point when I intend rolling a 0.95.2 RC.
>

We are shooting for a HFileV3 in time for inclusion in this RC.

> What I would suggest you fellows do is given you have enough votes
> and reviewers between you, then I would drive hard at getting the
> needed core and any api breaking changes committed over the next
> week or two so you have these in before the gate comes down.

We were planning to do it this way but now until it became clear that 0.96
WILL GO OUT at the end of this month. :-) We do not want to change APIs or
make invasive changes in 0.95/0.96 now because it is so close to going out.

We would be unhappy with a release of 0.96 if we would be unable to make
progress with security use cases in a shipping 0.96. I'm not sure how long
we would have to wait for 0.98. At the same time we don't want to make a
user pay a price for security if they don't want it, that is what has been
occupying us on this work for the past couple of months. We don't want to
hold up the 0.96 release either and see HFileV3 as a way everyone wins.

So we refactored the most recent prototype of tag persistence in file into
HFileV3 in a way that minimizes changes to core code. We're only changing
the data block encoder base class, because the design intent of that
package is to be shared among all file formats. No changes to anything on
the client or RPC. These changes are being looked at by three committers
right now.

We abandoned further work on Cell and CellCodec etc. until 0.98, agree, the
clock has run out there. Plumbing tags through to the client will be part
of that. For the use cases we have on deck for tags, only server side
support is needed and clients can set operation attributes to get them to
the server for now.

On Fri, Jul 19, 2013 at 4:31 PM, Stack <st...@duboce.net> wrote:

> On Fri, Jul 19, 2013 at 3:34 PM, Andrew Purtell <ap...@apache.org>
> wrote:
>
> > On Fri, Jul 19, 2013 at 2:02 PM, Elliott Clark <ec...@apache.org>
> wrote:
> >
> > > That should mean that it's possible to make this change in a later
> > version
> > > without holding up 0.96.
> > >
> >
> > Will this hold up 0.96? What would you suggest we do, besides making a
> > patch available for review along with test results, which is in the
> works.
> > As you say this work can co-exist perfectly well with all code in 0.96
> > without interference with it. (Although it is reasonable to be skeptical
> of
> > my claims until viewing a patch.)
> >
> >
> 0.96 is way too late already.  I am shooting for end of July as cut-off
> point when I intend rolling a 0.95.2 RC.  I do not want to take on new
> features after 0.95.2.
>
> What I would suggest you fellows do is given you have enough votes and
> reviewers between you, then I would drive hard at getting the needed core
> and any api breaking changes committed over the next week or two so you
> have these in before the gate comes down.  Perhaps the new file format will
> make it in in time but I do not think we should hold up 0.96 till it is
> baked; it can come in post 0.96 especially if it is all additions and not
> core amendments.
>
> Is there a design we can be checking out meantime?
>
> St.Ack
>

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by Stack <st...@duboce.net>.
On Fri, Jul 19, 2013 at 3:34 PM, Andrew Purtell <ap...@apache.org> wrote:

> On Fri, Jul 19, 2013 at 2:02 PM, Elliott Clark <ec...@apache.org> wrote:
>
> > That should mean that it's possible to make this change in a later
> version
> > without holding up 0.96.
> >
>
> Will this hold up 0.96? What would you suggest we do, besides making a
> patch available for review along with test results, which is in the works.
> As you say this work can co-exist perfectly well with all code in 0.96
> without interference with it. (Although it is reasonable to be skeptical of
> my claims until viewing a patch.)
>
>
0.96 is way too late already.  I am shooting for end of July as cut-off
point when I intend rolling a 0.95.2 RC.  I do not want to take on new
features after 0.95.2.

What I would suggest you fellows do is given you have enough votes and
reviewers between you, then I would drive hard at getting the needed core
and any api breaking changes committed over the next week or two so you
have these in before the gate comes down.  Perhaps the new file format will
make it in in time but I do not think we should hold up 0.96 till it is
baked; it can come in post 0.96 especially if it is all additions and not
core amendments.

Is there a design we can be checking out meantime?

St.Ack

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by Andrew Purtell <ap...@apache.org>.
On Fri, Jul 19, 2013 at 2:02 PM, Elliott Clark <ec...@apache.org> wrote:

> That should mean that it's possible to make this change in a later version
> without holding up 0.96.
>

Will this hold up 0.96? What would you suggest we do, besides making a
patch available for review along with test results, which is in the works.
As you say this work can co-exist perfectly well with all code in 0.96
without interference with it. (Although it is reasonable to be skeptical of
my claims until viewing a patch.)

-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by Elliott Clark <ec...@apache.org>.
On Fri, Jul 19, 2013 at 11:01 AM, Andrew Purtell <ap...@apache.org> wrote:
> It's not enough to version the file format, we have also found it necessary
> change the block encoder interfaces to maintain good performance.

That's fine that still doesn't necessitate pulling this into 0.96.
The rpc has the ability to specify what Codecs are usable.  That
should allow the creation of new block encoders and new apis used in
HFileV3 and allow the older versions to co-exist perfectly well.  That
should mean that it's possible to make this change in a later version
without holding up 0.96.

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by Andrew Purtell <ap...@apache.org>.
On Fri, Jul 19, 2013 at 10:52 AM, Elliott Clark <ec...@apache.org> wrote:

>  We already have the ability to version hfile.
>

It's not enough to version the file format, we have also found it necessary
change the block encoder interfaces to maintain good performance. After
several prototypes we arrived at V3 as the best option in our estimation
for doing that without disrupting a lot of really core critical code in use
now (V2). I will let Ram and Anoop elaborate as they've been the ones down
in the guts of HFile mostly.

> We've already all agreed on what features would make the train for 0.96.

Obviously we feel differently, so are raising this for your consideration.

For me, I have something I feel important (HBASE-6222) ready to go in to
0.96, and I would like to see it ship in 0.96, except for the lack of
inline tags support. I can fall back to an implementation which stores
metadata in a shadow column family instead of inline in the cell/KV, but
experiments have shown that suboptimal to the alternative, and then I would
need to consider migration. So I am +1 for inline tags and HFile V3 as the
least worst way of making that happen.

-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by Elliott Clark <ec...@apache.org>.
On Fri, Jul 19, 2013 at 10:35 AM, Ted Yu <yu...@gmail.com> wrote:
> If cell tagging goes to 0.96, that would open door to many scenarios.

I don't understand why it has to be in 0.96. The rpc already has the
ability to signal what it can handle.  We have the ability to fallback
(all the way to protobuf kv's if need be).  We already have the
ability to version hfile.  We've already all agreed on what features
would make the train for 0.96.

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by Ted Yu <yu...@gmail.com>.
Whether tags should be used depends on the availability of cell tagging
feature under discussion here.

If cell tagging goes to 0.96, that would open door to many scenarios.

Cheers

On Fri, Jul 19, 2013 at 10:13 AM, Anoop John <an...@gmail.com> wrote:

> >The reason for my question was that the notion of storing information in
> tag was mentioned in recent past. See links below:
>
> Ted you have plans to make use of the inline tags for these issues?
>
> -Anoop-
>
> On Fri, Jul 19, 2013 at 10:02 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Thanks for the confirmation, Ram, Anoop and Andy.
> >
> > The reason for my question was that the notion of storing information in
> > tag was mentioned in recent past. See links below:
> >
> >
> >
> https://issues.apache.org/jira/browse/HBASE-8701?focusedCommentId=13678434&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13678434
> >
> >
> >
> https://issues.apache.org/jira/browse/HBASE-3787?focusedCommentId=13635960&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13635960
> >
> > On Fri, Jul 19, 2013 at 9:27 AM, Andrew Purtell <ap...@apache.org>
> > wrote:
> >
> > > On Fri, Jul 19, 2013 at 7:18 AM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > Would tags be visible to methods of BaseRegionObserver, other than
> > > > AccessController ? Meaning, would other (non-secure) components of
> > HBase
> > > > be able to use cell
> > > > tagging to store certain information ?
> > > >
> > >
> > > Inline cell/KV tags must be core feature by definition I think, so an
> > > component of HBase will be able to use them, coprocessor or not.
> > >
> > >
> > > --
> > > Best regards,
> > >
> > >    - Andy
> > >
> > > Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> > > (via Tom White)
> > >
> >
>

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by Anoop John <an...@gmail.com>.
>The reason for my question was that the notion of storing information in
tag was mentioned in recent past. See links below:

Ted you have plans to make use of the inline tags for these issues?

-Anoop-

On Fri, Jul 19, 2013 at 10:02 PM, Ted Yu <yu...@gmail.com> wrote:

> Thanks for the confirmation, Ram, Anoop and Andy.
>
> The reason for my question was that the notion of storing information in
> tag was mentioned in recent past. See links below:
>
>
> https://issues.apache.org/jira/browse/HBASE-8701?focusedCommentId=13678434&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13678434
>
>
> https://issues.apache.org/jira/browse/HBASE-3787?focusedCommentId=13635960&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13635960
>
> On Fri, Jul 19, 2013 at 9:27 AM, Andrew Purtell <ap...@apache.org>
> wrote:
>
> > On Fri, Jul 19, 2013 at 7:18 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > Would tags be visible to methods of BaseRegionObserver, other than
> > > AccessController ? Meaning, would other (non-secure) components of
> HBase
> > > be able to use cell
> > > tagging to store certain information ?
> > >
> >
> > Inline cell/KV tags must be core feature by definition I think, so an
> > component of HBase will be able to use them, coprocessor or not.
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by Ted Yu <yu...@gmail.com>.
Thanks for the confirmation, Ram, Anoop and Andy.

The reason for my question was that the notion of storing information in
tag was mentioned in recent past. See links below:

https://issues.apache.org/jira/browse/HBASE-8701?focusedCommentId=13678434&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13678434

https://issues.apache.org/jira/browse/HBASE-3787?focusedCommentId=13635960&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13635960

On Fri, Jul 19, 2013 at 9:27 AM, Andrew Purtell <ap...@apache.org> wrote:

> On Fri, Jul 19, 2013 at 7:18 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > Would tags be visible to methods of BaseRegionObserver, other than
> > AccessController ? Meaning, would other (non-secure) components of HBase
> > be able to use cell
> > tagging to store certain information ?
> >
>
> Inline cell/KV tags must be core feature by definition I think, so an
> component of HBase will be able to use them, coprocessor or not.
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by Andrew Purtell <ap...@apache.org>.
On Fri, Jul 19, 2013 at 7:18 AM, Ted Yu <yu...@gmail.com> wrote:

> Would tags be visible to methods of BaseRegionObserver, other than
> AccessController ? Meaning, would other (non-secure) components of HBase
> be able to use cell
> tagging to store certain information ?
>

Inline cell/KV tags must be core feature by definition I think, so an
component of HBase will be able to use them, coprocessor or not.


-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by ramkrishna vasudevan <ra...@gmail.com>.
>>As Anoop proposed, if there is a way to de-activate the
tags feature when all the KVs in a file are having tag length as zero, then
it's all good!
This can happen after the compaction is done.  And we have this provision.
 May be we thought in the initial version we need not add this.
>>Meaning, would other (non-secure) components of HBase be able to use cell
tagging to store certain information ?

The filters will have access to the KV that has tags.
Tags can be used to store additional information but the native code does
not have the capability to understand tags in the sense they would just
treat as byte arrays.
And there is currently nothing like tag would work only with security
rather the use case currently helps in security.
Hope i answered your query.


On Fri, Jul 19, 2013 at 7:48 PM, Ted Yu <yu...@gmail.com> wrote:

> Would tags be visible to methods of BaseRegionObserver, other than
> AccessController ?
>
> Meaning, would other (non-secure) components of HBase be able to use cell
> tagging to store certain information ?
>
> Please clarify.
>
> Thanks
>
> On Fri, Jul 19, 2013 at 6:09 AM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
> > Thanks Ram and Anoop for those details again. I don't think there is a
> need
> > to be able to revert from V3 to V2. And 1 byte overhead on an HFile is
> not
> > really an overhead. As Anoop proposed, if there is a way to de-activate
> the
> > tags feature when all the KVs in a file are having tag length as zero,
> then
> > it's all good!
> >
> > Looking forward to test that!
> >
> > JM
> >
> > 2013/7/19 ramkrishna vasudevan <ra...@gmail.com>
> >
> > > But am afraid that once the user switches to V3 with tags he cannot
> come
> > > back to V2.  If this scenario is possible then we need to see a work
> > around
> > > for that?
> > > Particularly in the case if the user has written the tags and tries to
> > read
> > > it back with V2 then it would not work.
> > >
> > > If user switches to V3 but does not write any tags then if we go with
> the
> > > option of making tags optional using the Fileinfo then atleast after
> the
> > > compaction is done the Hfile could be read with the V2 reader also.
>  But
> > i
> > > don't think the user would intend to do this given the fact that he
> needs
> > > tags for his usecase.
> > >
> > > Regards
> > > Ram
> > >
> > >
> > > On Fri, Jul 19, 2013 at 5:21 PM, Anoop John <an...@gmail.com>
> > wrote:
> > >
> > > > Jean
> > > >         When V2 will be used there wont any extra bytes and so no
> > > overhead
> > > > in write or read paths.
> > > > When V3 is used, and there are no tags present at all, we will have
> > extra
> > > > bytes for writing tag length.  Trying to put tag length as VInt so
> that
> > > > this will be 1 byte only.  Then using File infos we can avoid
> overhead.
> > > >
> > > > Say when all the KVs in a file are having tag length as zero( a filer
> > > > trailer indicate this) , during read we can avoid the read and decode
> > of
> > > > teh tag length. Just skip one byte of tag length.
> > > >
> > > > Regarding avoiding the tag length (even the 1 byte fully)  maybe
> during
> > > > compaction it should be possible. But whether really needed I am
> > > thinikng.
> > > > User can select V3 when there is a need for Tags.
> > > >
> > > > -Anoop-
> > > >
> > > > On Fri, Jul 19, 2013 at 4:53 PM, Jean-Marc Spaggiari <
> > > > jean-marc@spaggiari.org> wrote:
> > > >
> > > > > Thanks Ram.
> > > > >
> > > > > One last. Space wise. If I understand correctly, between V2 and V3,
> > > when
> > > > > tags are de-activated, there will be only a 1 bit difference, so
> same
> > > > > storage space used. If tags are activated but empty, is it going to
> > be
> > > > the
> > > > > same thing? Or are we going to have all the tags overhead? Like can
> > we
> > > > have
> > > > > a byte to say "no tags in that file" in addition to "tags are
> > activated
> > > > for
> > > > > that file"?
> > > > >
> > > > > So 2 questions.
> > > > >
> > > > > 1) what the overhead on disk space from the tags.
> > > > > 2) should we have a flag(bit) per file to say no tags even if
> > activated
> > > > to
> > > > > limit this overhead and ket people activate it for futur uses?
> > > > >
> > > > > JMS
> > > > > Le 2013-07-19 07:11, "ramkrishna vasudevan" <
> > > > > ramkrishna.s.vasudevan@gmail.com> a écrit :
> > > > >
> > > > > > >>Based on your details, I think it will be, but very minimal, or
> > > > > > almost invisible, correct?
> > > > > > Yes of course.
> > > > > > Regarding migration, any file written with V2 would still be read
> > > with
> > > > > > HFileReaderV2 and the new files will be written with V3.  So
> there
> > > > should
> > > > > > not be any problem here.  We are anyway testing these things to
> >  make
> > > > > sure
> > > > > > we don't break anywhere.  Thanks Jean for the interest.
> > > > > >
> > > > > > @Stack
> > > > > > I would write up on the changes foreseen for the Codec changes to
> > > > support
> > > > > > RPC and HFileV3.
> > > > > > Discussing with Anoop, we have some benefits when the Tags are
> > > written
> > > > as
> > > > > > the byte array and when tags are in memory.  Anyway that i would
> > > write
> > > > up
> > > > > > in a seperate thread also considering the inputs on the current
> way
> > > the
> > > > > > patch has been made.
> > > > > >
> > > > > > Regards
> > > > > > Ram
> > > > > >
> > > > > >
> > > > > > On Fri, Jul 19, 2013 at 4:32 PM, Jean-Marc Spaggiari <
> > > > > > jean-marc@spaggiari.org> wrote:
> > > > > >
> > > > > > > Like Ted and St.Ack, I read all of this with a great interest
> and
> > > > > > > everything looked good to me.
> > > > > > >
> > > > > > > My only concern will be performance wise.  Even if tags are
> > > disabled,
> > > > > di
> > > > > > > you forsee some performances impacts because everything will
> now
> > > need
> > > > > to
> > > > > > be
> > > > > > > tag aware? Based on your details, I think it will be, but very
> > > > minimal,
> > > > > > or
> > > > > > > almost invisible, correct?
> > > > > > >
> > > > > > > Also, for migrations from v2 to v3, if v3 is activated, that
> will
> > > be
> > > > > > simply
> > > > > > > done when HFilea will be written, correct? So not really any
> > > > migration
> > > > > > > process required?
> > > > > > >
> > > > > > > JM
> > > > > > > Le 2013-07-19 01:13, "Stack" <st...@duboce.net> a écrit :
> > > > > > >
> > > > > > > > On Thu, Jul 18, 2013 at 10:14 AM, ramkrishna vasudevan <
> > > > > > > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > > > > > > ...
> > > > > > > >
> > > > > > > > >  We can avoid several problems with HFile V2 internals, and
> > > > > backwards
> > > > > > > > > compatibility concerns, and allow for working tags support
> > with
> > > > no
> > > > > > > > > performance impact and low risk to all HBase users who do
> not
> > > > want
> > > > > > tag
> > > > > > > > > support, while still allowing for inline tags capabilities
> > in a
> > > > > > > shipping
> > > > > > > > > version of HBase, by introducing this in a new V3 version
> for
> > > > > HFile.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > This seems like a good tactic to me.  HFileV2 has the current
> > KV
> > > > > format
> > > > > > > > hard-coded all over and trying to 'fix' this would probably
> > take
> > > a
> > > > > > bunch
> > > > > > > of
> > > > > > > > effort and would jeopardize current workings.
> > > > > > > >
> > > > > > > > ....
> > > > > > > >
> > > > > > > > >
> > > > > > > > >  We have been working on this and will have a clean patch
> > with
> > > > good
> > > > > > > > amount
> > > > > > > > > of testing in time for 0.96.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > I'd think that your moving into a green field by doing an
> > hfilev3
> > > > > would
> > > > > > > > make it so your work could run independent of 0.96 timeline;
> > i.e.
> > > > it
> > > > > > > could
> > > > > > > > come in post 0.96?
> > > > > > > >
> > > > > > > > What sort of changes do you foresee necessary in core to
> > support
> > > > cell
> > > > > > > > codecs?  Between rpc and hfilev3?
> > > > > > > >
> > > > > > > > Thanks Ram,
> > > > > > > > St.Ack
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by Anoop John <an...@gmail.com>.
It should be Ted. Tags will be present in KV(Cell).  So whichever part
deals with KVs (Cells) can use the tags and do some thing with that.  Do
some checks in Filter and filter out KVs, or access in CP etc etc.

-Anoop-

On Fri, Jul 19, 2013 at 7:48 PM, Ted Yu <yu...@gmail.com> wrote:

> Would tags be visible to methods of BaseRegionObserver, other than
> AccessController ?
>
> Meaning, would other (non-secure) components of HBase be able to use cell
> tagging to store certain information ?
>
> Please clarify.
>
> Thanks
>
> On Fri, Jul 19, 2013 at 6:09 AM, Jean-Marc Spaggiari <
>  jean-marc@spaggiari.org> wrote:
>
> > Thanks Ram and Anoop for those details again. I don't think there is a
> need
> > to be able to revert from V3 to V2. And 1 byte overhead on an HFile is
> not
> > really an overhead. As Anoop proposed, if there is a way to de-activate
> the
> > tags feature when all the KVs in a file are having tag length as zero,
> then
> > it's all good!
> >
> > Looking forward to test that!
> >
> > JM
> >
> > 2013/7/19 ramkrishna vasudevan <ra...@gmail.com>
> >
> > > But am afraid that once the user switches to V3 with tags he cannot
> come
> > > back to V2.  If this scenario is possible then we need to see a work
> > around
> > > for that?
> > > Particularly in the case if the user has written the tags and tries to
> > read
> > > it back with V2 then it would not work.
> > >
> > > If user switches to V3 but does not write any tags then if we go with
> the
> > > option of making tags optional using the Fileinfo then atleast after
> the
> > > compaction is done the Hfile could be read with the V2 reader also.
>  But
> > i
> > > don't think the user would intend to do this given the fact that he
> needs
> > > tags for his usecase.
> > >
> > > Regards
> > > Ram
> > >
> > >
> > > On Fri, Jul 19, 2013 at 5:21 PM, Anoop John <an...@gmail.com>
> > wrote:
> > >
> > > > Jean
> > > >         When V2 will be used there wont any extra bytes and so no
> > > overhead
> > > > in write or read paths.
> > > > When V3 is used, and there are no tags present at all, we will have
> > extra
> > > > bytes for writing tag length.  Trying to put tag length as VInt so
> that
> > > > this will be 1 byte only.  Then using File infos we can avoid
> overhead.
> > > >
> > > > Say when all the KVs in a file are having tag length as zero( a filer
> > > > trailer indicate this) , during read we can avoid the read and decode
> > of
> > > > teh tag length. Just skip one byte of tag length.
> > > >
> > > > Regarding avoiding the tag length (even the 1 byte fully)  maybe
> during
> > > > compaction it should be possible. But whether really needed I am
> > > thinikng.
> > > > User can select V3 when there is a need for Tags.
> > > >
> > > > -Anoop-
> > > >
> > > > On Fri, Jul 19, 2013 at 4:53 PM, Jean-Marc Spaggiari <
> > > > jean-marc@spaggiari.org> wrote:
> > > >
> > > > > Thanks Ram.
> > > > >
> > > > > One last. Space wise. If I understand correctly, between V2 and V3,
> > > when
> > > > > tags are de-activated, there will be only a 1 bit difference, so
> same
> > > > > storage space used. If tags are activated but empty, is it going to
> > be
> > > > the
> > > > > same thing? Or are we going to have all the tags overhead? Like can
> > we
> > > > have
> > > > > a byte to say "no tags in that file" in addition to "tags are
> > activated
> > > > for
> > > > > that file"?
> > > > >
> > > > > So 2 questions.
> > > > >
> > > > > 1) what the overhead on disk space from the tags.
> > > > > 2) should we have a flag(bit) per file to say no tags even if
> > activated
> > > > to
> > > > > limit this overhead and ket people activate it for futur uses?
> > > > >
> > > > > JMS
> > > > > Le 2013-07-19 07:11, "ramkrishna vasudevan" <
> > > > > ramkrishna.s.vasudevan@gmail.com> a écrit :
> > > > >
> > > > > > >>Based on your details, I think it will be, but very minimal, or
> > > > > > almost invisible, correct?
> > > > > > Yes of course.
> > > > > > Regarding migration, any file written with V2 would still be read
> > > with
> > > > > > HFileReaderV2 and the new files will be written with V3.  So
> there
> > > > should
> > > > > > not be any problem here.  We are anyway testing these things to
> >  make
> > > > > sure
> > > > > > we don't break anywhere.  Thanks Jean for the interest.
> > > > > >
> > > > > > @Stack
> > > > > > I would write up on the changes foreseen for the Codec changes to
> > > > support
> > > > > > RPC and HFileV3.
> > > > > > Discussing with Anoop, we have some benefits when the Tags are
> > > written
> > > > as
> > > > > > the byte array and when tags are in memory.  Anyway that i would
> > > write
> > > > up
> > > > > > in a seperate thread also considering the inputs on the current
> way
> > > the
> > > > > > patch has been made.
> > > > > >
> > > > > > Regards
> > > > > > Ram
> > > > > >
> > > > > >
> > > > > > On Fri, Jul 19, 2013 at 4:32 PM, Jean-Marc Spaggiari <
> > > > > > jean-marc@spaggiari.org> wrote:
> > > > > >
> > > > > > > Like Ted and St.Ack, I read all of this with a great interest
> and
> > > > > > > everything looked good to me.
> > > > > > >
> > > > > > > My only concern will be performance wise.  Even if tags are
> > > disabled,
> > > > > di
> > > > > > > you forsee some performances impacts because everything will
> now
> > > need
> > > > > to
> > > > > > be
> > > > > > > tag aware? Based on your details, I think it will be, but very
> > > > minimal,
> > > > > > or
> > > > > > > almost invisible, correct?
> > > > > > >
> > > > > > > Also, for migrations from v2 to v3, if v3 is activated, that
> will
> > > be
> > > > > > simply
> > > > > > > done when HFilea will be written, correct? So not really any
> > > > migration
> > > > > > > process required?
> > > > > > >
> > > > > > > JM
> > > > > > > Le 2013-07-19 01:13, "Stack" <st...@duboce.net> a écrit :
> > > > > > >
> > > > > > > > On Thu, Jul 18, 2013 at 10:14 AM, ramkrishna vasudevan <
> > > > > > > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > > > > > > ...
> > > > > > > >
> > > > > > > > >  We can avoid several problems with HFile V2 internals, and
> > > > > backwards
> > > > > > > > > compatibility concerns, and allow for working tags support
> > with
> > > > no
> > > > > > > > > performance impact and low risk to all HBase users who do
> not
> > > > want
> > > > > > tag
> > > > > > > > > support, while still allowing for inline tags capabilities
> > in a
> > > > > > > shipping
> > > > > > > > > version of HBase, by introducing this in a new V3 version
> for
> > > > > HFile.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > This seems like a good tactic to me.  HFileV2 has the current
> > KV
> > > > > format
> > > > > > > > hard-coded all over and trying to 'fix' this would probably
> > take
> > > a
> > > > > > bunch
> > > > > > > of
> > > > > > > > effort and would jeopardize current workings.
> > > > > > > >
> > > > > > > > ....
> > > > > > > >
> > > > > > > > >
> > > > > > > > >  We have been working on this and will have a clean patch
> > with
> > > > good
> > > > > > > > amount
> > > > > > > > > of testing in time for 0.96.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > I'd think that your moving into a green field by doing an
> > hfilev3
> > > > > would
> > > > > > > > make it so your work could run independent of 0.96 timeline;
> > i.e.
> > > > it
> > > > > > > could
> > > > > > > > come in post 0.96?
> > > > > > > >
> > > > > > > > What sort of changes do you foresee necessary in core to
> > support
> > > > cell
> > > > > > > > codecs?  Between rpc and hfilev3?
> > > > > > > >
> > > > > > > > Thanks Ram,
> > > > > > > > St.Ack
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by Ted Yu <yu...@gmail.com>.
Would tags be visible to methods of BaseRegionObserver, other than
AccessController ?

Meaning, would other (non-secure) components of HBase be able to use cell
tagging to store certain information ?

Please clarify.

Thanks

On Fri, Jul 19, 2013 at 6:09 AM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Thanks Ram and Anoop for those details again. I don't think there is a need
> to be able to revert from V3 to V2. And 1 byte overhead on an HFile is not
> really an overhead. As Anoop proposed, if there is a way to de-activate the
> tags feature when all the KVs in a file are having tag length as zero, then
> it's all good!
>
> Looking forward to test that!
>
> JM
>
> 2013/7/19 ramkrishna vasudevan <ra...@gmail.com>
>
> > But am afraid that once the user switches to V3 with tags he cannot come
> > back to V2.  If this scenario is possible then we need to see a work
> around
> > for that?
> > Particularly in the case if the user has written the tags and tries to
> read
> > it back with V2 then it would not work.
> >
> > If user switches to V3 but does not write any tags then if we go with the
> > option of making tags optional using the Fileinfo then atleast after the
> > compaction is done the Hfile could be read with the V2 reader also.  But
> i
> > don't think the user would intend to do this given the fact that he needs
> > tags for his usecase.
> >
> > Regards
> > Ram
> >
> >
> > On Fri, Jul 19, 2013 at 5:21 PM, Anoop John <an...@gmail.com>
> wrote:
> >
> > > Jean
> > >         When V2 will be used there wont any extra bytes and so no
> > overhead
> > > in write or read paths.
> > > When V3 is used, and there are no tags present at all, we will have
> extra
> > > bytes for writing tag length.  Trying to put tag length as VInt so that
> > > this will be 1 byte only.  Then using File infos we can avoid overhead.
> > >
> > > Say when all the KVs in a file are having tag length as zero( a filer
> > > trailer indicate this) , during read we can avoid the read and decode
> of
> > > teh tag length. Just skip one byte of tag length.
> > >
> > > Regarding avoiding the tag length (even the 1 byte fully)  maybe during
> > > compaction it should be possible. But whether really needed I am
> > thinikng.
> > > User can select V3 when there is a need for Tags.
> > >
> > > -Anoop-
> > >
> > > On Fri, Jul 19, 2013 at 4:53 PM, Jean-Marc Spaggiari <
> > > jean-marc@spaggiari.org> wrote:
> > >
> > > > Thanks Ram.
> > > >
> > > > One last. Space wise. If I understand correctly, between V2 and V3,
> > when
> > > > tags are de-activated, there will be only a 1 bit difference, so same
> > > > storage space used. If tags are activated but empty, is it going to
> be
> > > the
> > > > same thing? Or are we going to have all the tags overhead? Like can
> we
> > > have
> > > > a byte to say "no tags in that file" in addition to "tags are
> activated
> > > for
> > > > that file"?
> > > >
> > > > So 2 questions.
> > > >
> > > > 1) what the overhead on disk space from the tags.
> > > > 2) should we have a flag(bit) per file to say no tags even if
> activated
> > > to
> > > > limit this overhead and ket people activate it for futur uses?
> > > >
> > > > JMS
> > > > Le 2013-07-19 07:11, "ramkrishna vasudevan" <
> > > > ramkrishna.s.vasudevan@gmail.com> a écrit :
> > > >
> > > > > >>Based on your details, I think it will be, but very minimal, or
> > > > > almost invisible, correct?
> > > > > Yes of course.
> > > > > Regarding migration, any file written with V2 would still be read
> > with
> > > > > HFileReaderV2 and the new files will be written with V3.  So there
> > > should
> > > > > not be any problem here.  We are anyway testing these things to
>  make
> > > > sure
> > > > > we don't break anywhere.  Thanks Jean for the interest.
> > > > >
> > > > > @Stack
> > > > > I would write up on the changes foreseen for the Codec changes to
> > > support
> > > > > RPC and HFileV3.
> > > > > Discussing with Anoop, we have some benefits when the Tags are
> > written
> > > as
> > > > > the byte array and when tags are in memory.  Anyway that i would
> > write
> > > up
> > > > > in a seperate thread also considering the inputs on the current way
> > the
> > > > > patch has been made.
> > > > >
> > > > > Regards
> > > > > Ram
> > > > >
> > > > >
> > > > > On Fri, Jul 19, 2013 at 4:32 PM, Jean-Marc Spaggiari <
> > > > > jean-marc@spaggiari.org> wrote:
> > > > >
> > > > > > Like Ted and St.Ack, I read all of this with a great interest and
> > > > > > everything looked good to me.
> > > > > >
> > > > > > My only concern will be performance wise.  Even if tags are
> > disabled,
> > > > di
> > > > > > you forsee some performances impacts because everything will now
> > need
> > > > to
> > > > > be
> > > > > > tag aware? Based on your details, I think it will be, but very
> > > minimal,
> > > > > or
> > > > > > almost invisible, correct?
> > > > > >
> > > > > > Also, for migrations from v2 to v3, if v3 is activated, that will
> > be
> > > > > simply
> > > > > > done when HFilea will be written, correct? So not really any
> > > migration
> > > > > > process required?
> > > > > >
> > > > > > JM
> > > > > > Le 2013-07-19 01:13, "Stack" <st...@duboce.net> a écrit :
> > > > > >
> > > > > > > On Thu, Jul 18, 2013 at 10:14 AM, ramkrishna vasudevan <
> > > > > > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > > > > > ...
> > > > > > >
> > > > > > > >  We can avoid several problems with HFile V2 internals, and
> > > > backwards
> > > > > > > > compatibility concerns, and allow for working tags support
> with
> > > no
> > > > > > > > performance impact and low risk to all HBase users who do not
> > > want
> > > > > tag
> > > > > > > > support, while still allowing for inline tags capabilities
> in a
> > > > > > shipping
> > > > > > > > version of HBase, by introducing this in a new V3 version for
> > > > HFile.
> > > > > > > >
> > > > > > > >
> > > > > > > This seems like a good tactic to me.  HFileV2 has the current
> KV
> > > > format
> > > > > > > hard-coded all over and trying to 'fix' this would probably
> take
> > a
> > > > > bunch
> > > > > > of
> > > > > > > effort and would jeopardize current workings.
> > > > > > >
> > > > > > > ....
> > > > > > >
> > > > > > > >
> > > > > > > >  We have been working on this and will have a clean patch
> with
> > > good
> > > > > > > amount
> > > > > > > > of testing in time for 0.96.
> > > > > > > >
> > > > > > > >
> > > > > > > I'd think that your moving into a green field by doing an
> hfilev3
> > > > would
> > > > > > > make it so your work could run independent of 0.96 timeline;
> i.e.
> > > it
> > > > > > could
> > > > > > > come in post 0.96?
> > > > > > >
> > > > > > > What sort of changes do you foresee necessary in core to
> support
> > > cell
> > > > > > > codecs?  Between rpc and hfilev3?
> > > > > > >
> > > > > > > Thanks Ram,
> > > > > > > St.Ack
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Thanks Ram and Anoop for those details again. I don't think there is a need
to be able to revert from V3 to V2. And 1 byte overhead on an HFile is not
really an overhead. As Anoop proposed, if there is a way to de-activate the
tags feature when all the KVs in a file are having tag length as zero, then
it's all good!

Looking forward to test that!

JM

2013/7/19 ramkrishna vasudevan <ra...@gmail.com>

> But am afraid that once the user switches to V3 with tags he cannot come
> back to V2.  If this scenario is possible then we need to see a work around
> for that?
> Particularly in the case if the user has written the tags and tries to read
> it back with V2 then it would not work.
>
> If user switches to V3 but does not write any tags then if we go with the
> option of making tags optional using the Fileinfo then atleast after the
> compaction is done the Hfile could be read with the V2 reader also.  But i
> don't think the user would intend to do this given the fact that he needs
> tags for his usecase.
>
> Regards
> Ram
>
>
> On Fri, Jul 19, 2013 at 5:21 PM, Anoop John <an...@gmail.com> wrote:
>
> > Jean
> >         When V2 will be used there wont any extra bytes and so no
> overhead
> > in write or read paths.
> > When V3 is used, and there are no tags present at all, we will have extra
> > bytes for writing tag length.  Trying to put tag length as VInt so that
> > this will be 1 byte only.  Then using File infos we can avoid overhead.
> >
> > Say when all the KVs in a file are having tag length as zero( a filer
> > trailer indicate this) , during read we can avoid the read and decode of
> > teh tag length. Just skip one byte of tag length.
> >
> > Regarding avoiding the tag length (even the 1 byte fully)  maybe during
> > compaction it should be possible. But whether really needed I am
> thinikng.
> > User can select V3 when there is a need for Tags.
> >
> > -Anoop-
> >
> > On Fri, Jul 19, 2013 at 4:53 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> > > Thanks Ram.
> > >
> > > One last. Space wise. If I understand correctly, between V2 and V3,
> when
> > > tags are de-activated, there will be only a 1 bit difference, so same
> > > storage space used. If tags are activated but empty, is it going to be
> > the
> > > same thing? Or are we going to have all the tags overhead? Like can we
> > have
> > > a byte to say "no tags in that file" in addition to "tags are activated
> > for
> > > that file"?
> > >
> > > So 2 questions.
> > >
> > > 1) what the overhead on disk space from the tags.
> > > 2) should we have a flag(bit) per file to say no tags even if activated
> > to
> > > limit this overhead and ket people activate it for futur uses?
> > >
> > > JMS
> > > Le 2013-07-19 07:11, "ramkrishna vasudevan" <
> > > ramkrishna.s.vasudevan@gmail.com> a écrit :
> > >
> > > > >>Based on your details, I think it will be, but very minimal, or
> > > > almost invisible, correct?
> > > > Yes of course.
> > > > Regarding migration, any file written with V2 would still be read
> with
> > > > HFileReaderV2 and the new files will be written with V3.  So there
> > should
> > > > not be any problem here.  We are anyway testing these things to  make
> > > sure
> > > > we don't break anywhere.  Thanks Jean for the interest.
> > > >
> > > > @Stack
> > > > I would write up on the changes foreseen for the Codec changes to
> > support
> > > > RPC and HFileV3.
> > > > Discussing with Anoop, we have some benefits when the Tags are
> written
> > as
> > > > the byte array and when tags are in memory.  Anyway that i would
> write
> > up
> > > > in a seperate thread also considering the inputs on the current way
> the
> > > > patch has been made.
> > > >
> > > > Regards
> > > > Ram
> > > >
> > > >
> > > > On Fri, Jul 19, 2013 at 4:32 PM, Jean-Marc Spaggiari <
> > > > jean-marc@spaggiari.org> wrote:
> > > >
> > > > > Like Ted and St.Ack, I read all of this with a great interest and
> > > > > everything looked good to me.
> > > > >
> > > > > My only concern will be performance wise.  Even if tags are
> disabled,
> > > di
> > > > > you forsee some performances impacts because everything will now
> need
> > > to
> > > > be
> > > > > tag aware? Based on your details, I think it will be, but very
> > minimal,
> > > > or
> > > > > almost invisible, correct?
> > > > >
> > > > > Also, for migrations from v2 to v3, if v3 is activated, that will
> be
> > > > simply
> > > > > done when HFilea will be written, correct? So not really any
> > migration
> > > > > process required?
> > > > >
> > > > > JM
> > > > > Le 2013-07-19 01:13, "Stack" <st...@duboce.net> a écrit :
> > > > >
> > > > > > On Thu, Jul 18, 2013 at 10:14 AM, ramkrishna vasudevan <
> > > > > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > > > > ...
> > > > > >
> > > > > > >  We can avoid several problems with HFile V2 internals, and
> > > backwards
> > > > > > > compatibility concerns, and allow for working tags support with
> > no
> > > > > > > performance impact and low risk to all HBase users who do not
> > want
> > > > tag
> > > > > > > support, while still allowing for inline tags capabilities in a
> > > > > shipping
> > > > > > > version of HBase, by introducing this in a new V3 version for
> > > HFile.
> > > > > > >
> > > > > > >
> > > > > > This seems like a good tactic to me.  HFileV2 has the current KV
> > > format
> > > > > > hard-coded all over and trying to 'fix' this would probably take
> a
> > > > bunch
> > > > > of
> > > > > > effort and would jeopardize current workings.
> > > > > >
> > > > > > ....
> > > > > >
> > > > > > >
> > > > > > >  We have been working on this and will have a clean patch with
> > good
> > > > > > amount
> > > > > > > of testing in time for 0.96.
> > > > > > >
> > > > > > >
> > > > > > I'd think that your moving into a green field by doing an hfilev3
> > > would
> > > > > > make it so your work could run independent of 0.96 timeline; i.e.
> > it
> > > > > could
> > > > > > come in post 0.96?
> > > > > >
> > > > > > What sort of changes do you foresee necessary in core to support
> > cell
> > > > > > codecs?  Between rpc and hfilev3?
> > > > > >
> > > > > > Thanks Ram,
> > > > > > St.Ack
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by ramkrishna vasudevan <ra...@gmail.com>.
But am afraid that once the user switches to V3 with tags he cannot come
back to V2.  If this scenario is possible then we need to see a work around
for that?
Particularly in the case if the user has written the tags and tries to read
it back with V2 then it would not work.

If user switches to V3 but does not write any tags then if we go with the
option of making tags optional using the Fileinfo then atleast after the
compaction is done the Hfile could be read with the V2 reader also.  But i
don't think the user would intend to do this given the fact that he needs
tags for his usecase.

Regards
Ram


On Fri, Jul 19, 2013 at 5:21 PM, Anoop John <an...@gmail.com> wrote:

> Jean
>         When V2 will be used there wont any extra bytes and so no overhead
> in write or read paths.
> When V3 is used, and there are no tags present at all, we will have extra
> bytes for writing tag length.  Trying to put tag length as VInt so that
> this will be 1 byte only.  Then using File infos we can avoid overhead.
>
> Say when all the KVs in a file are having tag length as zero( a filer
> trailer indicate this) , during read we can avoid the read and decode of
> teh tag length. Just skip one byte of tag length.
>
> Regarding avoiding the tag length (even the 1 byte fully)  maybe during
> compaction it should be possible. But whether really needed I am thinikng.
> User can select V3 when there is a need for Tags.
>
> -Anoop-
>
> On Fri, Jul 19, 2013 at 4:53 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
> > Thanks Ram.
> >
> > One last. Space wise. If I understand correctly, between V2 and V3, when
> > tags are de-activated, there will be only a 1 bit difference, so same
> > storage space used. If tags are activated but empty, is it going to be
> the
> > same thing? Or are we going to have all the tags overhead? Like can we
> have
> > a byte to say "no tags in that file" in addition to "tags are activated
> for
> > that file"?
> >
> > So 2 questions.
> >
> > 1) what the overhead on disk space from the tags.
> > 2) should we have a flag(bit) per file to say no tags even if activated
> to
> > limit this overhead and ket people activate it for futur uses?
> >
> > JMS
> > Le 2013-07-19 07:11, "ramkrishna vasudevan" <
> > ramkrishna.s.vasudevan@gmail.com> a écrit :
> >
> > > >>Based on your details, I think it will be, but very minimal, or
> > > almost invisible, correct?
> > > Yes of course.
> > > Regarding migration, any file written with V2 would still be read with
> > > HFileReaderV2 and the new files will be written with V3.  So there
> should
> > > not be any problem here.  We are anyway testing these things to  make
> > sure
> > > we don't break anywhere.  Thanks Jean for the interest.
> > >
> > > @Stack
> > > I would write up on the changes foreseen for the Codec changes to
> support
> > > RPC and HFileV3.
> > > Discussing with Anoop, we have some benefits when the Tags are written
> as
> > > the byte array and when tags are in memory.  Anyway that i would write
> up
> > > in a seperate thread also considering the inputs on the current way the
> > > patch has been made.
> > >
> > > Regards
> > > Ram
> > >
> > >
> > > On Fri, Jul 19, 2013 at 4:32 PM, Jean-Marc Spaggiari <
> > > jean-marc@spaggiari.org> wrote:
> > >
> > > > Like Ted and St.Ack, I read all of this with a great interest and
> > > > everything looked good to me.
> > > >
> > > > My only concern will be performance wise.  Even if tags are disabled,
> > di
> > > > you forsee some performances impacts because everything will now need
> > to
> > > be
> > > > tag aware? Based on your details, I think it will be, but very
> minimal,
> > > or
> > > > almost invisible, correct?
> > > >
> > > > Also, for migrations from v2 to v3, if v3 is activated, that will be
> > > simply
> > > > done when HFilea will be written, correct? So not really any
> migration
> > > > process required?
> > > >
> > > > JM
> > > > Le 2013-07-19 01:13, "Stack" <st...@duboce.net> a écrit :
> > > >
> > > > > On Thu, Jul 18, 2013 at 10:14 AM, ramkrishna vasudevan <
> > > > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > > > ...
> > > > >
> > > > > >  We can avoid several problems with HFile V2 internals, and
> > backwards
> > > > > > compatibility concerns, and allow for working tags support with
> no
> > > > > > performance impact and low risk to all HBase users who do not
> want
> > > tag
> > > > > > support, while still allowing for inline tags capabilities in a
> > > > shipping
> > > > > > version of HBase, by introducing this in a new V3 version for
> > HFile.
> > > > > >
> > > > > >
> > > > > This seems like a good tactic to me.  HFileV2 has the current KV
> > format
> > > > > hard-coded all over and trying to 'fix' this would probably take a
> > > bunch
> > > > of
> > > > > effort and would jeopardize current workings.
> > > > >
> > > > > ....
> > > > >
> > > > > >
> > > > > >  We have been working on this and will have a clean patch with
> good
> > > > > amount
> > > > > > of testing in time for 0.96.
> > > > > >
> > > > > >
> > > > > I'd think that your moving into a green field by doing an hfilev3
> > would
> > > > > make it so your work could run independent of 0.96 timeline; i.e.
> it
> > > > could
> > > > > come in post 0.96?
> > > > >
> > > > > What sort of changes do you foresee necessary in core to support
> cell
> > > > > codecs?  Between rpc and hfilev3?
> > > > >
> > > > > Thanks Ram,
> > > > > St.Ack
> > > > >
> > > >
> > >
> >
>

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by Anoop John <an...@gmail.com>.
Jean
        When V2 will be used there wont any extra bytes and so no overhead
in write or read paths.
When V3 is used, and there are no tags present at all, we will have extra
bytes for writing tag length.  Trying to put tag length as VInt so that
this will be 1 byte only.  Then using File infos we can avoid overhead.

Say when all the KVs in a file are having tag length as zero( a filer
trailer indicate this) , during read we can avoid the read and decode of
teh tag length. Just skip one byte of tag length.

Regarding avoiding the tag length (even the 1 byte fully)  maybe during
compaction it should be possible. But whether really needed I am thinikng.
User can select V3 when there is a need for Tags.

-Anoop-

On Fri, Jul 19, 2013 at 4:53 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Thanks Ram.
>
> One last. Space wise. If I understand correctly, between V2 and V3, when
> tags are de-activated, there will be only a 1 bit difference, so same
> storage space used. If tags are activated but empty, is it going to be the
> same thing? Or are we going to have all the tags overhead? Like can we have
> a byte to say "no tags in that file" in addition to "tags are activated for
> that file"?
>
> So 2 questions.
>
> 1) what the overhead on disk space from the tags.
> 2) should we have a flag(bit) per file to say no tags even if activated to
> limit this overhead and ket people activate it for futur uses?
>
> JMS
> Le 2013-07-19 07:11, "ramkrishna vasudevan" <
> ramkrishna.s.vasudevan@gmail.com> a écrit :
>
> > >>Based on your details, I think it will be, but very minimal, or
> > almost invisible, correct?
> > Yes of course.
> > Regarding migration, any file written with V2 would still be read with
> > HFileReaderV2 and the new files will be written with V3.  So there should
> > not be any problem here.  We are anyway testing these things to  make
> sure
> > we don't break anywhere.  Thanks Jean for the interest.
> >
> > @Stack
> > I would write up on the changes foreseen for the Codec changes to support
> > RPC and HFileV3.
> > Discussing with Anoop, we have some benefits when the Tags are written as
> > the byte array and when tags are in memory.  Anyway that i would write up
> > in a seperate thread also considering the inputs on the current way the
> > patch has been made.
> >
> > Regards
> > Ram
> >
> >
> > On Fri, Jul 19, 2013 at 4:32 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> > > Like Ted and St.Ack, I read all of this with a great interest and
> > > everything looked good to me.
> > >
> > > My only concern will be performance wise.  Even if tags are disabled,
> di
> > > you forsee some performances impacts because everything will now need
> to
> > be
> > > tag aware? Based on your details, I think it will be, but very minimal,
> > or
> > > almost invisible, correct?
> > >
> > > Also, for migrations from v2 to v3, if v3 is activated, that will be
> > simply
> > > done when HFilea will be written, correct? So not really any migration
> > > process required?
> > >
> > > JM
> > > Le 2013-07-19 01:13, "Stack" <st...@duboce.net> a écrit :
> > >
> > > > On Thu, Jul 18, 2013 at 10:14 AM, ramkrishna vasudevan <
> > > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > > ...
> > > >
> > > > >  We can avoid several problems with HFile V2 internals, and
> backwards
> > > > > compatibility concerns, and allow for working tags support with no
> > > > > performance impact and low risk to all HBase users who do not want
> > tag
> > > > > support, while still allowing for inline tags capabilities in a
> > > shipping
> > > > > version of HBase, by introducing this in a new V3 version for
> HFile.
> > > > >
> > > > >
> > > > This seems like a good tactic to me.  HFileV2 has the current KV
> format
> > > > hard-coded all over and trying to 'fix' this would probably take a
> > bunch
> > > of
> > > > effort and would jeopardize current workings.
> > > >
> > > > ....
> > > >
> > > > >
> > > > >  We have been working on this and will have a clean patch with good
> > > > amount
> > > > > of testing in time for 0.96.
> > > > >
> > > > >
> > > > I'd think that your moving into a green field by doing an hfilev3
> would
> > > > make it so your work could run independent of 0.96 timeline; i.e. it
> > > could
> > > > come in post 0.96?
> > > >
> > > > What sort of changes do you foresee necessary in core to support cell
> > > > codecs?  Between rpc and hfilev3?
> > > >
> > > > Thanks Ram,
> > > > St.Ack
> > > >
> > >
> >
>

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Thanks Ram.

One last. Space wise. If I understand correctly, between V2 and V3, when
tags are de-activated, there will be only a 1 bit difference, so same
storage space used. If tags are activated but empty, is it going to be the
same thing? Or are we going to have all the tags overhead? Like can we have
a byte to say "no tags in that file" in addition to "tags are activated for
that file"?

So 2 questions.

1) what the overhead on disk space from the tags.
2) should we have a flag(bit) per file to say no tags even if activated to
limit this overhead and ket people activate it for futur uses?

JMS
Le 2013-07-19 07:11, "ramkrishna vasudevan" <
ramkrishna.s.vasudevan@gmail.com> a écrit :

> >>Based on your details, I think it will be, but very minimal, or
> almost invisible, correct?
> Yes of course.
> Regarding migration, any file written with V2 would still be read with
> HFileReaderV2 and the new files will be written with V3.  So there should
> not be any problem here.  We are anyway testing these things to  make sure
> we don't break anywhere.  Thanks Jean for the interest.
>
> @Stack
> I would write up on the changes foreseen for the Codec changes to support
> RPC and HFileV3.
> Discussing with Anoop, we have some benefits when the Tags are written as
> the byte array and when tags are in memory.  Anyway that i would write up
> in a seperate thread also considering the inputs on the current way the
> patch has been made.
>
> Regards
> Ram
>
>
> On Fri, Jul 19, 2013 at 4:32 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
> > Like Ted and St.Ack, I read all of this with a great interest and
> > everything looked good to me.
> >
> > My only concern will be performance wise.  Even if tags are disabled, di
> > you forsee some performances impacts because everything will now need to
> be
> > tag aware? Based on your details, I think it will be, but very minimal,
> or
> > almost invisible, correct?
> >
> > Also, for migrations from v2 to v3, if v3 is activated, that will be
> simply
> > done when HFilea will be written, correct? So not really any migration
> > process required?
> >
> > JM
> > Le 2013-07-19 01:13, "Stack" <st...@duboce.net> a écrit :
> >
> > > On Thu, Jul 18, 2013 at 10:14 AM, ramkrishna vasudevan <
> > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > ...
> > >
> > > >  We can avoid several problems with HFile V2 internals, and backwards
> > > > compatibility concerns, and allow for working tags support with no
> > > > performance impact and low risk to all HBase users who do not want
> tag
> > > > support, while still allowing for inline tags capabilities in a
> > shipping
> > > > version of HBase, by introducing this in a new V3 version for HFile.
> > > >
> > > >
> > > This seems like a good tactic to me.  HFileV2 has the current KV format
> > > hard-coded all over and trying to 'fix' this would probably take a
> bunch
> > of
> > > effort and would jeopardize current workings.
> > >
> > > ....
> > >
> > > >
> > > >  We have been working on this and will have a clean patch with good
> > > amount
> > > > of testing in time for 0.96.
> > > >
> > > >
> > > I'd think that your moving into a green field by doing an hfilev3 would
> > > make it so your work could run independent of 0.96 timeline; i.e. it
> > could
> > > come in post 0.96?
> > >
> > > What sort of changes do you foresee necessary in core to support cell
> > > codecs?  Between rpc and hfilev3?
> > >
> > > Thanks Ram,
> > > St.Ack
> > >
> >
>

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by ramkrishna vasudevan <ra...@gmail.com>.
>>Based on your details, I think it will be, but very minimal, or
almost invisible, correct?
Yes of course.
Regarding migration, any file written with V2 would still be read with
HFileReaderV2 and the new files will be written with V3.  So there should
not be any problem here.  We are anyway testing these things to  make sure
we don't break anywhere.  Thanks Jean for the interest.

@Stack
I would write up on the changes foreseen for the Codec changes to support
RPC and HFileV3.
Discussing with Anoop, we have some benefits when the Tags are written as
the byte array and when tags are in memory.  Anyway that i would write up
in a seperate thread also considering the inputs on the current way the
patch has been made.

Regards
Ram


On Fri, Jul 19, 2013 at 4:32 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Like Ted and St.Ack, I read all of this with a great interest and
> everything looked good to me.
>
> My only concern will be performance wise.  Even if tags are disabled, di
> you forsee some performances impacts because everything will now need to be
> tag aware? Based on your details, I think it will be, but very minimal, or
> almost invisible, correct?
>
> Also, for migrations from v2 to v3, if v3 is activated, that will be simply
> done when HFilea will be written, correct? So not really any migration
> process required?
>
> JM
> Le 2013-07-19 01:13, "Stack" <st...@duboce.net> a écrit :
>
> > On Thu, Jul 18, 2013 at 10:14 AM, ramkrishna vasudevan <
> > ramkrishna.s.vasudevan@gmail.com> wrote:
> > ...
> >
> > >  We can avoid several problems with HFile V2 internals, and backwards
> > > compatibility concerns, and allow for working tags support with no
> > > performance impact and low risk to all HBase users who do not want tag
> > > support, while still allowing for inline tags capabilities in a
> shipping
> > > version of HBase, by introducing this in a new V3 version for HFile.
> > >
> > >
> > This seems like a good tactic to me.  HFileV2 has the current KV format
> > hard-coded all over and trying to 'fix' this would probably take a bunch
> of
> > effort and would jeopardize current workings.
> >
> > ....
> >
> > >
> > >  We have been working on this and will have a clean patch with good
> > amount
> > > of testing in time for 0.96.
> > >
> > >
> > I'd think that your moving into a green field by doing an hfilev3 would
> > make it so your work could run independent of 0.96 timeline; i.e. it
> could
> > come in post 0.96?
> >
> > What sort of changes do you foresee necessary in core to support cell
> > codecs?  Between rpc and hfilev3?
> >
> > Thanks Ram,
> > St.Ack
> >
>

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Like Ted and St.Ack, I read all of this with a great interest and
everything looked good to me.

My only concern will be performance wise.  Even if tags are disabled, di
you forsee some performances impacts because everything will now need to be
tag aware? Based on your details, I think it will be, but very minimal, or
almost invisible, correct?

Also, for migrations from v2 to v3, if v3 is activated, that will be simply
done when HFilea will be written, correct? So not really any migration
process required?

JM
Le 2013-07-19 01:13, "Stack" <st...@duboce.net> a écrit :

> On Thu, Jul 18, 2013 at 10:14 AM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
> ...
>
> >  We can avoid several problems with HFile V2 internals, and backwards
> > compatibility concerns, and allow for working tags support with no
> > performance impact and low risk to all HBase users who do not want tag
> > support, while still allowing for inline tags capabilities in a shipping
> > version of HBase, by introducing this in a new V3 version for HFile.
> >
> >
> This seems like a good tactic to me.  HFileV2 has the current KV format
> hard-coded all over and trying to 'fix' this would probably take a bunch of
> effort and would jeopardize current workings.
>
> ....
>
> >
> >  We have been working on this and will have a clean patch with good
> amount
> > of testing in time for 0.96.
> >
> >
> I'd think that your moving into a green field by doing an hfilev3 would
> make it so your work could run independent of 0.96 timeline; i.e. it could
> come in post 0.96?
>
> What sort of changes do you foresee necessary in core to support cell
> codecs?  Between rpc and hfilev3?
>
> Thanks Ram,
> St.Ack
>

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by Andrew Purtell <ap...@apache.org>.
On Thu, Jul 18, 2013 at 10:12 PM, Stack <st...@duboce.net> wrote:

> >  We have been working on this and will have a clean patch with good
> amount
> > of testing in time for 0.96.
> >
> >
> I'd think that your moving into a green field by doing an hfilev3 would
> make it so your work could run independent of 0.96 timeline; i.e. it could
> come in post 0.96?


We have work like HBASE-7662 (HBASE-6222) queued up for a while waiting for
inline tags support. My analysis with prototypes of those indicated that
inline tags storage would be key toward keeping performance variability to
a minimum with those features in effect. I would like to see 0.96 shipping
with an inline tags capability in core so those coprocessor use cases can
move forward and let interested users try them out. 6222 is ready to go but
for the core support.

-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: DISCUSS : HFile V3 proposal for tags in 0.96

Posted by Stack <st...@duboce.net>.
On Thu, Jul 18, 2013 at 10:14 AM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:
...

>  We can avoid several problems with HFile V2 internals, and backwards
> compatibility concerns, and allow for working tags support with no
> performance impact and low risk to all HBase users who do not want tag
> support, while still allowing for inline tags capabilities in a shipping
> version of HBase, by introducing this in a new V3 version for HFile.
>
>
This seems like a good tactic to me.  HFileV2 has the current KV format
hard-coded all over and trying to 'fix' this would probably take a bunch of
effort and would jeopardize current workings.

....

>
>  We have been working on this and will have a clean patch with good amount
> of testing in time for 0.96.
>
>
I'd think that your moving into a green field by doing an hfilev3 would
make it so your work could run independent of 0.96 timeline; i.e. it could
come in post 0.96?

What sort of changes do you foresee necessary in core to support cell
codecs?  Between rpc and hfilev3?

Thanks Ram,
St.Ack