You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by Nick Dimiduk <nd...@gmail.com> on 2013/03/13 17:42:22 UTC

HBase type support

Hi all,

I'd like to draw your attention to HBASE-8089. The desire is to add type
support to HBase. There are two primary objectives: make the lives of
developers building on HBase easier, and facilitate better tools on top of
HBase. Please chime in with any feature suggestions you think we've missed
in initial conversations.

Thanks,
-n

[0]: https://issues.apache.org/jira/browse/HBASE-8089

Re: HBase type support

Posted by Nick Dimiduk <nd...@gmail.com>.

On Thu, Mar 14, 2013 at 2:47 PM, Matteo Bertozzi <th...@gmail.com>wrote:

> could you point me to the big picture of this jira?
>

The big picture is documented in the attachment on the ticket. If it's
lacking, let me clarify and improve the document.

from what I've understood this is something like
> extending the Bytes.toInt() to all the types
> to allow the user to have something more like
>       table.putInt(myKey, 100);
>       int v = table.getInt(myKey)
>

That's only on the surface, but yes, as a final step, we could integrate
these types into the client API. I started some brain-storming about that
on HBASE-7941. As a naive blanket statement, anything supported by Bytes
should be supported natively by the client.

or is there something more?
>

Improvements to the client API are only a useful side-effect. The real
point here is for HBase to ship with "support" for data types besides
byte[]. Those data types would be defined according to the "HBase
Management System", just like an RDBMS defines types that it supports.
These types are intentionally defined independent of Java; HBase needs
better support in more languages in the future, so being tied further to
Java doesn't help with that. "Support" means provide conversion between
these types and the byte[] HBase uses under the hood. Because of HBase's
semantics, it is critical that this conversion maintain the natural
ordering of the originating type. This was my original intention in
introducing HBASE-7692. I outlined the motivation for this in the attached
document.

like table schemas, entities & co similar to the kiji project?
>

Table schemas are off the table for this. See the "Out of scope" section of
the document. Just as HBase now is BYO-types, HBase after this improvement
will be BYO-schema. Today, the application must manage a schema defining a
map from application entities to HBase Cells. The application also manages
its own serialization details for turning language types into byte[]. This
ticket seeks to alleviate the latter. Entities are a little conflated;
we've discussed providing a "compound type", which you could
consider equivalent to an entity. Without table schema, there's no where to
store that entity's definition.

are you also thinking at some sort of data-awareness server side
> for example encoding/compression based on the data type
> or compaction policies if your key is a date or similar?
>

Not so far. There's an understandable high amount of resistance to baking
this into the server-side. There are indeed a number of things that could
be done with more thorough type awareness, but they are not addressed here.
This is intended only as a client-side improvement for the sake of users of
HBase.

How can the motivation document be improved to make these intentions more
clear?

Thanks,
Nick

On Wed, Mar 13, 2013 at 4:42 PM, Nick Dimiduk <nd...@gmail.com> wrote:
>
> > Hi all,
> >
> > I'd like to draw your attention to HBASE-8089. The desire is to add type
> > support to HBase. There are two primary objectives: make the lives of
> > developers building on HBase easier, and facilitate better tools on top
> of
> > HBase. Please chime in with any feature suggestions you think we've
> missed
> > in initial conversations.
> >
> > Thanks,
> > -n
> >
> > [0]: https://issues.apache.org/jira/browse/HBASE-8089
> >
>

Re: HBase type support

Posted by Matteo Bertozzi <th...@gmail.com>.

could you point me to the big picture of this jira?

from what I've understood this is something like
extending the Bytes.toInt() to all the types
to allow the user to have something more like
      table.putInt(myKey, 100);
      int v = table.getInt(myKey)

something like this right?
or is there something more?
like table schemas, entities & co similar to the kiji project?

are you also thinking at some sort of data-awareness server side
for example encoding/compression based on the data type
or compaction policies if your key is a date or similar?

Thanks!

On Wed, Mar 13, 2013 at 4:42 PM, Nick Dimiduk <nd...@gmail.com> wrote:

> Hi all,
>
> I'd like to draw your attention to HBASE-8089. The desire is to add type
> support to HBase. There are two primary objectives: make the lives of
> developers building on HBase easier, and facilitate better tools on top of
> HBase. Please chime in with any feature suggestions you think we've missed
> in initial conversations.
>
> Thanks,
> -n
>
> [0]: https://issues.apache.org/jira/browse/HBASE-8089
>

Re: HBase type support

Posted by Nick Dimiduk <nd...@gmail.com>.

On Sat, Mar 16, 2013 at 5:18 AM, Michel Segel <mi...@hotmail.com>wrote:

> Isn't that what you get through add on frameworks like TSDB and Kiji ?
> Maybe not on the client side, but frameworks that extend HBase...
>

Sure. How can these tools interoperate together? Right now, they would all
have to agree on serialization and schema representation methods. They
don't; each designs their own schema and type management systems. Now the
user is experiencing vendor lockin.

This proposal puts in place an HBase-sanctioned solution for storing typed
data. The question of schema remains up to the tools, but with this
proposal, at least data can be read interoperably. Those other frameworks
can choose to use it or not, but the ones that do will all agree on how to
read and write, say, an Integer value.

Thanks,
Nick

On Mar 16, 2013, at 12:45 AM, lars hofhansl <la...@apache.org> wrote:
>
> > I think generally we should keep HBase a byte[] based key value store.
> > What we should add to HBase are tools that would allow client side apps
> (or libraries) to built functionality on top of plain HBase.
> >
> > Serialization that maintains a correct semantic sort order is important
> as a building block, so is code that can build up correctly serialized and
> sortable compound keys, as well as hashing algorithms.
> >
> > Where I would draw the line is adding types to HBase itself. As long as
> one can write a client, or Filters, or Coprocessors with the tools provided
> by HBase we're good. Higher level functionality can then be built of on top
> of HBase.
> >
> >
> > For example, maybe we need to add better access API to the HBase WAL in
> order to have an external library implement idempotent transactions (which
> can be used to implement 2ndary indexes).
> > Maybe some other primitives have to be exposed in order to allow an
> external library to implement full transactions.
> > Or we might need a statistics framework (such as the one that Jesse is
> working on).
> >
> > These are all building blocks that do not presume specific access
> patterns or clients, but can be used to implement them.
> >
> >
> > As usual, just my $0.02.
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> > From: Nick Dimiduk <nd...@gmail.com>
> > To: user@hbase.apache.org
> > Sent: Friday, March 15, 2013 10:57 AM
> > Subject: Re: HBase type support
> >
> > I'm talking about MD5, SHA1, etc. It's something explicitly mentioned
> > in HBASE-7221.
> >
> > On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <jtaylor@salesforce.com
> >wrote:
> >
> >> Hi Nick,
> >> What do you mean by "hashing algorithms"?
> >> Thanks,
> >> James
> >>
> >>
> >> On 03/15/2013 10:11 AM, Nick Dimiduk wrote:
> >>
> >>> Hi David,
> >>>
> >>> Native support for a handful of hashing algorithms has also been
> >>> discussed.
> >>> Do you think these should be supported directly, as opposed to using a
> >>> fixed-length String or fixed-length byte[]?
> >>>
> >>> Thanks,
> >>> Nick
> >>>
> >>> On Thu, Mar 14, 2013 at 9:51 AM, David Koch <og...@googlemail.com>
> >>> wrote:
> >>>
> >>>   Hi Nick,
> >>>>
> >>>> As an HBase user I would welcome this addition. In addition to the
> >>>> proposed
> >>>> list of datatypes A UUID/GUID type would also be nice to have.
> >>>>
> >>>> Regards,
> >>>>
> >>>> /David
> >>>>
> >>>>
> >>>> On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <nd...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>   Hi all,
> >>>>>
> >>>>> I'd like to draw your attention to HBASE-8089. The desire is to add
> type
> >>>>> support to HBase. There are two primary objectives: make the lives of
> >>>>> developers building on HBase easier, and facilitate better tools on
> top
> >>>>>
> >>>> of
> >>>>
> >>>>> HBase. Please chime in with any feature suggestions you think we've
> >>>>>
> >>>> missed
> >>>>
> >>>>> in initial conversations.
> >>>>>
> >>>>> Thanks,
> >>>>> -n
> >>>>>
> >>>>> [0]: https://issues.apache.org/**jira/browse/HBASE-8089<
> https://issues.apache.org/jira/browse/HBASE-8089>
> >>>>>
> >>>>>
>

Re: HBase type support

Posted by Amit Sela <am...@infolinks.com>.

Regarding HBase 7941 - client API support..

In the past year I wrote a lot of client code for HBase and which led to
writing a helper class for my specific needs, and since it was brought up
here I guess I'm not the only one who did something similar...I personally
like the idea brought up by Nick in the JIRA of using some kind of
<SerializationType> interface and having HBase shipped with primitive
support - and anyone who wants will implement for their needs. Same idea ad
Hadoop's Writable interface - shipping with IntWritable, LongWritable, etc.

Anyway I'd be happy to help.

On Sun, Mar 17, 2013 at 7:12 PM, Andrew Purtell <ap...@apache.org> wrote:

> > This then leads to another question... suppose Apache does add encryption
> to Hadoop. While the Apache organization does have the proper paperwork in
> place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel, etc ?
>
> Well I can't put that question aside since you've brought it up now
> twice and encryption feature candidates for Apache Hadoop and Apache HBase
> are something I have been working on. Its a valid question but since as you
> admit you don't know what you are talking about, perhaps stating uninformed
> opinions can be avoided. Only the latter is what I object to. I think the
> short answer is as an Apache contributor I'm concerned about the Apache
> product. Downstream repackagers can take whatever action needed including
> changes, since it is open source, or feedback about it representing a
> hardship. At this point I have heard nothing like that. I work for Intel
> and can say we are good with it.
>
> On Sunday, March 17, 2013, Michael Segel wrote:
>
> > Its not a question of FUD, but that certain types of
> encryption/decryption
> > code falls under the munitions act.
> > See: http://www.fas.org/irp/offdocs/eo_crypt_9611_memo.htm
> >
> > Having said that, there is this:
> > http://www.bis.doc.gov/encryption/encfaqs6_17_02.html
> >
> > In short, I don't as a habit export/import encryption technology so I am
> > not up to speed on the current state of the laws.
> > Which is why I have to question the current state of the US encryption
> > laws.
> >
> > This then leads to another question... suppose Apache does add encryption
> > to Hadoop. While the Apache organization does have the proper paperwork
> in
> > place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel, etc ?
> >
> > But lets put that question aside.
> >
> > The point I was trying to make was that the core Sun JVM does support MD5
> > and SHA-1 out of the box, so that anyone running Hadoop and using the
> > 1.6_xx or the 1.7_xx versions of the JVM will have these packages.
> >
> > Adding hooks that use these classes are a no brainer.  However, beyond
> > this... you tell me.
> >
> > -Mike
> >
> > On Mar 16, 2013, at 7:59 AM, Andrew Purtell <ap...@apache.org> wrote:
> >
> > > The ASF avails itself of an exception to crypto export which only
> > requires
> > > a bit of PMC housekeeping at release time. So "is not [ok]" is FUD. I
> > > humbly request we refrain from FUD here. See
> > > http://www.apache.org/dev/crypto.html. To the best of our knowledge we
> > > expect this to continue, though the ASF has not updated this policy yet
> > for
> > > recent regulation updates.
> > >
> > > On Saturday, March 16, 2013, Michel Segel wrote:
> > >
> > >> I also want to add that you could add MD5 and SHA-1, but I'd check on
> us
> > >> laws... I think these are ok, however other encryption/decryption code
> > is
> > >> not.
> > >>
> > >> They are part of the std sun java libraries ...
> > >>
> > >> Sent from a remote device. Please excuse any typos...
> > >>
> > >> Mike Segel
> > >>
> > >> On Mar 16, 2013, at 7:18 AM, Michel Segel <mi...@hotmail.com>
> > >> wrote:
> > >>
> > >>> Isn't that what you get through add on frameworks like TSDB and Kiji
> ?
> > >> Maybe not on the client side, but frameworks that extend HBase...
> > >>>
> > >>> Sent from a remote device. Please excuse any typos...
> > >>>
> > >>> Mike Segel
> > >>>
> > >>> On Mar 16, 2013, at 12:45 AM, lars hofhansl <la...@apache.org>
> wrote:
> > >>>
> > >>>> I think generally we should keep HBase a byte[] based key value
> store.
> > >>>> What we should add to HBase are tools that would allow client side
> > apps
> > >> (or libraries) to built functionality on top of plain HBase.
> > >>>>
> > >>>> Serialization that maintains a correct semantic sort order is
> > important
> > >> as a building block, so is code that can build up correctly serialized
> > and
> > >> sortable compound keys, as well as hashing algorithms.
> > >>>>
> > >>>> Where I would draw the line is adding types to HBase itself. As long
> > as
> > >> one can write a client, or Filters, or Coprocessors with the tools
> > provided
> > >> by HBase we're good. Higher level functionality can then be built of
> on
> > top
> > >> of HBase.
> > >>>>
> > >>>>
> > >>>> For example, maybe we need to add better access API to the HBase WAL
> > in
> > >> order to have an external library implement idempotent transactions
> > (which
> > >> can be used to implement 2ndary indexes).
> > >>>> Maybe some other primitives have to be exposed in order to allow an
> > >> external library to implement full transactions.
> > >>>> Or we might need a statistics framework (such as the one that Jesse
> is
> > >> working on).
> > >>>>
> > >>>> These are all building blocks that do not presume specific access
> > >> patterns or clients, but can be used to implement them.
> > >>>>
> > >>>>
> > >>>> As usual, just my $0.02.
> > >>>>
> > >>>> -- Lars
> > >>>>
> > >>>>
> > >>>>
> > >>>> ________________________________
> > >>>> From: Nick Dimiduk <nd...@gmail.com>
> > >>>> To: user@hbase.apache.org
> > >>>> Sent: Friday, March 15, 2013 10:57 AM
> > >>>> Subject: Re: HBase type support
> > >>>>
> > >>>> I'm talking about MD5, SHA1, etc. It's something explicitly
> mentioned
> > >>>> in HBASE-7221.
> > >>>>
> > >>>> On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <
> > jtaylor@salesforce.com
> > >>> wrote:
> > >>>>
> > >>>>> Hi Nick,
> > >>>>> What do you mean by "hashing algorithms"?
> > >>>>> Thanks,
> > >>>>> James
> > >>>>>
> > >>>>>
> > >>>>> On 03/15/2013 10:11 AM, Nick Dimiduk wrote:
> > >>>>>
> > >>>>>> Hi David,
> > >>>>>>
> > >>>>>> Native support for a handful of hashing algorithms has also been
> > >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: HBase type support

Posted by Michael Segel <mi...@hotmail.com>.

Thanks for the clarification Doug. 

Back to my point, I was saying that MD5 and SHA-1 are already part of the Java package so if you're running Java 1.6_xx or Java 1.7_xx, you will have MD5 available.  So it could be a good thing. 


Murmur is released under MIT... Is there going to be a licensing issue? (Thinking back to the delay in getting Snappy.) Note: I don't know which is why I am asking so I don't want to be accused of FUD. 
:-P

On Mar 18, 2013, at 2:16 PM, Doug Meil <do...@explorysmedical.com> wrote:

> 
> Sorry I'm late to this thread but I was the guy behind HBASE-7221 and the
> algorithms specifically mentioned were MD5 and Murmur (not SHA-1).  And
> implementation of Murmur already exists in Hbase, and the MD5
> implementation was the one that ships with Java.
> 
> The intent was to include hashing appropriate for use with key
> distribution of rowkeys in tables as is often suggested on the dist-lists.
> SHA-1 is probably overkill for the rowkey case, but I wouldn't want to
> stop anybody from using SHA-1 if it was appropriate for their needs.
> 
> 
> 
> 
> 
> On 3/18/13 8:02 AM, "Michel Segel" <mi...@hotmail.com> wrote:
> 
>> Andrew, 
>> 
>> I was aware of you employer, which I am pretty sure that they have
>> already dealt with the issue of  exporting encryption software and
>> probably hardware too.
>> 
>> Neither of us are lawyers and what I do know of dealing with the
>> government bureaucracies, it's not always as simple of just filing the
>> correct paperwork. (Sometimes it is, sometimes not so much, YMMV...)
>> 
>> Putting the hooks for encryption is probably a good idea. Shipping the
>> encryption w the release or making it part of the official release, not
>> so much. Sorry, I'm being a bit conservative here.
>> 
>> IMHO I think fixing other issues would be of a higher priority, but
>> that's just me;-)
>> 
>> Sent from a remote device. Please excuse any typos...
>> 
>> Mike Segel
>> 
>> On Mar 17, 2013, at 12:12 PM, Andrew Purtell <ap...@apache.org> wrote:
>> 
>>>> This then leads to another question... suppose Apache does add
>>>> encryption
>>> to Hadoop. While the Apache organization does have the proper paperwork
>>> in
>>> place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel, etc
>>> ?
>>> 
>>> Well I can't put that question aside since you've brought it up now
>>> twice and encryption feature candidates for Apache Hadoop and Apache
>>> HBase
>>> are something I have been working on. Its a valid question but since as
>>> you
>>> admit you don't know what you are talking about, perhaps stating
>>> uninformed
>>> opinions can be avoided. Only the latter is what I object to. I think
>>> the
>>> short answer is as an Apache contributor I'm concerned about the Apache
>>> product. Downstream repackagers can take whatever action needed
>>> including
>>> changes, since it is open source, or feedback about it representing a
>>> hardship. At this point I have heard nothing like that. I work for Intel
>>> and can say we are good with it.
>>> 
>>> On Sunday, March 17, 2013, Michael Segel wrote:
>>> 
>>>> Its not a question of FUD, but that certain types of
>>>> encryption/decryption
>>>> code falls under the munitions act.
>>>> See: http://www.fas.org/irp/offdocs/eo_crypt_9611_memo.htm
>>>> 
>>>> Having said that, there is this:
>>>> http://www.bis.doc.gov/encryption/encfaqs6_17_02.html
>>>> 
>>>> In short, I don't as a habit export/import encryption technology so I
>>>> am
>>>> not up to speed on the current state of the laws.
>>>> Which is why I have to question the current state of the US encryption
>>>> laws.
>>>> 
>>>> This then leads to another question... suppose Apache does add
>>>> encryption
>>>> to Hadoop. While the Apache organization does have the proper
>>>> paperwork in
>>>> place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel,
>>>> etc ?
>>>> 
>>>> But lets put that question aside.
>>>> 
>>>> The point I was trying to make was that the core Sun JVM does support
>>>> MD5
>>>> and SHA-1 out of the box, so that anyone running Hadoop and using the
>>>> 1.6_xx or the 1.7_xx versions of the JVM will have these packages.
>>>> 
>>>> Adding hooks that use these classes are a no brainer.  However, beyond
>>>> this... you tell me.
>>>> 
>>>> -Mike
>>>> 
>>>> On Mar 16, 2013, at 7:59 AM, Andrew Purtell <ap...@apache.org>
>>>> wrote:
>>>> 
>>>>> The ASF avails itself of an exception to crypto export which only
>>>> requires
>>>>> a bit of PMC housekeeping at release time. So "is not [ok]" is FUD. I
>>>>> humbly request we refrain from FUD here. See
>>>>> http://www.apache.org/dev/crypto.html. To the best of our knowledge we
>>>>> expect this to continue, though the ASF has not updated this policy
>>>>> yet
>>>> for
>>>>> recent regulation updates.
>>>>> 
>>>>> On Saturday, March 16, 2013, Michel Segel wrote:
>>>>> 
>>>>>> I also want to add that you could add MD5 and SHA-1, but I'd check
>>>>>> on us
>>>>>> laws... I think these are ok, however other encryption/decryption
>>>>>> code
>>>> is
>>>>>> not.
>>>>>> 
>>>>>> They are part of the std sun java libraries ...
>>>>>> 
>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>> 
>>>>>> Mike Segel
>>>>>> 
>>>>>> On Mar 16, 2013, at 7:18 AM, Michel Segel <mi...@hotmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Isn't that what you get through add on frameworks like TSDB and
>>>>>>> Kiji ?
>>>>>> Maybe not on the client side, but frameworks that extend HBase...
>>>>>>> 
>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>> 
>>>>>>> Mike Segel
>>>>>>> 
>>>>>>> On Mar 16, 2013, at 12:45 AM, lars hofhansl <la...@apache.org>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> I think generally we should keep HBase a byte[] based key value
>>>>>>>> store.
>>>>>>>> What we should add to HBase are tools that would allow client side
>>>> apps
>>>>>> (or libraries) to built functionality on top of plain HBase.
>>>>>>>> 
>>>>>>>> Serialization that maintains a correct semantic sort order is
>>>> important
>>>>>> as a building block, so is code that can build up correctly
>>>>>> serialized
>>>> and
>>>>>> sortable compound keys, as well as hashing algorithms.
>>>>>>>> 
>>>>>>>> Where I would draw the line is adding types to HBase itself. As
>>>>>>>> long
>>>> as
>>>>>> one can write a client, or Filters, or Coprocessors with the tools
>>>> provided
>>>>>> by HBase we're good. Higher level functionality can then be built of
>>>>>> on
>>>> top
>>>>>> of HBase.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> For example, maybe we need to add better access API to the HBase
>>>>>>>> WAL
>>>> in
>>>>>> order to have an external library implement idempotent transactions
>>>> (which
>>>>>> can be used to implement 2ndary indexes).
>>>>>>>> Maybe some other primitives have to be exposed in order to allow an
>>>>>> external library to implement full transactions.
>>>>>>>> Or we might need a statistics framework (such as the one that
>>>>>>>> Jesse is
>>>>>> working on).
>>>>>>>> 
>>>>>>>> These are all building blocks that do not presume specific access
>>>>>> patterns or clients, but can be used to implement them.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> As usual, just my $0.02.
>>>>>>>> 
>>>>>>>> -- Lars
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ________________________________
>>>>>>>> From: Nick Dimiduk <nd...@gmail.com>
>>>>>>>> To: user@hbase.apache.org
>>>>>>>> Sent: Friday, March 15, 2013 10:57 AM
>>>>>>>> Subject: Re: HBase type support
>>>>>>>> 
>>>>>>>> I'm talking about MD5, SHA1, etc. It's something explicitly
>>>>>>>> mentioned
>>>>>>>> in HBASE-7221.
>>>>>>>> 
>>>>>>>> On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <
>>>> jtaylor@salesforce.com
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Nick,
>>>>>>>>> What do you mean by "hashing algorithms"?
>>>>>>>>> Thanks,
>>>>>>>>> James
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 03/15/2013 10:11 AM, Nick Dimiduk wrote:
>>>>>>>>> 
>>>>>>>>>> Hi David,
>>>>>>>>>> 
>>>>>>>>>> Native support for a handful of hashing algorithms has also been
>>> 
>>> 
>>> 
>>> -- 
>>> Best regards,
>>> 
>>>  - Andy
>>> 
>>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>>> (via Tom White)
>> 
> 
> 
> 
>

Re: HBase type support

Posted by Doug Meil <do...@explorysmedical.com>.

Sorry I'm late to this thread but I was the guy behind HBASE-7221 and the
algorithms specifically mentioned were MD5 and Murmur (not SHA-1).  And
implementation of Murmur already exists in Hbase, and the MD5
implementation was the one that ships with Java.

The intent was to include hashing appropriate for use with key
distribution of rowkeys in tables as is often suggested on the dist-lists.
 SHA-1 is probably overkill for the rowkey case, but I wouldn't want to
stop anybody from using SHA-1 if it was appropriate for their needs.





On 3/18/13 8:02 AM, "Michel Segel" <mi...@hotmail.com> wrote:

>Andrew, 
>
>I was aware of you employer, which I am pretty sure that they have
>already dealt with the issue of  exporting encryption software and
>probably hardware too.
>
>Neither of us are lawyers and what I do know of dealing with the
>government bureaucracies, it's not always as simple of just filing the
>correct paperwork. (Sometimes it is, sometimes not so much, YMMV...)
>
>Putting the hooks for encryption is probably a good idea. Shipping the
>encryption w the release or making it part of the official release, not
>so much. Sorry, I'm being a bit conservative here.
>
>IMHO I think fixing other issues would be of a higher priority, but
>that's just me;-)
>
>Sent from a remote device. Please excuse any typos...
>
>Mike Segel
>
>On Mar 17, 2013, at 12:12 PM, Andrew Purtell <ap...@apache.org> wrote:
>
>>> This then leads to another question... suppose Apache does add
>>>encryption
>> to Hadoop. While the Apache organization does have the proper paperwork
>>in
>> place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel, etc
>>?
>> 
>> Well I can't put that question aside since you've brought it up now
>> twice and encryption feature candidates for Apache Hadoop and Apache
>>HBase
>> are something I have been working on. Its a valid question but since as
>>you
>> admit you don't know what you are talking about, perhaps stating
>>uninformed
>> opinions can be avoided. Only the latter is what I object to. I think
>>the
>> short answer is as an Apache contributor I'm concerned about the Apache
>> product. Downstream repackagers can take whatever action needed
>>including
>> changes, since it is open source, or feedback about it representing a
>> hardship. At this point I have heard nothing like that. I work for Intel
>> and can say we are good with it.
>> 
>> On Sunday, March 17, 2013, Michael Segel wrote:
>> 
>>> Its not a question of FUD, but that certain types of
>>>encryption/decryption
>>> code falls under the munitions act.
>>> See: http://www.fas.org/irp/offdocs/eo_crypt_9611_memo.htm
>>> 
>>> Having said that, there is this:
>>> http://www.bis.doc.gov/encryption/encfaqs6_17_02.html
>>> 
>>> In short, I don't as a habit export/import encryption technology so I
>>>am
>>> not up to speed on the current state of the laws.
>>> Which is why I have to question the current state of the US encryption
>>> laws.
>>> 
>>> This then leads to another question... suppose Apache does add
>>>encryption
>>> to Hadoop. While the Apache organization does have the proper
>>>paperwork in
>>> place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel,
>>>etc ?
>>> 
>>> But lets put that question aside.
>>> 
>>> The point I was trying to make was that the core Sun JVM does support
>>>MD5
>>> and SHA-1 out of the box, so that anyone running Hadoop and using the
>>> 1.6_xx or the 1.7_xx versions of the JVM will have these packages.
>>> 
>>> Adding hooks that use these classes are a no brainer.  However, beyond
>>> this... you tell me.
>>> 
>>> -Mike
>>> 
>>> On Mar 16, 2013, at 7:59 AM, Andrew Purtell <ap...@apache.org>
>>>wrote:
>>> 
>>>> The ASF avails itself of an exception to crypto export which only
>>> requires
>>>> a bit of PMC housekeeping at release time. So "is not [ok]" is FUD. I
>>>> humbly request we refrain from FUD here. See
>>>> http://www.apache.org/dev/crypto.html. To the best of our knowledge we
>>>> expect this to continue, though the ASF has not updated this policy
>>>>yet
>>> for
>>>> recent regulation updates.
>>>> 
>>>> On Saturday, March 16, 2013, Michel Segel wrote:
>>>> 
>>>>> I also want to add that you could add MD5 and SHA-1, but I'd check
>>>>>on us
>>>>> laws... I think these are ok, however other encryption/decryption
>>>>>code
>>> is
>>>>> not.
>>>>> 
>>>>> They are part of the std sun java libraries ...
>>>>> 
>>>>> Sent from a remote device. Please excuse any typos...
>>>>> 
>>>>> Mike Segel
>>>>> 
>>>>> On Mar 16, 2013, at 7:18 AM, Michel Segel <mi...@hotmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Isn't that what you get through add on frameworks like TSDB and
>>>>>>Kiji ?
>>>>> Maybe not on the client side, but frameworks that extend HBase...
>>>>>> 
>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>> 
>>>>>> Mike Segel
>>>>>> 
>>>>>> On Mar 16, 2013, at 12:45 AM, lars hofhansl <la...@apache.org>
>>>>>>wrote:
>>>>>> 
>>>>>>> I think generally we should keep HBase a byte[] based key value
>>>>>>>store.
>>>>>>> What we should add to HBase are tools that would allow client side
>>> apps
>>>>> (or libraries) to built functionality on top of plain HBase.
>>>>>>> 
>>>>>>> Serialization that maintains a correct semantic sort order is
>>> important
>>>>> as a building block, so is code that can build up correctly
>>>>>serialized
>>> and
>>>>> sortable compound keys, as well as hashing algorithms.
>>>>>>> 
>>>>>>> Where I would draw the line is adding types to HBase itself. As
>>>>>>>long
>>> as
>>>>> one can write a client, or Filters, or Coprocessors with the tools
>>> provided
>>>>> by HBase we're good. Higher level functionality can then be built of
>>>>>on
>>> top
>>>>> of HBase.
>>>>>>> 
>>>>>>> 
>>>>>>> For example, maybe we need to add better access API to the HBase
>>>>>>>WAL
>>> in
>>>>> order to have an external library implement idempotent transactions
>>> (which
>>>>> can be used to implement 2ndary indexes).
>>>>>>> Maybe some other primitives have to be exposed in order to allow an
>>>>> external library to implement full transactions.
>>>>>>> Or we might need a statistics framework (such as the one that
>>>>>>>Jesse is
>>>>> working on).
>>>>>>> 
>>>>>>> These are all building blocks that do not presume specific access
>>>>> patterns or clients, but can be used to implement them.
>>>>>>> 
>>>>>>> 
>>>>>>> As usual, just my $0.02.
>>>>>>> 
>>>>>>> -- Lars
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> ________________________________
>>>>>>> From: Nick Dimiduk <nd...@gmail.com>
>>>>>>> To: user@hbase.apache.org
>>>>>>> Sent: Friday, March 15, 2013 10:57 AM
>>>>>>> Subject: Re: HBase type support
>>>>>>> 
>>>>>>> I'm talking about MD5, SHA1, etc. It's something explicitly
>>>>>>>mentioned
>>>>>>> in HBASE-7221.
>>>>>>> 
>>>>>>> On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <
>>> jtaylor@salesforce.com
>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi Nick,
>>>>>>>> What do you mean by "hashing algorithms"?
>>>>>>>> Thanks,
>>>>>>>> James
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 03/15/2013 10:11 AM, Nick Dimiduk wrote:
>>>>>>>> 
>>>>>>>>> Hi David,
>>>>>>>>> 
>>>>>>>>> Native support for a handful of hashing algorithms has also been
>> 
>> 
>> 
>> -- 
>> Best regards,
>> 
>>   - Andy
>> 
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)
>

Re: HBase type support

Posted by Michel Segel <mi...@hotmail.com>.

Andrew, 

I was aware of you employer, which I am pretty sure that they have already dealt with the issue of  exporting encryption software and probably hardware too.

Neither of us are lawyers and what I do know of dealing with the government bureaucracies, it's not always as simple of just filing the correct paperwork. (Sometimes it is, sometimes not so much, YMMV...)

Putting the hooks for encryption is probably a good idea. Shipping the encryption w the release or making it part of the official release, not so much. Sorry, I'm being a bit conservative here. 

IMHO I think fixing other issues would be of a higher priority, but that's just me;-)

Sent from a remote device. Please excuse any typos...

Mike Segel

On Mar 17, 2013, at 12:12 PM, Andrew Purtell <ap...@apache.org> wrote:

>> This then leads to another question... suppose Apache does add encryption
> to Hadoop. While the Apache organization does have the proper paperwork in
> place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel, etc ?
> 
> Well I can't put that question aside since you've brought it up now
> twice and encryption feature candidates for Apache Hadoop and Apache HBase
> are something I have been working on. Its a valid question but since as you
> admit you don't know what you are talking about, perhaps stating uninformed
> opinions can be avoided. Only the latter is what I object to. I think the
> short answer is as an Apache contributor I'm concerned about the Apache
> product. Downstream repackagers can take whatever action needed including
> changes, since it is open source, or feedback about it representing a
> hardship. At this point I have heard nothing like that. I work for Intel
> and can say we are good with it.
> 
> On Sunday, March 17, 2013, Michael Segel wrote:
> 
>> Its not a question of FUD, but that certain types of encryption/decryption
>> code falls under the munitions act.
>> See: http://www.fas.org/irp/offdocs/eo_crypt_9611_memo.htm
>> 
>> Having said that, there is this:
>> http://www.bis.doc.gov/encryption/encfaqs6_17_02.html
>> 
>> In short, I don't as a habit export/import encryption technology so I am
>> not up to speed on the current state of the laws.
>> Which is why I have to question the current state of the US encryption
>> laws.
>> 
>> This then leads to another question... suppose Apache does add encryption
>> to Hadoop. While the Apache organization does have the proper paperwork in
>> place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel, etc ?
>> 
>> But lets put that question aside.
>> 
>> The point I was trying to make was that the core Sun JVM does support MD5
>> and SHA-1 out of the box, so that anyone running Hadoop and using the
>> 1.6_xx or the 1.7_xx versions of the JVM will have these packages.
>> 
>> Adding hooks that use these classes are a no brainer.  However, beyond
>> this... you tell me.
>> 
>> -Mike
>> 
>> On Mar 16, 2013, at 7:59 AM, Andrew Purtell <ap...@apache.org> wrote:
>> 
>>> The ASF avails itself of an exception to crypto export which only
>> requires
>>> a bit of PMC housekeeping at release time. So "is not [ok]" is FUD. I
>>> humbly request we refrain from FUD here. See
>>> http://www.apache.org/dev/crypto.html. To the best of our knowledge we
>>> expect this to continue, though the ASF has not updated this policy yet
>> for
>>> recent regulation updates.
>>> 
>>> On Saturday, March 16, 2013, Michel Segel wrote:
>>> 
>>>> I also want to add that you could add MD5 and SHA-1, but I'd check on us
>>>> laws... I think these are ok, however other encryption/decryption code
>> is
>>>> not.
>>>> 
>>>> They are part of the std sun java libraries ...
>>>> 
>>>> Sent from a remote device. Please excuse any typos...
>>>> 
>>>> Mike Segel
>>>> 
>>>> On Mar 16, 2013, at 7:18 AM, Michel Segel <mi...@hotmail.com>
>>>> wrote:
>>>> 
>>>>> Isn't that what you get through add on frameworks like TSDB and Kiji ?
>>>> Maybe not on the client side, but frameworks that extend HBase...
>>>>> 
>>>>> Sent from a remote device. Please excuse any typos...
>>>>> 
>>>>> Mike Segel
>>>>> 
>>>>> On Mar 16, 2013, at 12:45 AM, lars hofhansl <la...@apache.org> wrote:
>>>>> 
>>>>>> I think generally we should keep HBase a byte[] based key value store.
>>>>>> What we should add to HBase are tools that would allow client side
>> apps
>>>> (or libraries) to built functionality on top of plain HBase.
>>>>>> 
>>>>>> Serialization that maintains a correct semantic sort order is
>> important
>>>> as a building block, so is code that can build up correctly serialized
>> and
>>>> sortable compound keys, as well as hashing algorithms.
>>>>>> 
>>>>>> Where I would draw the line is adding types to HBase itself. As long
>> as
>>>> one can write a client, or Filters, or Coprocessors with the tools
>> provided
>>>> by HBase we're good. Higher level functionality can then be built of on
>> top
>>>> of HBase.
>>>>>> 
>>>>>> 
>>>>>> For example, maybe we need to add better access API to the HBase WAL
>> in
>>>> order to have an external library implement idempotent transactions
>> (which
>>>> can be used to implement 2ndary indexes).
>>>>>> Maybe some other primitives have to be exposed in order to allow an
>>>> external library to implement full transactions.
>>>>>> Or we might need a statistics framework (such as the one that Jesse is
>>>> working on).
>>>>>> 
>>>>>> These are all building blocks that do not presume specific access
>>>> patterns or clients, but can be used to implement them.
>>>>>> 
>>>>>> 
>>>>>> As usual, just my $0.02.
>>>>>> 
>>>>>> -- Lars
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ________________________________
>>>>>> From: Nick Dimiduk <nd...@gmail.com>
>>>>>> To: user@hbase.apache.org
>>>>>> Sent: Friday, March 15, 2013 10:57 AM
>>>>>> Subject: Re: HBase type support
>>>>>> 
>>>>>> I'm talking about MD5, SHA1, etc. It's something explicitly mentioned
>>>>>> in HBASE-7221.
>>>>>> 
>>>>>> On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <
>> jtaylor@salesforce.com
>>>>> wrote:
>>>>>> 
>>>>>>> Hi Nick,
>>>>>>> What do you mean by "hashing algorithms"?
>>>>>>> Thanks,
>>>>>>> James
>>>>>>> 
>>>>>>> 
>>>>>>> On 03/15/2013 10:11 AM, Nick Dimiduk wrote:
>>>>>>> 
>>>>>>>> Hi David,
>>>>>>>> 
>>>>>>>> Native support for a handful of hashing algorithms has also been
> 
> 
> 
> -- 
> Best regards,
> 
>   - Andy
> 
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)

Re: HBase type support

Posted by Michel Segel <mi...@hotmail.com>.

If we look at TSDB, Kiji, Asynch HBase, it looks like extensions to HBase already exist.

I haven't looked at Salesforce,com's SQL interface, but I suspect that they too have some sort of framework where they have to enforce typing.


Sent from a remote device. Please excuse any typos...

Mike Segel

On Mar 17, 2013, at 10:01 PM, ramkrishna vasudevan <ra...@gmail.com> wrote:

> HBase shipping a generic framework for different interfaces is needed for
> ease of use for the users.  +1 on the idea.
> Getting out the correct result for float values, positive and negative
> integers had to be taken care by the users or by using some wrappers.
> 
> This will help to solve that problem to a great extent.
> 
> Regards
> Ram
> 
> On Sun, Mar 17, 2013 at 10:54 PM, Mohamed Ibrahim <mi...@mibrahim.net>wrote:
> 
>> I'm not a lawyer, but I think we're ok as long as it's in source code as
>> that is protected under freedom of speech in the US. See here (
>> http://en.wikipedia.org/wiki/Cryptography ) under Export Control, the part
>> related to Bernstein v. United States . I don't know about binaries like
>> deb, but I can tell that we download binaries for browsers every day and
>> they use encryption in lots of places. I believe if there's any real issues
>> it would have surfaced up by now.
>> 
>> As far as types in HBase, I think it is an excellent idea. I would suggest
>> to enable us to add a custom type, just like we can add our custom filters.
>> Some types that I had to code myself include CSV. There can be other custom
>> types that I need in the future, may be json, so the ability to add a
>> custom type might be a good feature.
>> 
>> Thanks,
>> Mohamed
>> 
>> 
>> On Sun, Mar 17, 2013 at 1:12 PM, Andrew Purtell <ap...@apache.org>
>> wrote:
>> 
>>>> This then leads to another question... suppose Apache does add
>> encryption
>>> to Hadoop. While the Apache organization does have the proper paperwork
>> in
>>> place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel, etc ?
>>> 
>>> Well I can't put that question aside since you've brought it up now
>>> twice and encryption feature candidates for Apache Hadoop and Apache
>> HBase
>>> are something I have been working on. Its a valid question but since as
>> you
>>> admit you don't know what you are talking about, perhaps stating
>> uninformed
>>> opinions can be avoided. Only the latter is what I object to. I think the
>>> short answer is as an Apache contributor I'm concerned about the Apache
>>> product. Downstream repackagers can take whatever action needed including
>>> changes, since it is open source, or feedback about it representing a
>>> hardship. At this point I have heard nothing like that. I work for Intel
>>> and can say we are good with it.
>>> 
>>> On Sunday, March 17, 2013, Michael Segel wrote:
>>> 
>>>> Its not a question of FUD, but that certain types of
>>> encryption/decryption
>>>> code falls under the munitions act.
>>>> See: http://www.fas.org/irp/offdocs/eo_crypt_9611_memo.htm
>>>> 
>>>> Having said that, there is this:
>>>> http://www.bis.doc.gov/encryption/encfaqs6_17_02.html
>>>> 
>>>> In short, I don't as a habit export/import encryption technology so I
>> am
>>>> not up to speed on the current state of the laws.
>>>> Which is why I have to question the current state of the US encryption
>>>> laws.
>>>> 
>>>> This then leads to another question... suppose Apache does add
>> encryption
>>>> to Hadoop. While the Apache organization does have the proper paperwork
>>> in
>>>> place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel,
>> etc ?
>>>> 
>>>> But lets put that question aside.
>>>> 
>>>> The point I was trying to make was that the core Sun JVM does support
>> MD5
>>>> and SHA-1 out of the box, so that anyone running Hadoop and using the
>>>> 1.6_xx or the 1.7_xx versions of the JVM will have these packages.
>>>> 
>>>> Adding hooks that use these classes are a no brainer.  However, beyond
>>>> this... you tell me.
>>>> 
>>>> -Mike
>>>> 
>>>> On Mar 16, 2013, at 7:59 AM, Andrew Purtell <ap...@apache.org>
>> wrote:
>>>> 
>>>>> The ASF avails itself of an exception to crypto export which only
>>>> requires
>>>>> a bit of PMC housekeeping at release time. So "is not [ok]" is FUD. I
>>>>> humbly request we refrain from FUD here. See
>>>>> http://www.apache.org/dev/crypto.html. To the best of our knowledge
>> we
>>>>> expect this to continue, though the ASF has not updated this policy
>> yet
>>>> for
>>>>> recent regulation updates.
>>>>> 
>>>>> On Saturday, March 16, 2013, Michel Segel wrote:
>>>>> 
>>>>>> I also want to add that you could add MD5 and SHA-1, but I'd check
>> on
>>> us
>>>>>> laws... I think these are ok, however other encryption/decryption
>> code
>>>> is
>>>>>> not.
>>>>>> 
>>>>>> They are part of the std sun java libraries ...
>>>>>> 
>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>> 
>>>>>> Mike Segel
>>>>>> 
>>>>>> On Mar 16, 2013, at 7:18 AM, Michel Segel <
>> michael_segel@hotmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Isn't that what you get through add on frameworks like TSDB and
>> Kiji
>>> ?
>>>>>> Maybe not on the client side, but frameworks that extend HBase...
>>>>>>> 
>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>> 
>>>>>>> Mike Segel
>>>>>>> 
>>>>>>> On Mar 16, 2013, at 12:45 AM, lars hofhansl <la...@apache.org>
>>> wrote:
>>>>>>> 
>>>>>>>> I think generally we should keep HBase a byte[] based key value
>>> store.
>>>>>>>> What we should add to HBase are tools that would allow client side
>>>> apps
>>>>>> (or libraries) to built functionality on top of plain HBase.
>>>>>>>> 
>>>>>>>> Serialization that maintains a correct semantic sort order is
>>>> important
>>>>>> as a building block, so is code that can build up correctly
>> serialized
>>>> and
>>>>>> sortable compound keys, as well as hashing algorithms.
>>>>>>>> 
>>>>>>>> Where I would draw the line is adding types to HBase itself. As
>> long
>>>> as
>>>>>> one can write a client, or Filters, or Coprocessors with the tools
>>>> provided
>>>>>> by HBase we're good. Higher level functionality can then be built of
>>> on
>>>> top
>>>>>> of HBase.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> For example, maybe we need to add better access API to the HBase
>> WAL
>>>> in
>>>>>> order to have an external library implement idempotent transactions
>>>> (which
>>>>>> can be used to implement 2ndary indexes).
>>>>>>>> Maybe some other primitives have to be exposed in order to allow
>> an
>>>>>> external library to implement full transactions.
>>>>>>>> Or we might need a statistics framework (such as the one that
>> Jesse
>>> is
>>>>>> working on).
>>>>>>>> 
>>>>>>>> These are all building blocks that do not presume specific access
>>>>>> patterns or clients, but can be used to implement them.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> As usual, just my $0.02.
>>>>>>>> 
>>>>>>>> -- Lars
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ________________________________
>>>>>>>> From: Nick Dimiduk <nd...@gmail.com>
>>>>>>>> To: user@hbase.apache.org
>>>>>>>> Sent: Friday, March 15, 2013 10:57 AM
>>>>>>>> Subject: Re: HBase type support
>>>>>>>> 
>>>>>>>> I'm talking about MD5, SHA1, etc. It's something explicitly
>>> mentioned
>>>>>>>> in HBASE-7221.
>>>>>>>> 
>>>>>>>> On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <
>>>> jtaylor@salesforce.com
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Nick,
>>>>>>>>> What do you mean by "hashing algorithms"?
>>>>>>>>> Thanks,
>>>>>>>>> James
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 03/15/2013 10:11 AM, Nick Dimiduk wrote:
>>>>>>>>> 
>>>>>>>>>> Hi David,
>>>>>>>>>> 
>>>>>>>>>> Native support for a handful of hashing algorithms has also been
>>> 
>>> 
>>> 
>>> --
>>> Best regards,
>>> 
>>>   - Andy
>>> 
>>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>>> (via Tom White)
>>

Re: HBase type support

Posted by ramkrishna vasudevan <ra...@gmail.com>.

HBase shipping a generic framework for different interfaces is needed for
ease of use for the users.  +1 on the idea.
Getting out the correct result for float values, positive and negative
integers had to be taken care by the users or by using some wrappers.

This will help to solve that problem to a great extent.

Regards
Ram

On Sun, Mar 17, 2013 at 10:54 PM, Mohamed Ibrahim <mi...@mibrahim.net>wrote:

> I'm not a lawyer, but I think we're ok as long as it's in source code as
> that is protected under freedom of speech in the US. See here (
> http://en.wikipedia.org/wiki/Cryptography ) under Export Control, the part
> related to Bernstein v. United States . I don't know about binaries like
> deb, but I can tell that we download binaries for browsers every day and
> they use encryption in lots of places. I believe if there's any real issues
> it would have surfaced up by now.
>
> As far as types in HBase, I think it is an excellent idea. I would suggest
> to enable us to add a custom type, just like we can add our custom filters.
> Some types that I had to code myself include CSV. There can be other custom
> types that I need in the future, may be json, so the ability to add a
> custom type might be a good feature.
>
> Thanks,
> Mohamed
>
>
> On Sun, Mar 17, 2013 at 1:12 PM, Andrew Purtell <ap...@apache.org>
> wrote:
>
> > > This then leads to another question... suppose Apache does add
> encryption
> > to Hadoop. While the Apache organization does have the proper paperwork
> in
> > place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel, etc ?
> >
> > Well I can't put that question aside since you've brought it up now
> > twice and encryption feature candidates for Apache Hadoop and Apache
> HBase
> > are something I have been working on. Its a valid question but since as
> you
> > admit you don't know what you are talking about, perhaps stating
> uninformed
> > opinions can be avoided. Only the latter is what I object to. I think the
> > short answer is as an Apache contributor I'm concerned about the Apache
> > product. Downstream repackagers can take whatever action needed including
> > changes, since it is open source, or feedback about it representing a
> > hardship. At this point I have heard nothing like that. I work for Intel
> > and can say we are good with it.
> >
> > On Sunday, March 17, 2013, Michael Segel wrote:
> >
> > > Its not a question of FUD, but that certain types of
> > encryption/decryption
> > > code falls under the munitions act.
> > > See: http://www.fas.org/irp/offdocs/eo_crypt_9611_memo.htm
> > >
> > > Having said that, there is this:
> > > http://www.bis.doc.gov/encryption/encfaqs6_17_02.html
> > >
> > > In short, I don't as a habit export/import encryption technology so I
> am
> > > not up to speed on the current state of the laws.
> > > Which is why I have to question the current state of the US encryption
> > > laws.
> > >
> > > This then leads to another question... suppose Apache does add
> encryption
> > > to Hadoop. While the Apache organization does have the proper paperwork
> > in
> > > place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel,
> etc ?
> > >
> > > But lets put that question aside.
> > >
> > > The point I was trying to make was that the core Sun JVM does support
> MD5
> > > and SHA-1 out of the box, so that anyone running Hadoop and using the
> > > 1.6_xx or the 1.7_xx versions of the JVM will have these packages.
> > >
> > > Adding hooks that use these classes are a no brainer.  However, beyond
> > > this... you tell me.
> > >
> > > -Mike
> > >
> > > On Mar 16, 2013, at 7:59 AM, Andrew Purtell <ap...@apache.org>
> wrote:
> > >
> > > > The ASF avails itself of an exception to crypto export which only
> > > requires
> > > > a bit of PMC housekeeping at release time. So "is not [ok]" is FUD. I
> > > > humbly request we refrain from FUD here. See
> > > > http://www.apache.org/dev/crypto.html. To the best of our knowledge
> we
> > > > expect this to continue, though the ASF has not updated this policy
> yet
> > > for
> > > > recent regulation updates.
> > > >
> > > > On Saturday, March 16, 2013, Michel Segel wrote:
> > > >
> > > >> I also want to add that you could add MD5 and SHA-1, but I'd check
> on
> > us
> > > >> laws... I think these are ok, however other encryption/decryption
> code
> > > is
> > > >> not.
> > > >>
> > > >> They are part of the std sun java libraries ...
> > > >>
> > > >> Sent from a remote device. Please excuse any typos...
> > > >>
> > > >> Mike Segel
> > > >>
> > > >> On Mar 16, 2013, at 7:18 AM, Michel Segel <
> michael_segel@hotmail.com>
> > > >> wrote:
> > > >>
> > > >>> Isn't that what you get through add on frameworks like TSDB and
> Kiji
> > ?
> > > >> Maybe not on the client side, but frameworks that extend HBase...
> > > >>>
> > > >>> Sent from a remote device. Please excuse any typos...
> > > >>>
> > > >>> Mike Segel
> > > >>>
> > > >>> On Mar 16, 2013, at 12:45 AM, lars hofhansl <la...@apache.org>
> > wrote:
> > > >>>
> > > >>>> I think generally we should keep HBase a byte[] based key value
> > store.
> > > >>>> What we should add to HBase are tools that would allow client side
> > > apps
> > > >> (or libraries) to built functionality on top of plain HBase.
> > > >>>>
> > > >>>> Serialization that maintains a correct semantic sort order is
> > > important
> > > >> as a building block, so is code that can build up correctly
> serialized
> > > and
> > > >> sortable compound keys, as well as hashing algorithms.
> > > >>>>
> > > >>>> Where I would draw the line is adding types to HBase itself. As
> long
> > > as
> > > >> one can write a client, or Filters, or Coprocessors with the tools
> > > provided
> > > >> by HBase we're good. Higher level functionality can then be built of
> > on
> > > top
> > > >> of HBase.
> > > >>>>
> > > >>>>
> > > >>>> For example, maybe we need to add better access API to the HBase
> WAL
> > > in
> > > >> order to have an external library implement idempotent transactions
> > > (which
> > > >> can be used to implement 2ndary indexes).
> > > >>>> Maybe some other primitives have to be exposed in order to allow
> an
> > > >> external library to implement full transactions.
> > > >>>> Or we might need a statistics framework (such as the one that
> Jesse
> > is
> > > >> working on).
> > > >>>>
> > > >>>> These are all building blocks that do not presume specific access
> > > >> patterns or clients, but can be used to implement them.
> > > >>>>
> > > >>>>
> > > >>>> As usual, just my $0.02.
> > > >>>>
> > > >>>> -- Lars
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> ________________________________
> > > >>>> From: Nick Dimiduk <nd...@gmail.com>
> > > >>>> To: user@hbase.apache.org
> > > >>>> Sent: Friday, March 15, 2013 10:57 AM
> > > >>>> Subject: Re: HBase type support
> > > >>>>
> > > >>>> I'm talking about MD5, SHA1, etc. It's something explicitly
> > mentioned
> > > >>>> in HBASE-7221.
> > > >>>>
> > > >>>> On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <
> > > jtaylor@salesforce.com
> > > >>> wrote:
> > > >>>>
> > > >>>>> Hi Nick,
> > > >>>>> What do you mean by "hashing algorithms"?
> > > >>>>> Thanks,
> > > >>>>> James
> > > >>>>>
> > > >>>>>
> > > >>>>> On 03/15/2013 10:11 AM, Nick Dimiduk wrote:
> > > >>>>>
> > > >>>>>> Hi David,
> > > >>>>>>
> > > >>>>>> Native support for a handful of hashing algorithms has also been
> > > >
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>

Re: HBase type support

Posted by Mohamed Ibrahim <mi...@mibrahim.net>.

I'm not a lawyer, but I think we're ok as long as it's in source code as
that is protected under freedom of speech in the US. See here (
http://en.wikipedia.org/wiki/Cryptography ) under Export Control, the part
related to Bernstein v. United States . I don't know about binaries like
deb, but I can tell that we download binaries for browsers every day and
they use encryption in lots of places. I believe if there's any real issues
it would have surfaced up by now.

As far as types in HBase, I think it is an excellent idea. I would suggest
to enable us to add a custom type, just like we can add our custom filters.
Some types that I had to code myself include CSV. There can be other custom
types that I need in the future, may be json, so the ability to add a
custom type might be a good feature.

Thanks,
Mohamed


On Sun, Mar 17, 2013 at 1:12 PM, Andrew Purtell <ap...@apache.org> wrote:

> > This then leads to another question... suppose Apache does add encryption
> to Hadoop. While the Apache organization does have the proper paperwork in
> place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel, etc ?
>
> Well I can't put that question aside since you've brought it up now
> twice and encryption feature candidates for Apache Hadoop and Apache HBase
> are something I have been working on. Its a valid question but since as you
> admit you don't know what you are talking about, perhaps stating uninformed
> opinions can be avoided. Only the latter is what I object to. I think the
> short answer is as an Apache contributor I'm concerned about the Apache
> product. Downstream repackagers can take whatever action needed including
> changes, since it is open source, or feedback about it representing a
> hardship. At this point I have heard nothing like that. I work for Intel
> and can say we are good with it.
>
> On Sunday, March 17, 2013, Michael Segel wrote:
>
> > Its not a question of FUD, but that certain types of
> encryption/decryption
> > code falls under the munitions act.
> > See: http://www.fas.org/irp/offdocs/eo_crypt_9611_memo.htm
> >
> > Having said that, there is this:
> > http://www.bis.doc.gov/encryption/encfaqs6_17_02.html
> >
> > In short, I don't as a habit export/import encryption technology so I am
> > not up to speed on the current state of the laws.
> > Which is why I have to question the current state of the US encryption
> > laws.
> >
> > This then leads to another question... suppose Apache does add encryption
> > to Hadoop. While the Apache organization does have the proper paperwork
> in
> > place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel, etc ?
> >
> > But lets put that question aside.
> >
> > The point I was trying to make was that the core Sun JVM does support MD5
> > and SHA-1 out of the box, so that anyone running Hadoop and using the
> > 1.6_xx or the 1.7_xx versions of the JVM will have these packages.
> >
> > Adding hooks that use these classes are a no brainer.  However, beyond
> > this... you tell me.
> >
> > -Mike
> >
> > On Mar 16, 2013, at 7:59 AM, Andrew Purtell <ap...@apache.org> wrote:
> >
> > > The ASF avails itself of an exception to crypto export which only
> > requires
> > > a bit of PMC housekeeping at release time. So "is not [ok]" is FUD. I
> > > humbly request we refrain from FUD here. See
> > > http://www.apache.org/dev/crypto.html. To the best of our knowledge we
> > > expect this to continue, though the ASF has not updated this policy yet
> > for
> > > recent regulation updates.
> > >
> > > On Saturday, March 16, 2013, Michel Segel wrote:
> > >
> > >> I also want to add that you could add MD5 and SHA-1, but I'd check on
> us
> > >> laws... I think these are ok, however other encryption/decryption code
> > is
> > >> not.
> > >>
> > >> They are part of the std sun java libraries ...
> > >>
> > >> Sent from a remote device. Please excuse any typos...
> > >>
> > >> Mike Segel
> > >>
> > >> On Mar 16, 2013, at 7:18 AM, Michel Segel <mi...@hotmail.com>
> > >> wrote:
> > >>
> > >>> Isn't that what you get through add on frameworks like TSDB and Kiji
> ?
> > >> Maybe not on the client side, but frameworks that extend HBase...
> > >>>
> > >>> Sent from a remote device. Please excuse any typos...
> > >>>
> > >>> Mike Segel
> > >>>
> > >>> On Mar 16, 2013, at 12:45 AM, lars hofhansl <la...@apache.org>
> wrote:
> > >>>
> > >>>> I think generally we should keep HBase a byte[] based key value
> store.
> > >>>> What we should add to HBase are tools that would allow client side
> > apps
> > >> (or libraries) to built functionality on top of plain HBase.
> > >>>>
> > >>>> Serialization that maintains a correct semantic sort order is
> > important
> > >> as a building block, so is code that can build up correctly serialized
> > and
> > >> sortable compound keys, as well as hashing algorithms.
> > >>>>
> > >>>> Where I would draw the line is adding types to HBase itself. As long
> > as
> > >> one can write a client, or Filters, or Coprocessors with the tools
> > provided
> > >> by HBase we're good. Higher level functionality can then be built of
> on
> > top
> > >> of HBase.
> > >>>>
> > >>>>
> > >>>> For example, maybe we need to add better access API to the HBase WAL
> > in
> > >> order to have an external library implement idempotent transactions
> > (which
> > >> can be used to implement 2ndary indexes).
> > >>>> Maybe some other primitives have to be exposed in order to allow an
> > >> external library to implement full transactions.
> > >>>> Or we might need a statistics framework (such as the one that Jesse
> is
> > >> working on).
> > >>>>
> > >>>> These are all building blocks that do not presume specific access
> > >> patterns or clients, but can be used to implement them.
> > >>>>
> > >>>>
> > >>>> As usual, just my $0.02.
> > >>>>
> > >>>> -- Lars
> > >>>>
> > >>>>
> > >>>>
> > >>>> ________________________________
> > >>>> From: Nick Dimiduk <nd...@gmail.com>
> > >>>> To: user@hbase.apache.org
> > >>>> Sent: Friday, March 15, 2013 10:57 AM
> > >>>> Subject: Re: HBase type support
> > >>>>
> > >>>> I'm talking about MD5, SHA1, etc. It's something explicitly
> mentioned
> > >>>> in HBASE-7221.
> > >>>>
> > >>>> On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <
> > jtaylor@salesforce.com
> > >>> wrote:
> > >>>>
> > >>>>> Hi Nick,
> > >>>>> What do you mean by "hashing algorithms"?
> > >>>>> Thanks,
> > >>>>> James
> > >>>>>
> > >>>>>
> > >>>>> On 03/15/2013 10:11 AM, Nick Dimiduk wrote:
> > >>>>>
> > >>>>>> Hi David,
> > >>>>>>
> > >>>>>> Native support for a handful of hashing algorithms has also been
> > >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: HBase type support

Posted by Andrew Purtell <ap...@apache.org>.

> This then leads to another question... suppose Apache does add encryption
to Hadoop. While the Apache organization does have the proper paperwork in
place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel, etc ?

Well I can't put that question aside since you've brought it up now
twice and encryption feature candidates for Apache Hadoop and Apache HBase
are something I have been working on. Its a valid question but since as you
admit you don't know what you are talking about, perhaps stating uninformed
opinions can be avoided. Only the latter is what I object to. I think the
short answer is as an Apache contributor I'm concerned about the Apache
product. Downstream repackagers can take whatever action needed including
changes, since it is open source, or feedback about it representing a
hardship. At this point I have heard nothing like that. I work for Intel
and can say we are good with it.

On Sunday, March 17, 2013, Michael Segel wrote:

> Its not a question of FUD, but that certain types of encryption/decryption
> code falls under the munitions act.
> See: http://www.fas.org/irp/offdocs/eo_crypt_9611_memo.htm
>
> Having said that, there is this:
> http://www.bis.doc.gov/encryption/encfaqs6_17_02.html
>
> In short, I don't as a habit export/import encryption technology so I am
> not up to speed on the current state of the laws.
> Which is why I have to question the current state of the US encryption
> laws.
>
> This then leads to another question... suppose Apache does add encryption
> to Hadoop. While the Apache organization does have the proper paperwork in
> place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel, etc ?
>
> But lets put that question aside.
>
> The point I was trying to make was that the core Sun JVM does support MD5
> and SHA-1 out of the box, so that anyone running Hadoop and using the
> 1.6_xx or the 1.7_xx versions of the JVM will have these packages.
>
> Adding hooks that use these classes are a no brainer.  However, beyond
> this... you tell me.
>
> -Mike
>
> On Mar 16, 2013, at 7:59 AM, Andrew Purtell <ap...@apache.org> wrote:
>
> > The ASF avails itself of an exception to crypto export which only
> requires
> > a bit of PMC housekeeping at release time. So "is not [ok]" is FUD. I
> > humbly request we refrain from FUD here. See
> > http://www.apache.org/dev/crypto.html. To the best of our knowledge we
> > expect this to continue, though the ASF has not updated this policy yet
> for
> > recent regulation updates.
> >
> > On Saturday, March 16, 2013, Michel Segel wrote:
> >
> >> I also want to add that you could add MD5 and SHA-1, but I'd check on us
> >> laws... I think these are ok, however other encryption/decryption code
> is
> >> not.
> >>
> >> They are part of the std sun java libraries ...
> >>
> >> Sent from a remote device. Please excuse any typos...
> >>
> >> Mike Segel
> >>
> >> On Mar 16, 2013, at 7:18 AM, Michel Segel <mi...@hotmail.com>
> >> wrote:
> >>
> >>> Isn't that what you get through add on frameworks like TSDB and Kiji ?
> >> Maybe not on the client side, but frameworks that extend HBase...
> >>>
> >>> Sent from a remote device. Please excuse any typos...
> >>>
> >>> Mike Segel
> >>>
> >>> On Mar 16, 2013, at 12:45 AM, lars hofhansl <la...@apache.org> wrote:
> >>>
> >>>> I think generally we should keep HBase a byte[] based key value store.
> >>>> What we should add to HBase are tools that would allow client side
> apps
> >> (or libraries) to built functionality on top of plain HBase.
> >>>>
> >>>> Serialization that maintains a correct semantic sort order is
> important
> >> as a building block, so is code that can build up correctly serialized
> and
> >> sortable compound keys, as well as hashing algorithms.
> >>>>
> >>>> Where I would draw the line is adding types to HBase itself. As long
> as
> >> one can write a client, or Filters, or Coprocessors with the tools
> provided
> >> by HBase we're good. Higher level functionality can then be built of on
> top
> >> of HBase.
> >>>>
> >>>>
> >>>> For example, maybe we need to add better access API to the HBase WAL
> in
> >> order to have an external library implement idempotent transactions
> (which
> >> can be used to implement 2ndary indexes).
> >>>> Maybe some other primitives have to be exposed in order to allow an
> >> external library to implement full transactions.
> >>>> Or we might need a statistics framework (such as the one that Jesse is
> >> working on).
> >>>>
> >>>> These are all building blocks that do not presume specific access
> >> patterns or clients, but can be used to implement them.
> >>>>
> >>>>
> >>>> As usual, just my $0.02.
> >>>>
> >>>> -- Lars
> >>>>
> >>>>
> >>>>
> >>>> ________________________________
> >>>> From: Nick Dimiduk <nd...@gmail.com>
> >>>> To: user@hbase.apache.org
> >>>> Sent: Friday, March 15, 2013 10:57 AM
> >>>> Subject: Re: HBase type support
> >>>>
> >>>> I'm talking about MD5, SHA1, etc. It's something explicitly mentioned
> >>>> in HBASE-7221.
> >>>>
> >>>> On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <
> jtaylor@salesforce.com
> >>> wrote:
> >>>>
> >>>>> Hi Nick,
> >>>>> What do you mean by "hashing algorithms"?
> >>>>> Thanks,
> >>>>> James
> >>>>>
> >>>>>
> >>>>> On 03/15/2013 10:11 AM, Nick Dimiduk wrote:
> >>>>>
> >>>>>> Hi David,
> >>>>>>
> >>>>>> Native support for a handful of hashing algorithms has also been
> >



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: HBase type support

Posted by Michael Segel <mi...@hotmail.com>.

Its not a question of FUD, but that certain types of encryption/decryption code falls under the munitions act. 
See: http://www.fas.org/irp/offdocs/eo_crypt_9611_memo.htm

Having said that, there is this:
http://www.bis.doc.gov/encryption/encfaqs6_17_02.html

In short, I don't as a habit export/import encryption technology so I am not up to speed on the current state of the laws. 
Which is why I have to question the current state of the US encryption laws. 

This then leads to another question... suppose Apache does add encryption to Hadoop. While the Apache organization does have the proper paperwork in place, what then happens to Cloudera, Hortonworks, EMC, IBM, Intel, etc ? 

But lets put that question aside. 

The point I was trying to make was that the core Sun JVM does support MD5 and SHA-1 out of the box, so that anyone running Hadoop and using the 1.6_xx or the 1.7_xx versions of the JVM will have these packages. 

Adding hooks that use these classes are a no brainer.  However, beyond this... you tell me. 

-Mike

On Mar 16, 2013, at 7:59 AM, Andrew Purtell <ap...@apache.org> wrote:

> The ASF avails itself of an exception to crypto export which only requires
> a bit of PMC housekeeping at release time. So "is not [ok]" is FUD. I
> humbly request we refrain from FUD here. See
> http://www.apache.org/dev/crypto.html. To the best of our knowledge we
> expect this to continue, though the ASF has not updated this policy yet for
> recent regulation updates.
> 
> On Saturday, March 16, 2013, Michel Segel wrote:
> 
>> I also want to add that you could add MD5 and SHA-1, but I'd check on us
>> laws... I think these are ok, however other encryption/decryption code is
>> not.
>> 
>> They are part of the std sun java libraries ...
>> 
>> Sent from a remote device. Please excuse any typos...
>> 
>> Mike Segel
>> 
>> On Mar 16, 2013, at 7:18 AM, Michel Segel <mi...@hotmail.com>
>> wrote:
>> 
>>> Isn't that what you get through add on frameworks like TSDB and Kiji ?
>> Maybe not on the client side, but frameworks that extend HBase...
>>> 
>>> Sent from a remote device. Please excuse any typos...
>>> 
>>> Mike Segel
>>> 
>>> On Mar 16, 2013, at 12:45 AM, lars hofhansl <la...@apache.org> wrote:
>>> 
>>>> I think generally we should keep HBase a byte[] based key value store.
>>>> What we should add to HBase are tools that would allow client side apps
>> (or libraries) to built functionality on top of plain HBase.
>>>> 
>>>> Serialization that maintains a correct semantic sort order is important
>> as a building block, so is code that can build up correctly serialized and
>> sortable compound keys, as well as hashing algorithms.
>>>> 
>>>> Where I would draw the line is adding types to HBase itself. As long as
>> one can write a client, or Filters, or Coprocessors with the tools provided
>> by HBase we're good. Higher level functionality can then be built of on top
>> of HBase.
>>>> 
>>>> 
>>>> For example, maybe we need to add better access API to the HBase WAL in
>> order to have an external library implement idempotent transactions (which
>> can be used to implement 2ndary indexes).
>>>> Maybe some other primitives have to be exposed in order to allow an
>> external library to implement full transactions.
>>>> Or we might need a statistics framework (such as the one that Jesse is
>> working on).
>>>> 
>>>> These are all building blocks that do not presume specific access
>> patterns or clients, but can be used to implement them.
>>>> 
>>>> 
>>>> As usual, just my $0.02.
>>>> 
>>>> -- Lars
>>>> 
>>>> 
>>>> 
>>>> ________________________________
>>>> From: Nick Dimiduk <nd...@gmail.com>
>>>> To: user@hbase.apache.org
>>>> Sent: Friday, March 15, 2013 10:57 AM
>>>> Subject: Re: HBase type support
>>>> 
>>>> I'm talking about MD5, SHA1, etc. It's something explicitly mentioned
>>>> in HBASE-7221.
>>>> 
>>>> On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <jtaylor@salesforce.com
>>> wrote:
>>>> 
>>>>> Hi Nick,
>>>>> What do you mean by "hashing algorithms"?
>>>>> Thanks,
>>>>> James
>>>>> 
>>>>> 
>>>>> On 03/15/2013 10:11 AM, Nick Dimiduk wrote:
>>>>> 
>>>>>> Hi David,
>>>>>> 
>>>>>> Native support for a handful of hashing algorithms has also been
>>>>>> discussed.
>>>>>> Do you think these should be supported directly, as opposed to using a
>>>>>> fixed-length String or fixed-length byte[]?
>>>>>> 
>>>>>> Thanks,
>>>>>> Nick
>>>>>> 
>>>>>> On Thu, Mar 14, 2013 at 9:51 AM, David Koch <og...@googlemail.com>
>>>>>> wrote:
>>>>>> 
>>>>>> Hi Nick,
>>>>>>> 
>>>>>>> As an HBase user I would welcome this addition. In addition to the
>>>>>>> proposed
>>>>>>> list of datatypes A UUID/GUID type would also be nice to have.
>>>>>>> 
>>>>>>> Regards,
>>>>>>> 
>>>>>>> /David
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <nd...@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> I'd like to draw your attention to HBASE-8089. The desire is to add
>> type
>>>>>>>> support to HBase. There are two primary objectives: make the lives
>> of
>>>>>>>> developers building on HBase easier, and facilitate better tools on
>> top
>>>>>>>> 
>>>>>>> of
>>>>>>> 
>>>>>>>> HBase. Please chime in with any feature suggestions you think we've
>>>>>>>> 
>> 
> 
> 
> -- 
> Best regards,
> 
>   - Andy
> 
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)

Re: HBase type support

Posted by Andrew Purtell <ap...@apache.org>.

The ASF avails itself of an exception to crypto export which only requires
a bit of PMC housekeeping at release time. So "is not [ok]" is FUD. I
humbly request we refrain from FUD here. See
http://www.apache.org/dev/crypto.html. To the best of our knowledge we
expect this to continue, though the ASF has not updated this policy yet for
recent regulation updates.

On Saturday, March 16, 2013, Michel Segel wrote:

> I also want to add that you could add MD5 and SHA-1, but I'd check on us
> laws... I think these are ok, however other encryption/decryption code is
> not.
>
> They are part of the std sun java libraries ...
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Mar 16, 2013, at 7:18 AM, Michel Segel <mi...@hotmail.com>
> wrote:
>
> > Isn't that what you get through add on frameworks like TSDB and Kiji ?
> Maybe not on the client side, but frameworks that extend HBase...
> >
> > Sent from a remote device. Please excuse any typos...
> >
> > Mike Segel
> >
> > On Mar 16, 2013, at 12:45 AM, lars hofhansl <la...@apache.org> wrote:
> >
> >> I think generally we should keep HBase a byte[] based key value store.
> >> What we should add to HBase are tools that would allow client side apps
> (or libraries) to built functionality on top of plain HBase.
> >>
> >> Serialization that maintains a correct semantic sort order is important
> as a building block, so is code that can build up correctly serialized and
> sortable compound keys, as well as hashing algorithms.
> >>
> >> Where I would draw the line is adding types to HBase itself. As long as
> one can write a client, or Filters, or Coprocessors with the tools provided
> by HBase we're good. Higher level functionality can then be built of on top
> of HBase.
> >>
> >>
> >> For example, maybe we need to add better access API to the HBase WAL in
> order to have an external library implement idempotent transactions (which
> can be used to implement 2ndary indexes).
> >> Maybe some other primitives have to be exposed in order to allow an
> external library to implement full transactions.
> >> Or we might need a statistics framework (such as the one that Jesse is
> working on).
> >>
> >> These are all building blocks that do not presume specific access
> patterns or clients, but can be used to implement them.
> >>
> >>
> >> As usual, just my $0.02.
> >>
> >> -- Lars
> >>
> >>
> >>
> >> ________________________________
> >> From: Nick Dimiduk <nd...@gmail.com>
> >> To: user@hbase.apache.org
> >> Sent: Friday, March 15, 2013 10:57 AM
> >> Subject: Re: HBase type support
> >>
> >> I'm talking about MD5, SHA1, etc. It's something explicitly mentioned
> >> in HBASE-7221.
> >>
> >> On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <jtaylor@salesforce.com
> >wrote:
> >>
> >>> Hi Nick,
> >>> What do you mean by "hashing algorithms"?
> >>> Thanks,
> >>> James
> >>>
> >>>
> >>> On 03/15/2013 10:11 AM, Nick Dimiduk wrote:
> >>>
> >>>> Hi David,
> >>>>
> >>>> Native support for a handful of hashing algorithms has also been
> >>>> discussed.
> >>>> Do you think these should be supported directly, as opposed to using a
> >>>> fixed-length String or fixed-length byte[]?
> >>>>
> >>>> Thanks,
> >>>> Nick
> >>>>
> >>>> On Thu, Mar 14, 2013 at 9:51 AM, David Koch <og...@googlemail.com>
> >>>> wrote:
> >>>>
> >>>>  Hi Nick,
> >>>>>
> >>>>> As an HBase user I would welcome this addition. In addition to the
> >>>>> proposed
> >>>>> list of datatypes A UUID/GUID type would also be nice to have.
> >>>>>
> >>>>> Regards,
> >>>>>
> >>>>> /David
> >>>>>
> >>>>>
> >>>>> On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <nd...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>  Hi all,
> >>>>>>
> >>>>>> I'd like to draw your attention to HBASE-8089. The desire is to add
> type
> >>>>>> support to HBase. There are two primary objectives: make the lives
> of
> >>>>>> developers building on HBase easier, and facilitate better tools on
> top
> >>>>>>
> >>>>> of
> >>>>>
> >>>>>> HBase. Please chime in with any feature suggestions you think we've
> >>>>>>
>


-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: HBase type support

Posted by Michel Segel <mi...@hotmail.com>.

I also want to add that you could add MD5 and SHA-1, but I'd check on us laws... I think these are ok, however other encryption/decryption code is not.

They are part of the std sun java libraries ...

Sent from a remote device. Please excuse any typos...

Mike Segel

On Mar 16, 2013, at 7:18 AM, Michel Segel <mi...@hotmail.com> wrote:

> Isn't that what you get through add on frameworks like TSDB and Kiji ? Maybe not on the client side, but frameworks that extend HBase...
> 
> Sent from a remote device. Please excuse any typos...
> 
> Mike Segel
> 
> On Mar 16, 2013, at 12:45 AM, lars hofhansl <la...@apache.org> wrote:
> 
>> I think generally we should keep HBase a byte[] based key value store.
>> What we should add to HBase are tools that would allow client side apps (or libraries) to built functionality on top of plain HBase.
>> 
>> Serialization that maintains a correct semantic sort order is important as a building block, so is code that can build up correctly serialized and sortable compound keys, as well as hashing algorithms.
>> 
>> Where I would draw the line is adding types to HBase itself. As long as one can write a client, or Filters, or Coprocessors with the tools provided by HBase we're good. Higher level functionality can then be built of on top of HBase.
>> 
>> 
>> For example, maybe we need to add better access API to the HBase WAL in order to have an external library implement idempotent transactions (which can be used to implement 2ndary indexes).
>> Maybe some other primitives have to be exposed in order to allow an external library to implement full transactions.
>> Or we might need a statistics framework (such as the one that Jesse is working on).
>> 
>> These are all building blocks that do not presume specific access patterns or clients, but can be used to implement them.
>> 
>> 
>> As usual, just my $0.02.
>> 
>> -- Lars
>> 
>> 
>> 
>> ________________________________
>> From: Nick Dimiduk <nd...@gmail.com>
>> To: user@hbase.apache.org 
>> Sent: Friday, March 15, 2013 10:57 AM
>> Subject: Re: HBase type support
>> 
>> I'm talking about MD5, SHA1, etc. It's something explicitly mentioned
>> in HBASE-7221.
>> 
>> On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <jt...@salesforce.com>wrote:
>> 
>>> Hi Nick,
>>> What do you mean by "hashing algorithms"?
>>> Thanks,
>>> James
>>> 
>>> 
>>> On 03/15/2013 10:11 AM, Nick Dimiduk wrote:
>>> 
>>>> Hi David,
>>>> 
>>>> Native support for a handful of hashing algorithms has also been
>>>> discussed.
>>>> Do you think these should be supported directly, as opposed to using a
>>>> fixed-length String or fixed-length byte[]?
>>>> 
>>>> Thanks,
>>>> Nick
>>>> 
>>>> On Thu, Mar 14, 2013 at 9:51 AM, David Koch <og...@googlemail.com>
>>>> wrote:
>>>> 
>>>>  Hi Nick,
>>>>> 
>>>>> As an HBase user I would welcome this addition. In addition to the
>>>>> proposed
>>>>> list of datatypes A UUID/GUID type would also be nice to have.
>>>>> 
>>>>> Regards,
>>>>> 
>>>>> /David
>>>>> 
>>>>> 
>>>>> On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <nd...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>  Hi all,
>>>>>> 
>>>>>> I'd like to draw your attention to HBASE-8089. The desire is to add type
>>>>>> support to HBase. There are two primary objectives: make the lives of
>>>>>> developers building on HBase easier, and facilitate better tools on top
>>>>>> 
>>>>> of
>>>>> 
>>>>>> HBase. Please chime in with any feature suggestions you think we've
>>>>>> 
>>>>> missed
>>>>> 
>>>>>> in initial conversations.
>>>>>> 
>>>>>> Thanks,
>>>>>> -n
>>>>>> 
>>>>>> [0]: https://issues.apache.org/**jira/browse/HBASE-8089<https://issues.apache.org/jira/browse/HBASE-8089>
>>>>>> 
>>>>>> 
>

Re: HBase type support

Posted by Michel Segel <mi...@hotmail.com>.

Isn't that what you get through add on frameworks like TSDB and Kiji ? Maybe not on the client side, but frameworks that extend HBase...

Sent from a remote device. Please excuse any typos...

Mike Segel

On Mar 16, 2013, at 12:45 AM, lars hofhansl <la...@apache.org> wrote:

> I think generally we should keep HBase a byte[] based key value store.
> What we should add to HBase are tools that would allow client side apps (or libraries) to built functionality on top of plain HBase.
> 
> Serialization that maintains a correct semantic sort order is important as a building block, so is code that can build up correctly serialized and sortable compound keys, as well as hashing algorithms.
> 
> Where I would draw the line is adding types to HBase itself. As long as one can write a client, or Filters, or Coprocessors with the tools provided by HBase we're good. Higher level functionality can then be built of on top of HBase.
> 
> 
> For example, maybe we need to add better access API to the HBase WAL in order to have an external library implement idempotent transactions (which can be used to implement 2ndary indexes).
> Maybe some other primitives have to be exposed in order to allow an external library to implement full transactions.
> Or we might need a statistics framework (such as the one that Jesse is working on).
> 
> These are all building blocks that do not presume specific access patterns or clients, but can be used to implement them.
> 
> 
> As usual, just my $0.02.
> 
> -- Lars
> 
> 
> 
> ________________________________
> From: Nick Dimiduk <nd...@gmail.com>
> To: user@hbase.apache.org 
> Sent: Friday, March 15, 2013 10:57 AM
> Subject: Re: HBase type support
> 
> I'm talking about MD5, SHA1, etc. It's something explicitly mentioned
> in HBASE-7221.
> 
> On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <jt...@salesforce.com>wrote:
> 
>> Hi Nick,
>> What do you mean by "hashing algorithms"?
>> Thanks,
>> James
>> 
>> 
>> On 03/15/2013 10:11 AM, Nick Dimiduk wrote:
>> 
>>> Hi David,
>>> 
>>> Native support for a handful of hashing algorithms has also been
>>> discussed.
>>> Do you think these should be supported directly, as opposed to using a
>>> fixed-length String or fixed-length byte[]?
>>> 
>>> Thanks,
>>> Nick
>>> 
>>> On Thu, Mar 14, 2013 at 9:51 AM, David Koch <og...@googlemail.com>
>>> wrote:
>>> 
>>>   Hi Nick,
>>>> 
>>>> As an HBase user I would welcome this addition. In addition to the
>>>> proposed
>>>> list of datatypes A UUID/GUID type would also be nice to have.
>>>> 
>>>> Regards,
>>>> 
>>>> /David
>>>> 
>>>> 
>>>> On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <nd...@gmail.com>
>>>> wrote:
>>>> 
>>>>   Hi all,
>>>>> 
>>>>> I'd like to draw your attention to HBASE-8089. The desire is to add type
>>>>> support to HBase. There are two primary objectives: make the lives of
>>>>> developers building on HBase easier, and facilitate better tools on top
>>>>> 
>>>> of
>>>> 
>>>>> HBase. Please chime in with any feature suggestions you think we've
>>>>> 
>>>> missed
>>>> 
>>>>> in initial conversations.
>>>>> 
>>>>> Thanks,
>>>>> -n
>>>>> 
>>>>> [0]: https://issues.apache.org/**jira/browse/HBASE-8089<https://issues.apache.org/jira/browse/HBASE-8089>
>>>>> 
>>>>>

Re: HBase type support

Posted by Nick Dimiduk <nd...@gmail.com>.

On Mon, Mar 18, 2013 at 4:54 PM, Jonathan Hsieh <jo...@cloudera.com> wrote:

> +1.  I really don't want to add typing specific information into hbase core
> -- howver, having buliding blocks, plugins, and extra metadata manage it
> seems quite reasonable to me.
>
> There are many many games that can be played to encode data and enforcing
> typing at the hbase level as opposed to library. (ex: putting in structs
> that have fields with ints as opposed to having tons of cols with ints in
> them, or how opentsdb encodes time stamps, etc..).
>

I'm not proposing deep integration with core. This stuff would exist in the
client module only. The byte[] interfaces don't go away either; OpenTSDB
can continue to perform its data encoding as is. This proposal seeks to
enable less sophisticated data storage approaches, solve the common case in
a reasonable way.

-n

On Fri, Mar 15, 2013 at 10:45 PM, lars hofhansl <la...@apache.org> wrote:
>
> > I think generally we should keep HBase a byte[] based key value store.
> > What we should add to HBase are tools that would allow client side apps
> > (or libraries) to built functionality on top of plain HBase.
> >
> > Serialization that maintains a correct semantic sort order is important
> as
> > a building block, so is code that can build up correctly serialized and
> > sortable compound keys, as well as hashing algorithms.
> >
> > Where I would draw the line is adding types to HBase itself. As long as
> > one can write a client, or Filters, or Coprocessors with the tools
> provided
> > by HBase we're good. Higher level functionality can then be built of on
> top
> > of HBase.
> >
> >
> > For example, maybe we need to add better access API to the HBase WAL in
> > order to have an external library implement idempotent transactions
> (which
> > can be used to implement 2ndary indexes).
> > Maybe some other primitives have to be exposed in order to allow an
> > external library to implement full transactions.
> > Or we might need a statistics framework (such as the one that Jesse is
> > working on).
> >
> > These are all building blocks that do not presume specific access
> patterns
> > or clients, but can be used to implement them.
> >
> >
> > As usual, just my $0.02.
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> >  From: Nick Dimiduk <nd...@gmail.com>
> > To: user@hbase.apache.org
> > Sent: Friday, March 15, 2013 10:57 AM
> > Subject: Re: HBase type support
> >
> > I'm talking about MD5, SHA1, etc. It's something explicitly mentioned
> > in HBASE-7221.
> >
> > On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <jtaylor@salesforce.com
> > >wrote:
> >
> > > Hi Nick,
> > > What do you mean by "hashing algorithms"?
> > > Thanks,
> > > James
> > >
> > >
> > > On 03/15/2013 10:11 AM, Nick Dimiduk wrote:
> > >
> > >> Hi David,
> > >>
> > >> Native support for a handful of hashing algorithms has also been
> > >> discussed.
> > >> Do you think these should be supported directly, as opposed to using a
> > >> fixed-length String or fixed-length byte[]?
> > >>
> > >> Thanks,
> > >> Nick
> > >>
> > >> On Thu, Mar 14, 2013 at 9:51 AM, David Koch <og...@googlemail.com>
> > >> wrote:
> > >>
> > >>  Hi Nick,
> > >>>
> > >>> As an HBase user I would welcome this addition. In addition to the
> > >>> proposed
> > >>> list of datatypes A UUID/GUID type would also be nice to have.
> > >>>
> > >>> Regards,
> > >>>
> > >>> /David
> > >>>
> > >>>
> > >>> On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <nd...@gmail.com>
> > >>> wrote:
> > >>>
> > >>>  Hi all,
> > >>>>
> > >>>> I'd like to draw your attention to HBASE-8089. The desire is to add
> > type
> > >>>> support to HBase. There are two primary objectives: make the lives
> of
> > >>>> developers building on HBase easier, and facilitate better tools on
> > top
> > >>>>
> > >>> of
> > >>>
> > >>>> HBase. Please chime in with any feature suggestions you think we've
> > >>>>
> > >>> missed
> > >>>
> > >>>> in initial conversations.
> > >>>>
> > >>>> Thanks,
> > >>>> -n
> > >>>>
> > >>>> [0]: https://issues.apache.org/**jira/browse/HBASE-8089<
> > https://issues.apache.org/jira/browse/HBASE-8089>
> > >>>>
> > >>>>
> > >
> >
>
>
>
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // jon@cloudera.com
>

Re: HBase type support

Posted by Michael Segel <mi...@hotmail.com>.

yup. Why break a good thing? ;-)

On Mar 18, 2013, at 6:54 PM, Jonathan Hsieh <jo...@cloudera.com> wrote:

> +1.  I really don't want to add typing specific information into hbase core
> -- howver, having buliding blocks, plugins, and extra metadata manage it
> seems quite reasonable to me.
> 
> There are many many games that can be played to encode data and enforcing
> typing at the hbase level as opposed to library. (ex: putting in structs
> that have fields with ints as opposed to having tons of cols with ints in
> them, or how opentsdb encodes time stamps, etc..).
> 
> Jon.
> 
> On Fri, Mar 15, 2013 at 10:45 PM, lars hofhansl <la...@apache.org> wrote:
> 
>> I think generally we should keep HBase a byte[] based key value store.
>> What we should add to HBase are tools that would allow client side apps
>> (or libraries) to built functionality on top of plain HBase.
>> 
>> Serialization that maintains a correct semantic sort order is important as
>> a building block, so is code that can build up correctly serialized and
>> sortable compound keys, as well as hashing algorithms.
>> 
>> Where I would draw the line is adding types to HBase itself. As long as
>> one can write a client, or Filters, or Coprocessors with the tools provided
>> by HBase we're good. Higher level functionality can then be built of on top
>> of HBase.
>> 
>> 
>> For example, maybe we need to add better access API to the HBase WAL in
>> order to have an external library implement idempotent transactions (which
>> can be used to implement 2ndary indexes).
>> Maybe some other primitives have to be exposed in order to allow an
>> external library to implement full transactions.
>> Or we might need a statistics framework (such as the one that Jesse is
>> working on).
>> 
>> These are all building blocks that do not presume specific access patterns
>> or clients, but can be used to implement them.
>> 
>> 
>> As usual, just my $0.02.
>> 
>> -- Lars
>> 
>> 
>> 
>> ________________________________
>> From: Nick Dimiduk <nd...@gmail.com>
>> To: user@hbase.apache.org
>> Sent: Friday, March 15, 2013 10:57 AM
>> Subject: Re: HBase type support
>> 
>> I'm talking about MD5, SHA1, etc. It's something explicitly mentioned
>> in HBASE-7221.
>> 
>> On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <jtaylor@salesforce.com
>>> wrote:
>> 
>>> Hi Nick,
>>> What do you mean by "hashing algorithms"?
>>> Thanks,
>>> James
>>> 
>>> 
>>> On 03/15/2013 10:11 AM, Nick Dimiduk wrote:
>>> 
>>>> Hi David,
>>>> 
>>>> Native support for a handful of hashing algorithms has also been
>>>> discussed.
>>>> Do you think these should be supported directly, as opposed to using a
>>>> fixed-length String or fixed-length byte[]?
>>>> 
>>>> Thanks,
>>>> Nick
>>>> 
>>>> On Thu, Mar 14, 2013 at 9:51 AM, David Koch <og...@googlemail.com>
>>>> wrote:
>>>> 
>>>> Hi Nick,
>>>>> 
>>>>> As an HBase user I would welcome this addition. In addition to the
>>>>> proposed
>>>>> list of datatypes A UUID/GUID type would also be nice to have.
>>>>> 
>>>>> Regards,
>>>>> 
>>>>> /David
>>>>> 
>>>>> 
>>>>> On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <nd...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>> Hi all,
>>>>>> 
>>>>>> I'd like to draw your attention to HBASE-8089. The desire is to add
>> type
>>>>>> support to HBase. There are two primary objectives: make the lives of
>>>>>> developers building on HBase easier, and facilitate better tools on
>> top
>>>>>> 
>>>>> of
>>>>> 
>>>>>> HBase. Please chime in with any feature suggestions you think we've
>>>>>> 
>>>>> missed
>>>>> 
>>>>>> in initial conversations.
>>>>>> 
>>>>>> Thanks,
>>>>>> -n
>>>>>> 
>>>>>> [0]: https://issues.apache.org/**jira/browse/HBASE-8089<
>> https://issues.apache.org/jira/browse/HBASE-8089>
>>>>>> 
>>>>>> 
>>> 
>> 
> 
> 
> 
> -- 
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // jon@cloudera.com

Re: HBase type support

Posted by Jonathan Hsieh <jo...@cloudera.com>.

+1.  I really don't want to add typing specific information into hbase core
-- howver, having buliding blocks, plugins, and extra metadata manage it
seems quite reasonable to me.

There are many many games that can be played to encode data and enforcing
typing at the hbase level as opposed to library. (ex: putting in structs
that have fields with ints as opposed to having tons of cols with ints in
them, or how opentsdb encodes time stamps, etc..).

Jon.

On Fri, Mar 15, 2013 at 10:45 PM, lars hofhansl <la...@apache.org> wrote:

> I think generally we should keep HBase a byte[] based key value store.
> What we should add to HBase are tools that would allow client side apps
> (or libraries) to built functionality on top of plain HBase.
>
> Serialization that maintains a correct semantic sort order is important as
> a building block, so is code that can build up correctly serialized and
> sortable compound keys, as well as hashing algorithms.
>
> Where I would draw the line is adding types to HBase itself. As long as
> one can write a client, or Filters, or Coprocessors with the tools provided
> by HBase we're good. Higher level functionality can then be built of on top
> of HBase.
>
>
> For example, maybe we need to add better access API to the HBase WAL in
> order to have an external library implement idempotent transactions (which
> can be used to implement 2ndary indexes).
> Maybe some other primitives have to be exposed in order to allow an
> external library to implement full transactions.
> Or we might need a statistics framework (such as the one that Jesse is
> working on).
>
> These are all building blocks that do not presume specific access patterns
> or clients, but can be used to implement them.
>
>
> As usual, just my $0.02.
>
> -- Lars
>
>
>
> ________________________________
>  From: Nick Dimiduk <nd...@gmail.com>
> To: user@hbase.apache.org
> Sent: Friday, March 15, 2013 10:57 AM
> Subject: Re: HBase type support
>
> I'm talking about MD5, SHA1, etc. It's something explicitly mentioned
> in HBASE-7221.
>
> On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <jtaylor@salesforce.com
> >wrote:
>
> > Hi Nick,
> > What do you mean by "hashing algorithms"?
> > Thanks,
> > James
> >
> >
> > On 03/15/2013 10:11 AM, Nick Dimiduk wrote:
> >
> >> Hi David,
> >>
> >> Native support for a handful of hashing algorithms has also been
> >> discussed.
> >> Do you think these should be supported directly, as opposed to using a
> >> fixed-length String or fixed-length byte[]?
> >>
> >> Thanks,
> >> Nick
> >>
> >> On Thu, Mar 14, 2013 at 9:51 AM, David Koch <og...@googlemail.com>
> >> wrote:
> >>
> >>  Hi Nick,
> >>>
> >>> As an HBase user I would welcome this addition. In addition to the
> >>> proposed
> >>> list of datatypes A UUID/GUID type would also be nice to have.
> >>>
> >>> Regards,
> >>>
> >>> /David
> >>>
> >>>
> >>> On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <nd...@gmail.com>
> >>> wrote:
> >>>
> >>>  Hi all,
> >>>>
> >>>> I'd like to draw your attention to HBASE-8089. The desire is to add
> type
> >>>> support to HBase. There are two primary objectives: make the lives of
> >>>> developers building on HBase easier, and facilitate better tools on
> top
> >>>>
> >>> of
> >>>
> >>>> HBase. Please chime in with any feature suggestions you think we've
> >>>>
> >>> missed
> >>>
> >>>> in initial conversations.
> >>>>
> >>>> Thanks,
> >>>> -n
> >>>>
> >>>> [0]: https://issues.apache.org/**jira/browse/HBASE-8089<
> https://issues.apache.org/jira/browse/HBASE-8089>
> >>>>
> >>>>
> >
>



-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Re: HBase type support

Posted by Nick Dimiduk <nd...@gmail.com>.

On Fri, Mar 15, 2013 at 10:45 PM, lars hofhansl <la...@apache.org> wrote:

> I think generally we should keep HBase a byte[] based key value store.
> What we should add to HBase are tools that would allow client side apps
> (or libraries) to built functionality on top of plain HBase.
>

That's precisely it. HBase is not changed in any fundamental way to
acknowledge or enforce types. Instead, the hbase-client module makes type
management easier for user code.

Serialization that maintains a correct semantic sort order is important as
> a building block, so is code that can build up correctly serialized and
> sortable compound keys, as well as hashing algorithms.
>

Agreed on serialization. Hashing I can do without. Yes it's a common
practice, but IMHO, if you're hashing, you're not taking advantage of the
natural distribution of your data. I think it's a lazy schema designer's
approach. I see no problem with shipping with support for some hashing
strategies if users demand, but I don't think it's a design approach we
should encourage.

Thanks,
Nick

________________________________
>  From: Nick Dimiduk <nd...@gmail.com>
> To: user@hbase.apache.org
> Sent: Friday, March 15, 2013 10:57 AM
> Subject: Re: HBase type support
>
> I'm talking about MD5, SHA1, etc. It's something explicitly mentioned
> in HBASE-7221.
>
> On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <jtaylor@salesforce.com
> >wrote:
>
> > Hi Nick,
> > What do you mean by "hashing algorithms"?
> > Thanks,
> > James
> >
> >
> > On 03/15/2013 10:11 AM, Nick Dimiduk wrote:
> >
> >> Hi David,
> >>
> >> Native support for a handful of hashing algorithms has also been
> >> discussed.
> >> Do you think these should be supported directly, as opposed to using a
> >> fixed-length String or fixed-length byte[]?
> >>
> >> Thanks,
> >> Nick
> >>
> >> On Thu, Mar 14, 2013 at 9:51 AM, David Koch <og...@googlemail.com>
> >> wrote:
> >>
> >>  Hi Nick,
> >>>
> >>> As an HBase user I would welcome this addition. In addition to the
> >>> proposed
> >>> list of datatypes A UUID/GUID type would also be nice to have.
> >>>
> >>> Regards,
> >>>
> >>> /David
> >>>
> >>>
> >>> On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <nd...@gmail.com>
> >>> wrote:
> >>>
> >>>  Hi all,
> >>>>
> >>>> I'd like to draw your attention to HBASE-8089. The desire is to add
> type
> >>>> support to HBase. There are two primary objectives: make the lives of
> >>>> developers building on HBase easier, and facilitate better tools on
> top
> >>>>
> >>> of
> >>>
> >>>> HBase. Please chime in with any feature suggestions you think we've
> >>>>
> >>> missed
> >>>
> >>>> in initial conversations.
> >>>>
> >>>> Thanks,
> >>>> -n
> >>>>
> >>>> [0]: https://issues.apache.org/**jira/browse/HBASE-8089<
> https://issues.apache.org/jira/browse/HBASE-8089>
> >>>>
> >>>>
> >

Re: HBase type support

Posted by lars hofhansl <la...@apache.org>.

I think generally we should keep HBase a byte[] based key value store.
What we should add to HBase are tools that would allow client side apps (or libraries) to built functionality on top of plain HBase.

Serialization that maintains a correct semantic sort order is important as a building block, so is code that can build up correctly serialized and sortable compound keys, as well as hashing algorithms.

Where I would draw the line is adding types to HBase itself. As long as one can write a client, or Filters, or Coprocessors with the tools provided by HBase we're good. Higher level functionality can then be built of on top of HBase.

For example, maybe we need to add better access API to the HBase WAL in order to have an external library implement idempotent transactions (which can be used to implement 2ndary indexes).
Maybe some other primitives have to be exposed in order to allow an external library to implement full transactions.
Or we might need a statistics framework (such as the one that Jesse is working on).

These are all building blocks that do not presume specific access patterns or clients, but can be used to implement them.

As usual, just my $0.02.

-- Lars

________________________________
 From: Nick Dimiduk <nd...@gmail.com>
To: user@hbase.apache.org 
Sent: Friday, March 15, 2013 10:57 AM
Subject: Re: HBase type support

I'm talking about MD5, SHA1, etc. It's something explicitly mentioned
in HBASE-7221.

On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <jt...@salesforce.com>wrote:

> Hi Nick,
> What do you mean by "hashing algorithms"?
> Thanks,
> James
>
>
> On 03/15/2013 10:11 AM, Nick Dimiduk wrote:
>
>> Hi David,
>>
>> Native support for a handful of hashing algorithms has also been
>> discussed.
>> Do you think these should be supported directly, as opposed to using a
>> fixed-length String or fixed-length byte[]?
>>
>> Thanks,
>> Nick
>>
>> On Thu, Mar 14, 2013 at 9:51 AM, David Koch <og...@googlemail.com>
>> wrote:
>>
>>  Hi Nick,
>>>
>>> As an HBase user I would welcome this addition. In addition to the
>>> proposed
>>> list of datatypes A UUID/GUID type would also be nice to have.
>>>
>>> Regards,
>>>
>>> /David
>>>
>>>
>>> On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <nd...@gmail.com>
>>> wrote:
>>>
>>>  Hi all,
>>>>
>>>> I'd like to draw your attention to HBASE-8089. The desire is to add type
>>>> support to HBase. There are two primary objectives: make the lives of
>>>> developers building on HBase easier, and facilitate better tools on top
>>>>
>>> of
>>>
>>>> HBase. Please chime in with any feature suggestions you think we've
>>>>
>>> missed
>>>
>>>> in initial conversations.
>>>>
>>>> Thanks,
>>>> -n
>>>>
>>>> [0]: https://issues.apache.org/**jira/browse/HBASE-8089<https://issues.apache.org/jira/browse/HBASE-8089>
>>>>
>>>>
>

Re: HBase type support

Posted by Nick Dimiduk <nd...@gmail.com>.

I'm talking about MD5, SHA1, etc. It's something explicitly mentioned
in HBASE-7221.

On Fri, Mar 15, 2013 at 10:55 AM, James Taylor <jt...@salesforce.com>wrote:

> Hi Nick,
> What do you mean by "hashing algorithms"?
> Thanks,
> James
>
>
> On 03/15/2013 10:11 AM, Nick Dimiduk wrote:
>
>> Hi David,
>>
>> Native support for a handful of hashing algorithms has also been
>> discussed.
>> Do you think these should be supported directly, as opposed to using a
>> fixed-length String or fixed-length byte[]?
>>
>> Thanks,
>> Nick
>>
>> On Thu, Mar 14, 2013 at 9:51 AM, David Koch <og...@googlemail.com>
>> wrote:
>>
>>  Hi Nick,
>>>
>>> As an HBase user I would welcome this addition. In addition to the
>>> proposed
>>> list of datatypes A UUID/GUID type would also be nice to have.
>>>
>>> Regards,
>>>
>>> /David
>>>
>>>
>>> On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <nd...@gmail.com>
>>> wrote:
>>>
>>>  Hi all,
>>>>
>>>> I'd like to draw your attention to HBASE-8089. The desire is to add type
>>>> support to HBase. There are two primary objectives: make the lives of
>>>> developers building on HBase easier, and facilitate better tools on top
>>>>
>>> of
>>>
>>>> HBase. Please chime in with any feature suggestions you think we've
>>>>
>>> missed
>>>
>>>> in initial conversations.
>>>>
>>>> Thanks,
>>>> -n
>>>>
>>>> [0]: https://issues.apache.org/**jira/browse/HBASE-8089<https://issues.apache.org/jira/browse/HBASE-8089>
>>>>
>>>>
>

Re: HBase type support

Posted by James Taylor <jt...@salesforce.com>.

Hi Nick,
What do you mean by "hashing algorithms"?
Thanks,
James

On 03/15/2013 10:11 AM, Nick Dimiduk wrote:
> Hi David,
>
> Native support for a handful of hashing algorithms has also been discussed.
> Do you think these should be supported directly, as opposed to using a
> fixed-length String or fixed-length byte[]?
>
> Thanks,
> Nick
>
> On Thu, Mar 14, 2013 at 9:51 AM, David Koch <og...@googlemail.com> wrote:
>
>> Hi Nick,
>>
>> As an HBase user I would welcome this addition. In addition to the proposed
>> list of datatypes A UUID/GUID type would also be nice to have.
>>
>> Regards,
>>
>> /David
>>
>>
>> On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <nd...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I'd like to draw your attention to HBASE-8089. The desire is to add type
>>> support to HBase. There are two primary objectives: make the lives of
>>> developers building on HBase easier, and facilitate better tools on top
>> of
>>> HBase. Please chime in with any feature suggestions you think we've
>> missed
>>> in initial conversations.
>>>
>>> Thanks,
>>> -n
>>>
>>> [0]: https://issues.apache.org/jira/browse/HBASE-8089
>>>

Re: HBase type support

Posted by Nick Dimiduk <nd...@gmail.com>.

Hi David,

Native support for a handful of hashing algorithms has also been discussed.
Do you think these should be supported directly, as opposed to using a
fixed-length String or fixed-length byte[]?

Thanks,
Nick

On Thu, Mar 14, 2013 at 9:51 AM, David Koch <og...@googlemail.com> wrote:

> Hi Nick,
>
> As an HBase user I would welcome this addition. In addition to the proposed
> list of datatypes A UUID/GUID type would also be nice to have.
>
> Regards,
>
> /David
>
>
> On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <nd...@gmail.com> wrote:
>
> > Hi all,
> >
> > I'd like to draw your attention to HBASE-8089. The desire is to add type
> > support to HBase. There are two primary objectives: make the lives of
> > developers building on HBase easier, and facilitate better tools on top
> of
> > HBase. Please chime in with any feature suggestions you think we've
> missed
> > in initial conversations.
> >
> > Thanks,
> > -n
> >
> > [0]: https://issues.apache.org/jira/browse/HBASE-8089
> >
>

Re: HBase type support

Posted by Nick Dimiduk <nd...@gmail.com>.

On Fri, Mar 15, 2013 at 3:35 PM, Michel Segel <mi...@hotmail.com>wrote:

> So how do you check types in a column when the column isn't defined in the
> Schema?
>

In this proposal, it's not up to HBase to enforce types or schema, just as
it does not do these things today. What we're proposing is a set of
utilities that take the burdon of correct serialization off of user code. I
request that you please read the proposal in its entirety before commenting
further.

Thanks,
Nick

On Mar 15, 2013, at 10:06 AM, Nick Dimiduk <nd...@gmail.com> wrote:
>
> > On Fri, Mar 15, 2013 at 5:25 AM, Michel Segel <michael_segel@hotmail.com
> >wrote:
> >
> >> You do realize that having to worry about one type is easier...
> >
> > For HBase developers, that's true. The side-effect is that those worries
> > are pushed out into users' applications. Think of the application
> developer
> > who's accustomed to all the accoutrements provided by the Management
> System
> > part of an RDBMS. They pick up HBase and have none of that. I think the
> > motivations outlined in the attached document make a good case for
> bringing
> > some of that burden out of users' applications.
> >
> > A bit more freedom...
> >
> > Support for raw byte[] doesn't go away. In this proposal, bytes remain
> the
> > core plumbing of the system.
> >
> > -n
> >
> > On Mar 14, 2013, at 11:51 AM, David Koch <og...@googlemail.com> wrote:
> >>
> >>> Hi Nick,
> >>>
> >>> As an HBase user I would welcome this addition. In addition to the
> >> proposed
> >>> list of datatypes A UUID/GUID type would also be nice to have.
> >>>
> >>> Regards,
> >>>
> >>> /David
> >>>
> >>>
> >>> On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <nd...@gmail.com>
> >> wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> I'd like to draw your attention to HBASE-8089. The desire is to add
> type
> >>>> support to HBase. There are two primary objectives: make the lives of
> >>>> developers building on HBase easier, and facilitate better tools on
> top
> >> of
> >>>> HBase. Please chime in with any feature suggestions you think we've
> >> missed
> >>>> in initial conversations.
> >>>>
> >>>> Thanks,
> >>>> -n
> >>>>
> >>>> [0]: https://issues.apache.org/jira/browse/HBASE-8089
> >>
>

Re: HBase type support

Posted by Michel Segel <mi...@hotmail.com>.

Ok..

So how do you check types in a column when the column isn't defined in the Schema?


Sent from a remote device. Please excuse any typos...

Mike Segel

On Mar 15, 2013, at 10:06 AM, Nick Dimiduk <nd...@gmail.com> wrote:

> On Fri, Mar 15, 2013 at 5:25 AM, Michel Segel <mi...@hotmail.com>wrote:
> 
>> You do realize that having to worry about one type is easier...
> 
> For HBase developers, that's true. The side-effect is that those worries
> are pushed out into users' applications. Think of the application developer
> who's accustomed to all the accoutrements provided by the Management System
> part of an RDBMS. They pick up HBase and have none of that. I think the
> motivations outlined in the attached document make a good case for bringing
> some of that burden out of users' applications.
> 
> A bit more freedom...
> 
> Support for raw byte[] doesn't go away. In this proposal, bytes remain the
> core plumbing of the system.
> 
> -n
> 
> On Mar 14, 2013, at 11:51 AM, David Koch <og...@googlemail.com> wrote:
>> 
>>> Hi Nick,
>>> 
>>> As an HBase user I would welcome this addition. In addition to the
>> proposed
>>> list of datatypes A UUID/GUID type would also be nice to have.
>>> 
>>> Regards,
>>> 
>>> /David
>>> 
>>> 
>>> On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <nd...@gmail.com>
>> wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I'd like to draw your attention to HBASE-8089. The desire is to add type
>>>> support to HBase. There are two primary objectives: make the lives of
>>>> developers building on HBase easier, and facilitate better tools on top
>> of
>>>> HBase. Please chime in with any feature suggestions you think we've
>> missed
>>>> in initial conversations.
>>>> 
>>>> Thanks,
>>>> -n
>>>> 
>>>> [0]: https://issues.apache.org/jira/browse/HBASE-8089
>>

Re: HBase type support

Posted by Nick Dimiduk <nd...@gmail.com>.

On Fri, Mar 15, 2013 at 5:25 AM, Michel Segel <mi...@hotmail.com>wrote:

> You do realize that having to worry about one type is easier...
>

For HBase developers, that's true. The side-effect is that those worries
are pushed out into users' applications. Think of the application developer
who's accustomed to all the accoutrements provided by the Management System
part of an RDBMS. They pick up HBase and have none of that. I think the
motivations outlined in the attached document make a good case for bringing
some of that burden out of users' applications.

A bit more freedom...
>

Support for raw byte[] doesn't go away. In this proposal, bytes remain the
core plumbing of the system.

-n

On Mar 14, 2013, at 11:51 AM, David Koch <og...@googlemail.com> wrote:
>
> > Hi Nick,
> >
> > As an HBase user I would welcome this addition. In addition to the
> proposed
> > list of datatypes A UUID/GUID type would also be nice to have.
> >
> > Regards,
> >
> > /David
> >
> >
> > On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <nd...@gmail.com>
> wrote:
> >
> >> Hi all,
> >>
> >> I'd like to draw your attention to HBASE-8089. The desire is to add type
> >> support to HBase. There are two primary objectives: make the lives of
> >> developers building on HBase easier, and facilitate better tools on top
> of
> >> HBase. Please chime in with any feature suggestions you think we've
> missed
> >> in initial conversations.
> >>
> >> Thanks,
> >> -n
> >>
> >> [0]: https://issues.apache.org/jira/browse/HBASE-8089
> >>
>

Re: HBase type support

Posted by Michel Segel <mi...@hotmail.com>.

You do realize that having to worry about one type is easier...

A bit more freedom...


Sent from a remote device. Please excuse any typos...

Mike Segel

On Mar 14, 2013, at 11:51 AM, David Koch <og...@googlemail.com> wrote:

> Hi Nick,
> 
> As an HBase user I would welcome this addition. In addition to the proposed
> list of datatypes A UUID/GUID type would also be nice to have.
> 
> Regards,
> 
> /David
> 
> 
> On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <nd...@gmail.com> wrote:
> 
>> Hi all,
>> 
>> I'd like to draw your attention to HBASE-8089. The desire is to add type
>> support to HBase. There are two primary objectives: make the lives of
>> developers building on HBase easier, and facilitate better tools on top of
>> HBase. Please chime in with any feature suggestions you think we've missed
>> in initial conversations.
>> 
>> Thanks,
>> -n
>> 
>> [0]: https://issues.apache.org/jira/browse/HBASE-8089
>>

Re: HBase type support

Posted by David Koch <og...@googlemail.com>.

Hi Nick,

As an HBase user I would welcome this addition. In addition to the proposed
list of datatypes A UUID/GUID type would also be nice to have.

Regards,

/David

On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk <nd...@gmail.com> wrote:

> Hi all,
>
> I'd like to draw your attention to HBASE-8089. The desire is to add type
> support to HBase. There are two primary objectives: make the lives of
> developers building on HBase easier, and facilitate better tools on top of
> HBase. Please chime in with any feature suggestions you think we've missed
> in initial conversations.
>
> Thanks,
> -n
>
> [0]: https://issues.apache.org/jira/browse/HBASE-8089
>