You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gora.apache.org by Alfonso Nishikawa <al...@gmail.com> on 2013/02/08 01:02:02 UTC

Updated GORA-174 HBase information - unions

Hi all,

I updated GORA-174 issue info about HBase backend at [0]. Any thoughts? I
think now is better expressed.
If no one think is wrong, I will implement solution-1 and solution-2(this
means maybe quite work, so do we maintain it? -I vote yes).

I had to restore my git server, but in this case not all went right, so now
is up again at [1].

Best regards,

Alfonso Nishikawa

[0] https://people.apache.org/~alfonsonishikawa/gora-174.html
[1]
http://git.nishilua.com/do/gora_nishikawa.git/shortlog/refs/heads/GORA-174


"Drinking bloody marys all night will make you feel like a corpse in the
morning."

Re: Updated GORA-174 HBase information - unions

Posted by Renato Marroquín Mogrovejo <re...@gmail.com>.
Hi,


2013/2/10 Alfonso Nishikawa <al...@gmail.com>:
> Hi Renato,
>
>> So what you are proposing is to store and extra index at the beginning
>> of the actual value? or does HBase do this automatically? What about
>> if bytes were being written? couldn't some type of corruption happen
>> and make this unusable?
>
> The extra byte at the beginning of the actual value is part of Avro :)
> Gora-hbase must adhere to avro specs, so that is really the union
> sourcecode update.
> In the case of bytes, first is encoded a 'long' with the length of the
> bytes, followed with the bytes data.
> I got all from Avro Specs at [2].

Thanks! I overlooked the binary encoding specification ;)
The problem with Cassandra is that not everything is written down as
bytes (well it probably is but deeper down in the code). Please look
at column types [1].
So what would you suggest to do in cases where non-appendable column
types are used e.g. BooleanType, UUIDType, and others? I mean in
columns storing integers or decimals, I think we could append a single
value to determine what type of serializer to use, but I dunno what to
do in those other cases.

>>> think now is better expressed.
>>> If no one think is wrong, I will implement solution-1 and solution-2(this
>>> means maybe quite work, so do we maintain it? -I vote yes).
>>
>> So does your solution have two parts? or are they two separate
>> possible solutions?
>
> There are two potencial different problems (incompatibilities with
> legacy data), so we can choose to leave them behind both, only one, or
> none. Lewis voted for facing both (same as I), so I guess we will
> mainaint data compatibility until version 1.0.

This is a part I am not understanding very well. You guys are saying
that legacy data is a problem, but why is this a problem if we haven't
been supporting Avro Union in the past? This is a new feature, not an
upgrade. And for what I am understanding, the second issue was on
marking as deprecated the support for Union data types. But then
again, if we are able to support Union data types, this would be the
first time.
Am I understanding things correctly here? Lewis? Alfonso? anyone else?

>> You said on another email that HBase could persist Union data types
>> directly without having to modify it (did I get that right? or am I
>> confusing stuff? ) so implementing this would be just to tell HBase to
>> save the union data type but not actually writing this extra byte? I
>> wasn't able to find the avro documentation talking about this, could
>> you please point me to where this is?
>
> Sorry, surely my fault because I always express myself wrong. You need
> to write that index. Solution 1 [3] avoids writing that index but is
> an exception for only null-or-onetype unions.

Ok, I see. But what about unions with more than one type? shouldn't we
think in solving this once for all?
We also have to keep in mind that the same solution might not be
applicable to all data stores, but we should be able to provide the
same features across all the supported data stores.

>>> I had to restore my git server, but in this case not all went right, so now
>>> is up again at [1].
>>
>> Thanks! and great work documenting this issue! (:


Renato M.

[1] http://www.datastax.com/docs/1.0/ddl/column_family#about-data-types-comparators-and-validators


>>
>> Renato M.
>
> Thank you for your comments and questions! :)
>
> Best regards,
>
> Alfonso Nishikawa
>
> [2] - http://avro.apache.org/docs/current/spec.html#binary_encoding
> [3] - https://people.apache.org/~alfonsonishikawa/gora-174.html

Re: Updated GORA-174 HBase information - unions

Posted by Alfonso Nishikawa <al...@gmail.com>.
Hi Renato,

> So what you are proposing is to store and extra index at the beginning
> of the actual value? or does HBase do this automatically? What about
> if bytes were being written? couldn't some type of corruption happen
> and make this unusable?

The extra byte at the beginning of the actual value is part of Avro :)
Gora-hbase must adhere to avro specs, so that is really the union
sourcecode update.
In the case of bytes, first is encoded a 'long' with the length of the
bytes, followed with the bytes data.
I got all from Avro Specs at [2].

>> think now is better expressed.
>> If no one think is wrong, I will implement solution-1 and solution-2(this
>> means maybe quite work, so do we maintain it? -I vote yes).
>
> So does your solution have two parts? or are they two separate
> possible solutions?

There are two potencial different problems (incompatibilities with
legacy data), so we can choose to leave them behind both, only one, or
none. Lewis voted for facing both (same as I), so I guess we will
mainaint data compatibility until version 1.0.

> You said on another email that HBase could persist Union data types
> directly without having to modify it (did I get that right? or am I
> confusing stuff? ) so implementing this would be just to tell HBase to
> save the union data type but not actually writing this extra byte? I
> wasn't able to find the avro documentation talking about this, could
> you please point me to where this is?

Sorry, surely my fault because I always express myself wrong. You need
to write that index. Solution 1 [3] avoids writing that index but is
an exception for only null-or-onetype unions.

>> I had to restore my git server, but in this case not all went right, so now
>> is up again at [1].
>
> Thanks! and great work documenting this issue! (:
>
>
> Renato M.

Thank you for your comments and questions! :)

Best regards,

Alfonso Nishikawa

[2] - http://avro.apache.org/docs/current/spec.html#binary_encoding
[3] - https://people.apache.org/~alfonsonishikawa/gora-174.html

Re: Updated GORA-174 HBase information - unions

Posted by Renato Marroquín Mogrovejo <re...@gmail.com>.
Hi Alfonso,


2013/2/7 Alfonso Nishikawa <al...@gmail.com>:
> Hi all,
>
> I updated GORA-174 issue info about HBase backend at [0]. Any thoughts?

So what you are proposing is to store and extra index at the beginning
of the actual value? or does HBase do this automatically? What about
if bytes were being written? couldn't some type of corruption happen
and make this unusable?

 col_name       content:index+value
 ----------     ---------------------------
 fam:mytext     \x01This is the text


> think now is better expressed.
> If no one think is wrong, I will implement solution-1 and solution-2(this
> means maybe quite work, so do we maintain it? -I vote yes).

So does your solution have two parts? or are they two separate
possible solutions?
You said on another email that HBase could persist Union data types
directly without having to modify it (did I get that right? or am I
confusing stuff? ) so implementing this would be just to tell HBase to
save the union data type but not actually writing this extra byte? I
wasn't able to find the avro documentation talking about this, could
you please point me to where this is?

> I had to restore my git server, but in this case not all went right, so now
> is up again at [1].

Thanks! and great work documenting this issue! (:


Renato M.

> Best regards,
>
> Alfonso Nishikawa
>
> [0] https://people.apache.org/~alfonsonishikawa/gora-174.html
> [1]
> http://git.nishilua.com/do/gora_nishikawa.git/shortlog/refs/heads/GORA-174
>
>
> "Drinking bloody marys all night will make you feel like a corpse in the
> morning."