You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Wilm Schumacher <wi...@gmail.com> on 2015/01/16 22:23:20 UTC

general question about datamodel => empty columns

Hi,

I run into a problem , which I encounter several times by now and
perhaps you can help me.

What should I include in tables where just the qualifier is needed? E.g.
in indexing you have to make the reference of the index either by
columns, or by rows in the index table. But in this way, there is no
data to put into the table.

An example for clarification:
Suppose you want to make an index for another table which indexes
something like "type of entry".

<table>
row1 data:type => type1 , data:data => foo
row2 data:type => type2 , data:data => bar
row3 data:type => type1 , data:data => baz
row4 data:type => type2 , data:data => whatever

indexing 1: indexing by columns

<index>
type1 index:row1 => ??? , index:row2 => ???
type2 index:row2 => ??? , index:row4 => ???

indexing 2: indexing by rows
<index>
type1-row1 ??:?? => ??
type1-row3 ??:?? => ??
type2-row2 ??:?? => ??
type2-row3 ??:?? => ??

works if there is any column family file to scan. Thus I need data.

either way ... I actually have to put data where it is'n needed.

What should I do to insert into the columns? By now I mostly use the
timestamp of creation, which in my opinion is quite stupid, as I have
the timestamp in the column right away. This only would waste space. I
could use empty strings (bytes), which will work, but somehow feels wrong.

What are you using? Is empty string/useless timestamp common practice?

Best wishes,

Wilm

RE: general question about datamodel => empty columns

Posted by Taeyun Kim <ta...@innowireless.com>.

Hi,

(Warning: I'm kind of a newbie...) 
I would make the tables as follows:

<table1: Single column (named 'c') to save space by avoiding the overhead by the key and multiple cells>
row1: type1 + foo
row2: type2 + bar
row3: type1 + baz
row4: type2 + whatever

<index1: Again single column, and the data value is duplicated from table1. With this you can just Scan through the index1 to get the values, avoiding Gets to table1.>
type1 + row1: foo
type1 + row3: baz
type2 + row2: bar
type2 + row4: whatever

<index2> Not needed

Cheers.

-----Original Message-----
From: Wilm Schumacher [mailto:wilm.schumacher@gmail.com] 
Sent: Saturday, January 17, 2015 6:23 AM
To: user@hbase.apache.org
Subject: general question about datamodel => empty columns

Hi,

I run into a problem , which I encounter several times by now and perhaps you can help me.

What should I include in tables where just the qualifier is needed? E.g.
in indexing you have to make the reference of the index either by columns, or by rows in the index table. But in this way, there is no data to put into the table.

An example for clarification:
Suppose you want to make an index for another table which indexes something like "type of entry".

<table>
row1 data:type => type1 , data:data => foo
row2 data:type => type2 , data:data => bar
row3 data:type => type1 , data:data => baz
row4 data:type => type2 , data:data => whatever

indexing 1: indexing by columns

<index>
type1 index:row1 => ??? , index:row2 => ???
type2 index:row2 => ??? , index:row4 => ???

indexing 2: indexing by rows
<index>
type1-row1 ??:?? => ??
type1-row3 ??:?? => ??
type2-row2 ??:?? => ??
type2-row3 ??:?? => ??

works if there is any column family file to scan. Thus I need data.

either way ... I actually have to put data where it is'n needed.

What should I do to insert into the columns? By now I mostly use the timestamp of creation, which in my opinion is quite stupid, as I have the timestamp in the column right away. This only would waste space. I could use empty strings (bytes), which will work, but somehow feels wrong.

What are you using? Is empty string/useless timestamp common practice?

Best wishes,

Wilm