You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by John Ryan <jo...@gmail.com> on 2008/08/19 17:58:57 UTC

Compactions and deserialization

Hi

I was perusing through the code in the HStore.java class. In the function
compact() which takes an array of CompactionReaders there is seems to be no
deserialization happening when you read the values for the same key from
different readers. What if one of readers had column C1 and another reader
had column C2 for the same key - wouldn't you want both the values in the
compacted file? From the code it looks like you pick whatever is in the
latest file? Am I right or am I missing something here? How do you resolve
the data so that every peice makes it to the compacted file?

-JRR

RE: Compactions and deserialization

Posted by Jim Kellerman <ji...@powerset.com>.
Comments inline below:
> -----Original Message-----
> From: John Ryan [mailto:john.reliance.ryan@gmail.com]
> Sent: Tuesday, August 19, 2008 8:59 AM
> To: hbase-user@hadoop.apache.org
> Subject: Compactions and deserialization
>
> Hi
>
> I was perusing through the code in the HStore.java class. In the function
> compact() which takes an array of CompactionReaders there is seems to be no
> deserialization happening when you read the values for the same key from
> different readers. What if one of readers had column C1 and another reader
> had column C2 for the same key - wouldn't you want both the values in the
> compacted file? From the code it looks like you pick whatever is in the
> latest file? Am I right or am I missing something here? How do you resolve
> the data so that every peice makes it to the compacted file?

An HStore manages a single column family. All the keys and data are stored in a Hadoop MapFile (org.apache.hadoop.io.MapFile). The keys are always HStoreKey and the values are always ImmutableBytesWritable.

As for two files containing the same key, remember that an HStoreKey is row/column/timestamp

If the timestamps are identical for the same row/column, then only the most recent value applies.


> -JRR