You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Fuad Efendi <fe...@outsideiq.com> on 2011/08/03 16:25:27 UTC

Primary Key Design

Hi,


I am starting to use following scheme for primary keys:
SHA256(URL) + "-RAW" Primary Key Schema
<https://outsideiq.jira.com/browse/CA-107>



RATIONALE:
* PKs  in Lily (user-defined) will be prepended "USER." and I can't use URI
for instance (it contains dots which is special character in current
version)
* Additionally to SHA-256-generated PK, Lily will still use UUID (which is
really unique) for versioningŠ
* IMPORTANT: we need randomize Pks; it is best practice with Hbase (data
will be randomly distributed in a cluster)

and I suggest to use similar SHA256(JSON-Object-in-UTF8) + "-OIQ" (it is
postfix so that we will have good "randomization"; in Hbase, all data is
physically sorted by PK)
- since all OIQ objects will be stored denormalized as JSON (string type
Lily) (note, it will be UTF-8 encoded, I believe it is also part of
ECMA-specs)




/**

 * {@link 
http://stackoverflow.com/questions/221165/pros-and-cons-of-using-md5-hash-of
-uri-as-the-primary-key-in-a-database}

 * 

 * @author Fuad

 *

 */

public class SHA256 {



public static final String SHA256(byte[] bytes) throws
NoSuchAlgorithmException {

MessageDigest md = MessageDigest.getInstance("SHA-256");

md.update(bytes);

byte[] mdbytes = md.digest();



// convert the byte to hex format

StringBuffer hexString = new StringBuffer();

for (int i = 0; i < mdbytes.length; i++) {

String hex = Integer.toHexString(0xff & mdbytes[i]);

if (hex.length() == 1)

hexString.append('0');

hexString.append(hex);

}



return hexString.toString();

}





public static final String SHA256(String text) throws
NoSuchAlgorithmException, UnsupportedEncodingException  {

return SHA256(text.getBytes("UTF-8"));

}



}











-- 
Fuad Efendi






Re: Primary Key Design

Posted by Fuad Efendi <fu...@efendi.ca>.
Such design is enforced for RAW: we need to keep history of HTMLs under
the same ID value, that's why first candidate for ID is URL, and finally
we use SHA(URL)

For OIQ, it must be carefully planned. SHA(JSON) has benefit of implicit
"equals" implementation (JSON objects are not the same if ID := SHA(JSON)
is different)

-Fuad






On 11-08-03 10:25 AM, "Fuad Efendi" <fe...@outsideiq.com> wrote:

>Hi,
>
>
>I am starting to use following scheme for primary keys:
>SHA256(URL) + "-RAW" Primary Key Schema
><https://outsideiq.jira.com/browse/CA-107>
>
>
>
>RATIONALE:
>* PKs  in Lily (user-defined) will be prepended "USER." and I can't use
>URI
>for instance (it contains dots which is special character in current
>version)
>* Additionally to SHA-256-generated PK, Lily will still use UUID (which is
>really unique) for versioning?
>* IMPORTANT: we need randomize Pks; it is best practice with Hbase (data
>will be randomly distributed in a cluster)
>
>and I suggest to use similar SHA256(JSON-Object-in-UTF8) + "-OIQ" (it is
>postfix so that we will have good "randomization"; in Hbase, all data is
>physically sorted by PK)
>- since all OIQ objects will be stored denormalized as JSON (string type
>Lily) (note, it will be UTF-8 encoded, I believe it is also part of
>ECMA-specs)
>
>
>
>
>/**
>
> * {@link 
>http://stackoverflow.com/questions/221165/pros-and-cons-of-using-md5-hash-
>of
>-uri-as-the-primary-key-in-a-database}
>
> * 
>
> * @author Fuad
>
> *
>
> */
>
>public class SHA256 {
>
>
>
>public static final String SHA256(byte[] bytes) throws
>NoSuchAlgorithmException {
>
>MessageDigest md = MessageDigest.getInstance("SHA-256");
>
>md.update(bytes);
>
>byte[] mdbytes = md.digest();
>
>
>
>// convert the byte to hex format
>
>StringBuffer hexString = new StringBuffer();
>
>for (int i = 0; i < mdbytes.length; i++) {
>
>String hex = Integer.toHexString(0xff & mdbytes[i]);
>
>if (hex.length() == 1)
>
>hexString.append('0');
>
>hexString.append(hex);
>
>}
>
>
>
>return hexString.toString();
>
>}
>
>
>
>
>
>public static final String SHA256(String text) throws
>NoSuchAlgorithmException, UnsupportedEncodingException  {
>
>return SHA256(text.getBytes("UTF-8"));
>
>}
>
>
>
>}
>
>
>
>
>
>
>
>
>
>
>
>-- 
>Fuad Efendi
>
>
>
>
>