You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2015/03/30 13:23:12 UTC

[Nutch Wiki] Update of "NutchHBaseHiveMapping" by talat

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "NutchHBaseHiveMapping" page has been changed by talat:
https://wiki.apache.org/nutch/NutchHBaseHiveMapping

Comment:
Hive mapping query for Nutch  2.x with Hbase Datastore

New page:
When you need to map your HBase table which is used by Nutch 2.x, You may use below query in order to map it to Hive. Please fill in <crawlId> tags for your owns. This query can be used for all the sections which use Hive metastore. i.e. Impala

CREATE EXTERNAL TABLE '''''<crawlId>'''''_webpage (
 key string, baseUrl string, status int, prevFetchTime bigint, fetchTime bigint, fetchInterval bigint, retriesSinceFetch int, reprUrl string, content string, contentType string, protocolStatus string, modifiedTime bigint, prevModifiedTime bigint, batchId string, title string, text string, parseStatus int, signature string, prevSignature string, score int, headers map<string,string>, inlinks map<string,string>, outlinks map<string,string>, metadata map<string,string>, markers map<string,string>
) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (
 "hbase.columns.mapping" = ":key,f:bas,f:st,f:pts#b,f:ts#b,f:fi#b,f:rsf,f:rpr,f:cnt,f:typ,f:prot,f:mod#b,f:pmod#b,f:bid,p:t,p:c,p:st,p:sig,p:psig,s:s,h:,il:,ol:,mtdt:,mk:"
) 
TBLPROPERTIES (
 "hbase.table.name" = "'''''<crawlId>'''''_webpage"
);