You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "Hamilton, Robert (Austin)" <ro...@hp.com> on 2012/02/23 22:51:29 UTC

Still problems with index

I am still running into an issue with index not returning all my data. This is with hive 0.8.1.  I'm not sure where to go from here and open to suggestions.

It almost looks as if my upgrade (from 0.7.1) to 0.8.1 has some issue - as also the autoindex feature does not seem to work for me.
For the purpose of this test I kept Hive 0.7.1 as is but I installed Hive 0.8.1 into a separate directory and used a different metastore (using mysql) for it.
This is just on the hope that I can keep the existing installation unchanged and still test the newer version. I set HIVE_HOME to the 8.1 directory and put all the jars in the lib into the CLASSPATH before invoking hive in the test.

After I run the test, I have 533 rows when index is not used, zero rows with index.  It should be 533 rows.
Corresponding test using uncompressed table returns 533 rows both with/without index.

Each of the four steps were run sequentially but in separate hive session.


1.       Test table created this way:
SET hive.exec.compress.output=true;
SET io.seqfile.compression.type=BLOCK;
SET mapred.output.compression.codec = com.hadoop.compression.lzo.LzopCodec;

create table omnic as select * from omni;


2.       index created this way:
drop index omni_sess on omnic;
SET hive.exec.compress.output=false;
SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
create index omni_sess on table omnic(session_id) as 'COMPACT' with deferred rebuild in table omnic_sess;
alter index omni_sess on omnic rebuild;


3.       Sample table:

SET hive.exec.compress.output=false;

insert overwrite directory '/user/robert/bobc' select `_bucketname`,`_offsets` from omnic_sess a join sampled b on a.session_id=b.session_id where a.session_id is not null;


4.       Finally the test itself:

SET hive.index.compact.file=/user/robert/bobc;
SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
insert overwrite directory 'testnox' select /*+ mapjoin(b) */  a.session_id,a.hit_epoc_sec  from omnic a join sampled b on a.session_id=b.session_id where a.session_id is not null;

set hive.optimize.autoindex=true;
set hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat;
insert overwrite directory 'testidx' select /*+ mapjoin(b) */  a.session_id,a.hit_epoc_sec  from omnic a join sampled b on a.session_id=b.session_id where a.session_id is not null;



[hdfs@txn4pchad05 test]$ hadoop fs -text /user/robert/bobc/*|head
12/02/23 21:29:53 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
12/02/23 21:29:53 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev fatal: Not a git repository (or any of the parent directories): .git]
hdfs://txn4pchad01.usa.hp.com/user/hive/warehouse/omnic/000113_0.lzo122167654122183303122173507122165476122180645122170417122175969122178155
hdfs://txn4pchad01.usa.hp.com/user/hive/warehouse/omnic/000070_0.lzo217747089
hdfs://txn4pchad01.usa.hp.com/user/hive/warehouse/omnic/000142_0.lzo101758512101751307101723208101755283101737621101712562101734346101729300101717005101719070101726197101746274101740611101732296101710463101753412101743471101721157101748344101715031
hdfs://txn4pchad01.usa.hp.com/user/hive/warehouse/omnic/000070_0.lzo217784824
hdfs://txn4pchad01.usa.hp.com/user/hive/warehouse/omnic/000113_0.lzo122416609
hdfs://txn4pchad01.usa.hp.com/user/hive/warehouse/omnic/000125_0.lzo71150312
hdfs://txn4pchad01.usa.hp.com/user/hive/warehouse/omnic/000142_0.lzo101955949101965898101989168101863966101914106101831065101960915101859329101943971101842195101980791101870322101861642101918637102004882101837518101975824101875893101958430101898913101890798101839915101946990101834331101986822101866949101873585101910562101953630101924255101968381101894098101854877101846609101935603101932331101930145101973347101857217101901404101963399101883316102001496102007314101998177101937945101885813101927798101983279101994635101896429101888311101848988101921911101970864101852404101978301101941481101904948101950365101916452101844501101878364101991354101908221101828157101880840
hdfs://txn4pchad01.usa.hp.com/user/hive/warehouse/omnic/000104_0.lzo174169190
hdfs://txn4pchad01.usa.hp.com/user/hive/warehouse/omnic/000104_0.lzo174222731
hdfs://txn4pchad01.usa.hp.com/user/hive/warehouse/omnic/000100_0.lzo99993323

Robert Hamilton
HP.com IT
512.432.8445 office |  Robert.Hamllton@hp.com<ma...@hp.com>
14231 Tandem Blvd | Austin | TX 78728