You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Omkar Joshi <Om...@lntinfotech.com> on 2013/04/16 08:31:48 UTC

Data not loaded in table via ImportTSV

Hi,

The background thread is this :

http://mail-archives.apache.org/mod_mbox/hbase-user/201304.mbox/%3CE689A42B73C5A545AD77332A4FC75D8C1EFBD80952@VSHINMSMBX01.vshodc.lntinfotech.com%3E

I'm referring to the HBase doc. http://hbase.apache.org/book/ops_mgt.html#importtsv

Accordingly, my command is :

HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-0.94.6.1.jar importtsv '-Dimporttsv.separator=;' -Dimporttsv.columns=HBASE_ROW_KEY,CUSTOMER_INFO:NAME,CUSTOMER_INFO:EMAIL,CUSTOMER_INFO:ADDRESS,CUSTOMER_INFO:MOBILE  -Dimporttsv.bulk.output=hdfs://cldx-1139-1033:9000/hbase/storefileoutput CUSTOMERS hdfs://cldx-1139-1033:9000/hbase/copiedFromLocal/customer.txt

..../*classpath echoed here*/

13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/home/hduser/hadoop_ecosystem/apache_hadoop/hadoop_installation/hadoop-1.0.4/libexec/../lib/native/Linux-amd64-64
13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:os.version=3.2.0-23-generic
13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:user.name=hduser
13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/hduser
13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/hduser/hadoop_ecosystem/apache_hbase/hbase_installation/hbase-0.94.6.1/bin
13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=cldx-1140-1034:2181 sessionTimeout=180000 watcher=hconnection
13/04/16 17:18:43 INFO zookeeper.ClientCnxn: Opening socket connection to server cldx-1140-1034/172.25.6.71:2181. Will not attempt to authenticate using SASL (unknown error)
13/04/16 17:18:43 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 5483@cldx-1139-1033
13/04/16 17:18:43 INFO zookeeper.ClientCnxn: Socket connection established to cldx-1140-1034/172.25.6.71:2181, initiating session
13/04/16 17:18:43 INFO zookeeper.ClientCnxn: Session establishment complete on server cldx-1140-1034/172.25.6.71:2181, sessionid = 0x13def2889530023, negotiated timeout = 180000
13/04/16 17:18:44 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=cldx-1140-1034:2181 sessionTimeout=180000 watcher=catalogtracker-on-org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@34d03009
13/04/16 17:18:44 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 5483@cldx-1139-1033
13/04/16 17:18:44 INFO zookeeper.ClientCnxn: Opening socket connection to server cldx-1140-1034/172.25.6.71:2181. Will not attempt to authenticate using SASL (unknown error)
13/04/16 17:18:44 INFO zookeeper.ClientCnxn: Socket connection established to cldx-1140-1034/172.25.6.71:2181, initiating session
13/04/16 17:18:44 INFO zookeeper.ClientCnxn: Session establishment complete on server cldx-1140-1034/172.25.6.71:2181, sessionid = 0x13def2889530024, negotiated timeout = 180000
13/04/16 17:18:44 INFO zookeeper.ZooKeeper: Session: 0x13def2889530024 closed
13/04/16 17:18:44 INFO zookeeper.ClientCnxn: EventThread shut down
13/04/16 17:18:44 INFO mapreduce.HFileOutputFormat: Looking up current regions for table org.apache.hadoop.hbase.client.HTable@238cfdf
13/04/16 17:18:44 INFO mapreduce.HFileOutputFormat: Configuring 1 reduce partitions to match current region count
13/04/16 17:18:44 INFO mapreduce.HFileOutputFormat: Writing partition information to hdfs://cldx-1139-1033:9000/user/hduser/partitions_4159cd24-b8ff-4919-854b-a7d1da5069ad
13/04/16 17:18:44 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/04/16 17:18:44 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
13/04/16 17:18:44 INFO compress.CodecPool: Got brand-new compressor
13/04/16 17:18:44 INFO mapreduce.HFileOutputFormat: Incremental table output configured.
13/04/16 17:18:47 INFO input.FileInputFormat: Total input paths to process : 1
13/04/16 17:18:47 WARN snappy.LoadSnappy: Snappy native library not loaded
13/04/16 17:18:47 INFO mapred.JobClient: Running job: job_201304091909_0010
13/04/16 17:18:48 INFO mapred.JobClient:  map 0% reduce 0%
13/04/16 17:19:07 INFO mapred.JobClient:  map 100% reduce 0%
13/04/16 17:19:19 INFO mapred.JobClient:  map 100% reduce 100%
13/04/16 17:19:24 INFO mapred.JobClient: Job complete: job_201304091909_0010
13/04/16 17:19:24 INFO mapred.JobClient: Counters: 30
13/04/16 17:19:24 INFO mapred.JobClient:   Job Counters
13/04/16 17:19:24 INFO mapred.JobClient:     Launched reduce tasks=1
13/04/16 17:19:24 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=16567
13/04/16 17:19:24 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/04/16 17:19:24 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/04/16 17:19:24 INFO mapred.JobClient:     Launched map tasks=1
13/04/16 17:19:24 INFO mapred.JobClient:     Data-local map tasks=1
13/04/16 17:19:24 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=10953
13/04/16 17:19:24 INFO mapred.JobClient:   ImportTsv
13/04/16 17:19:24 INFO mapred.JobClient:     Bad Lines=0
13/04/16 17:19:24 INFO mapred.JobClient:   File Output Format Counters
13/04/16 17:19:24 INFO mapred.JobClient:     Bytes Written=1984
13/04/16 17:19:24 INFO mapred.JobClient:   FileSystemCounters
13/04/16 17:19:24 INFO mapred.JobClient:     FILE_BYTES_READ=1753
13/04/16 17:19:24 INFO mapred.JobClient:     HDFS_BYTES_READ=563
13/04/16 17:19:24 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=74351
13/04/16 17:19:24 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1984
13/04/16 17:19:24 INFO mapred.JobClient:   File Input Format Counters
13/04/16 17:19:24 INFO mapred.JobClient:     Bytes Read=433
13/04/16 17:19:24 INFO mapred.JobClient:   Map-Reduce Framework
13/04/16 17:19:24 INFO mapred.JobClient:     Map output materialized bytes=1600
13/04/16 17:19:24 INFO mapred.JobClient:     Map input records=5
13/04/16 17:19:24 INFO mapred.JobClient:     Reduce shuffle bytes=0
13/04/16 17:19:24 INFO mapred.JobClient:     Spilled Records=10
13/04/16 17:19:24 INFO mapred.JobClient:     Map output bytes=1574
13/04/16 17:19:24 INFO mapred.JobClient:     Total committed heap usage (bytes)=212664320
13/04/16 17:19:24 INFO mapred.JobClient:     CPU time spent (ms)=4780
13/04/16 17:19:24 INFO mapred.JobClient:     Combine input records=0
13/04/16 17:19:24 INFO mapred.JobClient:     SPLIT_RAW_BYTES=130
13/04/16 17:19:24 INFO mapred.JobClient:     Reduce input records=5
13/04/16 17:19:24 INFO mapred.JobClient:     Reduce input groups=5
13/04/16 17:19:24 INFO mapred.JobClient:     Combine output records=0
13/04/16 17:19:24 INFO mapred.JobClient:     Physical memory (bytes) snapshot=279982080
13/04/16 17:19:24 INFO mapred.JobClient:     Reduce output records=20
13/04/16 17:19:24 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2010615808
13/04/16 17:19:24 INFO mapred.JobClient:     Map output records=5

As seen, there aren't any bad lines and mapper has output 5 records(the source text file has 5 records)

The HDFS reflects the following :

hduser@cldx-1139-1033:~/hadoop_ecosystem/apache_hbase/hbase_installation/hbase-0.94.6.1/bin$ hadoop fs -ls /hbase
Warning: $HADOOP_HOME is deprecated.

Found 12 items
drwxr-xr-x   - hduser supergroup          0 2013-04-09 19:47 /hbase/-ROOT-
drwxr-xr-x   - hduser supergroup          0 2013-04-09 19:47 /hbase/.META.
drwxr-xr-x   - hduser supergroup          0 2013-04-16 16:02 /hbase/.archive
drwxr-xr-x   - hduser supergroup          0 2013-04-09 19:47 /hbase/.logs
drwxr-xr-x   - hduser supergroup          0 2013-04-09 19:47 /hbase/.oldlogs
drwxr-xr-x   - hduser supergroup          0 2013-04-16 16:05 /hbase/.tmp
drwxr-xr-x   - hduser supergroup          0 2013-04-16 16:05 /hbase/CUSTOMERS
drwxr-xr-x   - hduser supergroup          0 2013-04-16 17:14 /hbase/copiedFromLocal
-rw-r--r--   4 hduser supergroup         38 2013-04-09 19:47 /hbase/hbase.id
-rw-r--r--   4 hduser supergroup          3 2013-04-09 19:47 /hbase/hbase.version
drwxr-xr-x   - hduser supergroup          0 2013-04-16 17:19 /hbase/storefileoutput
drwxr-xr-x   - hduser supergroup          0 2013-04-09 22:03 /hbase/users
hduser@cldx-1139-1033:~/hadoop_ecosystem/apache_hbase/hbase_installation/hbase-0.94.6.1/bin$ hadoop fs -ls /hbase/storefileoutput
Warning: $HADOOP_HOME is deprecated.

Found 3 items
drwxr-xr-x   - hduser supergroup          0 2013-04-16 17:19 /hbase/storefileoutput/CUSTOMER_INFO
-rw-r--r--   4 hduser supergroup          0 2013-04-16 17:19 /hbase/storefileoutput/_SUCCESS
drwxr-xr-x   - hduser supergroup          0 2013-04-16 17:18 /hbase/storefileoutput/_logs
hduser@cldx-1139-1033:~/hadoop_ecosystem/apache_hbase/hbase_installation/hbase-0.94.6.1/bin$
hduser@cldx-1139-1033:~/hadoop_ecosystem/apache_hbase/hbase_installation/hbase-0.94.6.1/bin$ hadoop fs -ls /hbase/storefileoutput/CUSTOMER_INFO
Warning: $HADOOP_HOME is deprecated.

Found 1 items
-rw-r--r--   4 hduser supergroup       1984 2013-04-16 17:19 /hbase/storefileoutput/CUSTOMER_INFO/64a822e4ff82456785740925eccd392f

But no rows are inserted in the CUSTOMERS table :

hduser@cldx-1139-1033:~$ $HBASE_HOME/bin/hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.94.6.1, r1464658, Thu Apr  4 10:58:50 PDT 2013

hbase(main):001:0> scan 'CUSTOMERS'
ROW                              COLUMN+CELL
0 row(s) in 0.8240 seconds

Do I need to execute some additional step(CompleteBulkLoad?) to push the data - I'm not sure if this is required !

Regards,
Omkar Joshi

________________________________
The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. L&T Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail"

RE: Data not loaded in table via ImportTSV

Posted by Omkar Joshi <Om...@lntinfotech.com>.
Hi Anoop,

Actually, I got confused after reading the doc. - I thought a simple importtsv command(which also takes table name as the argument) would suffice. But as you pointed out, completebulkload is required.

HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-0.94.6.1.jar completebulkload hdfs://cldx-1139-1033:9000/hbase/storefileoutput PRODUCTS

Thanks for the help !

Regards,
Omkar Joshi


-----Original Message-----
From: Anoop Sam John [mailto:anoopsj@huawei.com] 
Sent: Tuesday, April 16, 2013 12:26 PM
To: user@hbase.apache.org
Subject: RE: Data not loaded in table via ImportTSV

Hi
               Have you used the tool, LoadIncrementalHFiles  after the ImportTSV?

-Anoop-
________________________________________
From: Omkar Joshi [Omkar.Joshi@lntinfotech.com]
Sent: Tuesday, April 16, 2013 12:01 PM
To: user@hbase.apache.org
Subject: Data not loaded in table via ImportTSV

Hi,

The background thread is this :

http://mail-archives.apache.org/mod_mbox/hbase-user/201304.mbox/%3CE689A42B73C5A545AD77332A4FC75D8C1EFBD80952@VSHINMSMBX01.vshodc.lntinfotech.com%3E

I'm referring to the HBase doc. http://hbase.apache.org/book/ops_mgt.html#importtsv

Accordingly, my command is :

HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-0.94.6.1.jar importtsv '-Dimporttsv.separator=;' -Dimporttsv.columns=HBASE_ROW_KEY,CUSTOMER_INFO:NAME,CUSTOMER_INFO:EMAIL,CUSTOMER_INFO:ADDRESS,CUSTOMER_INFO:MOBILE  -Dimporttsv.bulk.output=hdfs://cldx-1139-1033:9000/hbase/storefileoutput CUSTOMERS hdfs://cldx-1139-1033:9000/hbase/copiedFromLocal/customer.txt

..../*classpath echoed here*/

13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/home/hduser/hadoop_ecosystem/apache_hadoop/hadoop_installation/hadoop-1.0.4/libexec/../lib/native/Linux-amd64-64
13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:os.version=3.2.0-23-generic
13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:user.name=hduser
13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/hduser
13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/hduser/hadoop_ecosystem/apache_hbase/hbase_installation/hbase-0.94.6.1/bin
13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=cldx-1140-1034:2181 sessionTimeout=180000 watcher=hconnection
13/04/16 17:18:43 INFO zookeeper.ClientCnxn: Opening socket connection to server cldx-1140-1034/172.25.6.71:2181. Will not attempt to authenticate using SASL (unknown error)
13/04/16 17:18:43 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 5483@cldx-1139-1033
13/04/16 17:18:43 INFO zookeeper.ClientCnxn: Socket connection established to cldx-1140-1034/172.25.6.71:2181, initiating session
13/04/16 17:18:43 INFO zookeeper.ClientCnxn: Session establishment complete on server cldx-1140-1034/172.25.6.71:2181, sessionid = 0x13def2889530023, negotiated timeout = 180000
13/04/16 17:18:44 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=cldx-1140-1034:2181 sessionTimeout=180000 watcher=catalogtracker-on-org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@34d03009
13/04/16 17:18:44 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 5483@cldx-1139-1033
13/04/16 17:18:44 INFO zookeeper.ClientCnxn: Opening socket connection to server cldx-1140-1034/172.25.6.71:2181. Will not attempt to authenticate using SASL (unknown error)
13/04/16 17:18:44 INFO zookeeper.ClientCnxn: Socket connection established to cldx-1140-1034/172.25.6.71:2181, initiating session
13/04/16 17:18:44 INFO zookeeper.ClientCnxn: Session establishment complete on server cldx-1140-1034/172.25.6.71:2181, sessionid = 0x13def2889530024, negotiated timeout = 180000
13/04/16 17:18:44 INFO zookeeper.ZooKeeper: Session: 0x13def2889530024 closed
13/04/16 17:18:44 INFO zookeeper.ClientCnxn: EventThread shut down
13/04/16 17:18:44 INFO mapreduce.HFileOutputFormat: Looking up current regions for table org.apache.hadoop.hbase.client.HTable@238cfdf
13/04/16 17:18:44 INFO mapreduce.HFileOutputFormat: Configuring 1 reduce partitions to match current region count
13/04/16 17:18:44 INFO mapreduce.HFileOutputFormat: Writing partition information to hdfs://cldx-1139-1033:9000/user/hduser/partitions_4159cd24-b8ff-4919-854b-a7d1da5069ad
13/04/16 17:18:44 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/04/16 17:18:44 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
13/04/16 17:18:44 INFO compress.CodecPool: Got brand-new compressor
13/04/16 17:18:44 INFO mapreduce.HFileOutputFormat: Incremental table output configured.
13/04/16 17:18:47 INFO input.FileInputFormat: Total input paths to process : 1
13/04/16 17:18:47 WARN snappy.LoadSnappy: Snappy native library not loaded
13/04/16 17:18:47 INFO mapred.JobClient: Running job: job_201304091909_0010
13/04/16 17:18:48 INFO mapred.JobClient:  map 0% reduce 0%
13/04/16 17:19:07 INFO mapred.JobClient:  map 100% reduce 0%
13/04/16 17:19:19 INFO mapred.JobClient:  map 100% reduce 100%
13/04/16 17:19:24 INFO mapred.JobClient: Job complete: job_201304091909_0010
13/04/16 17:19:24 INFO mapred.JobClient: Counters: 30
13/04/16 17:19:24 INFO mapred.JobClient:   Job Counters
13/04/16 17:19:24 INFO mapred.JobClient:     Launched reduce tasks=1
13/04/16 17:19:24 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=16567
13/04/16 17:19:24 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/04/16 17:19:24 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/04/16 17:19:24 INFO mapred.JobClient:     Launched map tasks=1
13/04/16 17:19:24 INFO mapred.JobClient:     Data-local map tasks=1
13/04/16 17:19:24 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=10953
13/04/16 17:19:24 INFO mapred.JobClient:   ImportTsv
13/04/16 17:19:24 INFO mapred.JobClient:     Bad Lines=0
13/04/16 17:19:24 INFO mapred.JobClient:   File Output Format Counters
13/04/16 17:19:24 INFO mapred.JobClient:     Bytes Written=1984
13/04/16 17:19:24 INFO mapred.JobClient:   FileSystemCounters
13/04/16 17:19:24 INFO mapred.JobClient:     FILE_BYTES_READ=1753
13/04/16 17:19:24 INFO mapred.JobClient:     HDFS_BYTES_READ=563
13/04/16 17:19:24 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=74351
13/04/16 17:19:24 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1984
13/04/16 17:19:24 INFO mapred.JobClient:   File Input Format Counters
13/04/16 17:19:24 INFO mapred.JobClient:     Bytes Read=433
13/04/16 17:19:24 INFO mapred.JobClient:   Map-Reduce Framework
13/04/16 17:19:24 INFO mapred.JobClient:     Map output materialized bytes=1600
13/04/16 17:19:24 INFO mapred.JobClient:     Map input records=5
13/04/16 17:19:24 INFO mapred.JobClient:     Reduce shuffle bytes=0
13/04/16 17:19:24 INFO mapred.JobClient:     Spilled Records=10
13/04/16 17:19:24 INFO mapred.JobClient:     Map output bytes=1574
13/04/16 17:19:24 INFO mapred.JobClient:     Total committed heap usage (bytes)=212664320
13/04/16 17:19:24 INFO mapred.JobClient:     CPU time spent (ms)=4780
13/04/16 17:19:24 INFO mapred.JobClient:     Combine input records=0
13/04/16 17:19:24 INFO mapred.JobClient:     SPLIT_RAW_BYTES=130
13/04/16 17:19:24 INFO mapred.JobClient:     Reduce input records=5
13/04/16 17:19:24 INFO mapred.JobClient:     Reduce input groups=5
13/04/16 17:19:24 INFO mapred.JobClient:     Combine output records=0
13/04/16 17:19:24 INFO mapred.JobClient:     Physical memory (bytes) snapshot=279982080
13/04/16 17:19:24 INFO mapred.JobClient:     Reduce output records=20
13/04/16 17:19:24 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2010615808
13/04/16 17:19:24 INFO mapred.JobClient:     Map output records=5

As seen, there aren't any bad lines and mapper has output 5 records(the source text file has 5 records)

The HDFS reflects the following :

hduser@cldx-1139-1033:~/hadoop_ecosystem/apache_hbase/hbase_installation/hbase-0.94.6.1/bin$ hadoop fs -ls /hbase
Warning: $HADOOP_HOME is deprecated.

Found 12 items
drwxr-xr-x   - hduser supergroup          0 2013-04-09 19:47 /hbase/-ROOT-
drwxr-xr-x   - hduser supergroup          0 2013-04-09 19:47 /hbase/.META.
drwxr-xr-x   - hduser supergroup          0 2013-04-16 16:02 /hbase/.archive
drwxr-xr-x   - hduser supergroup          0 2013-04-09 19:47 /hbase/.logs
drwxr-xr-x   - hduser supergroup          0 2013-04-09 19:47 /hbase/.oldlogs
drwxr-xr-x   - hduser supergroup          0 2013-04-16 16:05 /hbase/.tmp
drwxr-xr-x   - hduser supergroup          0 2013-04-16 16:05 /hbase/CUSTOMERS
drwxr-xr-x   - hduser supergroup          0 2013-04-16 17:14 /hbase/copiedFromLocal
-rw-r--r--   4 hduser supergroup         38 2013-04-09 19:47 /hbase/hbase.id
-rw-r--r--   4 hduser supergroup          3 2013-04-09 19:47 /hbase/hbase.version
drwxr-xr-x   - hduser supergroup          0 2013-04-16 17:19 /hbase/storefileoutput
drwxr-xr-x   - hduser supergroup          0 2013-04-09 22:03 /hbase/users
hduser@cldx-1139-1033:~/hadoop_ecosystem/apache_hbase/hbase_installation/hbase-0.94.6.1/bin$ hadoop fs -ls /hbase/storefileoutput
Warning: $HADOOP_HOME is deprecated.

Found 3 items
drwxr-xr-x   - hduser supergroup          0 2013-04-16 17:19 /hbase/storefileoutput/CUSTOMER_INFO
-rw-r--r--   4 hduser supergroup          0 2013-04-16 17:19 /hbase/storefileoutput/_SUCCESS
drwxr-xr-x   - hduser supergroup          0 2013-04-16 17:18 /hbase/storefileoutput/_logs
hduser@cldx-1139-1033:~/hadoop_ecosystem/apache_hbase/hbase_installation/hbase-0.94.6.1/bin$
hduser@cldx-1139-1033:~/hadoop_ecosystem/apache_hbase/hbase_installation/hbase-0.94.6.1/bin$ hadoop fs -ls /hbase/storefileoutput/CUSTOMER_INFO
Warning: $HADOOP_HOME is deprecated.

Found 1 items
-rw-r--r--   4 hduser supergroup       1984 2013-04-16 17:19 /hbase/storefileoutput/CUSTOMER_INFO/64a822e4ff82456785740925eccd392f

But no rows are inserted in the CUSTOMERS table :

hduser@cldx-1139-1033:~$ $HBASE_HOME/bin/hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.94.6.1, r1464658, Thu Apr  4 10:58:50 PDT 2013

hbase(main):001:0> scan 'CUSTOMERS'
ROW                              COLUMN+CELL
0 row(s) in 0.8240 seconds

Do I need to execute some additional step(CompleteBulkLoad?) to push the data - I'm not sure if this is required !

Regards,
Omkar Joshi

________________________________
The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. L&T Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail"

RE: Data not loaded in table via ImportTSV

Posted by Anoop Sam John <an...@huawei.com>.
Hi
               Have you used the tool, LoadIncrementalHFiles  after the ImportTSV?

-Anoop-
________________________________________
From: Omkar Joshi [Omkar.Joshi@lntinfotech.com]
Sent: Tuesday, April 16, 2013 12:01 PM
To: user@hbase.apache.org
Subject: Data not loaded in table via ImportTSV

Hi,

The background thread is this :

http://mail-archives.apache.org/mod_mbox/hbase-user/201304.mbox/%3CE689A42B73C5A545AD77332A4FC75D8C1EFBD80952@VSHINMSMBX01.vshodc.lntinfotech.com%3E

I'm referring to the HBase doc. http://hbase.apache.org/book/ops_mgt.html#importtsv

Accordingly, my command is :

HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-0.94.6.1.jar importtsv '-Dimporttsv.separator=;' -Dimporttsv.columns=HBASE_ROW_KEY,CUSTOMER_INFO:NAME,CUSTOMER_INFO:EMAIL,CUSTOMER_INFO:ADDRESS,CUSTOMER_INFO:MOBILE  -Dimporttsv.bulk.output=hdfs://cldx-1139-1033:9000/hbase/storefileoutput CUSTOMERS hdfs://cldx-1139-1033:9000/hbase/copiedFromLocal/customer.txt

..../*classpath echoed here*/

13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/home/hduser/hadoop_ecosystem/apache_hadoop/hadoop_installation/hadoop-1.0.4/libexec/../lib/native/Linux-amd64-64
13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:os.version=3.2.0-23-generic
13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:user.name=hduser
13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/hduser
13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/hduser/hadoop_ecosystem/apache_hbase/hbase_installation/hbase-0.94.6.1/bin
13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=cldx-1140-1034:2181 sessionTimeout=180000 watcher=hconnection
13/04/16 17:18:43 INFO zookeeper.ClientCnxn: Opening socket connection to server cldx-1140-1034/172.25.6.71:2181. Will not attempt to authenticate using SASL (unknown error)
13/04/16 17:18:43 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 5483@cldx-1139-1033
13/04/16 17:18:43 INFO zookeeper.ClientCnxn: Socket connection established to cldx-1140-1034/172.25.6.71:2181, initiating session
13/04/16 17:18:43 INFO zookeeper.ClientCnxn: Session establishment complete on server cldx-1140-1034/172.25.6.71:2181, sessionid = 0x13def2889530023, negotiated timeout = 180000
13/04/16 17:18:44 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=cldx-1140-1034:2181 sessionTimeout=180000 watcher=catalogtracker-on-org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@34d03009
13/04/16 17:18:44 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 5483@cldx-1139-1033
13/04/16 17:18:44 INFO zookeeper.ClientCnxn: Opening socket connection to server cldx-1140-1034/172.25.6.71:2181. Will not attempt to authenticate using SASL (unknown error)
13/04/16 17:18:44 INFO zookeeper.ClientCnxn: Socket connection established to cldx-1140-1034/172.25.6.71:2181, initiating session
13/04/16 17:18:44 INFO zookeeper.ClientCnxn: Session establishment complete on server cldx-1140-1034/172.25.6.71:2181, sessionid = 0x13def2889530024, negotiated timeout = 180000
13/04/16 17:18:44 INFO zookeeper.ZooKeeper: Session: 0x13def2889530024 closed
13/04/16 17:18:44 INFO zookeeper.ClientCnxn: EventThread shut down
13/04/16 17:18:44 INFO mapreduce.HFileOutputFormat: Looking up current regions for table org.apache.hadoop.hbase.client.HTable@238cfdf
13/04/16 17:18:44 INFO mapreduce.HFileOutputFormat: Configuring 1 reduce partitions to match current region count
13/04/16 17:18:44 INFO mapreduce.HFileOutputFormat: Writing partition information to hdfs://cldx-1139-1033:9000/user/hduser/partitions_4159cd24-b8ff-4919-854b-a7d1da5069ad
13/04/16 17:18:44 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/04/16 17:18:44 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
13/04/16 17:18:44 INFO compress.CodecPool: Got brand-new compressor
13/04/16 17:18:44 INFO mapreduce.HFileOutputFormat: Incremental table output configured.
13/04/16 17:18:47 INFO input.FileInputFormat: Total input paths to process : 1
13/04/16 17:18:47 WARN snappy.LoadSnappy: Snappy native library not loaded
13/04/16 17:18:47 INFO mapred.JobClient: Running job: job_201304091909_0010
13/04/16 17:18:48 INFO mapred.JobClient:  map 0% reduce 0%
13/04/16 17:19:07 INFO mapred.JobClient:  map 100% reduce 0%
13/04/16 17:19:19 INFO mapred.JobClient:  map 100% reduce 100%
13/04/16 17:19:24 INFO mapred.JobClient: Job complete: job_201304091909_0010
13/04/16 17:19:24 INFO mapred.JobClient: Counters: 30
13/04/16 17:19:24 INFO mapred.JobClient:   Job Counters
13/04/16 17:19:24 INFO mapred.JobClient:     Launched reduce tasks=1
13/04/16 17:19:24 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=16567
13/04/16 17:19:24 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/04/16 17:19:24 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/04/16 17:19:24 INFO mapred.JobClient:     Launched map tasks=1
13/04/16 17:19:24 INFO mapred.JobClient:     Data-local map tasks=1
13/04/16 17:19:24 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=10953
13/04/16 17:19:24 INFO mapred.JobClient:   ImportTsv
13/04/16 17:19:24 INFO mapred.JobClient:     Bad Lines=0
13/04/16 17:19:24 INFO mapred.JobClient:   File Output Format Counters
13/04/16 17:19:24 INFO mapred.JobClient:     Bytes Written=1984
13/04/16 17:19:24 INFO mapred.JobClient:   FileSystemCounters
13/04/16 17:19:24 INFO mapred.JobClient:     FILE_BYTES_READ=1753
13/04/16 17:19:24 INFO mapred.JobClient:     HDFS_BYTES_READ=563
13/04/16 17:19:24 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=74351
13/04/16 17:19:24 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1984
13/04/16 17:19:24 INFO mapred.JobClient:   File Input Format Counters
13/04/16 17:19:24 INFO mapred.JobClient:     Bytes Read=433
13/04/16 17:19:24 INFO mapred.JobClient:   Map-Reduce Framework
13/04/16 17:19:24 INFO mapred.JobClient:     Map output materialized bytes=1600
13/04/16 17:19:24 INFO mapred.JobClient:     Map input records=5
13/04/16 17:19:24 INFO mapred.JobClient:     Reduce shuffle bytes=0
13/04/16 17:19:24 INFO mapred.JobClient:     Spilled Records=10
13/04/16 17:19:24 INFO mapred.JobClient:     Map output bytes=1574
13/04/16 17:19:24 INFO mapred.JobClient:     Total committed heap usage (bytes)=212664320
13/04/16 17:19:24 INFO mapred.JobClient:     CPU time spent (ms)=4780
13/04/16 17:19:24 INFO mapred.JobClient:     Combine input records=0
13/04/16 17:19:24 INFO mapred.JobClient:     SPLIT_RAW_BYTES=130
13/04/16 17:19:24 INFO mapred.JobClient:     Reduce input records=5
13/04/16 17:19:24 INFO mapred.JobClient:     Reduce input groups=5
13/04/16 17:19:24 INFO mapred.JobClient:     Combine output records=0
13/04/16 17:19:24 INFO mapred.JobClient:     Physical memory (bytes) snapshot=279982080
13/04/16 17:19:24 INFO mapred.JobClient:     Reduce output records=20
13/04/16 17:19:24 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2010615808
13/04/16 17:19:24 INFO mapred.JobClient:     Map output records=5

As seen, there aren't any bad lines and mapper has output 5 records(the source text file has 5 records)

The HDFS reflects the following :

hduser@cldx-1139-1033:~/hadoop_ecosystem/apache_hbase/hbase_installation/hbase-0.94.6.1/bin$ hadoop fs -ls /hbase
Warning: $HADOOP_HOME is deprecated.

Found 12 items
drwxr-xr-x   - hduser supergroup          0 2013-04-09 19:47 /hbase/-ROOT-
drwxr-xr-x   - hduser supergroup          0 2013-04-09 19:47 /hbase/.META.
drwxr-xr-x   - hduser supergroup          0 2013-04-16 16:02 /hbase/.archive
drwxr-xr-x   - hduser supergroup          0 2013-04-09 19:47 /hbase/.logs
drwxr-xr-x   - hduser supergroup          0 2013-04-09 19:47 /hbase/.oldlogs
drwxr-xr-x   - hduser supergroup          0 2013-04-16 16:05 /hbase/.tmp
drwxr-xr-x   - hduser supergroup          0 2013-04-16 16:05 /hbase/CUSTOMERS
drwxr-xr-x   - hduser supergroup          0 2013-04-16 17:14 /hbase/copiedFromLocal
-rw-r--r--   4 hduser supergroup         38 2013-04-09 19:47 /hbase/hbase.id
-rw-r--r--   4 hduser supergroup          3 2013-04-09 19:47 /hbase/hbase.version
drwxr-xr-x   - hduser supergroup          0 2013-04-16 17:19 /hbase/storefileoutput
drwxr-xr-x   - hduser supergroup          0 2013-04-09 22:03 /hbase/users
hduser@cldx-1139-1033:~/hadoop_ecosystem/apache_hbase/hbase_installation/hbase-0.94.6.1/bin$ hadoop fs -ls /hbase/storefileoutput
Warning: $HADOOP_HOME is deprecated.

Found 3 items
drwxr-xr-x   - hduser supergroup          0 2013-04-16 17:19 /hbase/storefileoutput/CUSTOMER_INFO
-rw-r--r--   4 hduser supergroup          0 2013-04-16 17:19 /hbase/storefileoutput/_SUCCESS
drwxr-xr-x   - hduser supergroup          0 2013-04-16 17:18 /hbase/storefileoutput/_logs
hduser@cldx-1139-1033:~/hadoop_ecosystem/apache_hbase/hbase_installation/hbase-0.94.6.1/bin$
hduser@cldx-1139-1033:~/hadoop_ecosystem/apache_hbase/hbase_installation/hbase-0.94.6.1/bin$ hadoop fs -ls /hbase/storefileoutput/CUSTOMER_INFO
Warning: $HADOOP_HOME is deprecated.

Found 1 items
-rw-r--r--   4 hduser supergroup       1984 2013-04-16 17:19 /hbase/storefileoutput/CUSTOMER_INFO/64a822e4ff82456785740925eccd392f

But no rows are inserted in the CUSTOMERS table :

hduser@cldx-1139-1033:~$ $HBASE_HOME/bin/hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.94.6.1, r1464658, Thu Apr  4 10:58:50 PDT 2013

hbase(main):001:0> scan 'CUSTOMERS'
ROW                              COLUMN+CELL
0 row(s) in 0.8240 seconds

Do I need to execute some additional step(CompleteBulkLoad?) to push the data - I'm not sure if this is required !

Regards,
Omkar Joshi

________________________________
The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. L&T Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail"