You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Omkar Joshi <Om...@lntinfotech.com> on 2013/04/17 06:31:09 UTC

Loading text files from local file system

The background thread is here :

http://mail-archives.apache.org/mod_mbox/hbase-user/201304.mbox/%3CE689A42B73C5A545AD77332A4FC75D8C1EFBE84153@VSHINMSMBX01.vshodc.lntinfotech.com%3E

Following are the commands that I'm using to load files onto HBase :

HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-0.94.6.1.jar importtsv '-Dimporttsv.separator=;' -Dimporttsv.columns=HBASE_ROW_KEY,PRODUCT_INFO:NAME,PRODUCT_INFO:CATEGORY,PRODUCT_INFO:GROUP,PRODUCT_INFO:COMPANY,PRODUCT_INFO:COST,PRODUCT_INFO:COLOR,PRODUCT_INFO:BLANK_COLUMN -Dimporttsv.bulk.output=hdfs://cldx-1139-1033:9000/hbase/storefileoutput_6 PRODUCTS hdfs://cldx-1139-1033:9000/hbase/copiedFromLocal/product_6.txt

HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-0.94.6.1.jar completebulkload hdfs://cldx-1139-1033:9000/hbase/storefileoutput_6 PRODUCTS

As seen, the text files to be loaded in HBase first need to be loaded on HDFS. Given our infrastructure constraints/limitations, I'm getting space issues. The data in the text files is around 20GB + replication is consuming a lot of DFS.

Is there a way wherein a text file can be loaded directly from the local file system onto HBase?

Regards,
Omkar Joshi

________________________________
The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. L&T Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail"

RE: Loading text files from local file system

Posted by Omkar Joshi <Om...@lntinfotech.com>.
Yeah DFS space is a constraint.

I'll check the options specified by you.

Regards,
Omkar Joshi

-----Original Message-----
From: Suraj Varma [mailto:svarma.ng@gmail.com] 
Sent: Wednesday, April 17, 2013 2:07 PM
To: user@hbase.apache.org
Subject: Re: Loading text files from local file system

Maybe I misunderstood your constraint ... are you saying that your DFS
itself is having constraint due to file size & replication? If so, how
about setting dfs.replication to 1 for the job?

There are other options like chopping up your file and processing it
piecemeal ... or perhaps customizing LoadIncrementalFiles to process
compressed input files and so forth ...

See if the dfs.replication + hfile.compression option works for you first.
--Suraj



On Wed, Apr 17, 2013 at 1:00 AM, Suraj Varma <sv...@gmail.com> wrote:

> Have you considered using hfile.compression, perhaps with snappy
> compression?
> See this thread:
> http://grokbase.com/t/hbase/user/10cqrd06pc/hbase-bulk-load-script
> --Suraj
>
>
>
> On Tue, Apr 16, 2013 at 9:31 PM, Omkar Joshi <Om...@lntinfotech.com>wrote:
>
>> The background thread is here :
>>
>>
>> http://mail-archives.apache.org/mod_mbox/hbase-user/201304.mbox/%3CE689A42B73C5A545AD77332A4FC75D8C1EFBE84153@VSHINMSMBX01.vshodc.lntinfotech.com%3E
>>
>> Following are the commands that I'm using to load files onto HBase :
>>
>> HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath`
>> ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-0.94.6.1.jar importtsv
>> '-Dimporttsv.separator=;'
>> -Dimporttsv.columns=HBASE_ROW_KEY,PRODUCT_INFO:NAME,PRODUCT_INFO:CATEGORY,PRODUCT_INFO:GROUP,PRODUCT_INFO:COMPANY,PRODUCT_INFO:COST,PRODUCT_INFO:COLOR,PRODUCT_INFO:BLANK_COLUMN
>> -Dimporttsv.bulk.output=hdfs://cldx-1139-1033:9000/hbase/storefileoutput_6
>> PRODUCTS hdfs://cldx-1139-1033:9000/hbase/copiedFromLocal/product_6.txt
>>
>> HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath`
>> ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-0.94.6.1.jar
>> completebulkload hdfs://cldx-1139-1033:9000/hbase/storefileoutput_6 PRODUCTS
>>
>> As seen, the text files to be loaded in HBase first need to be loaded on
>> HDFS. Given our infrastructure constraints/limitations, I'm getting space
>> issues. The data in the text files is around 20GB + replication is
>> consuming a lot of DFS.
>>
>> Is there a way wherein a text file can be loaded directly from the local
>> file system onto HBase?
>>
>> Regards,
>> Omkar Joshi
>>
>> ________________________________
>> The contents of this e-mail and any attachment(s) may contain
>> confidential or privileged information for the intended recipient(s).
>> Unintended recipients are prohibited from taking action on the basis of
>> information in this e-mail and using or disseminating the information, and
>> must notify the sender and delete it from their system. L&T Infotech will
>> not accept responsibility or liability for the accuracy or completeness of,
>> or the presence of any virus or disabling code in this e-mail"
>>
>
>

Re: Loading text files from local file system

Posted by Suraj Varma <sv...@gmail.com>.
Maybe I misunderstood your constraint ... are you saying that your DFS
itself is having constraint due to file size & replication? If so, how
about setting dfs.replication to 1 for the job?

There are other options like chopping up your file and processing it
piecemeal ... or perhaps customizing LoadIncrementalFiles to process
compressed input files and so forth ...

See if the dfs.replication + hfile.compression option works for you first.
--Suraj



On Wed, Apr 17, 2013 at 1:00 AM, Suraj Varma <sv...@gmail.com> wrote:

> Have you considered using hfile.compression, perhaps with snappy
> compression?
> See this thread:
> http://grokbase.com/t/hbase/user/10cqrd06pc/hbase-bulk-load-script
> --Suraj
>
>
>
> On Tue, Apr 16, 2013 at 9:31 PM, Omkar Joshi <Om...@lntinfotech.com>wrote:
>
>> The background thread is here :
>>
>>
>> http://mail-archives.apache.org/mod_mbox/hbase-user/201304.mbox/%3CE689A42B73C5A545AD77332A4FC75D8C1EFBE84153@VSHINMSMBX01.vshodc.lntinfotech.com%3E
>>
>> Following are the commands that I'm using to load files onto HBase :
>>
>> HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath`
>> ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-0.94.6.1.jar importtsv
>> '-Dimporttsv.separator=;'
>> -Dimporttsv.columns=HBASE_ROW_KEY,PRODUCT_INFO:NAME,PRODUCT_INFO:CATEGORY,PRODUCT_INFO:GROUP,PRODUCT_INFO:COMPANY,PRODUCT_INFO:COST,PRODUCT_INFO:COLOR,PRODUCT_INFO:BLANK_COLUMN
>> -Dimporttsv.bulk.output=hdfs://cldx-1139-1033:9000/hbase/storefileoutput_6
>> PRODUCTS hdfs://cldx-1139-1033:9000/hbase/copiedFromLocal/product_6.txt
>>
>> HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath`
>> ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-0.94.6.1.jar
>> completebulkload hdfs://cldx-1139-1033:9000/hbase/storefileoutput_6 PRODUCTS
>>
>> As seen, the text files to be loaded in HBase first need to be loaded on
>> HDFS. Given our infrastructure constraints/limitations, I'm getting space
>> issues. The data in the text files is around 20GB + replication is
>> consuming a lot of DFS.
>>
>> Is there a way wherein a text file can be loaded directly from the local
>> file system onto HBase?
>>
>> Regards,
>> Omkar Joshi
>>
>> ________________________________
>> The contents of this e-mail and any attachment(s) may contain
>> confidential or privileged information for the intended recipient(s).
>> Unintended recipients are prohibited from taking action on the basis of
>> information in this e-mail and using or disseminating the information, and
>> must notify the sender and delete it from their system. L&T Infotech will
>> not accept responsibility or liability for the accuracy or completeness of,
>> or the presence of any virus or disabling code in this e-mail"
>>
>
>

Re: Loading text files from local file system

Posted by Suraj Varma <sv...@gmail.com>.
Have you considered using hfile.compression, perhaps with snappy
compression?
See this thread:
http://grokbase.com/t/hbase/user/10cqrd06pc/hbase-bulk-load-script
--Suraj



On Tue, Apr 16, 2013 at 9:31 PM, Omkar Joshi <Om...@lntinfotech.com>wrote:

> The background thread is here :
>
>
> http://mail-archives.apache.org/mod_mbox/hbase-user/201304.mbox/%3CE689A42B73C5A545AD77332A4FC75D8C1EFBE84153@VSHINMSMBX01.vshodc.lntinfotech.com%3E
>
> Following are the commands that I'm using to load files onto HBase :
>
> HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath`
> ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-0.94.6.1.jar importtsv
> '-Dimporttsv.separator=;'
> -Dimporttsv.columns=HBASE_ROW_KEY,PRODUCT_INFO:NAME,PRODUCT_INFO:CATEGORY,PRODUCT_INFO:GROUP,PRODUCT_INFO:COMPANY,PRODUCT_INFO:COST,PRODUCT_INFO:COLOR,PRODUCT_INFO:BLANK_COLUMN
> -Dimporttsv.bulk.output=hdfs://cldx-1139-1033:9000/hbase/storefileoutput_6
> PRODUCTS hdfs://cldx-1139-1033:9000/hbase/copiedFromLocal/product_6.txt
>
> HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath`
> ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-0.94.6.1.jar
> completebulkload hdfs://cldx-1139-1033:9000/hbase/storefileoutput_6 PRODUCTS
>
> As seen, the text files to be loaded in HBase first need to be loaded on
> HDFS. Given our infrastructure constraints/limitations, I'm getting space
> issues. The data in the text files is around 20GB + replication is
> consuming a lot of DFS.
>
> Is there a way wherein a text file can be loaded directly from the local
> file system onto HBase?
>
> Regards,
> Omkar Joshi
>
> ________________________________
> The contents of this e-mail and any attachment(s) may contain confidential
> or privileged information for the intended recipient(s). Unintended
> recipients are prohibited from taking action on the basis of information in
> this e-mail and using or disseminating the information, and must notify the
> sender and delete it from their system. L&T Infotech will not accept
> responsibility or liability for the accuracy or completeness of, or the
> presence of any virus or disabling code in this e-mail"
>

Re: Loading text files from local file system

Posted by "Surendra , Manchikanti" <su...@gmail.com>.
Hi Joshi,

You can use Flume + AsyncHbaseSink / HBasesink to move data from local file
sytem to HBase.



Thanks,
Surendra M

-- Surendra Manchikanti


On Wed, Apr 17, 2013 at 10:01 AM, Omkar Joshi
<Om...@lntinfotech.com>wrote:

> The background thread is here :
>
>
> http://mail-archives.apache.org/mod_mbox/hbase-user/201304.mbox/%3CE689A42B73C5A545AD77332A4FC75D8C1EFBE84153@VSHINMSMBX01.vshodc.lntinfotech.com%3E
>
> Following are the commands that I'm using to load files onto HBase :
>
> HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath`
> ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-0.94.6.1.jar importtsv
> '-Dimporttsv.separator=;'
> -Dimporttsv.columns=HBASE_ROW_KEY,PRODUCT_INFO:NAME,PRODUCT_INFO:CATEGORY,PRODUCT_INFO:GROUP,PRODUCT_INFO:COMPANY,PRODUCT_INFO:COST,PRODUCT_INFO:COLOR,PRODUCT_INFO:BLANK_COLUMN
> -Dimporttsv.bulk.output=hdfs://cldx-1139-1033:9000/hbase/storefileoutput_6
> PRODUCTS hdfs://cldx-1139-1033:9000/hbase/copiedFromLocal/product_6.txt
>
> HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath`
> ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-0.94.6.1.jar
> completebulkload hdfs://cldx-1139-1033:9000/hbase/storefileoutput_6 PRODUCTS
>
> As seen, the text files to be loaded in HBase first need to be loaded on
> HDFS. Given our infrastructure constraints/limitations, I'm getting space
> issues. The data in the text files is around 20GB + replication is
> consuming a lot of DFS.
>
> Is there a way wherein a text file can be loaded directly from the local
> file system onto HBase?
>
> Regards,
> Omkar Joshi
>
> ________________________________
> The contents of this e-mail and any attachment(s) may contain confidential
> or privileged information for the intended recipient(s). Unintended
> recipients are prohibited from taking action on the basis of information in
> this e-mail and using or disseminating the information, and must notify the
> sender and delete it from their system. L&T Infotech will not accept
> responsibility or liability for the accuracy or completeness of, or the
> presence of any virus or disabling code in this e-mail"
>