You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Gautham Acharya <ga...@alleninstitute.org> on 2019/09/18 13:58:52 UTC

ImportTSV command line - too many args

I'm trying to use the ImportTSV utility to generate HFiles and move them into an instance using the CompleteBulkLoad tool.

Right now an error I'm running into is that the arg list is too long - I have over 50,000 columns to specify in my CS, and the bash shell throws an error. Is there an easy way around this? Piping the arguments does not seem to work either.

--gautham


Re: ImportTSV command line - too many args

Posted by Esteban Gutierrez <es...@cloudera.com.INVALID>.
Hi Gautham,

Well, there are few options to around that OS limitation. One is as you
mentioned to modify ImportTSV and accept a mappings file. The second option
without changing anything in the HBase code is to split your CSV input in
multiple files and keep the same key column on all the split files, as long
as you don't have duplicated column families, all new loaded data should be
added to the same row without any issue since they are part of a different
CF. I hope that helps.

Thanks,
Esteban.


--
Cloudera, Inc.



On Wed, Sep 18, 2019 at 8:59 AM Gautham Acharya <ga...@alleninstitute.org>
wrote:

> I'm trying to use the ImportTSV utility to generate HFiles and move them
> into an instance using the CompleteBulkLoad tool.
>
> Right now an error I'm running into is that the arg list is too long - I
> have over 50,000 columns to specify in my CS, and the bash shell throws an
> error. Is there an easy way around this? Piping the arguments does not seem
> to work either.
>
> --gautham
>
>