You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by Siva <sb...@gmail.com> on 2015/02/05 22:28:13 UTC

Line separator option in Bulk loader

We have table contains a NOTE column, this column contains lines of text
separated by new lines. When I export the data from .csv through
bulkloader, Phoenix is failing with error and Hbase terminates the text
till encounters the new line and assumes rest of NOTE as new record.



Is there a way to specify new line separator in Hbase or Phoenix bulk load?



With phoenix:



HADOOP_CLASSPATH=/usr/hdp/2.2.0.0-2041/hbase/lib/hbase-protocol.jar:/usr/hdp/2.2.0.0-2041/hbase/conf
hadoop jar
/usr/hdp/2.2.0.0-2041/phoenix/phoenix-4.2.0.2.2.0.0-2041-client.jar
org.apache.phoenix.mapreduce.CsvBulkLoadTool --table test_leadwarehouse
--input /user/sbhavanari/test_leadwarehouse.csv --zookeeper <zookeeper
Ip>:2181:/hbase



With hbase importtsv:



base org.apache.hadoop.hbase.mapreduce.ImportTsv '-Dimporttsv.separator=,'
-Dimporttsv.columns=<col_list> test_leadwarehouse
 /user/data/test_leadwarehouse.csv

Re: Line separator option in Bulk loader

Posted by Siva <sb...@gmail.com>.
Thanks Nick, I will open a JIRA request for both Phoenix and Hbase. Also I
will chip in and will contribute whatever I can :)

Thanks,
Siva.

On Thu, Feb 12, 2015 at 11:10 AM, Nick Dimiduk <nd...@gmail.com> wrote:

> Custom line separator is a reasonable request. Please open JIRAs for HBase
> and/or Phoenix import tools -- and provide a patch, if you're feeling
> generous ;)
>
> On Thu, Feb 12, 2015 at 10:39 AM, Siva <sb...@gmail.com> wrote:
>
>> Hi Gabriel,
>>
>> Having special character as line separator other than (\n) does not work
>> with even Hbase ImportTsv. But I found something richImportTsv in git.
>>
>> https://github.com/kawaa/RichImportTsv
>>
>> But it is 3 years old, was implemented by using old APIs. We should take
>> a step to rewrite with new API.
>>
>> Thanks,
>> Siva.
>>
>> On Wed, Feb 11, 2015 at 11:40 PM, Gabriel Reid <ga...@gmail.com>
>> wrote:
>>
>>> Hi Siva,
>>>
>>> Handling multi-line records with the Bulk CSV Loader (i.e.
>>> MapReduce-based loader) definitely won't support records split over
>>> multiple input lines. It could be that loading via PSQL (as described
>>> on http://phoenix.apache.org/bulk_dataload.html) will allow multi-line
>>> records, as this might be supported by the underlying CSV parsing
>>> library (commons-csv), although I'm not sure. In any case, I can't
>>> really give you any advice on how to make it work there if it isn't
>>> working right now.
>>>
>>> I assume this also won't work in HBase's ImportTsv.
>>>
>>> - Gabriel
>>>
>>>
>>> On Thu, Feb 5, 2015 at 10:28 PM, Siva <sb...@gmail.com> wrote:
>>> > We have table contains a NOTE column, this column contains lines of
>>> text
>>> > separated by new lines. When I export the data from .csv through
>>> bulkloader,
>>> > Phoenix is failing with error and Hbase terminates the text till
>>> encounters
>>> > the new line and assumes rest of NOTE as new record.
>>> >
>>> >
>>> >
>>> > Is there a way to specify new line separator in Hbase or Phoenix bulk
>>> load?
>>> >
>>> >
>>> >
>>> > With phoenix:
>>> >
>>> >
>>> >
>>> >
>>> HADOOP_CLASSPATH=/usr/hdp/2.2.0.0-2041/hbase/lib/hbase-protocol.jar:/usr/hdp/2.2.0.0-2041/hbase/conf
>>> > hadoop jar
>>> > /usr/hdp/2.2.0.0-2041/phoenix/phoenix-4.2.0.2.2.0.0-2041-client.jar
>>> > org.apache.phoenix.mapreduce.CsvBulkLoadTool --table test_leadwarehouse
>>> > --input /user/sbhavanari/test_leadwarehouse.csv --zookeeper <zookeeper
>>> > Ip>:2181:/hbase
>>> >
>>> >
>>> >
>>> > With hbase importtsv:
>>> >
>>> >
>>> >
>>> > base org.apache.hadoop.hbase.mapreduce.ImportTsv
>>> '-Dimporttsv.separator=,'
>>> > -Dimporttsv.columns=<col_list> test_leadwarehouse
>>> > /user/data/test_leadwarehouse.csv
>>>
>>
>>
>

Re: Line separator option in Bulk loader

Posted by Nick Dimiduk <nd...@gmail.com>.
Custom line separator is a reasonable request. Please open JIRAs for HBase
and/or Phoenix import tools -- and provide a patch, if you're feeling
generous ;)

On Thu, Feb 12, 2015 at 10:39 AM, Siva <sb...@gmail.com> wrote:

> Hi Gabriel,
>
> Having special character as line separator other than (\n) does not work
> with even Hbase ImportTsv. But I found something richImportTsv in git.
>
> https://github.com/kawaa/RichImportTsv
>
> But it is 3 years old, was implemented by using old APIs. We should take a
> step to rewrite with new API.
>
> Thanks,
> Siva.
>
> On Wed, Feb 11, 2015 at 11:40 PM, Gabriel Reid <ga...@gmail.com>
> wrote:
>
>> Hi Siva,
>>
>> Handling multi-line records with the Bulk CSV Loader (i.e.
>> MapReduce-based loader) definitely won't support records split over
>> multiple input lines. It could be that loading via PSQL (as described
>> on http://phoenix.apache.org/bulk_dataload.html) will allow multi-line
>> records, as this might be supported by the underlying CSV parsing
>> library (commons-csv), although I'm not sure. In any case, I can't
>> really give you any advice on how to make it work there if it isn't
>> working right now.
>>
>> I assume this also won't work in HBase's ImportTsv.
>>
>> - Gabriel
>>
>>
>> On Thu, Feb 5, 2015 at 10:28 PM, Siva <sb...@gmail.com> wrote:
>> > We have table contains a NOTE column, this column contains lines of text
>> > separated by new lines. When I export the data from .csv through
>> bulkloader,
>> > Phoenix is failing with error and Hbase terminates the text till
>> encounters
>> > the new line and assumes rest of NOTE as new record.
>> >
>> >
>> >
>> > Is there a way to specify new line separator in Hbase or Phoenix bulk
>> load?
>> >
>> >
>> >
>> > With phoenix:
>> >
>> >
>> >
>> >
>> HADOOP_CLASSPATH=/usr/hdp/2.2.0.0-2041/hbase/lib/hbase-protocol.jar:/usr/hdp/2.2.0.0-2041/hbase/conf
>> > hadoop jar
>> > /usr/hdp/2.2.0.0-2041/phoenix/phoenix-4.2.0.2.2.0.0-2041-client.jar
>> > org.apache.phoenix.mapreduce.CsvBulkLoadTool --table test_leadwarehouse
>> > --input /user/sbhavanari/test_leadwarehouse.csv --zookeeper <zookeeper
>> > Ip>:2181:/hbase
>> >
>> >
>> >
>> > With hbase importtsv:
>> >
>> >
>> >
>> > base org.apache.hadoop.hbase.mapreduce.ImportTsv
>> '-Dimporttsv.separator=,'
>> > -Dimporttsv.columns=<col_list> test_leadwarehouse
>> > /user/data/test_leadwarehouse.csv
>>
>
>

Re: Line separator option in Bulk loader

Posted by Siva <sb...@gmail.com>.
Hi Gabriel,

Having special character as line separator other than (\n) does not work
with even Hbase ImportTsv. But I found something richImportTsv in git.

https://github.com/kawaa/RichImportTsv

But it is 3 years old, was implemented by using old APIs. We should take a
step to rewrite with new API.

Thanks,
Siva.

On Wed, Feb 11, 2015 at 11:40 PM, Gabriel Reid <ga...@gmail.com>
wrote:

> Hi Siva,
>
> Handling multi-line records with the Bulk CSV Loader (i.e.
> MapReduce-based loader) definitely won't support records split over
> multiple input lines. It could be that loading via PSQL (as described
> on http://phoenix.apache.org/bulk_dataload.html) will allow multi-line
> records, as this might be supported by the underlying CSV parsing
> library (commons-csv), although I'm not sure. In any case, I can't
> really give you any advice on how to make it work there if it isn't
> working right now.
>
> I assume this also won't work in HBase's ImportTsv.
>
> - Gabriel
>
>
> On Thu, Feb 5, 2015 at 10:28 PM, Siva <sb...@gmail.com> wrote:
> > We have table contains a NOTE column, this column contains lines of text
> > separated by new lines. When I export the data from .csv through
> bulkloader,
> > Phoenix is failing with error and Hbase terminates the text till
> encounters
> > the new line and assumes rest of NOTE as new record.
> >
> >
> >
> > Is there a way to specify new line separator in Hbase or Phoenix bulk
> load?
> >
> >
> >
> > With phoenix:
> >
> >
> >
> >
> HADOOP_CLASSPATH=/usr/hdp/2.2.0.0-2041/hbase/lib/hbase-protocol.jar:/usr/hdp/2.2.0.0-2041/hbase/conf
> > hadoop jar
> > /usr/hdp/2.2.0.0-2041/phoenix/phoenix-4.2.0.2.2.0.0-2041-client.jar
> > org.apache.phoenix.mapreduce.CsvBulkLoadTool --table test_leadwarehouse
> > --input /user/sbhavanari/test_leadwarehouse.csv --zookeeper <zookeeper
> > Ip>:2181:/hbase
> >
> >
> >
> > With hbase importtsv:
> >
> >
> >
> > base org.apache.hadoop.hbase.mapreduce.ImportTsv
> '-Dimporttsv.separator=,'
> > -Dimporttsv.columns=<col_list> test_leadwarehouse
> > /user/data/test_leadwarehouse.csv
>

Re: Line separator option in Bulk loader

Posted by Gabriel Reid <ga...@gmail.com>.
Hi Siva,

Handling multi-line records with the Bulk CSV Loader (i.e.
MapReduce-based loader) definitely won't support records split over
multiple input lines. It could be that loading via PSQL (as described
on http://phoenix.apache.org/bulk_dataload.html) will allow multi-line
records, as this might be supported by the underlying CSV parsing
library (commons-csv), although I'm not sure. In any case, I can't
really give you any advice on how to make it work there if it isn't
working right now.

I assume this also won't work in HBase's ImportTsv.

- Gabriel


On Thu, Feb 5, 2015 at 10:28 PM, Siva <sb...@gmail.com> wrote:
> We have table contains a NOTE column, this column contains lines of text
> separated by new lines. When I export the data from .csv through bulkloader,
> Phoenix is failing with error and Hbase terminates the text till encounters
> the new line and assumes rest of NOTE as new record.
>
>
>
> Is there a way to specify new line separator in Hbase or Phoenix bulk load?
>
>
>
> With phoenix:
>
>
>
> HADOOP_CLASSPATH=/usr/hdp/2.2.0.0-2041/hbase/lib/hbase-protocol.jar:/usr/hdp/2.2.0.0-2041/hbase/conf
> hadoop jar
> /usr/hdp/2.2.0.0-2041/phoenix/phoenix-4.2.0.2.2.0.0-2041-client.jar
> org.apache.phoenix.mapreduce.CsvBulkLoadTool --table test_leadwarehouse
> --input /user/sbhavanari/test_leadwarehouse.csv --zookeeper <zookeeper
> Ip>:2181:/hbase
>
>
>
> With hbase importtsv:
>
>
>
> base org.apache.hadoop.hbase.mapreduce.ImportTsv '-Dimporttsv.separator=,'
> -Dimporttsv.columns=<col_list> test_leadwarehouse
> /user/data/test_leadwarehouse.csv