You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Radha Krishna G <gr...@yahoo.com> on 2016/08/03 07:53:49 UTC

Invoking org.apache.phoenix.mapreduce.CsvBulkLoadTool from phoenix-4.4.0.2.4.0.0-169-client.jar is not working properly

Hi All,i am trying to load around 40 GB file using "org.apache.phoenix.mapreduce.CsvBulkLoadTool" but it is showing the below error message.
INFO mapreduce.Job: Task Id : attempt_1469663368297_56967_m_000042_0, Status : FAILEDError: java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: (startline 1) EOF reached before encapsulated token finished        at org.apache.phoenix.mapreduce.CsvToKeyValueMapper.map(CsvToKeyValueMapper.java:176)        at org.apache.phoenix.mapreduce.CsvToKeyValueMapper.map(CsvToKeyValueMapper.java:67)        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)        at java.security.AccessController.doPrivileged(Native Method)        at javax.security.auth.Subject.doAs(Subject.java:422)        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)Caused by: java.lang.RuntimeException: java.io.IOException: (startline 1) EOF reached before encapsulated token finished        at org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:398)        at org.apache.commons.csv.CSVParser$1.hasNext(CSVParser.java:407)        at com.google.common.collect.Iterators.getNext(Iterators.java:890)        at com.google.common.collect.Iterables.getFirst(Iterables.java:781)        at org.apache.phoenix.mapreduce.CsvToKeyValueMapper$CsvLineParser.parse(CsvToKeyValueMapper.java:287)        at org.apache.phoenix.mapreduce.CsvToKeyValueMapper.map(CsvToKeyValueMapper.java:148)        ... 9 moreCaused by: java.io.IOException: (startline 1) EOF reached before encapsulated token finished        at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:282)        at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)        at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:450)        at org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:395)        ... 14 more

Note : I collected some sample records around(1000) form the same file and able to load using the same approach, but if i provide full file path its failing, can any one suggest what is solution for the above issue..
Bellow Command i used==================
HADOOP_CLASSPATH=/usr/hdp/current/phoenix-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/conf hadoop jar phoenix-4.4.0.2.4.0.0-169-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool --table "Table_Name" --input "HDFS input file path" -d $'\034'


-d $'\034' --> the field separator in the file is FS so we provided the explicitly  

RegardsRadha krishna G

Re: Invoking org.apache.phoenix.mapreduce.CsvBulkLoadTool from phoenix-4.4.0.2.4.0.0-169-client.jar is not working properly

Posted by Gabriel Reid <ga...@gmail.com>.
Hi Radha,

This looks to me as if there is an issue in your data somewhere past
the first 100 records. The bulk loader isn't supposed to fail due to
issues like this. Instead, it's intended to simply report the problem
input lines and continue on, but it appears that this isn't happening.

Could you log an issue in the PHOENIX JIRA
(https://issues.apache.org/jira/browse/PHOENIX) for this problem?

Thanks,

Gabriel


On Wed, Aug 3, 2016 at 9:53 AM, Radha Krishna G <gr...@yahoo.com> wrote:
>
> Hi All,
> i am trying to load around 40 GB file using
> "org.apache.phoenix.mapreduce.CsvBulkLoadTool" but it is showing the below
> error message.
>
> INFO mapreduce.Job: Task Id : attempt_1469663368297_56967_m_000042_0, Status
> : FAILED
> Error: java.lang.RuntimeException: java.lang.RuntimeException:
> java.io.IOException: (startline 1) EOF reached before encapsulated token
> finished
>         at
> org.apache.phoenix.mapreduce.CsvToKeyValueMapper.map(CsvToKeyValueMapper.java:176)
>         at
> org.apache.phoenix.mapreduce.CsvToKeyValueMapper.map(CsvToKeyValueMapper.java:67)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> Caused by: java.lang.RuntimeException: java.io.IOException: (startline 1)
> EOF reached before encapsulated token finished
>         at
> org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:398)
>         at org.apache.commons.csv.CSVParser$1.hasNext(CSVParser.java:407)
>         at com.google.common.collect.Iterators.getNext(Iterators.java:890)
>         at com.google.common.collect.Iterables.getFirst(Iterables.java:781)
>         at
> org.apache.phoenix.mapreduce.CsvToKeyValueMapper$CsvLineParser.parse(CsvToKeyValueMapper.java:287)
>         at
> org.apache.phoenix.mapreduce.CsvToKeyValueMapper.map(CsvToKeyValueMapper.java:148)
>         ... 9 more
> Caused by: java.io.IOException: (startline 1) EOF reached before
> encapsulated token finished
>         at
> org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:282)
>         at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
>         at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:450)
>         at
> org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:395)
>         ... 14 more
>
>
> Note : I collected some sample records around(1000) form the same file and
> able to load using the same approach, but if i provide full file path its
> failing, can any one suggest what is solution for the above issue..
>
> Bellow Command i used
> ==================
>
> HADOOP_CLASSPATH=/usr/hdp/current/phoenix-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/conf
> hadoop jar phoenix-4.4.0.2.4.0.0-169-client.jar
> org.apache.phoenix.mapreduce.CsvBulkLoadTool --table "Table_Name" --input
> "HDFS input file path" -d $'\034'
>
>
> -d $'\034' --> the field separator in the file is FS so we provided the
> explicitly
>
> Regards
> Radha krishna G