You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by "Cox, Jonathan A" <ja...@sandia.gov> on 2016/03/30 23:35:33 UTC
Problem Bulk Loading CSV with Empty Value at End of Row

I am using the CsvBulkLoaderTool to ingest a tab separated file that can contain empty columns. The problem is that the loader incorrectly interprets an empty last column as a non-existent column (instead of as an null entry).

For example, imagine I have a comma separated CSV with the following format:
key,username,password,gender,position,age,school,favorite_color

Now, let's say my CSV file contains the following row, where the gender field is missing. This will load correctly:
*#Ssj289,joeblow,sk29ssh, ,CEO,102,MIT,blue<new line>

However, if the missing field happens to be the last entry (favorite_color), it complains that there are only 7 of 8 required columns present:
*#Ssj289,joeblow,sk29ssh,female ,CEO,102,MIT, <new line>

This behavior will throw an error and fail to load the entire CSV file. Any pointers on how I can modify the source to have Phoenix interpret <delimiter><newline> as an empty/null last column?

Thanks,
Jon
(actual error is pasted below)


java.lang.Exception: java.lang.RuntimeException: java.lang.IllegalArgumentException: CSV record does not have enough values (has 26, but needs 27)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: CSV record does not have enough values (has 26, but needs 27)
        at org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:197)
        at org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:72)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: CSV record does not have enough values (has 26, but needs 27)
        at org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:74)
        at org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:44)
        at org.apache.phoenix.util.UpsertExecutor.execute(UpsertExecutor.java:133)
        at org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:166)
        ... 10 more
16/03/30 15:01:01 INFO mapreduce.Job: Job job_local1507432235_0