You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Khurram Faraaz (JIRA)" <ji...@apache.org> on 2015/10/20 20:26:28 UTC

[jira] [Commented] (DRILL-2322) CSV record reader should log which file and which record caused an error in the reader

    [ https://issues.apache.org/jira/browse/DRILL-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965515#comment-14965515 ] 

Khurram Faraaz commented on DRILL-2322:
---------------------------------------

Note that there is no mention of the line number of the file name as part of the error message. I also checked drillbit.log for stack trace there is no mention of the line number of the file name.

{code}
0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c1 from `bad_csv`;
Error: SYSTEM ERROR: NumberFormatException: @

Fragment 0:0

[Error Id: 6544865f-c743-4abc-a32c-0a6debe4c9f0 on centos-04.qa.lab:31010] (state=,code=0)
{code}

Stack trace from drillbit.log

{code}
2015-10-20 18:22:57,828 [29d9797e-19aa-3ee1-276b-e9d9319f41d7:frag:0:0] ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: NumberFormatException: @

Fragment 0:0

[Error Id: 6544865f-c743-4abc-a32c-0a6debe4c9f0 on centos-04.qa.lab:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: NumberFormatException: @

Fragment 0:0

[Error Id: 6544865f-c743-4abc-a32c-0a6debe4c9f0 on centos-04.qa.lab:31010]
        at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534) ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:323) [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:178) [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:292) [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_85]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_85]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_85]
Caused by: java.lang.NumberFormatException: @
        at org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.nfeI(StringFunctionHelpers.java:97) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.varCharToInt(StringFunctionHelpers.java:122) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at org.apache.drill.exec.test.generated.ProjectorGen3.doEval(ProjectorTemplate.java:62) ~[na:na]
        at org.apache.drill.exec.test.generated.ProjectorGen3.projectRecords(ProjectorTemplate.java:62) ~[na:na]
        at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork(ProjectRecordBatch.java:172) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:93) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:80) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:258) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:252) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at java.security.AccessController.doPrivileged(Native Method) ~[na:1.7.0_85]
        at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_85]
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) ~[hadoop-common-2.5.1-mapr-1503.jar:na]
        at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:252) [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        ... 4 common frames omitted
{code}

Here is the content from the csv file used in the test

{code}
0: jdbc:drill:schema=dfs.tmp> select * from `badCSVFile.csv`;
+-------------------+
|      columns      |
+-------------------+
| ["1","test","a"]  |
| ["2","test","b"]  |
| ["@","test","c"]  |
| ["4","test","d"]  |
| ["5","blah","e"]  |
+-------------------+
5 rows selected (0.532 seconds)
{code}

> CSV record reader should log which file and which record caused an error in the reader
> --------------------------------------------------------------------------------------
>
>                 Key: DRILL-2322
>                 URL: https://issues.apache.org/jira/browse/DRILL-2322
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Text & CSV
>    Affects Versions: 0.8.0
>            Reporter: Ramana Inukonda Nagaraj
>            Assignee: Sudheesh Katkam
>             Fix For: 0.9.0
>
>         Attachments: DRILL-2322.1.patch.txt, DRILL-2322.2.patch.txt, DRILL-2322.3.patch.txt
>
>
> I believe the title is self exploratory.
> If the text reader fails for any reason due to an offending record drill should log which file (if there are multiple files) and which line/record the error occurs at. This will improve debugging when dealing with large files/ large number of files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)