You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Ted Dunning <te...@gmail.com> on 2021/05/20 05:48:23 UTC

known bug in csv header parsing

I have a csv file that causes an exception when read by Drill. The file is
slightly mal-formed (but R can read it).

Interestingly, if I don't parse the header line, I don't get the exception
and the problematic embedded quotes are handled well. Likewise, deleting
the first data line (which is well-formed) causes the exception to go away.
Deleting the second data line also causes the exception to stop. Fixing the
quoting of the included quotes also fixes the problem. Swapping the lines
works like deleting the first line. Repeating the first line after the
second line still gets the exception.

The file is this:
-------------------------

desc,name

"foo","x"

"manure called "foo"","y"

-------------


The exception is shown below. My thought is that if the CSV file is
considered mal-formed, we should get an error on the line that says
something along the lines of "mal-formed input". Even better would be to
allow such lines to be omitted (up to some sanity limit) or to parse it
correctly (which happens without headers being parsed).

Anybody have any thoughts?

Here is the R behavior (it omits the embedded quotes):

> f = read.csv("v.csv")

> f

               desc name

1               foo    x

2 manure called foo    y


And here is the exception:

org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
NegativeArraySizeException Please, refer to logs for more information.
[Error Id: 7153f837-45eb-43d1-8e19-e3ca0197c61b ]
(java.lang.NegativeArraySizeException) null
org.apache.drill.exec.vector.VarCharVector$Accessor.get():487
org.apache.drill.exec.vector.VarCharVector$Accessor.getObject():514
org.apache.drill.exec.vector.VarCharVector$Accessor.getObject():475
org.apache.drill.exec.server.rest.WebUserConnection.sendData():147
org.apache.drill.exec.ops.AccountingUserConnection.sendData():42
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():120
org.apache.drill.exec.physical.impl.BaseRootExec.next():94
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1669
org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():748