You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by John Omernik <jo...@omernik.com> on 2015/11/04 15:48:48 UTC

Help with Troubleshooting dense error message

Hey all,

I am working with JSON that is on the whole fairly clean.  I am trying to
load into Parquet files, and the previous days worth of data worked just
fine, but todays data has something wrong with it and I Can't figure out
what it is. Unfortunately, I can't post the data, which I know makes this
hard to troubleshoot for the community. Hopefully I can provide some info
here, and get some pointers on where to look, and then report back on how
we could potentially improve the error messages.

The error is below.


I am looking to figure out given the information reported where I'd look to
trouble shoot this. Obviously the file 02ffc306e877_my_load_1446640931.json
is where I am looking to start

This file has 3000 lines (records of data, so it's somewhere in between.

The index/length/expected range don't mean anything to me I could use some
help there, because I am not even sure what I am looking for.

The record and/or Fragment... do those help me dig in?

Since this is one record per line, I went to line 2402 but that record
looks completely normal to me, (like all the other ones) but since this is
dense text, I am obviously missing something, but is the record the line
number?

Any other pointers I can use to trouble shoot this?

Thanks!

Error:


Caused by: org.apache.drill.common.exceptions.UserRemoteException:
DATA_READ ERROR: Error parsing JSON - index: 9604, length: 4 (expected:
range(0, 8192))



File
/etl/dev/my-metadata/mysqspull/loads/2015-11-04/02ffc306e877_my_load_1446640931.json

Record  2402

Fragment 1:5

Re: Help with Troubleshooting dense error message

Posted by John Omernik <jo...@omernik.com>.
Also, I just validated that without changing the data, I went from
initially being able to run the query, and as more data is added, at some
point the queries start failing with the IOOB error, then as that happens
eventually the drillbits run out of memory and need to be restarted. At
that point, the queries start working again. (Again, this is without
changing the data).

Apparently something in what is happening is causing a leak in memory that
is not being cleared.  I will happily try the 1.3.0 release once I can get
a MapR package, and hopefully this addresses the issue, if not, I we'll
have alternative troubleshooting to do.  (Is there a way to monitor memory?
I would assume it's on the metrics page, but are there some that I should
be paying attention to? There are lots of fields there, and I am not sure
what they all mean, especially some return negative numbers which is
confusing.)

John

On Thu, Nov 5, 2015 at 12:27 PM, John Omernik <jo...@omernik.com> wrote:

> Abdel -
>
> Thank you, I do understand it's a challenge for troubleshooting, and
> apologize to that end. I see you have a @maprtech email, is the binaries in
> the release built with the MapRDB support? I need that for my mapr cluster,
> that's why I am waiting for a MapR build of 1.3.0.
>
> On Thu, Nov 5, 2015 at 11:44 AM, Abdel Hakim Deneche <
> adeneche@maprtech.com> wrote:
>
>> Hey John,
>>
>> If you want to, you can download the binaries for 1.3 release candidate
>> from [1] and see if you can reproduce the error. You just need to unzip
>> the
>> folder and run "bin/drill-embedded".
>>
>> Without some data to reproduce the issue, it's really hard to come up with
>> an explanation.
>>
>> Thanks
>>
>> [1] http://people.apache.org/~jacques/apache-drill-1.3.0.rc0/
>>
>> On Thu, Nov 5, 2015 at 5:12 AM, John Omernik <jo...@omernik.com> wrote:
>>
>> > Hey Steven, I will look into that.  Based on your understanding of the
>> > problem would DRILL-4006 still apply given these conditions
>> >
>> > 1. When I query a directory of json files, and it fails signaling a
>> > specific JSON file as a culprit. When I remove that file, it works, and
>> > when I do a query only on that culprit JSON file it works as well.
>> > 2. When the error occurs, if I restart my drill bits, and run the query
>> > again it seems to work (This one baffles me)
>> >
>> > I will look to try the 1.3 release, I am using 1.2.1 release from MapR,
>> so
>> > I may have to wait until they roll a package for easy install (I want to
>> > include their MapRDB Support).
>> >
>> > MapR Team: If you have a current release with the Drill 4006
>> incorporated
>> > and possibly the JDBC Storage Plugin fixes rolled for testing, I'd love
>> to
>> > give it a shot (non-supported of course)
>> >
>> >
>> >
>> > On Thu, Nov 5, 2015 at 12:48 AM, Steven Phillips <st...@dremio.com>
>> > wrote:
>> >
>> > > This looks like DRILL-4006, a fix for which just went in.
>> > >
>> > > https://issues.apache.org/jira/browse/DRILL-4006
>> > >
>> > >
>> > > On Wed, Nov 4, 2015 at 12:16 PM, John Omernik <jo...@omernik.com>
>> wrote:
>> > >
>> > > > I am on MapR's 1.2.1 Package.
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > On Wed, Nov 4, 2015 at 2:14 PM, Abdel Hakim Deneche <
>> > > adeneche@maprtech.com
>> > > > >
>> > > > wrote:
>> > > >
>> > > > > One last thing, what version of Drill do you have installed ?
>> > > > >
>> > > > > On Wed, Nov 4, 2015 at 11:04 AM, John Omernik <jo...@omernik.com>
>> > > wrote:
>> > > > >
>> > > > > > No I don't think so.  I am running Drill in Marathon on Mesos,
>> so
>> > my
>> > > > > > startup settings are all very static. In addition, the only
>> session
>> > > > > > variable I was changed was the json as text option at the
>> session
>> > > level
>> > > > > and
>> > > > > > I was setting it on both the pre drillbit reboot and the post
>> > > drillbit
>> > > > > > reboot sessions (I need that to query the data).
>> > > > > >
>> > > > > > On Wed, Nov 4, 2015 at 12:46 PM, Abdel Hakim Deneche <
>> > > > > > adeneche@maprtech.com>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > This is strange indeed. The error message you reported earlier
>> > > > doesn't
>> > > > > > > suggest a memory leak issue but rather a bug when reading a
>> > > specific
>> > > > > set
>> > > > > > of
>> > > > > > > data.
>> > > > > > > Could it be that you changed some session options, and you
>> forgot
>> > > to
>> > > > > set
>> > > > > > > them again after you restarted the drillbits ?
>> > > > > > >
>> > > > > > > Thanks
>> > > > > > >
>> > > > > > > On Wed, Nov 4, 2015 at 10:37 AM, John Omernik <
>> john@omernik.com>
>> > > > > wrote:
>> > > > > > >
>> > > > > > > > So I pulled the (I was up to two) files that seemed to be
>> > causing
>> > > > > this
>> > > > > > > > issue out, and loaded my data.  (see my other posts on how I
>> > did
>> > > > that
>> > > > > > > with
>> > > > > > > > loading into a folder prefixed by .)
>> > > > > > > >
>> > > > > > > > Anywho, my Drill cluster became unstable in general, and I
>> was
>> > > not
>> > > > > able
>> > > > > > > to
>> > > > > > > > run any queries until I bounced by drill bits.
>> > > > > > > >
>> > > > > > > > I did that, got my process working again, and went to go try
>> > > > > > > > troubleshooting this problem again and everything appears
>> to be
>> > > > > working
>> > > > > > > > well now.  I am stumped.   Could a memory leak have caused
>> that
>> > > > error
>> > > > > > > only
>> > > > > > > > on some files?  I am monitoring now to determine if the
>> problem
>> > > > > starts
>> > > > > > > > again, but that is REALLY strange to me. This seems out of
>> > > > character
>> > > > > > for
>> > > > > > > > Drill, both in my use of it, and in how it handles memory
>> has
>> > > been
>> > > > > > > > explained to me.  If I get the error again, I'll ensure I
>> set
>> > > that
>> > > > to
>> > > > > > > get a
>> > > > > > > > full stack trace.
>> > > > > > > >
>> > > > > > > > John
>> > > > > > > >
>> > > > > > > > On Wed, Nov 4, 2015 at 12:13 PM, Abdel Hakim Deneche <
>> > > > > > > > adeneche@maprtech.com>
>> > > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > The error message "index: 9604, length: 4 (expected:
>> range(0,
>> > > > > 8192))"
>> > > > > > > > > suggests an error happened when Drill tried to access a
>> > memory
>> > > > > buffer
>> > > > > > > > (most
>> > > > > > > > > likely while writing an int or float value)
>> > > > > > > > > This may be a bug actually exposed by that particular data
>> > > > record.
>> > > > > > > > >
>> > > > > > > > > You can try enabling verbose error logging before running
>> the
>> > > > query
>> > > > > > > > again:
>> > > > > > > > >
>> > > > > > > > > set `exec.errors.verbose`=true;
>> > > > > > > > >
>> > > > > > > > > This should give us a nice stack trace about this error.
>> > > > > > > > >
>> > > > > > > > > Thanks
>> > > > > > > > >
>> > > > > > > > > On Wed, Nov 4, 2015 at 7:29 AM, John Omernik <
>> > john@omernik.com
>> > > >
>> > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > There are multiple fields in that record, including two
>> > > lists.
>> > > > > Both
>> > > > > > > > lists
>> > > > > > > > > > have data in them (now I am runnning with json text mode
>> > > > because
>> > > > > at
>> > > > > > > > times
>> > > > > > > > > > the first value is a JSON null, but in these cases, that
>> > > should
>> > > > > be
>> > > > > > > > turned
>> > > > > > > > > > to "null" as  string.  (If I am understanding things
>> > > correctly)
>> > > > > and
>> > > > > > > > > > shouldn't be causing a problem.
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > On Wed, Nov 4, 2015 at 9:21 AM, Hsuan Yi Chu <
>> > > > > hyichu@maprtech.com>
>> > > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > What is the data type for that record in line 2402? A
>> > list?
>> > > > > > > > > > >
>> > > > > > > > > > > Do you think it could be similar to this issue ?
>> > > > > > > > > > >
>> > > > > > > > > > > https://issues.apache.org/jira/browse/DRILL-4006
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > On Wed, Nov 4, 2015 at 6:48 AM, John Omernik <
>> > > > john@omernik.com
>> > > > > >
>> > > > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > > Hey all,
>> > > > > > > > > > > >
>> > > > > > > > > > > > I am working with JSON that is on the whole fairly
>> > clean.
>> > > > I
>> > > > > am
>> > > > > > > > > trying
>> > > > > > > > > > to
>> > > > > > > > > > > > load into Parquet files, and the previous days
>> worth of
>> > > > data
>> > > > > > > worked
>> > > > > > > > > > just
>> > > > > > > > > > > > fine, but todays data has something wrong with it
>> and I
>> > > > Can't
>> > > > > > > > figure
>> > > > > > > > > > out
>> > > > > > > > > > > > what it is. Unfortunately, I can't post the data,
>> > which I
>> > > > > know
>> > > > > > > > makes
>> > > > > > > > > > this
>> > > > > > > > > > > > hard to troubleshoot for the community. Hopefully I
>> can
>> > > > > provide
>> > > > > > > > some
>> > > > > > > > > > info
>> > > > > > > > > > > > here, and get some pointers on where to look, and
>> then
>> > > > report
>> > > > > > > back
>> > > > > > > > on
>> > > > > > > > > > how
>> > > > > > > > > > > > we could potentially improve the error messages.
>> > > > > > > > > > > >
>> > > > > > > > > > > > The error is below.
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > I am looking to figure out given the information
>> > reported
>> > > > > where
>> > > > > > > I'd
>> > > > > > > > > > look
>> > > > > > > > > > > to
>> > > > > > > > > > > > trouble shoot this. Obviously the file
>> > > > > > > > > > > 02ffc306e877_my_load_1446640931.json
>> > > > > > > > > > > > is where I am looking to start
>> > > > > > > > > > > >
>> > > > > > > > > > > > This file has 3000 lines (records of data, so it's
>> > > > somewhere
>> > > > > in
>> > > > > > > > > > between.
>> > > > > > > > > > > >
>> > > > > > > > > > > > The index/length/expected range don't mean anything
>> to
>> > > me I
>> > > > > > could
>> > > > > > > > use
>> > > > > > > > > > > some
>> > > > > > > > > > > > help there, because I am not even sure what I am
>> > looking
>> > > > for.
>> > > > > > > > > > > >
>> > > > > > > > > > > > The record and/or Fragment... do those help me dig
>> in?
>> > > > > > > > > > > >
>> > > > > > > > > > > > Since this is one record per line, I went to line
>> 2402
>> > > but
>> > > > > that
>> > > > > > > > > record
>> > > > > > > > > > > > looks completely normal to me, (like all the other
>> > ones)
>> > > > but
>> > > > > > > since
>> > > > > > > > > this
>> > > > > > > > > > > is
>> > > > > > > > > > > > dense text, I am obviously missing something, but is
>> > the
>> > > > > record
>> > > > > > > the
>> > > > > > > > > > line
>> > > > > > > > > > > > number?
>> > > > > > > > > > > >
>> > > > > > > > > > > > Any other pointers I can use to trouble shoot this?
>> > > > > > > > > > > >
>> > > > > > > > > > > > Thanks!
>> > > > > > > > > > > >
>> > > > > > > > > > > > Error:
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > Caused by:
>> > > > > > > org.apache.drill.common.exceptions.UserRemoteException:
>> > > > > > > > > > > > DATA_READ ERROR: Error parsing JSON - index: 9604,
>> > > length:
>> > > > 4
>> > > > > > > > > (expected:
>> > > > > > > > > > > > range(0, 8192))
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > File
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> /etl/dev/my-metadata/mysqspull/loads/2015-11-04/02ffc306e877_my_load_1446640931.json
>> > > > > > > > > > > >
>> > > > > > > > > > > > Record  2402
>> > > > > > > > > > > >
>> > > > > > > > > > > > Fragment 1:5
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > --
>> > > > > > > > >
>> > > > > > > > > Abdelhakim Deneche
>> > > > > > > > >
>> > > > > > > > > Software Engineer
>> > > > > > > > >
>> > > > > > > > >   <http://www.mapr.com/>
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > Now Available - Free Hadoop On-Demand Training
>> > > > > > > > > <
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > --
>> > > > > > >
>> > > > > > > Abdelhakim Deneche
>> > > > > > >
>> > > > > > > Software Engineer
>> > > > > > >
>> > > > > > >   <http://www.mapr.com/>
>> > > > > > >
>> > > > > > >
>> > > > > > > Now Available - Free Hadoop On-Demand Training
>> > > > > > > <
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > >
>> > > > > Abdelhakim Deneche
>> > > > >
>> > > > > Software Engineer
>> > > > >
>> > > > >   <http://www.mapr.com/>
>> > > > >
>> > > > >
>> > > > > Now Available - Free Hadoop On-Demand Training
>> > > > > <
>> > > > >
>> > > >
>> > >
>> >
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>>
>>
>> --
>>
>> Abdelhakim Deneche
>>
>> Software Engineer
>>
>>   <http://www.mapr.com/>
>>
>>
>> Now Available - Free Hadoop On-Demand Training
>> <
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> >
>>
>
>

Re: Help with Troubleshooting dense error message

Posted by John Omernik <jo...@omernik.com>.
Abdel, so these weren't built with MapRFS, as I am getting the Can't find
file system for Scheme MapRFS. I'll be eagerly awaiting the MapR Package!
Thanks

John

T

On Thu, Nov 5, 2015 at 12:27 PM, John Omernik <jo...@omernik.com> wrote:

> Abdel -
>
> Thank you, I do understand it's a challenge for troubleshooting, and
> apologize to that end. I see you have a @maprtech email, is the binaries in
> the release built with the MapRDB support? I need that for my mapr cluster,
> that's why I am waiting for a MapR build of 1.3.0.
>
> On Thu, Nov 5, 2015 at 11:44 AM, Abdel Hakim Deneche <
> adeneche@maprtech.com> wrote:
>
>> Hey John,
>>
>> If you want to, you can download the binaries for 1.3 release candidate
>> from [1] and see if you can reproduce the error. You just need to unzip
>> the
>> folder and run "bin/drill-embedded".
>>
>> Without some data to reproduce the issue, it's really hard to come up with
>> an explanation.
>>
>> Thanks
>>
>> [1] http://people.apache.org/~jacques/apache-drill-1.3.0.rc0/
>>
>> On Thu, Nov 5, 2015 at 5:12 AM, John Omernik <jo...@omernik.com> wrote:
>>
>> > Hey Steven, I will look into that.  Based on your understanding of the
>> > problem would DRILL-4006 still apply given these conditions
>> >
>> > 1. When I query a directory of json files, and it fails signaling a
>> > specific JSON file as a culprit. When I remove that file, it works, and
>> > when I do a query only on that culprit JSON file it works as well.
>> > 2. When the error occurs, if I restart my drill bits, and run the query
>> > again it seems to work (This one baffles me)
>> >
>> > I will look to try the 1.3 release, I am using 1.2.1 release from MapR,
>> so
>> > I may have to wait until they roll a package for easy install (I want to
>> > include their MapRDB Support).
>> >
>> > MapR Team: If you have a current release with the Drill 4006
>> incorporated
>> > and possibly the JDBC Storage Plugin fixes rolled for testing, I'd love
>> to
>> > give it a shot (non-supported of course)
>> >
>> >
>> >
>> > On Thu, Nov 5, 2015 at 12:48 AM, Steven Phillips <st...@dremio.com>
>> > wrote:
>> >
>> > > This looks like DRILL-4006, a fix for which just went in.
>> > >
>> > > https://issues.apache.org/jira/browse/DRILL-4006
>> > >
>> > >
>> > > On Wed, Nov 4, 2015 at 12:16 PM, John Omernik <jo...@omernik.com>
>> wrote:
>> > >
>> > > > I am on MapR's 1.2.1 Package.
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > On Wed, Nov 4, 2015 at 2:14 PM, Abdel Hakim Deneche <
>> > > adeneche@maprtech.com
>> > > > >
>> > > > wrote:
>> > > >
>> > > > > One last thing, what version of Drill do you have installed ?
>> > > > >
>> > > > > On Wed, Nov 4, 2015 at 11:04 AM, John Omernik <jo...@omernik.com>
>> > > wrote:
>> > > > >
>> > > > > > No I don't think so.  I am running Drill in Marathon on Mesos,
>> so
>> > my
>> > > > > > startup settings are all very static. In addition, the only
>> session
>> > > > > > variable I was changed was the json as text option at the
>> session
>> > > level
>> > > > > and
>> > > > > > I was setting it on both the pre drillbit reboot and the post
>> > > drillbit
>> > > > > > reboot sessions (I need that to query the data).
>> > > > > >
>> > > > > > On Wed, Nov 4, 2015 at 12:46 PM, Abdel Hakim Deneche <
>> > > > > > adeneche@maprtech.com>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > This is strange indeed. The error message you reported earlier
>> > > > doesn't
>> > > > > > > suggest a memory leak issue but rather a bug when reading a
>> > > specific
>> > > > > set
>> > > > > > of
>> > > > > > > data.
>> > > > > > > Could it be that you changed some session options, and you
>> forgot
>> > > to
>> > > > > set
>> > > > > > > them again after you restarted the drillbits ?
>> > > > > > >
>> > > > > > > Thanks
>> > > > > > >
>> > > > > > > On Wed, Nov 4, 2015 at 10:37 AM, John Omernik <
>> john@omernik.com>
>> > > > > wrote:
>> > > > > > >
>> > > > > > > > So I pulled the (I was up to two) files that seemed to be
>> > causing
>> > > > > this
>> > > > > > > > issue out, and loaded my data.  (see my other posts on how I
>> > did
>> > > > that
>> > > > > > > with
>> > > > > > > > loading into a folder prefixed by .)
>> > > > > > > >
>> > > > > > > > Anywho, my Drill cluster became unstable in general, and I
>> was
>> > > not
>> > > > > able
>> > > > > > > to
>> > > > > > > > run any queries until I bounced by drill bits.
>> > > > > > > >
>> > > > > > > > I did that, got my process working again, and went to go try
>> > > > > > > > troubleshooting this problem again and everything appears
>> to be
>> > > > > working
>> > > > > > > > well now.  I am stumped.   Could a memory leak have caused
>> that
>> > > > error
>> > > > > > > only
>> > > > > > > > on some files?  I am monitoring now to determine if the
>> problem
>> > > > > starts
>> > > > > > > > again, but that is REALLY strange to me. This seems out of
>> > > > character
>> > > > > > for
>> > > > > > > > Drill, both in my use of it, and in how it handles memory
>> has
>> > > been
>> > > > > > > > explained to me.  If I get the error again, I'll ensure I
>> set
>> > > that
>> > > > to
>> > > > > > > get a
>> > > > > > > > full stack trace.
>> > > > > > > >
>> > > > > > > > John
>> > > > > > > >
>> > > > > > > > On Wed, Nov 4, 2015 at 12:13 PM, Abdel Hakim Deneche <
>> > > > > > > > adeneche@maprtech.com>
>> > > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > The error message "index: 9604, length: 4 (expected:
>> range(0,
>> > > > > 8192))"
>> > > > > > > > > suggests an error happened when Drill tried to access a
>> > memory
>> > > > > buffer
>> > > > > > > > (most
>> > > > > > > > > likely while writing an int or float value)
>> > > > > > > > > This may be a bug actually exposed by that particular data
>> > > > record.
>> > > > > > > > >
>> > > > > > > > > You can try enabling verbose error logging before running
>> the
>> > > > query
>> > > > > > > > again:
>> > > > > > > > >
>> > > > > > > > > set `exec.errors.verbose`=true;
>> > > > > > > > >
>> > > > > > > > > This should give us a nice stack trace about this error.
>> > > > > > > > >
>> > > > > > > > > Thanks
>> > > > > > > > >
>> > > > > > > > > On Wed, Nov 4, 2015 at 7:29 AM, John Omernik <
>> > john@omernik.com
>> > > >
>> > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > There are multiple fields in that record, including two
>> > > lists.
>> > > > > Both
>> > > > > > > > lists
>> > > > > > > > > > have data in them (now I am runnning with json text mode
>> > > > because
>> > > > > at
>> > > > > > > > times
>> > > > > > > > > > the first value is a JSON null, but in these cases, that
>> > > should
>> > > > > be
>> > > > > > > > turned
>> > > > > > > > > > to "null" as  string.  (If I am understanding things
>> > > correctly)
>> > > > > and
>> > > > > > > > > > shouldn't be causing a problem.
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > On Wed, Nov 4, 2015 at 9:21 AM, Hsuan Yi Chu <
>> > > > > hyichu@maprtech.com>
>> > > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > What is the data type for that record in line 2402? A
>> > list?
>> > > > > > > > > > >
>> > > > > > > > > > > Do you think it could be similar to this issue ?
>> > > > > > > > > > >
>> > > > > > > > > > > https://issues.apache.org/jira/browse/DRILL-4006
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > On Wed, Nov 4, 2015 at 6:48 AM, John Omernik <
>> > > > john@omernik.com
>> > > > > >
>> > > > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > > Hey all,
>> > > > > > > > > > > >
>> > > > > > > > > > > > I am working with JSON that is on the whole fairly
>> > clean.
>> > > > I
>> > > > > am
>> > > > > > > > > trying
>> > > > > > > > > > to
>> > > > > > > > > > > > load into Parquet files, and the previous days
>> worth of
>> > > > data
>> > > > > > > worked
>> > > > > > > > > > just
>> > > > > > > > > > > > fine, but todays data has something wrong with it
>> and I
>> > > > Can't
>> > > > > > > > figure
>> > > > > > > > > > out
>> > > > > > > > > > > > what it is. Unfortunately, I can't post the data,
>> > which I
>> > > > > know
>> > > > > > > > makes
>> > > > > > > > > > this
>> > > > > > > > > > > > hard to troubleshoot for the community. Hopefully I
>> can
>> > > > > provide
>> > > > > > > > some
>> > > > > > > > > > info
>> > > > > > > > > > > > here, and get some pointers on where to look, and
>> then
>> > > > report
>> > > > > > > back
>> > > > > > > > on
>> > > > > > > > > > how
>> > > > > > > > > > > > we could potentially improve the error messages.
>> > > > > > > > > > > >
>> > > > > > > > > > > > The error is below.
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > I am looking to figure out given the information
>> > reported
>> > > > > where
>> > > > > > > I'd
>> > > > > > > > > > look
>> > > > > > > > > > > to
>> > > > > > > > > > > > trouble shoot this. Obviously the file
>> > > > > > > > > > > 02ffc306e877_my_load_1446640931.json
>> > > > > > > > > > > > is where I am looking to start
>> > > > > > > > > > > >
>> > > > > > > > > > > > This file has 3000 lines (records of data, so it's
>> > > > somewhere
>> > > > > in
>> > > > > > > > > > between.
>> > > > > > > > > > > >
>> > > > > > > > > > > > The index/length/expected range don't mean anything
>> to
>> > > me I
>> > > > > > could
>> > > > > > > > use
>> > > > > > > > > > > some
>> > > > > > > > > > > > help there, because I am not even sure what I am
>> > looking
>> > > > for.
>> > > > > > > > > > > >
>> > > > > > > > > > > > The record and/or Fragment... do those help me dig
>> in?
>> > > > > > > > > > > >
>> > > > > > > > > > > > Since this is one record per line, I went to line
>> 2402
>> > > but
>> > > > > that
>> > > > > > > > > record
>> > > > > > > > > > > > looks completely normal to me, (like all the other
>> > ones)
>> > > > but
>> > > > > > > since
>> > > > > > > > > this
>> > > > > > > > > > > is
>> > > > > > > > > > > > dense text, I am obviously missing something, but is
>> > the
>> > > > > record
>> > > > > > > the
>> > > > > > > > > > line
>> > > > > > > > > > > > number?
>> > > > > > > > > > > >
>> > > > > > > > > > > > Any other pointers I can use to trouble shoot this?
>> > > > > > > > > > > >
>> > > > > > > > > > > > Thanks!
>> > > > > > > > > > > >
>> > > > > > > > > > > > Error:
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > Caused by:
>> > > > > > > org.apache.drill.common.exceptions.UserRemoteException:
>> > > > > > > > > > > > DATA_READ ERROR: Error parsing JSON - index: 9604,
>> > > length:
>> > > > 4
>> > > > > > > > > (expected:
>> > > > > > > > > > > > range(0, 8192))
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > File
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> /etl/dev/my-metadata/mysqspull/loads/2015-11-04/02ffc306e877_my_load_1446640931.json
>> > > > > > > > > > > >
>> > > > > > > > > > > > Record  2402
>> > > > > > > > > > > >
>> > > > > > > > > > > > Fragment 1:5
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > --
>> > > > > > > > >
>> > > > > > > > > Abdelhakim Deneche
>> > > > > > > > >
>> > > > > > > > > Software Engineer
>> > > > > > > > >
>> > > > > > > > >   <http://www.mapr.com/>
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > Now Available - Free Hadoop On-Demand Training
>> > > > > > > > > <
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > --
>> > > > > > >
>> > > > > > > Abdelhakim Deneche
>> > > > > > >
>> > > > > > > Software Engineer
>> > > > > > >
>> > > > > > >   <http://www.mapr.com/>
>> > > > > > >
>> > > > > > >
>> > > > > > > Now Available - Free Hadoop On-Demand Training
>> > > > > > > <
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > >
>> > > > > Abdelhakim Deneche
>> > > > >
>> > > > > Software Engineer
>> > > > >
>> > > > >   <http://www.mapr.com/>
>> > > > >
>> > > > >
>> > > > > Now Available - Free Hadoop On-Demand Training
>> > > > > <
>> > > > >
>> > > >
>> > >
>> >
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>>
>>
>> --
>>
>> Abdelhakim Deneche
>>
>> Software Engineer
>>
>>   <http://www.mapr.com/>
>>
>>
>> Now Available - Free Hadoop On-Demand Training
>> <
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> >
>>
>
>

Re: Help with Troubleshooting dense error message

Posted by John Omernik <jo...@omernik.com>.
Abdel -

Thank you, I do understand it's a challenge for troubleshooting, and
apologize to that end. I see you have a @maprtech email, is the binaries in
the release built with the MapRDB support? I need that for my mapr cluster,
that's why I am waiting for a MapR build of 1.3.0.

On Thu, Nov 5, 2015 at 11:44 AM, Abdel Hakim Deneche <ad...@maprtech.com>
wrote:

> Hey John,
>
> If you want to, you can download the binaries for 1.3 release candidate
> from [1] and see if you can reproduce the error. You just need to unzip the
> folder and run "bin/drill-embedded".
>
> Without some data to reproduce the issue, it's really hard to come up with
> an explanation.
>
> Thanks
>
> [1] http://people.apache.org/~jacques/apache-drill-1.3.0.rc0/
>
> On Thu, Nov 5, 2015 at 5:12 AM, John Omernik <jo...@omernik.com> wrote:
>
> > Hey Steven, I will look into that.  Based on your understanding of the
> > problem would DRILL-4006 still apply given these conditions
> >
> > 1. When I query a directory of json files, and it fails signaling a
> > specific JSON file as a culprit. When I remove that file, it works, and
> > when I do a query only on that culprit JSON file it works as well.
> > 2. When the error occurs, if I restart my drill bits, and run the query
> > again it seems to work (This one baffles me)
> >
> > I will look to try the 1.3 release, I am using 1.2.1 release from MapR,
> so
> > I may have to wait until they roll a package for easy install (I want to
> > include their MapRDB Support).
> >
> > MapR Team: If you have a current release with the Drill 4006 incorporated
> > and possibly the JDBC Storage Plugin fixes rolled for testing, I'd love
> to
> > give it a shot (non-supported of course)
> >
> >
> >
> > On Thu, Nov 5, 2015 at 12:48 AM, Steven Phillips <st...@dremio.com>
> > wrote:
> >
> > > This looks like DRILL-4006, a fix for which just went in.
> > >
> > > https://issues.apache.org/jira/browse/DRILL-4006
> > >
> > >
> > > On Wed, Nov 4, 2015 at 12:16 PM, John Omernik <jo...@omernik.com>
> wrote:
> > >
> > > > I am on MapR's 1.2.1 Package.
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Nov 4, 2015 at 2:14 PM, Abdel Hakim Deneche <
> > > adeneche@maprtech.com
> > > > >
> > > > wrote:
> > > >
> > > > > One last thing, what version of Drill do you have installed ?
> > > > >
> > > > > On Wed, Nov 4, 2015 at 11:04 AM, John Omernik <jo...@omernik.com>
> > > wrote:
> > > > >
> > > > > > No I don't think so.  I am running Drill in Marathon on Mesos, so
> > my
> > > > > > startup settings are all very static. In addition, the only
> session
> > > > > > variable I was changed was the json as text option at the session
> > > level
> > > > > and
> > > > > > I was setting it on both the pre drillbit reboot and the post
> > > drillbit
> > > > > > reboot sessions (I need that to query the data).
> > > > > >
> > > > > > On Wed, Nov 4, 2015 at 12:46 PM, Abdel Hakim Deneche <
> > > > > > adeneche@maprtech.com>
> > > > > > wrote:
> > > > > >
> > > > > > > This is strange indeed. The error message you reported earlier
> > > > doesn't
> > > > > > > suggest a memory leak issue but rather a bug when reading a
> > > specific
> > > > > set
> > > > > > of
> > > > > > > data.
> > > > > > > Could it be that you changed some session options, and you
> forgot
> > > to
> > > > > set
> > > > > > > them again after you restarted the drillbits ?
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > On Wed, Nov 4, 2015 at 10:37 AM, John Omernik <
> john@omernik.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > So I pulled the (I was up to two) files that seemed to be
> > causing
> > > > > this
> > > > > > > > issue out, and loaded my data.  (see my other posts on how I
> > did
> > > > that
> > > > > > > with
> > > > > > > > loading into a folder prefixed by .)
> > > > > > > >
> > > > > > > > Anywho, my Drill cluster became unstable in general, and I
> was
> > > not
> > > > > able
> > > > > > > to
> > > > > > > > run any queries until I bounced by drill bits.
> > > > > > > >
> > > > > > > > I did that, got my process working again, and went to go try
> > > > > > > > troubleshooting this problem again and everything appears to
> be
> > > > > working
> > > > > > > > well now.  I am stumped.   Could a memory leak have caused
> that
> > > > error
> > > > > > > only
> > > > > > > > on some files?  I am monitoring now to determine if the
> problem
> > > > > starts
> > > > > > > > again, but that is REALLY strange to me. This seems out of
> > > > character
> > > > > > for
> > > > > > > > Drill, both in my use of it, and in how it handles memory has
> > > been
> > > > > > > > explained to me.  If I get the error again, I'll ensure I set
> > > that
> > > > to
> > > > > > > get a
> > > > > > > > full stack trace.
> > > > > > > >
> > > > > > > > John
> > > > > > > >
> > > > > > > > On Wed, Nov 4, 2015 at 12:13 PM, Abdel Hakim Deneche <
> > > > > > > > adeneche@maprtech.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > The error message "index: 9604, length: 4 (expected:
> range(0,
> > > > > 8192))"
> > > > > > > > > suggests an error happened when Drill tried to access a
> > memory
> > > > > buffer
> > > > > > > > (most
> > > > > > > > > likely while writing an int or float value)
> > > > > > > > > This may be a bug actually exposed by that particular data
> > > > record.
> > > > > > > > >
> > > > > > > > > You can try enabling verbose error logging before running
> the
> > > > query
> > > > > > > > again:
> > > > > > > > >
> > > > > > > > > set `exec.errors.verbose`=true;
> > > > > > > > >
> > > > > > > > > This should give us a nice stack trace about this error.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > On Wed, Nov 4, 2015 at 7:29 AM, John Omernik <
> > john@omernik.com
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > There are multiple fields in that record, including two
> > > lists.
> > > > > Both
> > > > > > > > lists
> > > > > > > > > > have data in them (now I am runnning with json text mode
> > > > because
> > > > > at
> > > > > > > > times
> > > > > > > > > > the first value is a JSON null, but in these cases, that
> > > should
> > > > > be
> > > > > > > > turned
> > > > > > > > > > to "null" as  string.  (If I am understanding things
> > > correctly)
> > > > > and
> > > > > > > > > > shouldn't be causing a problem.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Wed, Nov 4, 2015 at 9:21 AM, Hsuan Yi Chu <
> > > > > hyichu@maprtech.com>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > What is the data type for that record in line 2402? A
> > list?
> > > > > > > > > > >
> > > > > > > > > > > Do you think it could be similar to this issue ?
> > > > > > > > > > >
> > > > > > > > > > > https://issues.apache.org/jira/browse/DRILL-4006
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Nov 4, 2015 at 6:48 AM, John Omernik <
> > > > john@omernik.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hey all,
> > > > > > > > > > > >
> > > > > > > > > > > > I am working with JSON that is on the whole fairly
> > clean.
> > > > I
> > > > > am
> > > > > > > > > trying
> > > > > > > > > > to
> > > > > > > > > > > > load into Parquet files, and the previous days worth
> of
> > > > data
> > > > > > > worked
> > > > > > > > > > just
> > > > > > > > > > > > fine, but todays data has something wrong with it
> and I
> > > > Can't
> > > > > > > > figure
> > > > > > > > > > out
> > > > > > > > > > > > what it is. Unfortunately, I can't post the data,
> > which I
> > > > > know
> > > > > > > > makes
> > > > > > > > > > this
> > > > > > > > > > > > hard to troubleshoot for the community. Hopefully I
> can
> > > > > provide
> > > > > > > > some
> > > > > > > > > > info
> > > > > > > > > > > > here, and get some pointers on where to look, and
> then
> > > > report
> > > > > > > back
> > > > > > > > on
> > > > > > > > > > how
> > > > > > > > > > > > we could potentially improve the error messages.
> > > > > > > > > > > >
> > > > > > > > > > > > The error is below.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I am looking to figure out given the information
> > reported
> > > > > where
> > > > > > > I'd
> > > > > > > > > > look
> > > > > > > > > > > to
> > > > > > > > > > > > trouble shoot this. Obviously the file
> > > > > > > > > > > 02ffc306e877_my_load_1446640931.json
> > > > > > > > > > > > is where I am looking to start
> > > > > > > > > > > >
> > > > > > > > > > > > This file has 3000 lines (records of data, so it's
> > > > somewhere
> > > > > in
> > > > > > > > > > between.
> > > > > > > > > > > >
> > > > > > > > > > > > The index/length/expected range don't mean anything
> to
> > > me I
> > > > > > could
> > > > > > > > use
> > > > > > > > > > > some
> > > > > > > > > > > > help there, because I am not even sure what I am
> > looking
> > > > for.
> > > > > > > > > > > >
> > > > > > > > > > > > The record and/or Fragment... do those help me dig
> in?
> > > > > > > > > > > >
> > > > > > > > > > > > Since this is one record per line, I went to line
> 2402
> > > but
> > > > > that
> > > > > > > > > record
> > > > > > > > > > > > looks completely normal to me, (like all the other
> > ones)
> > > > but
> > > > > > > since
> > > > > > > > > this
> > > > > > > > > > > is
> > > > > > > > > > > > dense text, I am obviously missing something, but is
> > the
> > > > > record
> > > > > > > the
> > > > > > > > > > line
> > > > > > > > > > > > number?
> > > > > > > > > > > >
> > > > > > > > > > > > Any other pointers I can use to trouble shoot this?
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks!
> > > > > > > > > > > >
> > > > > > > > > > > > Error:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Caused by:
> > > > > > > org.apache.drill.common.exceptions.UserRemoteException:
> > > > > > > > > > > > DATA_READ ERROR: Error parsing JSON - index: 9604,
> > > length:
> > > > 4
> > > > > > > > > (expected:
> > > > > > > > > > > > range(0, 8192))
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > File
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> /etl/dev/my-metadata/mysqspull/loads/2015-11-04/02ffc306e877_my_load_1446640931.json
> > > > > > > > > > > >
> > > > > > > > > > > > Record  2402
> > > > > > > > > > > >
> > > > > > > > > > > > Fragment 1:5
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> > > > > > > > > Abdelhakim Deneche
> > > > > > > > >
> > > > > > > > > Software Engineer
> > > > > > > > >
> > > > > > > > >   <http://www.mapr.com/>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Now Available - Free Hadoop On-Demand Training
> > > > > > > > > <
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Abdelhakim Deneche
> > > > > > >
> > > > > > > Software Engineer
> > > > > > >
> > > > > > >   <http://www.mapr.com/>
> > > > > > >
> > > > > > >
> > > > > > > Now Available - Free Hadoop On-Demand Training
> > > > > > > <
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Abdelhakim Deneche
> > > > >
> > > > > Software Engineer
> > > > >
> > > > >   <http://www.mapr.com/>
> > > > >
> > > > >
> > > > > Now Available - Free Hadoop On-Demand Training
> > > > > <
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

Re: Help with Troubleshooting dense error message

Posted by Abdel Hakim Deneche <ad...@maprtech.com>.
Hey John,

If you want to, you can download the binaries for 1.3 release candidate
from [1] and see if you can reproduce the error. You just need to unzip the
folder and run "bin/drill-embedded".

Without some data to reproduce the issue, it's really hard to come up with
an explanation.

Thanks

[1] http://people.apache.org/~jacques/apache-drill-1.3.0.rc0/

On Thu, Nov 5, 2015 at 5:12 AM, John Omernik <jo...@omernik.com> wrote:

> Hey Steven, I will look into that.  Based on your understanding of the
> problem would DRILL-4006 still apply given these conditions
>
> 1. When I query a directory of json files, and it fails signaling a
> specific JSON file as a culprit. When I remove that file, it works, and
> when I do a query only on that culprit JSON file it works as well.
> 2. When the error occurs, if I restart my drill bits, and run the query
> again it seems to work (This one baffles me)
>
> I will look to try the 1.3 release, I am using 1.2.1 release from MapR, so
> I may have to wait until they roll a package for easy install (I want to
> include their MapRDB Support).
>
> MapR Team: If you have a current release with the Drill 4006 incorporated
> and possibly the JDBC Storage Plugin fixes rolled for testing, I'd love to
> give it a shot (non-supported of course)
>
>
>
> On Thu, Nov 5, 2015 at 12:48 AM, Steven Phillips <st...@dremio.com>
> wrote:
>
> > This looks like DRILL-4006, a fix for which just went in.
> >
> > https://issues.apache.org/jira/browse/DRILL-4006
> >
> >
> > On Wed, Nov 4, 2015 at 12:16 PM, John Omernik <jo...@omernik.com> wrote:
> >
> > > I am on MapR's 1.2.1 Package.
> > >
> > >
> > >
> > >
> > > On Wed, Nov 4, 2015 at 2:14 PM, Abdel Hakim Deneche <
> > adeneche@maprtech.com
> > > >
> > > wrote:
> > >
> > > > One last thing, what version of Drill do you have installed ?
> > > >
> > > > On Wed, Nov 4, 2015 at 11:04 AM, John Omernik <jo...@omernik.com>
> > wrote:
> > > >
> > > > > No I don't think so.  I am running Drill in Marathon on Mesos, so
> my
> > > > > startup settings are all very static. In addition, the only session
> > > > > variable I was changed was the json as text option at the session
> > level
> > > > and
> > > > > I was setting it on both the pre drillbit reboot and the post
> > drillbit
> > > > > reboot sessions (I need that to query the data).
> > > > >
> > > > > On Wed, Nov 4, 2015 at 12:46 PM, Abdel Hakim Deneche <
> > > > > adeneche@maprtech.com>
> > > > > wrote:
> > > > >
> > > > > > This is strange indeed. The error message you reported earlier
> > > doesn't
> > > > > > suggest a memory leak issue but rather a bug when reading a
> > specific
> > > > set
> > > > > of
> > > > > > data.
> > > > > > Could it be that you changed some session options, and you forgot
> > to
> > > > set
> > > > > > them again after you restarted the drillbits ?
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > On Wed, Nov 4, 2015 at 10:37 AM, John Omernik <jo...@omernik.com>
> > > > wrote:
> > > > > >
> > > > > > > So I pulled the (I was up to two) files that seemed to be
> causing
> > > > this
> > > > > > > issue out, and loaded my data.  (see my other posts on how I
> did
> > > that
> > > > > > with
> > > > > > > loading into a folder prefixed by .)
> > > > > > >
> > > > > > > Anywho, my Drill cluster became unstable in general, and I was
> > not
> > > > able
> > > > > > to
> > > > > > > run any queries until I bounced by drill bits.
> > > > > > >
> > > > > > > I did that, got my process working again, and went to go try
> > > > > > > troubleshooting this problem again and everything appears to be
> > > > working
> > > > > > > well now.  I am stumped.   Could a memory leak have caused that
> > > error
> > > > > > only
> > > > > > > on some files?  I am monitoring now to determine if the problem
> > > > starts
> > > > > > > again, but that is REALLY strange to me. This seems out of
> > > character
> > > > > for
> > > > > > > Drill, both in my use of it, and in how it handles memory has
> > been
> > > > > > > explained to me.  If I get the error again, I'll ensure I set
> > that
> > > to
> > > > > > get a
> > > > > > > full stack trace.
> > > > > > >
> > > > > > > John
> > > > > > >
> > > > > > > On Wed, Nov 4, 2015 at 12:13 PM, Abdel Hakim Deneche <
> > > > > > > adeneche@maprtech.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > The error message "index: 9604, length: 4 (expected: range(0,
> > > > 8192))"
> > > > > > > > suggests an error happened when Drill tried to access a
> memory
> > > > buffer
> > > > > > > (most
> > > > > > > > likely while writing an int or float value)
> > > > > > > > This may be a bug actually exposed by that particular data
> > > record.
> > > > > > > >
> > > > > > > > You can try enabling verbose error logging before running the
> > > query
> > > > > > > again:
> > > > > > > >
> > > > > > > > set `exec.errors.verbose`=true;
> > > > > > > >
> > > > > > > > This should give us a nice stack trace about this error.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > On Wed, Nov 4, 2015 at 7:29 AM, John Omernik <
> john@omernik.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > There are multiple fields in that record, including two
> > lists.
> > > > Both
> > > > > > > lists
> > > > > > > > > have data in them (now I am runnning with json text mode
> > > because
> > > > at
> > > > > > > times
> > > > > > > > > the first value is a JSON null, but in these cases, that
> > should
> > > > be
> > > > > > > turned
> > > > > > > > > to "null" as  string.  (If I am understanding things
> > correctly)
> > > > and
> > > > > > > > > shouldn't be causing a problem.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, Nov 4, 2015 at 9:21 AM, Hsuan Yi Chu <
> > > > hyichu@maprtech.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > What is the data type for that record in line 2402? A
> list?
> > > > > > > > > >
> > > > > > > > > > Do you think it could be similar to this issue ?
> > > > > > > > > >
> > > > > > > > > > https://issues.apache.org/jira/browse/DRILL-4006
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Wed, Nov 4, 2015 at 6:48 AM, John Omernik <
> > > john@omernik.com
> > > > >
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hey all,
> > > > > > > > > > >
> > > > > > > > > > > I am working with JSON that is on the whole fairly
> clean.
> > > I
> > > > am
> > > > > > > > trying
> > > > > > > > > to
> > > > > > > > > > > load into Parquet files, and the previous days worth of
> > > data
> > > > > > worked
> > > > > > > > > just
> > > > > > > > > > > fine, but todays data has something wrong with it and I
> > > Can't
> > > > > > > figure
> > > > > > > > > out
> > > > > > > > > > > what it is. Unfortunately, I can't post the data,
> which I
> > > > know
> > > > > > > makes
> > > > > > > > > this
> > > > > > > > > > > hard to troubleshoot for the community. Hopefully I can
> > > > provide
> > > > > > > some
> > > > > > > > > info
> > > > > > > > > > > here, and get some pointers on where to look, and then
> > > report
> > > > > > back
> > > > > > > on
> > > > > > > > > how
> > > > > > > > > > > we could potentially improve the error messages.
> > > > > > > > > > >
> > > > > > > > > > > The error is below.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I am looking to figure out given the information
> reported
> > > > where
> > > > > > I'd
> > > > > > > > > look
> > > > > > > > > > to
> > > > > > > > > > > trouble shoot this. Obviously the file
> > > > > > > > > > 02ffc306e877_my_load_1446640931.json
> > > > > > > > > > > is where I am looking to start
> > > > > > > > > > >
> > > > > > > > > > > This file has 3000 lines (records of data, so it's
> > > somewhere
> > > > in
> > > > > > > > > between.
> > > > > > > > > > >
> > > > > > > > > > > The index/length/expected range don't mean anything to
> > me I
> > > > > could
> > > > > > > use
> > > > > > > > > > some
> > > > > > > > > > > help there, because I am not even sure what I am
> looking
> > > for.
> > > > > > > > > > >
> > > > > > > > > > > The record and/or Fragment... do those help me dig in?
> > > > > > > > > > >
> > > > > > > > > > > Since this is one record per line, I went to line 2402
> > but
> > > > that
> > > > > > > > record
> > > > > > > > > > > looks completely normal to me, (like all the other
> ones)
> > > but
> > > > > > since
> > > > > > > > this
> > > > > > > > > > is
> > > > > > > > > > > dense text, I am obviously missing something, but is
> the
> > > > record
> > > > > > the
> > > > > > > > > line
> > > > > > > > > > > number?
> > > > > > > > > > >
> > > > > > > > > > > Any other pointers I can use to trouble shoot this?
> > > > > > > > > > >
> > > > > > > > > > > Thanks!
> > > > > > > > > > >
> > > > > > > > > > > Error:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Caused by:
> > > > > > org.apache.drill.common.exceptions.UserRemoteException:
> > > > > > > > > > > DATA_READ ERROR: Error parsing JSON - index: 9604,
> > length:
> > > 4
> > > > > > > > (expected:
> > > > > > > > > > > range(0, 8192))
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > File
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> /etl/dev/my-metadata/mysqspull/loads/2015-11-04/02ffc306e877_my_load_1446640931.json
> > > > > > > > > > >
> > > > > > > > > > > Record  2402
> > > > > > > > > > >
> > > > > > > > > > > Fragment 1:5
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > Abdelhakim Deneche
> > > > > > > >
> > > > > > > > Software Engineer
> > > > > > > >
> > > > > > > >   <http://www.mapr.com/>
> > > > > > > >
> > > > > > > >
> > > > > > > > Now Available - Free Hadoop On-Demand Training
> > > > > > > > <
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Abdelhakim Deneche
> > > > > >
> > > > > > Software Engineer
> > > > > >
> > > > > >   <http://www.mapr.com/>
> > > > > >
> > > > > >
> > > > > > Now Available - Free Hadoop On-Demand Training
> > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Abdelhakim Deneche
> > > >
> > > > Software Engineer
> > > >
> > > >   <http://www.mapr.com/>
> > > >
> > > >
> > > > Now Available - Free Hadoop On-Demand Training
> > > > <
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > >
> > > >
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: Help with Troubleshooting dense error message

Posted by John Omernik <jo...@omernik.com>.
Hey Steven, I will look into that.  Based on your understanding of the
problem would DRILL-4006 still apply given these conditions

1. When I query a directory of json files, and it fails signaling a
specific JSON file as a culprit. When I remove that file, it works, and
when I do a query only on that culprit JSON file it works as well.
2. When the error occurs, if I restart my drill bits, and run the query
again it seems to work (This one baffles me)

I will look to try the 1.3 release, I am using 1.2.1 release from MapR, so
I may have to wait until they roll a package for easy install (I want to
include their MapRDB Support).

MapR Team: If you have a current release with the Drill 4006 incorporated
and possibly the JDBC Storage Plugin fixes rolled for testing, I'd love to
give it a shot (non-supported of course)



On Thu, Nov 5, 2015 at 12:48 AM, Steven Phillips <st...@dremio.com> wrote:

> This looks like DRILL-4006, a fix for which just went in.
>
> https://issues.apache.org/jira/browse/DRILL-4006
>
>
> On Wed, Nov 4, 2015 at 12:16 PM, John Omernik <jo...@omernik.com> wrote:
>
> > I am on MapR's 1.2.1 Package.
> >
> >
> >
> >
> > On Wed, Nov 4, 2015 at 2:14 PM, Abdel Hakim Deneche <
> adeneche@maprtech.com
> > >
> > wrote:
> >
> > > One last thing, what version of Drill do you have installed ?
> > >
> > > On Wed, Nov 4, 2015 at 11:04 AM, John Omernik <jo...@omernik.com>
> wrote:
> > >
> > > > No I don't think so.  I am running Drill in Marathon on Mesos, so my
> > > > startup settings are all very static. In addition, the only session
> > > > variable I was changed was the json as text option at the session
> level
> > > and
> > > > I was setting it on both the pre drillbit reboot and the post
> drillbit
> > > > reboot sessions (I need that to query the data).
> > > >
> > > > On Wed, Nov 4, 2015 at 12:46 PM, Abdel Hakim Deneche <
> > > > adeneche@maprtech.com>
> > > > wrote:
> > > >
> > > > > This is strange indeed. The error message you reported earlier
> > doesn't
> > > > > suggest a memory leak issue but rather a bug when reading a
> specific
> > > set
> > > > of
> > > > > data.
> > > > > Could it be that you changed some session options, and you forgot
> to
> > > set
> > > > > them again after you restarted the drillbits ?
> > > > >
> > > > > Thanks
> > > > >
> > > > > On Wed, Nov 4, 2015 at 10:37 AM, John Omernik <jo...@omernik.com>
> > > wrote:
> > > > >
> > > > > > So I pulled the (I was up to two) files that seemed to be causing
> > > this
> > > > > > issue out, and loaded my data.  (see my other posts on how I did
> > that
> > > > > with
> > > > > > loading into a folder prefixed by .)
> > > > > >
> > > > > > Anywho, my Drill cluster became unstable in general, and I was
> not
> > > able
> > > > > to
> > > > > > run any queries until I bounced by drill bits.
> > > > > >
> > > > > > I did that, got my process working again, and went to go try
> > > > > > troubleshooting this problem again and everything appears to be
> > > working
> > > > > > well now.  I am stumped.   Could a memory leak have caused that
> > error
> > > > > only
> > > > > > on some files?  I am monitoring now to determine if the problem
> > > starts
> > > > > > again, but that is REALLY strange to me. This seems out of
> > character
> > > > for
> > > > > > Drill, both in my use of it, and in how it handles memory has
> been
> > > > > > explained to me.  If I get the error again, I'll ensure I set
> that
> > to
> > > > > get a
> > > > > > full stack trace.
> > > > > >
> > > > > > John
> > > > > >
> > > > > > On Wed, Nov 4, 2015 at 12:13 PM, Abdel Hakim Deneche <
> > > > > > adeneche@maprtech.com>
> > > > > > wrote:
> > > > > >
> > > > > > > The error message "index: 9604, length: 4 (expected: range(0,
> > > 8192))"
> > > > > > > suggests an error happened when Drill tried to access a memory
> > > buffer
> > > > > > (most
> > > > > > > likely while writing an int or float value)
> > > > > > > This may be a bug actually exposed by that particular data
> > record.
> > > > > > >
> > > > > > > You can try enabling verbose error logging before running the
> > query
> > > > > > again:
> > > > > > >
> > > > > > > set `exec.errors.verbose`=true;
> > > > > > >
> > > > > > > This should give us a nice stack trace about this error.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > On Wed, Nov 4, 2015 at 7:29 AM, John Omernik <john@omernik.com
> >
> > > > wrote:
> > > > > > >
> > > > > > > > There are multiple fields in that record, including two
> lists.
> > > Both
> > > > > > lists
> > > > > > > > have data in them (now I am runnning with json text mode
> > because
> > > at
> > > > > > times
> > > > > > > > the first value is a JSON null, but in these cases, that
> should
> > > be
> > > > > > turned
> > > > > > > > to "null" as  string.  (If I am understanding things
> correctly)
> > > and
> > > > > > > > shouldn't be causing a problem.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Nov 4, 2015 at 9:21 AM, Hsuan Yi Chu <
> > > hyichu@maprtech.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > What is the data type for that record in line 2402? A list?
> > > > > > > > >
> > > > > > > > > Do you think it could be similar to this issue ?
> > > > > > > > >
> > > > > > > > > https://issues.apache.org/jira/browse/DRILL-4006
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, Nov 4, 2015 at 6:48 AM, John Omernik <
> > john@omernik.com
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hey all,
> > > > > > > > > >
> > > > > > > > > > I am working with JSON that is on the whole fairly clean.
> > I
> > > am
> > > > > > > trying
> > > > > > > > to
> > > > > > > > > > load into Parquet files, and the previous days worth of
> > data
> > > > > worked
> > > > > > > > just
> > > > > > > > > > fine, but todays data has something wrong with it and I
> > Can't
> > > > > > figure
> > > > > > > > out
> > > > > > > > > > what it is. Unfortunately, I can't post the data, which I
> > > know
> > > > > > makes
> > > > > > > > this
> > > > > > > > > > hard to troubleshoot for the community. Hopefully I can
> > > provide
> > > > > > some
> > > > > > > > info
> > > > > > > > > > here, and get some pointers on where to look, and then
> > report
> > > > > back
> > > > > > on
> > > > > > > > how
> > > > > > > > > > we could potentially improve the error messages.
> > > > > > > > > >
> > > > > > > > > > The error is below.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I am looking to figure out given the information reported
> > > where
> > > > > I'd
> > > > > > > > look
> > > > > > > > > to
> > > > > > > > > > trouble shoot this. Obviously the file
> > > > > > > > > 02ffc306e877_my_load_1446640931.json
> > > > > > > > > > is where I am looking to start
> > > > > > > > > >
> > > > > > > > > > This file has 3000 lines (records of data, so it's
> > somewhere
> > > in
> > > > > > > > between.
> > > > > > > > > >
> > > > > > > > > > The index/length/expected range don't mean anything to
> me I
> > > > could
> > > > > > use
> > > > > > > > > some
> > > > > > > > > > help there, because I am not even sure what I am looking
> > for.
> > > > > > > > > >
> > > > > > > > > > The record and/or Fragment... do those help me dig in?
> > > > > > > > > >
> > > > > > > > > > Since this is one record per line, I went to line 2402
> but
> > > that
> > > > > > > record
> > > > > > > > > > looks completely normal to me, (like all the other ones)
> > but
> > > > > since
> > > > > > > this
> > > > > > > > > is
> > > > > > > > > > dense text, I am obviously missing something, but is the
> > > record
> > > > > the
> > > > > > > > line
> > > > > > > > > > number?
> > > > > > > > > >
> > > > > > > > > > Any other pointers I can use to trouble shoot this?
> > > > > > > > > >
> > > > > > > > > > Thanks!
> > > > > > > > > >
> > > > > > > > > > Error:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Caused by:
> > > > > org.apache.drill.common.exceptions.UserRemoteException:
> > > > > > > > > > DATA_READ ERROR: Error parsing JSON - index: 9604,
> length:
> > 4
> > > > > > > (expected:
> > > > > > > > > > range(0, 8192))
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > File
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> /etl/dev/my-metadata/mysqspull/loads/2015-11-04/02ffc306e877_my_load_1446640931.json
> > > > > > > > > >
> > > > > > > > > > Record  2402
> > > > > > > > > >
> > > > > > > > > > Fragment 1:5
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Abdelhakim Deneche
> > > > > > >
> > > > > > > Software Engineer
> > > > > > >
> > > > > > >   <http://www.mapr.com/>
> > > > > > >
> > > > > > >
> > > > > > > Now Available - Free Hadoop On-Demand Training
> > > > > > > <
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Abdelhakim Deneche
> > > > >
> > > > > Software Engineer
> > > > >
> > > > >   <http://www.mapr.com/>
> > > > >
> > > > >
> > > > > Now Available - Free Hadoop On-Demand Training
> > > > > <
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Abdelhakim Deneche
> > >
> > > Software Engineer
> > >
> > >   <http://www.mapr.com/>
> > >
> > >
> > > Now Available - Free Hadoop On-Demand Training
> > > <
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > >
> > >
> >
>

Re: Help with Troubleshooting dense error message

Posted by Steven Phillips <st...@dremio.com>.
This looks like DRILL-4006, a fix for which just went in.

https://issues.apache.org/jira/browse/DRILL-4006


On Wed, Nov 4, 2015 at 12:16 PM, John Omernik <jo...@omernik.com> wrote:

> I am on MapR's 1.2.1 Package.
>
>
>
>
> On Wed, Nov 4, 2015 at 2:14 PM, Abdel Hakim Deneche <adeneche@maprtech.com
> >
> wrote:
>
> > One last thing, what version of Drill do you have installed ?
> >
> > On Wed, Nov 4, 2015 at 11:04 AM, John Omernik <jo...@omernik.com> wrote:
> >
> > > No I don't think so.  I am running Drill in Marathon on Mesos, so my
> > > startup settings are all very static. In addition, the only session
> > > variable I was changed was the json as text option at the session level
> > and
> > > I was setting it on both the pre drillbit reboot and the post drillbit
> > > reboot sessions (I need that to query the data).
> > >
> > > On Wed, Nov 4, 2015 at 12:46 PM, Abdel Hakim Deneche <
> > > adeneche@maprtech.com>
> > > wrote:
> > >
> > > > This is strange indeed. The error message you reported earlier
> doesn't
> > > > suggest a memory leak issue but rather a bug when reading a specific
> > set
> > > of
> > > > data.
> > > > Could it be that you changed some session options, and you forgot to
> > set
> > > > them again after you restarted the drillbits ?
> > > >
> > > > Thanks
> > > >
> > > > On Wed, Nov 4, 2015 at 10:37 AM, John Omernik <jo...@omernik.com>
> > wrote:
> > > >
> > > > > So I pulled the (I was up to two) files that seemed to be causing
> > this
> > > > > issue out, and loaded my data.  (see my other posts on how I did
> that
> > > > with
> > > > > loading into a folder prefixed by .)
> > > > >
> > > > > Anywho, my Drill cluster became unstable in general, and I was not
> > able
> > > > to
> > > > > run any queries until I bounced by drill bits.
> > > > >
> > > > > I did that, got my process working again, and went to go try
> > > > > troubleshooting this problem again and everything appears to be
> > working
> > > > > well now.  I am stumped.   Could a memory leak have caused that
> error
> > > > only
> > > > > on some files?  I am monitoring now to determine if the problem
> > starts
> > > > > again, but that is REALLY strange to me. This seems out of
> character
> > > for
> > > > > Drill, both in my use of it, and in how it handles memory has been
> > > > > explained to me.  If I get the error again, I'll ensure I set that
> to
> > > > get a
> > > > > full stack trace.
> > > > >
> > > > > John
> > > > >
> > > > > On Wed, Nov 4, 2015 at 12:13 PM, Abdel Hakim Deneche <
> > > > > adeneche@maprtech.com>
> > > > > wrote:
> > > > >
> > > > > > The error message "index: 9604, length: 4 (expected: range(0,
> > 8192))"
> > > > > > suggests an error happened when Drill tried to access a memory
> > buffer
> > > > > (most
> > > > > > likely while writing an int or float value)
> > > > > > This may be a bug actually exposed by that particular data
> record.
> > > > > >
> > > > > > You can try enabling verbose error logging before running the
> query
> > > > > again:
> > > > > >
> > > > > > set `exec.errors.verbose`=true;
> > > > > >
> > > > > > This should give us a nice stack trace about this error.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > On Wed, Nov 4, 2015 at 7:29 AM, John Omernik <jo...@omernik.com>
> > > wrote:
> > > > > >
> > > > > > > There are multiple fields in that record, including two lists.
> > Both
> > > > > lists
> > > > > > > have data in them (now I am runnning with json text mode
> because
> > at
> > > > > times
> > > > > > > the first value is a JSON null, but in these cases, that should
> > be
> > > > > turned
> > > > > > > to "null" as  string.  (If I am understanding things correctly)
> > and
> > > > > > > shouldn't be causing a problem.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Nov 4, 2015 at 9:21 AM, Hsuan Yi Chu <
> > hyichu@maprtech.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > What is the data type for that record in line 2402? A list?
> > > > > > > >
> > > > > > > > Do you think it could be similar to this issue ?
> > > > > > > >
> > > > > > > > https://issues.apache.org/jira/browse/DRILL-4006
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Nov 4, 2015 at 6:48 AM, John Omernik <
> john@omernik.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hey all,
> > > > > > > > >
> > > > > > > > > I am working with JSON that is on the whole fairly clean.
> I
> > am
> > > > > > trying
> > > > > > > to
> > > > > > > > > load into Parquet files, and the previous days worth of
> data
> > > > worked
> > > > > > > just
> > > > > > > > > fine, but todays data has something wrong with it and I
> Can't
> > > > > figure
> > > > > > > out
> > > > > > > > > what it is. Unfortunately, I can't post the data, which I
> > know
> > > > > makes
> > > > > > > this
> > > > > > > > > hard to troubleshoot for the community. Hopefully I can
> > provide
> > > > > some
> > > > > > > info
> > > > > > > > > here, and get some pointers on where to look, and then
> report
> > > > back
> > > > > on
> > > > > > > how
> > > > > > > > > we could potentially improve the error messages.
> > > > > > > > >
> > > > > > > > > The error is below.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I am looking to figure out given the information reported
> > where
> > > > I'd
> > > > > > > look
> > > > > > > > to
> > > > > > > > > trouble shoot this. Obviously the file
> > > > > > > > 02ffc306e877_my_load_1446640931.json
> > > > > > > > > is where I am looking to start
> > > > > > > > >
> > > > > > > > > This file has 3000 lines (records of data, so it's
> somewhere
> > in
> > > > > > > between.
> > > > > > > > >
> > > > > > > > > The index/length/expected range don't mean anything to me I
> > > could
> > > > > use
> > > > > > > > some
> > > > > > > > > help there, because I am not even sure what I am looking
> for.
> > > > > > > > >
> > > > > > > > > The record and/or Fragment... do those help me dig in?
> > > > > > > > >
> > > > > > > > > Since this is one record per line, I went to line 2402 but
> > that
> > > > > > record
> > > > > > > > > looks completely normal to me, (like all the other ones)
> but
> > > > since
> > > > > > this
> > > > > > > > is
> > > > > > > > > dense text, I am obviously missing something, but is the
> > record
> > > > the
> > > > > > > line
> > > > > > > > > number?
> > > > > > > > >
> > > > > > > > > Any other pointers I can use to trouble shoot this?
> > > > > > > > >
> > > > > > > > > Thanks!
> > > > > > > > >
> > > > > > > > > Error:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Caused by:
> > > > org.apache.drill.common.exceptions.UserRemoteException:
> > > > > > > > > DATA_READ ERROR: Error parsing JSON - index: 9604, length:
> 4
> > > > > > (expected:
> > > > > > > > > range(0, 8192))
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > File
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> /etl/dev/my-metadata/mysqspull/loads/2015-11-04/02ffc306e877_my_load_1446640931.json
> > > > > > > > >
> > > > > > > > > Record  2402
> > > > > > > > >
> > > > > > > > > Fragment 1:5
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Abdelhakim Deneche
> > > > > >
> > > > > > Software Engineer
> > > > > >
> > > > > >   <http://www.mapr.com/>
> > > > > >
> > > > > >
> > > > > > Now Available - Free Hadoop On-Demand Training
> > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Abdelhakim Deneche
> > > >
> > > > Software Engineer
> > > >
> > > >   <http://www.mapr.com/>
> > > >
> > > >
> > > > Now Available - Free Hadoop On-Demand Training
> > > > <
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>

Re: Help with Troubleshooting dense error message

Posted by John Omernik <jo...@omernik.com>.
I am on MapR's 1.2.1 Package.




On Wed, Nov 4, 2015 at 2:14 PM, Abdel Hakim Deneche <ad...@maprtech.com>
wrote:

> One last thing, what version of Drill do you have installed ?
>
> On Wed, Nov 4, 2015 at 11:04 AM, John Omernik <jo...@omernik.com> wrote:
>
> > No I don't think so.  I am running Drill in Marathon on Mesos, so my
> > startup settings are all very static. In addition, the only session
> > variable I was changed was the json as text option at the session level
> and
> > I was setting it on both the pre drillbit reboot and the post drillbit
> > reboot sessions (I need that to query the data).
> >
> > On Wed, Nov 4, 2015 at 12:46 PM, Abdel Hakim Deneche <
> > adeneche@maprtech.com>
> > wrote:
> >
> > > This is strange indeed. The error message you reported earlier doesn't
> > > suggest a memory leak issue but rather a bug when reading a specific
> set
> > of
> > > data.
> > > Could it be that you changed some session options, and you forgot to
> set
> > > them again after you restarted the drillbits ?
> > >
> > > Thanks
> > >
> > > On Wed, Nov 4, 2015 at 10:37 AM, John Omernik <jo...@omernik.com>
> wrote:
> > >
> > > > So I pulled the (I was up to two) files that seemed to be causing
> this
> > > > issue out, and loaded my data.  (see my other posts on how I did that
> > > with
> > > > loading into a folder prefixed by .)
> > > >
> > > > Anywho, my Drill cluster became unstable in general, and I was not
> able
> > > to
> > > > run any queries until I bounced by drill bits.
> > > >
> > > > I did that, got my process working again, and went to go try
> > > > troubleshooting this problem again and everything appears to be
> working
> > > > well now.  I am stumped.   Could a memory leak have caused that error
> > > only
> > > > on some files?  I am monitoring now to determine if the problem
> starts
> > > > again, but that is REALLY strange to me. This seems out of character
> > for
> > > > Drill, both in my use of it, and in how it handles memory has been
> > > > explained to me.  If I get the error again, I'll ensure I set that to
> > > get a
> > > > full stack trace.
> > > >
> > > > John
> > > >
> > > > On Wed, Nov 4, 2015 at 12:13 PM, Abdel Hakim Deneche <
> > > > adeneche@maprtech.com>
> > > > wrote:
> > > >
> > > > > The error message "index: 9604, length: 4 (expected: range(0,
> 8192))"
> > > > > suggests an error happened when Drill tried to access a memory
> buffer
> > > > (most
> > > > > likely while writing an int or float value)
> > > > > This may be a bug actually exposed by that particular data record.
> > > > >
> > > > > You can try enabling verbose error logging before running the query
> > > > again:
> > > > >
> > > > > set `exec.errors.verbose`=true;
> > > > >
> > > > > This should give us a nice stack trace about this error.
> > > > >
> > > > > Thanks
> > > > >
> > > > > On Wed, Nov 4, 2015 at 7:29 AM, John Omernik <jo...@omernik.com>
> > wrote:
> > > > >
> > > > > > There are multiple fields in that record, including two lists.
> Both
> > > > lists
> > > > > > have data in them (now I am runnning with json text mode because
> at
> > > > times
> > > > > > the first value is a JSON null, but in these cases, that should
> be
> > > > turned
> > > > > > to "null" as  string.  (If I am understanding things correctly)
> and
> > > > > > shouldn't be causing a problem.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Nov 4, 2015 at 9:21 AM, Hsuan Yi Chu <
> hyichu@maprtech.com>
> > > > > wrote:
> > > > > >
> > > > > > > What is the data type for that record in line 2402? A list?
> > > > > > >
> > > > > > > Do you think it could be similar to this issue ?
> > > > > > >
> > > > > > > https://issues.apache.org/jira/browse/DRILL-4006
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Nov 4, 2015 at 6:48 AM, John Omernik <john@omernik.com
> >
> > > > wrote:
> > > > > > >
> > > > > > > > Hey all,
> > > > > > > >
> > > > > > > > I am working with JSON that is on the whole fairly clean.  I
> am
> > > > > trying
> > > > > > to
> > > > > > > > load into Parquet files, and the previous days worth of data
> > > worked
> > > > > > just
> > > > > > > > fine, but todays data has something wrong with it and I Can't
> > > > figure
> > > > > > out
> > > > > > > > what it is. Unfortunately, I can't post the data, which I
> know
> > > > makes
> > > > > > this
> > > > > > > > hard to troubleshoot for the community. Hopefully I can
> provide
> > > > some
> > > > > > info
> > > > > > > > here, and get some pointers on where to look, and then report
> > > back
> > > > on
> > > > > > how
> > > > > > > > we could potentially improve the error messages.
> > > > > > > >
> > > > > > > > The error is below.
> > > > > > > >
> > > > > > > >
> > > > > > > > I am looking to figure out given the information reported
> where
> > > I'd
> > > > > > look
> > > > > > > to
> > > > > > > > trouble shoot this. Obviously the file
> > > > > > > 02ffc306e877_my_load_1446640931.json
> > > > > > > > is where I am looking to start
> > > > > > > >
> > > > > > > > This file has 3000 lines (records of data, so it's somewhere
> in
> > > > > > between.
> > > > > > > >
> > > > > > > > The index/length/expected range don't mean anything to me I
> > could
> > > > use
> > > > > > > some
> > > > > > > > help there, because I am not even sure what I am looking for.
> > > > > > > >
> > > > > > > > The record and/or Fragment... do those help me dig in?
> > > > > > > >
> > > > > > > > Since this is one record per line, I went to line 2402 but
> that
> > > > > record
> > > > > > > > looks completely normal to me, (like all the other ones) but
> > > since
> > > > > this
> > > > > > > is
> > > > > > > > dense text, I am obviously missing something, but is the
> record
> > > the
> > > > > > line
> > > > > > > > number?
> > > > > > > >
> > > > > > > > Any other pointers I can use to trouble shoot this?
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > >
> > > > > > > > Error:
> > > > > > > >
> > > > > > > >
> > > > > > > > Caused by:
> > > org.apache.drill.common.exceptions.UserRemoteException:
> > > > > > > > DATA_READ ERROR: Error parsing JSON - index: 9604, length: 4
> > > > > (expected:
> > > > > > > > range(0, 8192))
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > File
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> /etl/dev/my-metadata/mysqspull/loads/2015-11-04/02ffc306e877_my_load_1446640931.json
> > > > > > > >
> > > > > > > > Record  2402
> > > > > > > >
> > > > > > > > Fragment 1:5
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Abdelhakim Deneche
> > > > >
> > > > > Software Engineer
> > > > >
> > > > >   <http://www.mapr.com/>
> > > > >
> > > > >
> > > > > Now Available - Free Hadoop On-Demand Training
> > > > > <
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Abdelhakim Deneche
> > >
> > > Software Engineer
> > >
> > >   <http://www.mapr.com/>
> > >
> > >
> > > Now Available - Free Hadoop On-Demand Training
> > > <
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > >
> > >
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

Re: Help with Troubleshooting dense error message

Posted by Abdel Hakim Deneche <ad...@maprtech.com>.
One last thing, what version of Drill do you have installed ?

On Wed, Nov 4, 2015 at 11:04 AM, John Omernik <jo...@omernik.com> wrote:

> No I don't think so.  I am running Drill in Marathon on Mesos, so my
> startup settings are all very static. In addition, the only session
> variable I was changed was the json as text option at the session level and
> I was setting it on both the pre drillbit reboot and the post drillbit
> reboot sessions (I need that to query the data).
>
> On Wed, Nov 4, 2015 at 12:46 PM, Abdel Hakim Deneche <
> adeneche@maprtech.com>
> wrote:
>
> > This is strange indeed. The error message you reported earlier doesn't
> > suggest a memory leak issue but rather a bug when reading a specific set
> of
> > data.
> > Could it be that you changed some session options, and you forgot to set
> > them again after you restarted the drillbits ?
> >
> > Thanks
> >
> > On Wed, Nov 4, 2015 at 10:37 AM, John Omernik <jo...@omernik.com> wrote:
> >
> > > So I pulled the (I was up to two) files that seemed to be causing this
> > > issue out, and loaded my data.  (see my other posts on how I did that
> > with
> > > loading into a folder prefixed by .)
> > >
> > > Anywho, my Drill cluster became unstable in general, and I was not able
> > to
> > > run any queries until I bounced by drill bits.
> > >
> > > I did that, got my process working again, and went to go try
> > > troubleshooting this problem again and everything appears to be working
> > > well now.  I am stumped.   Could a memory leak have caused that error
> > only
> > > on some files?  I am monitoring now to determine if the problem starts
> > > again, but that is REALLY strange to me. This seems out of character
> for
> > > Drill, both in my use of it, and in how it handles memory has been
> > > explained to me.  If I get the error again, I'll ensure I set that to
> > get a
> > > full stack trace.
> > >
> > > John
> > >
> > > On Wed, Nov 4, 2015 at 12:13 PM, Abdel Hakim Deneche <
> > > adeneche@maprtech.com>
> > > wrote:
> > >
> > > > The error message "index: 9604, length: 4 (expected: range(0, 8192))"
> > > > suggests an error happened when Drill tried to access a memory buffer
> > > (most
> > > > likely while writing an int or float value)
> > > > This may be a bug actually exposed by that particular data record.
> > > >
> > > > You can try enabling verbose error logging before running the query
> > > again:
> > > >
> > > > set `exec.errors.verbose`=true;
> > > >
> > > > This should give us a nice stack trace about this error.
> > > >
> > > > Thanks
> > > >
> > > > On Wed, Nov 4, 2015 at 7:29 AM, John Omernik <jo...@omernik.com>
> wrote:
> > > >
> > > > > There are multiple fields in that record, including two lists. Both
> > > lists
> > > > > have data in them (now I am runnning with json text mode because at
> > > times
> > > > > the first value is a JSON null, but in these cases, that should be
> > > turned
> > > > > to "null" as  string.  (If I am understanding things correctly) and
> > > > > shouldn't be causing a problem.
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Nov 4, 2015 at 9:21 AM, Hsuan Yi Chu <hy...@maprtech.com>
> > > > wrote:
> > > > >
> > > > > > What is the data type for that record in line 2402? A list?
> > > > > >
> > > > > > Do you think it could be similar to this issue ?
> > > > > >
> > > > > > https://issues.apache.org/jira/browse/DRILL-4006
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Nov 4, 2015 at 6:48 AM, John Omernik <jo...@omernik.com>
> > > wrote:
> > > > > >
> > > > > > > Hey all,
> > > > > > >
> > > > > > > I am working with JSON that is on the whole fairly clean.  I am
> > > > trying
> > > > > to
> > > > > > > load into Parquet files, and the previous days worth of data
> > worked
> > > > > just
> > > > > > > fine, but todays data has something wrong with it and I Can't
> > > figure
> > > > > out
> > > > > > > what it is. Unfortunately, I can't post the data, which I know
> > > makes
> > > > > this
> > > > > > > hard to troubleshoot for the community. Hopefully I can provide
> > > some
> > > > > info
> > > > > > > here, and get some pointers on where to look, and then report
> > back
> > > on
> > > > > how
> > > > > > > we could potentially improve the error messages.
> > > > > > >
> > > > > > > The error is below.
> > > > > > >
> > > > > > >
> > > > > > > I am looking to figure out given the information reported where
> > I'd
> > > > > look
> > > > > > to
> > > > > > > trouble shoot this. Obviously the file
> > > > > > 02ffc306e877_my_load_1446640931.json
> > > > > > > is where I am looking to start
> > > > > > >
> > > > > > > This file has 3000 lines (records of data, so it's somewhere in
> > > > > between.
> > > > > > >
> > > > > > > The index/length/expected range don't mean anything to me I
> could
> > > use
> > > > > > some
> > > > > > > help there, because I am not even sure what I am looking for.
> > > > > > >
> > > > > > > The record and/or Fragment... do those help me dig in?
> > > > > > >
> > > > > > > Since this is one record per line, I went to line 2402 but that
> > > > record
> > > > > > > looks completely normal to me, (like all the other ones) but
> > since
> > > > this
> > > > > > is
> > > > > > > dense text, I am obviously missing something, but is the record
> > the
> > > > > line
> > > > > > > number?
> > > > > > >
> > > > > > > Any other pointers I can use to trouble shoot this?
> > > > > > >
> > > > > > > Thanks!
> > > > > > >
> > > > > > > Error:
> > > > > > >
> > > > > > >
> > > > > > > Caused by:
> > org.apache.drill.common.exceptions.UserRemoteException:
> > > > > > > DATA_READ ERROR: Error parsing JSON - index: 9604, length: 4
> > > > (expected:
> > > > > > > range(0, 8192))
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > File
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> /etl/dev/my-metadata/mysqspull/loads/2015-11-04/02ffc306e877_my_load_1446640931.json
> > > > > > >
> > > > > > > Record  2402
> > > > > > >
> > > > > > > Fragment 1:5
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Abdelhakim Deneche
> > > >
> > > > Software Engineer
> > > >
> > > >   <http://www.mapr.com/>
> > > >
> > > >
> > > > Now Available - Free Hadoop On-Demand Training
> > > > <
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: Help with Troubleshooting dense error message

Posted by John Omernik <jo...@omernik.com>.
So I started having the issue again. and was able to capture the verbose
stack dump. But in trouble shooting it, I found I have more questions then
before.


Basically, I had another offending
file... /etl/dev/my-metadata/mysqspull/loads/2015-11-04/3d0b961086b7_my_load_1446660723.json


When I removed that file from directory I am doing my load in (2015-11-04)
the load worked fine.

I was like WOO this is a file issue.

Then I put it into a new directory named bad

and did the same CREATE TABLE but to a different destination and from the
dir0 = 'bad'

And it too worked with out issue!

This confused me because this indicates to me that it's not something about
the data that's messing up Drill, its maybe how much data? Or how many
files I am looking at?


Any thought on the stacktrace or anything you see would be helpful.  I am
stumped.


Error with Verbose Stack dump

> ALTER SESSION set `store.json.all_text_mode` = true;
+-------+------------------------------------+
|  ok   |              summary               |
+-------+------------------------------------+
| true  | store.json.all_text_mode updated.  |
+-------+------------------------------------+
1 row selected (0.201 seconds)


> CREATE TABLE dfs.dev.`my-metadata/.2015-11-04` as
. . . . . . . . . . . . . . . . . . . . . . .> (
. . . . . . . . . . . . . . . . . . . . . . .> select
. . . . . . . . . . . . . . . . . . . . . . .> cast(total as int) as total,
sha1, sha256, myhash, cast(`timestamp` as bigint) as `timestamp`, tags,
. . . . . . . . . . . . . . . . . . . . . . .> link, cast(positives as int)
as positives, cast(positives_delta as int) as positives_delta, ssdeep,
cast(size as int) as size,  type, report,
. . . . . . . . . . . . . . . . . . . . . . .> cast(first_seen as
timestamp) as `first_seen`, md5, cast(last_seen as timestamp) as
`last_seen`,
. . . . . . . . . . . . . . . . . . . . . . .> name, source_country,
source_id
. . . . . . . . . . . . . . . . . . . . . . .> from
dfs.etldev.`my-metadata/mysqspull/loads/` where dir0 = '2015-11-04'
. . . . . . . . . . . . . . . . . . . . . . .> );
Error: DATA_READ ERROR: Error parsing JSON - index: 16320, length: 4
(expected: range(0, 4096))

File
 /etl/dev/my-metadata/mysqspull/loads/2015-11-04/3d0b961086b7_my_load_1446660723.json
Record  4081
Fragment 1:10

[Error Id: cf211a0a-2a1d-4860-bc1a-d9ff2657f973 on node2:31010]

  (java.lang.IndexOutOfBoundsException) index: 16320, length: 4 (expected:
range(0, 4096))
    io.netty.buffer.DrillBuf.checkIndexD():189
    io.netty.buffer.DrillBuf.chk():211
    io.netty.buffer.DrillBuf.getInt():491
    org.apache.drill.exec.vector.UInt4Vector$Accessor.get():364

org.apache.drill.exec.vector.complex.BaseRepeatedValueVector$BaseRepeatedMutator.startNewValue():237

org.apache.drill.exec.vector.complex.impl.RepeatedVarCharWriterImpl.setPosition():157
    org.apache.drill.exec.vector.complex.impl.SingleListWriter.varChar():768
    org.apache.drill.exec.vector.complex.fn.JsonReader.handleString():463

org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataAllText():554

org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataAllText():389

org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataAllText():393
    org.apache.drill.exec.vector.complex.fn.JsonReader.writeDataSwitch():239
    org.apache.drill.exec.vector.complex.fn.JsonReader.writeToVector():179
    org.apache.drill.exec.vector.complex.fn.JsonReader.write():145
    org.apache.drill.exec.store.easy.json.JSONRecordReader.next():181
    org.apache.drill.exec.physical.impl.ScanBatch.next():183
    org.apache.drill.exec.record.AbstractRecordBatch.next():104
    org.apache.drill.exec.record.AbstractRecordBatch.next():94
    org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
    org.apache.drill.exec.record.AbstractRecordBatch.next():147
    org.apache.drill.exec.record.AbstractRecordBatch.next():104
    org.apache.drill.exec.record.AbstractRecordBatch.next():94
    org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
    org.apache.drill.exec.record.AbstractRecordBatch.next():147
    org.apache.drill.exec.record.AbstractRecordBatch.next():104
    org.apache.drill.exec.record.AbstractRecordBatch.next():94
    org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():91
    org.apache.drill.exec.record.AbstractRecordBatch.next():147
    org.apache.drill.exec.physical.impl.BaseRootExec.next():83

org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93
    org.apache.drill.exec.physical.impl.BaseRootExec.next():73
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():258
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():252
    java.security.AccessController.doPrivileged():-2
    javax.security.auth.Subject.doAs():422
    org.apache.hadoop.security.UserGroupInformation.doAs():1566
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():252
    org.apache.drill.common.SelfCleaningRunnable.run():38
    java.util.concurrent.ThreadPoolExecutor.runWorker():1142
    java.util.concurrent.ThreadPoolExecutor$Worker.run():617
    java.lang.Thread.run():745 (state=,code=0)


On Wed, Nov 4, 2015 at 1:04 PM, John Omernik <jo...@omernik.com> wrote:

> No I don't think so.  I am running Drill in Marathon on Mesos, so my
> startup settings are all very static. In addition, the only session
> variable I was changed was the json as text option at the session level and
> I was setting it on both the pre drillbit reboot and the post drillbit
> reboot sessions (I need that to query the data).
>
> On Wed, Nov 4, 2015 at 12:46 PM, Abdel Hakim Deneche <
> adeneche@maprtech.com> wrote:
>
>> This is strange indeed. The error message you reported earlier doesn't
>> suggest a memory leak issue but rather a bug when reading a specific set
>> of
>> data.
>> Could it be that you changed some session options, and you forgot to set
>> them again after you restarted the drillbits ?
>>
>> Thanks
>>
>> On Wed, Nov 4, 2015 at 10:37 AM, John Omernik <jo...@omernik.com> wrote:
>>
>> > So I pulled the (I was up to two) files that seemed to be causing this
>> > issue out, and loaded my data.  (see my other posts on how I did that
>> with
>> > loading into a folder prefixed by .)
>> >
>> > Anywho, my Drill cluster became unstable in general, and I was not able
>> to
>> > run any queries until I bounced by drill bits.
>> >
>> > I did that, got my process working again, and went to go try
>> > troubleshooting this problem again and everything appears to be working
>> > well now.  I am stumped.   Could a memory leak have caused that error
>> only
>> > on some files?  I am monitoring now to determine if the problem starts
>> > again, but that is REALLY strange to me. This seems out of character for
>> > Drill, both in my use of it, and in how it handles memory has been
>> > explained to me.  If I get the error again, I'll ensure I set that to
>> get a
>> > full stack trace.
>> >
>> > John
>> >
>> > On Wed, Nov 4, 2015 at 12:13 PM, Abdel Hakim Deneche <
>> > adeneche@maprtech.com>
>> > wrote:
>> >
>> > > The error message "index: 9604, length: 4 (expected: range(0, 8192))"
>> > > suggests an error happened when Drill tried to access a memory buffer
>> > (most
>> > > likely while writing an int or float value)
>> > > This may be a bug actually exposed by that particular data record.
>> > >
>> > > You can try enabling verbose error logging before running the query
>> > again:
>> > >
>> > > set `exec.errors.verbose`=true;
>> > >
>> > > This should give us a nice stack trace about this error.
>> > >
>> > > Thanks
>> > >
>> > > On Wed, Nov 4, 2015 at 7:29 AM, John Omernik <jo...@omernik.com>
>> wrote:
>> > >
>> > > > There are multiple fields in that record, including two lists. Both
>> > lists
>> > > > have data in them (now I am runnning with json text mode because at
>> > times
>> > > > the first value is a JSON null, but in these cases, that should be
>> > turned
>> > > > to "null" as  string.  (If I am understanding things correctly) and
>> > > > shouldn't be causing a problem.
>> > > >
>> > > >
>> > > >
>> > > > On Wed, Nov 4, 2015 at 9:21 AM, Hsuan Yi Chu <hy...@maprtech.com>
>> > > wrote:
>> > > >
>> > > > > What is the data type for that record in line 2402? A list?
>> > > > >
>> > > > > Do you think it could be similar to this issue ?
>> > > > >
>> > > > > https://issues.apache.org/jira/browse/DRILL-4006
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Wed, Nov 4, 2015 at 6:48 AM, John Omernik <jo...@omernik.com>
>> > wrote:
>> > > > >
>> > > > > > Hey all,
>> > > > > >
>> > > > > > I am working with JSON that is on the whole fairly clean.  I am
>> > > trying
>> > > > to
>> > > > > > load into Parquet files, and the previous days worth of data
>> worked
>> > > > just
>> > > > > > fine, but todays data has something wrong with it and I Can't
>> > figure
>> > > > out
>> > > > > > what it is. Unfortunately, I can't post the data, which I know
>> > makes
>> > > > this
>> > > > > > hard to troubleshoot for the community. Hopefully I can provide
>> > some
>> > > > info
>> > > > > > here, and get some pointers on where to look, and then report
>> back
>> > on
>> > > > how
>> > > > > > we could potentially improve the error messages.
>> > > > > >
>> > > > > > The error is below.
>> > > > > >
>> > > > > >
>> > > > > > I am looking to figure out given the information reported where
>> I'd
>> > > > look
>> > > > > to
>> > > > > > trouble shoot this. Obviously the file
>> > > > > 02ffc306e877_my_load_1446640931.json
>> > > > > > is where I am looking to start
>> > > > > >
>> > > > > > This file has 3000 lines (records of data, so it's somewhere in
>> > > > between.
>> > > > > >
>> > > > > > The index/length/expected range don't mean anything to me I
>> could
>> > use
>> > > > > some
>> > > > > > help there, because I am not even sure what I am looking for.
>> > > > > >
>> > > > > > The record and/or Fragment... do those help me dig in?
>> > > > > >
>> > > > > > Since this is one record per line, I went to line 2402 but that
>> > > record
>> > > > > > looks completely normal to me, (like all the other ones) but
>> since
>> > > this
>> > > > > is
>> > > > > > dense text, I am obviously missing something, but is the record
>> the
>> > > > line
>> > > > > > number?
>> > > > > >
>> > > > > > Any other pointers I can use to trouble shoot this?
>> > > > > >
>> > > > > > Thanks!
>> > > > > >
>> > > > > > Error:
>> > > > > >
>> > > > > >
>> > > > > > Caused by:
>> org.apache.drill.common.exceptions.UserRemoteException:
>> > > > > > DATA_READ ERROR: Error parsing JSON - index: 9604, length: 4
>> > > (expected:
>> > > > > > range(0, 8192))
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > File
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> /etl/dev/my-metadata/mysqspull/loads/2015-11-04/02ffc306e877_my_load_1446640931.json
>> > > > > >
>> > > > > > Record  2402
>> > > > > >
>> > > > > > Fragment 1:5
>> > > > > >
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > >
>> > > Abdelhakim Deneche
>> > >
>> > > Software Engineer
>> > >
>> > >   <http://www.mapr.com/>
>> > >
>> > >
>> > > Now Available - Free Hadoop On-Demand Training
>> > > <
>> > >
>> >
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> > > >
>> > >
>> >
>>
>>
>>
>> --
>>
>> Abdelhakim Deneche
>>
>> Software Engineer
>>
>>   <http://www.mapr.com/>
>>
>>
>> Now Available - Free Hadoop On-Demand Training
>> <
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> >
>>
>
>

Re: Help with Troubleshooting dense error message

Posted by John Omernik <jo...@omernik.com>.
No I don't think so.  I am running Drill in Marathon on Mesos, so my
startup settings are all very static. In addition, the only session
variable I was changed was the json as text option at the session level and
I was setting it on both the pre drillbit reboot and the post drillbit
reboot sessions (I need that to query the data).

On Wed, Nov 4, 2015 at 12:46 PM, Abdel Hakim Deneche <ad...@maprtech.com>
wrote:

> This is strange indeed. The error message you reported earlier doesn't
> suggest a memory leak issue but rather a bug when reading a specific set of
> data.
> Could it be that you changed some session options, and you forgot to set
> them again after you restarted the drillbits ?
>
> Thanks
>
> On Wed, Nov 4, 2015 at 10:37 AM, John Omernik <jo...@omernik.com> wrote:
>
> > So I pulled the (I was up to two) files that seemed to be causing this
> > issue out, and loaded my data.  (see my other posts on how I did that
> with
> > loading into a folder prefixed by .)
> >
> > Anywho, my Drill cluster became unstable in general, and I was not able
> to
> > run any queries until I bounced by drill bits.
> >
> > I did that, got my process working again, and went to go try
> > troubleshooting this problem again and everything appears to be working
> > well now.  I am stumped.   Could a memory leak have caused that error
> only
> > on some files?  I am monitoring now to determine if the problem starts
> > again, but that is REALLY strange to me. This seems out of character for
> > Drill, both in my use of it, and in how it handles memory has been
> > explained to me.  If I get the error again, I'll ensure I set that to
> get a
> > full stack trace.
> >
> > John
> >
> > On Wed, Nov 4, 2015 at 12:13 PM, Abdel Hakim Deneche <
> > adeneche@maprtech.com>
> > wrote:
> >
> > > The error message "index: 9604, length: 4 (expected: range(0, 8192))"
> > > suggests an error happened when Drill tried to access a memory buffer
> > (most
> > > likely while writing an int or float value)
> > > This may be a bug actually exposed by that particular data record.
> > >
> > > You can try enabling verbose error logging before running the query
> > again:
> > >
> > > set `exec.errors.verbose`=true;
> > >
> > > This should give us a nice stack trace about this error.
> > >
> > > Thanks
> > >
> > > On Wed, Nov 4, 2015 at 7:29 AM, John Omernik <jo...@omernik.com> wrote:
> > >
> > > > There are multiple fields in that record, including two lists. Both
> > lists
> > > > have data in them (now I am runnning with json text mode because at
> > times
> > > > the first value is a JSON null, but in these cases, that should be
> > turned
> > > > to "null" as  string.  (If I am understanding things correctly) and
> > > > shouldn't be causing a problem.
> > > >
> > > >
> > > >
> > > > On Wed, Nov 4, 2015 at 9:21 AM, Hsuan Yi Chu <hy...@maprtech.com>
> > > wrote:
> > > >
> > > > > What is the data type for that record in line 2402? A list?
> > > > >
> > > > > Do you think it could be similar to this issue ?
> > > > >
> > > > > https://issues.apache.org/jira/browse/DRILL-4006
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Nov 4, 2015 at 6:48 AM, John Omernik <jo...@omernik.com>
> > wrote:
> > > > >
> > > > > > Hey all,
> > > > > >
> > > > > > I am working with JSON that is on the whole fairly clean.  I am
> > > trying
> > > > to
> > > > > > load into Parquet files, and the previous days worth of data
> worked
> > > > just
> > > > > > fine, but todays data has something wrong with it and I Can't
> > figure
> > > > out
> > > > > > what it is. Unfortunately, I can't post the data, which I know
> > makes
> > > > this
> > > > > > hard to troubleshoot for the community. Hopefully I can provide
> > some
> > > > info
> > > > > > here, and get some pointers on where to look, and then report
> back
> > on
> > > > how
> > > > > > we could potentially improve the error messages.
> > > > > >
> > > > > > The error is below.
> > > > > >
> > > > > >
> > > > > > I am looking to figure out given the information reported where
> I'd
> > > > look
> > > > > to
> > > > > > trouble shoot this. Obviously the file
> > > > > 02ffc306e877_my_load_1446640931.json
> > > > > > is where I am looking to start
> > > > > >
> > > > > > This file has 3000 lines (records of data, so it's somewhere in
> > > > between.
> > > > > >
> > > > > > The index/length/expected range don't mean anything to me I could
> > use
> > > > > some
> > > > > > help there, because I am not even sure what I am looking for.
> > > > > >
> > > > > > The record and/or Fragment... do those help me dig in?
> > > > > >
> > > > > > Since this is one record per line, I went to line 2402 but that
> > > record
> > > > > > looks completely normal to me, (like all the other ones) but
> since
> > > this
> > > > > is
> > > > > > dense text, I am obviously missing something, but is the record
> the
> > > > line
> > > > > > number?
> > > > > >
> > > > > > Any other pointers I can use to trouble shoot this?
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > Error:
> > > > > >
> > > > > >
> > > > > > Caused by:
> org.apache.drill.common.exceptions.UserRemoteException:
> > > > > > DATA_READ ERROR: Error parsing JSON - index: 9604, length: 4
> > > (expected:
> > > > > > range(0, 8192))
> > > > > >
> > > > > >
> > > > > >
> > > > > > File
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> /etl/dev/my-metadata/mysqspull/loads/2015-11-04/02ffc306e877_my_load_1446640931.json
> > > > > >
> > > > > > Record  2402
> > > > > >
> > > > > > Fragment 1:5
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Abdelhakim Deneche
> > >
> > > Software Engineer
> > >
> > >   <http://www.mapr.com/>
> > >
> > >
> > > Now Available - Free Hadoop On-Demand Training
> > > <
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > >
> > >
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

Re: Help with Troubleshooting dense error message

Posted by Abdel Hakim Deneche <ad...@maprtech.com>.
This is strange indeed. The error message you reported earlier doesn't
suggest a memory leak issue but rather a bug when reading a specific set of
data.
Could it be that you changed some session options, and you forgot to set
them again after you restarted the drillbits ?

Thanks

On Wed, Nov 4, 2015 at 10:37 AM, John Omernik <jo...@omernik.com> wrote:

> So I pulled the (I was up to two) files that seemed to be causing this
> issue out, and loaded my data.  (see my other posts on how I did that with
> loading into a folder prefixed by .)
>
> Anywho, my Drill cluster became unstable in general, and I was not able to
> run any queries until I bounced by drill bits.
>
> I did that, got my process working again, and went to go try
> troubleshooting this problem again and everything appears to be working
> well now.  I am stumped.   Could a memory leak have caused that error only
> on some files?  I am monitoring now to determine if the problem starts
> again, but that is REALLY strange to me. This seems out of character for
> Drill, both in my use of it, and in how it handles memory has been
> explained to me.  If I get the error again, I'll ensure I set that to get a
> full stack trace.
>
> John
>
> On Wed, Nov 4, 2015 at 12:13 PM, Abdel Hakim Deneche <
> adeneche@maprtech.com>
> wrote:
>
> > The error message "index: 9604, length: 4 (expected: range(0, 8192))"
> > suggests an error happened when Drill tried to access a memory buffer
> (most
> > likely while writing an int or float value)
> > This may be a bug actually exposed by that particular data record.
> >
> > You can try enabling verbose error logging before running the query
> again:
> >
> > set `exec.errors.verbose`=true;
> >
> > This should give us a nice stack trace about this error.
> >
> > Thanks
> >
> > On Wed, Nov 4, 2015 at 7:29 AM, John Omernik <jo...@omernik.com> wrote:
> >
> > > There are multiple fields in that record, including two lists. Both
> lists
> > > have data in them (now I am runnning with json text mode because at
> times
> > > the first value is a JSON null, but in these cases, that should be
> turned
> > > to "null" as  string.  (If I am understanding things correctly) and
> > > shouldn't be causing a problem.
> > >
> > >
> > >
> > > On Wed, Nov 4, 2015 at 9:21 AM, Hsuan Yi Chu <hy...@maprtech.com>
> > wrote:
> > >
> > > > What is the data type for that record in line 2402? A list?
> > > >
> > > > Do you think it could be similar to this issue ?
> > > >
> > > > https://issues.apache.org/jira/browse/DRILL-4006
> > > >
> > > >
> > > >
> > > > On Wed, Nov 4, 2015 at 6:48 AM, John Omernik <jo...@omernik.com>
> wrote:
> > > >
> > > > > Hey all,
> > > > >
> > > > > I am working with JSON that is on the whole fairly clean.  I am
> > trying
> > > to
> > > > > load into Parquet files, and the previous days worth of data worked
> > > just
> > > > > fine, but todays data has something wrong with it and I Can't
> figure
> > > out
> > > > > what it is. Unfortunately, I can't post the data, which I know
> makes
> > > this
> > > > > hard to troubleshoot for the community. Hopefully I can provide
> some
> > > info
> > > > > here, and get some pointers on where to look, and then report back
> on
> > > how
> > > > > we could potentially improve the error messages.
> > > > >
> > > > > The error is below.
> > > > >
> > > > >
> > > > > I am looking to figure out given the information reported where I'd
> > > look
> > > > to
> > > > > trouble shoot this. Obviously the file
> > > > 02ffc306e877_my_load_1446640931.json
> > > > > is where I am looking to start
> > > > >
> > > > > This file has 3000 lines (records of data, so it's somewhere in
> > > between.
> > > > >
> > > > > The index/length/expected range don't mean anything to me I could
> use
> > > > some
> > > > > help there, because I am not even sure what I am looking for.
> > > > >
> > > > > The record and/or Fragment... do those help me dig in?
> > > > >
> > > > > Since this is one record per line, I went to line 2402 but that
> > record
> > > > > looks completely normal to me, (like all the other ones) but since
> > this
> > > > is
> > > > > dense text, I am obviously missing something, but is the record the
> > > line
> > > > > number?
> > > > >
> > > > > Any other pointers I can use to trouble shoot this?
> > > > >
> > > > > Thanks!
> > > > >
> > > > > Error:
> > > > >
> > > > >
> > > > > Caused by: org.apache.drill.common.exceptions.UserRemoteException:
> > > > > DATA_READ ERROR: Error parsing JSON - index: 9604, length: 4
> > (expected:
> > > > > range(0, 8192))
> > > > >
> > > > >
> > > > >
> > > > > File
> > > > >
> > > > >
> > > >
> > >
> >
> /etl/dev/my-metadata/mysqspull/loads/2015-11-04/02ffc306e877_my_load_1446640931.json
> > > > >
> > > > > Record  2402
> > > > >
> > > > > Fragment 1:5
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: Help with Troubleshooting dense error message

Posted by John Omernik <jo...@omernik.com>.
So I pulled the (I was up to two) files that seemed to be causing this
issue out, and loaded my data.  (see my other posts on how I did that with
loading into a folder prefixed by .)

Anywho, my Drill cluster became unstable in general, and I was not able to
run any queries until I bounced by drill bits.

I did that, got my process working again, and went to go try
troubleshooting this problem again and everything appears to be working
well now.  I am stumped.   Could a memory leak have caused that error only
on some files?  I am monitoring now to determine if the problem starts
again, but that is REALLY strange to me. This seems out of character for
Drill, both in my use of it, and in how it handles memory has been
explained to me.  If I get the error again, I'll ensure I set that to get a
full stack trace.

John

On Wed, Nov 4, 2015 at 12:13 PM, Abdel Hakim Deneche <ad...@maprtech.com>
wrote:

> The error message "index: 9604, length: 4 (expected: range(0, 8192))"
> suggests an error happened when Drill tried to access a memory buffer (most
> likely while writing an int or float value)
> This may be a bug actually exposed by that particular data record.
>
> You can try enabling verbose error logging before running the query again:
>
> set `exec.errors.verbose`=true;
>
> This should give us a nice stack trace about this error.
>
> Thanks
>
> On Wed, Nov 4, 2015 at 7:29 AM, John Omernik <jo...@omernik.com> wrote:
>
> > There are multiple fields in that record, including two lists. Both lists
> > have data in them (now I am runnning with json text mode because at times
> > the first value is a JSON null, but in these cases, that should be turned
> > to "null" as  string.  (If I am understanding things correctly) and
> > shouldn't be causing a problem.
> >
> >
> >
> > On Wed, Nov 4, 2015 at 9:21 AM, Hsuan Yi Chu <hy...@maprtech.com>
> wrote:
> >
> > > What is the data type for that record in line 2402? A list?
> > >
> > > Do you think it could be similar to this issue ?
> > >
> > > https://issues.apache.org/jira/browse/DRILL-4006
> > >
> > >
> > >
> > > On Wed, Nov 4, 2015 at 6:48 AM, John Omernik <jo...@omernik.com> wrote:
> > >
> > > > Hey all,
> > > >
> > > > I am working with JSON that is on the whole fairly clean.  I am
> trying
> > to
> > > > load into Parquet files, and the previous days worth of data worked
> > just
> > > > fine, but todays data has something wrong with it and I Can't figure
> > out
> > > > what it is. Unfortunately, I can't post the data, which I know makes
> > this
> > > > hard to troubleshoot for the community. Hopefully I can provide some
> > info
> > > > here, and get some pointers on where to look, and then report back on
> > how
> > > > we could potentially improve the error messages.
> > > >
> > > > The error is below.
> > > >
> > > >
> > > > I am looking to figure out given the information reported where I'd
> > look
> > > to
> > > > trouble shoot this. Obviously the file
> > > 02ffc306e877_my_load_1446640931.json
> > > > is where I am looking to start
> > > >
> > > > This file has 3000 lines (records of data, so it's somewhere in
> > between.
> > > >
> > > > The index/length/expected range don't mean anything to me I could use
> > > some
> > > > help there, because I am not even sure what I am looking for.
> > > >
> > > > The record and/or Fragment... do those help me dig in?
> > > >
> > > > Since this is one record per line, I went to line 2402 but that
> record
> > > > looks completely normal to me, (like all the other ones) but since
> this
> > > is
> > > > dense text, I am obviously missing something, but is the record the
> > line
> > > > number?
> > > >
> > > > Any other pointers I can use to trouble shoot this?
> > > >
> > > > Thanks!
> > > >
> > > > Error:
> > > >
> > > >
> > > > Caused by: org.apache.drill.common.exceptions.UserRemoteException:
> > > > DATA_READ ERROR: Error parsing JSON - index: 9604, length: 4
> (expected:
> > > > range(0, 8192))
> > > >
> > > >
> > > >
> > > > File
> > > >
> > > >
> > >
> >
> /etl/dev/my-metadata/mysqspull/loads/2015-11-04/02ffc306e877_my_load_1446640931.json
> > > >
> > > > Record  2402
> > > >
> > > > Fragment 1:5
> > > >
> > >
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

Re: Help with Troubleshooting dense error message

Posted by Abdel Hakim Deneche <ad...@maprtech.com>.
The error message "index: 9604, length: 4 (expected: range(0, 8192))"
suggests an error happened when Drill tried to access a memory buffer (most
likely while writing an int or float value)
This may be a bug actually exposed by that particular data record.

You can try enabling verbose error logging before running the query again:

set `exec.errors.verbose`=true;

This should give us a nice stack trace about this error.

Thanks

On Wed, Nov 4, 2015 at 7:29 AM, John Omernik <jo...@omernik.com> wrote:

> There are multiple fields in that record, including two lists. Both lists
> have data in them (now I am runnning with json text mode because at times
> the first value is a JSON null, but in these cases, that should be turned
> to "null" as  string.  (If I am understanding things correctly) and
> shouldn't be causing a problem.
>
>
>
> On Wed, Nov 4, 2015 at 9:21 AM, Hsuan Yi Chu <hy...@maprtech.com> wrote:
>
> > What is the data type for that record in line 2402? A list?
> >
> > Do you think it could be similar to this issue ?
> >
> > https://issues.apache.org/jira/browse/DRILL-4006
> >
> >
> >
> > On Wed, Nov 4, 2015 at 6:48 AM, John Omernik <jo...@omernik.com> wrote:
> >
> > > Hey all,
> > >
> > > I am working with JSON that is on the whole fairly clean.  I am trying
> to
> > > load into Parquet files, and the previous days worth of data worked
> just
> > > fine, but todays data has something wrong with it and I Can't figure
> out
> > > what it is. Unfortunately, I can't post the data, which I know makes
> this
> > > hard to troubleshoot for the community. Hopefully I can provide some
> info
> > > here, and get some pointers on where to look, and then report back on
> how
> > > we could potentially improve the error messages.
> > >
> > > The error is below.
> > >
> > >
> > > I am looking to figure out given the information reported where I'd
> look
> > to
> > > trouble shoot this. Obviously the file
> > 02ffc306e877_my_load_1446640931.json
> > > is where I am looking to start
> > >
> > > This file has 3000 lines (records of data, so it's somewhere in
> between.
> > >
> > > The index/length/expected range don't mean anything to me I could use
> > some
> > > help there, because I am not even sure what I am looking for.
> > >
> > > The record and/or Fragment... do those help me dig in?
> > >
> > > Since this is one record per line, I went to line 2402 but that record
> > > looks completely normal to me, (like all the other ones) but since this
> > is
> > > dense text, I am obviously missing something, but is the record the
> line
> > > number?
> > >
> > > Any other pointers I can use to trouble shoot this?
> > >
> > > Thanks!
> > >
> > > Error:
> > >
> > >
> > > Caused by: org.apache.drill.common.exceptions.UserRemoteException:
> > > DATA_READ ERROR: Error parsing JSON - index: 9604, length: 4 (expected:
> > > range(0, 8192))
> > >
> > >
> > >
> > > File
> > >
> > >
> >
> /etl/dev/my-metadata/mysqspull/loads/2015-11-04/02ffc306e877_my_load_1446640931.json
> > >
> > > Record  2402
> > >
> > > Fragment 1:5
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: Help with Troubleshooting dense error message

Posted by John Omernik <jo...@omernik.com>.
There are multiple fields in that record, including two lists. Both lists
have data in them (now I am runnning with json text mode because at times
the first value is a JSON null, but in these cases, that should be turned
to "null" as  string.  (If I am understanding things correctly) and
shouldn't be causing a problem.



On Wed, Nov 4, 2015 at 9:21 AM, Hsuan Yi Chu <hy...@maprtech.com> wrote:

> What is the data type for that record in line 2402? A list?
>
> Do you think it could be similar to this issue ?
>
> https://issues.apache.org/jira/browse/DRILL-4006
>
>
>
> On Wed, Nov 4, 2015 at 6:48 AM, John Omernik <jo...@omernik.com> wrote:
>
> > Hey all,
> >
> > I am working with JSON that is on the whole fairly clean.  I am trying to
> > load into Parquet files, and the previous days worth of data worked just
> > fine, but todays data has something wrong with it and I Can't figure out
> > what it is. Unfortunately, I can't post the data, which I know makes this
> > hard to troubleshoot for the community. Hopefully I can provide some info
> > here, and get some pointers on where to look, and then report back on how
> > we could potentially improve the error messages.
> >
> > The error is below.
> >
> >
> > I am looking to figure out given the information reported where I'd look
> to
> > trouble shoot this. Obviously the file
> 02ffc306e877_my_load_1446640931.json
> > is where I am looking to start
> >
> > This file has 3000 lines (records of data, so it's somewhere in between.
> >
> > The index/length/expected range don't mean anything to me I could use
> some
> > help there, because I am not even sure what I am looking for.
> >
> > The record and/or Fragment... do those help me dig in?
> >
> > Since this is one record per line, I went to line 2402 but that record
> > looks completely normal to me, (like all the other ones) but since this
> is
> > dense text, I am obviously missing something, but is the record the line
> > number?
> >
> > Any other pointers I can use to trouble shoot this?
> >
> > Thanks!
> >
> > Error:
> >
> >
> > Caused by: org.apache.drill.common.exceptions.UserRemoteException:
> > DATA_READ ERROR: Error parsing JSON - index: 9604, length: 4 (expected:
> > range(0, 8192))
> >
> >
> >
> > File
> >
> >
> /etl/dev/my-metadata/mysqspull/loads/2015-11-04/02ffc306e877_my_load_1446640931.json
> >
> > Record  2402
> >
> > Fragment 1:5
> >
>

Re: Help with Troubleshooting dense error message

Posted by Hsuan Yi Chu <hy...@maprtech.com>.
What is the data type for that record in line 2402? A list?

Do you think it could be similar to this issue ?

https://issues.apache.org/jira/browse/DRILL-4006



On Wed, Nov 4, 2015 at 6:48 AM, John Omernik <jo...@omernik.com> wrote:

> Hey all,
>
> I am working with JSON that is on the whole fairly clean.  I am trying to
> load into Parquet files, and the previous days worth of data worked just
> fine, but todays data has something wrong with it and I Can't figure out
> what it is. Unfortunately, I can't post the data, which I know makes this
> hard to troubleshoot for the community. Hopefully I can provide some info
> here, and get some pointers on where to look, and then report back on how
> we could potentially improve the error messages.
>
> The error is below.
>
>
> I am looking to figure out given the information reported where I'd look to
> trouble shoot this. Obviously the file 02ffc306e877_my_load_1446640931.json
> is where I am looking to start
>
> This file has 3000 lines (records of data, so it's somewhere in between.
>
> The index/length/expected range don't mean anything to me I could use some
> help there, because I am not even sure what I am looking for.
>
> The record and/or Fragment... do those help me dig in?
>
> Since this is one record per line, I went to line 2402 but that record
> looks completely normal to me, (like all the other ones) but since this is
> dense text, I am obviously missing something, but is the record the line
> number?
>
> Any other pointers I can use to trouble shoot this?
>
> Thanks!
>
> Error:
>
>
> Caused by: org.apache.drill.common.exceptions.UserRemoteException:
> DATA_READ ERROR: Error parsing JSON - index: 9604, length: 4 (expected:
> range(0, 8192))
>
>
>
> File
>
> /etl/dev/my-metadata/mysqspull/loads/2015-11-04/02ffc306e877_my_load_1446640931.json
>
> Record  2402
>
> Fragment 1:5
>