You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by "Willett, Phil" <Ph...@maritz.com> on 2015/04/11 18:40:09 UTC

Flatten function limit on large nested JSON array

First, congratulations on vastly improving the speed and size of JSON nested array parsing! I have run into a new problem when I try to flatten those large arrays now:
Query Failed: An Error Occurred
org.apache.drill.exec.rpc.RpcException: RemoteRpcException: Failure while running fragment., Incoming batch of org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch has size 69153, which is beyond the limit of 65536 [ 82af4f1d-c185-41dc-a348-5385c54aed04 on quickstart.cloudera:31010 ] [ 82af4f1d-c185-41dc-a348-5385c54aed04 on quickstart.cloudera:31010 ]

Is there a plan to improve the flatten function so that it can handle the large array sizes now allowed?
Confidentiality Warning: This e-mail contains information intended only for the use of the individual or entity named above. If the reader of this e-mail is not the intended recipient or the employee or agent responsible for delivering it to the intended recipient, any dissemination, publication or copying of this e-mail is strictly prohibited. The sender does not accept any responsibility for any loss, disruption or damage to your data or computer system that may occur while using data contained in, or transmitted with, this e-mail. If you have received this e-mail in error, please immediately notify us by return e-mail. Thank you.

Re: Flatten function limit on large nested JSON array

Posted by Jason Altekruse <al...@gmail.com>.
Hello Phil,

Unfortunately this was a bug that was in flatten all along that ended up
being exposed when we fixed another system-wide issue with supporting large
lists and very wide strings. I have posted a patch that fixes this issue
that is in review, and I want to do a little additional cleanup in flatten
while I'm looking at it.

If you would be willing to do a build the patch attached to this issue
should apply cleanly to the tip of the master branch and this should fix
your issue.

If not, the patch should be going in early next week, so you could just
wait until then.

https://issues.apache.org/jira/browse/DRILL-2162

- Jason



On Sat, Apr 11, 2015 at 9:40 AM, Willett, Phil <Ph...@maritz.com>
wrote:

> First, congratulations on vastly improving the speed and size of JSON
> nested array parsing! I have run into a new problem when I try to flatten
> those large arrays now:
> Query Failed: An Error Occurred
> org.apache.drill.exec.rpc.RpcException: RemoteRpcException: Failure while
> running fragment., Incoming batch of
> org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch has size
> 69153, which is beyond the limit of 65536 [
> 82af4f1d-c185-41dc-a348-5385c54aed04 on quickstart.cloudera:31010 ] [
> 82af4f1d-c185-41dc-a348-5385c54aed04 on quickstart.cloudera:31010 ]
>
> Is there a plan to improve the flatten function so that it can handle the
> large array sizes now allowed?
> Confidentiality Warning: This e-mail contains information intended only
> for the use of the individual or entity named above. If the reader of this
> e-mail is not the intended recipient or the employee or agent responsible
> for delivering it to the intended recipient, any dissemination, publication
> or copying of this e-mail is strictly prohibited. The sender does not
> accept any responsibility for any loss, disruption or damage to your data
> or computer system that may occur while using data contained in, or
> transmitted with, this e-mail. If you have received this e-mail in error,
> please immediately notify us by return e-mail. Thank you.
>