You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Charles Givre <cg...@gmail.com> on 2022/01/28 14:54:42 UTC

Re: 1.20.0-SNAPSHOT: Sort exceeded memory limit of 104857600 bytes

Daniel, 
Thanks for flagging this.  One thing I noticed in your logs is this:

Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to opt in.

What's happening here is that in the newer version of Drill, Drill is sending the sort operation to Mongo which (in theory) should be faster.  In contrast, Drill 1.19 would receive the unsorted data from Mongo then sort it.  I wonder if setting your mongo up so that the `allowDiskUse` parameter is true, you might get better results if Mongo sorts the data.

-- C



> On Jan 28, 2022, at 9:43 AM, Daniel Clark <cl...@gmail.com> wrote:
> 
> Hi Charles,
> 
> Yes "supportsSortPushdown" is set to true. I left it at the default. I'll
> try setting it to false, and try again. Thanks for the feedback.
> 
> On Fri, Jan 28, 2022 at 9:38 AM Charles Givre <cg...@gmail.com> wrote:
> 
>> Hey Daniel,
>> Did you have the sort pushdown enabled?  This is one change that we added
>> to the mongo pushdown since 1.19 and might be affecting your query.
>> Best,
>> -- C
>> 
>> 
>>> On Jan 28, 2022, at 9:32 AM, Daniel Clark <cl...@gmail.com> wrote:
>>> 
>>> Hello,
>>> 
>>> While evaluating 1.20.0-SNAPSHOT release performance, I ran a mongo
>> query that runs in 15 minutes in the 1.19 release (below).
>>> 
>>> SELECT `Elements_Efforts`.`EffortTypeName` AS `EffortTypeName`,
>>>  `Elements`.`ElementSubTypeName` AS `ElementSubTypeName`,
>>>  `Elements`.`ElementTypeName` AS `ElementTypeName`,
>>>  `Elements`.`PlanID` AS `PlanID`
>>> FROM `mongo.grounds`.`Elements` `Elements`
>>>  INNER JOIN `mongo.grounds`.`Elements_Efforts` `Elements_Efforts` ON
>> (`Elements`.`_id` = `Elements_Efforts`.`_id`)
>>> WHERE (`Elements`.`PlanID` = '1623263140')
>>> GROUP BY `Elements_Efforts`.`EffortTypeName`,
>>>  `Elements`.`ElementSubTypeName`,
>>>  `Elements`.`ElementTypeName`,
>>>  `Elements`.`PlanID`
>>> 
>>> The query runs for 34 minutes before returning this error; "Sort
>> exceeded memory limit of 104857600 bytes, but did not opt in to external
>> sorting. Aborting operation. Pass allowDiskUse:true to opt in.' on server
>> localhost:27017." Any ideas? I realize that it's a mongodb error, but the
>> mongo database doesn't raise this error with the 1.19 release. I was
>> expecting improved performance with the mongo storage plugin in the
>> upcoming 1.20 release. Nothing in my environment has changed. I've attached
>> the full stacktrace.
>>> 
>>> <stacktrace.txt>
>> 
>> 


Re: 1.20.0-SNAPSHOT: Sort exceeded memory limit of 104857600 bytes

Posted by Charles Givre <cg...@gmail.com>.
Good question. I don't know enough about Mongo config to answer that, but let me look into that. 
Best,
-- C

> On Jan 28, 2022, at 10:20 AM, Daniel Clark <cl...@gmail.com> wrote:
> 
> Hi Charles,
> 
> I was under the impression that the allowDiskUse parameter is passed by the
> client making the call to the mongodb server. Is it possible to add this
> parameter to the mongo storage plugin, similar to how you added the
> "batchSize" parameter for the 1.20 release?
> 
> On Fri, Jan 28, 2022 at 9:54 AM Charles Givre <cg...@gmail.com> wrote:
> 
>> Daniel,
>> Thanks for flagging this.  One thing I noticed in your logs is this:
>> 
>> Sort exceeded memory limit of 104857600 bytes, but did not opt in to
>> external sorting. Aborting operation. Pass allowDiskUse:true to opt in.
>> 
>> What's happening here is that in the newer version of Drill, Drill is
>> sending the sort operation to Mongo which (in theory) should be faster.  In
>> contrast, Drill 1.19 would receive the unsorted data from Mongo then sort
>> it.  I wonder if setting your mongo up so that the `allowDiskUse` parameter
>> is true, you might get better results if Mongo sorts the data.
>> 
>> -- C
>> 
>> 
>> 
>>> On Jan 28, 2022, at 9:43 AM, Daniel Clark <cl...@gmail.com> wrote:
>>> 
>>> Hi Charles,
>>> 
>>> Yes "supportsSortPushdown" is set to true. I left it at the default. I'll
>>> try setting it to false, and try again. Thanks for the feedback.
>>> 
>>> On Fri, Jan 28, 2022 at 9:38 AM Charles Givre <cg...@gmail.com> wrote:
>>> 
>>>> Hey Daniel,
>>>> Did you have the sort pushdown enabled?  This is one change that we
>> added
>>>> to the mongo pushdown since 1.19 and might be affecting your query.
>>>> Best,
>>>> -- C
>>>> 
>>>> 
>>>>> On Jan 28, 2022, at 9:32 AM, Daniel Clark <cl...@gmail.com> wrote:
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> While evaluating 1.20.0-SNAPSHOT release performance, I ran a mongo
>>>> query that runs in 15 minutes in the 1.19 release (below).
>>>>> 
>>>>> SELECT `Elements_Efforts`.`EffortTypeName` AS `EffortTypeName`,
>>>>> `Elements`.`ElementSubTypeName` AS `ElementSubTypeName`,
>>>>> `Elements`.`ElementTypeName` AS `ElementTypeName`,
>>>>> `Elements`.`PlanID` AS `PlanID`
>>>>> FROM `mongo.grounds`.`Elements` `Elements`
>>>>> INNER JOIN `mongo.grounds`.`Elements_Efforts` `Elements_Efforts` ON
>>>> (`Elements`.`_id` = `Elements_Efforts`.`_id`)
>>>>> WHERE (`Elements`.`PlanID` = '1623263140')
>>>>> GROUP BY `Elements_Efforts`.`EffortTypeName`,
>>>>> `Elements`.`ElementSubTypeName`,
>>>>> `Elements`.`ElementTypeName`,
>>>>> `Elements`.`PlanID`
>>>>> 
>>>>> The query runs for 34 minutes before returning this error; "Sort
>>>> exceeded memory limit of 104857600 bytes, but did not opt in to external
>>>> sorting. Aborting operation. Pass allowDiskUse:true to opt in.' on
>> server
>>>> localhost:27017." Any ideas? I realize that it's a mongodb error, but
>> the
>>>> mongo database doesn't raise this error with the 1.19 release. I was
>>>> expecting improved performance with the mongo storage plugin in the
>>>> upcoming 1.20 release. Nothing in my environment has changed. I've
>> attached
>>>> the full stacktrace.
>>>>> 
>>>>> <stacktrace.txt>
>>>> 
>>>> 
>> 
>> 


Re: 1.20.0-SNAPSHOT: Sort exceeded memory limit of 104857600 bytes

Posted by Charles Givre <cg...@gmail.com>.
Good question. I don't know enough about Mongo config to answer that, but let me look into that. 
Best,
-- C

> On Jan 28, 2022, at 10:20 AM, Daniel Clark <cl...@gmail.com> wrote:
> 
> Hi Charles,
> 
> I was under the impression that the allowDiskUse parameter is passed by the
> client making the call to the mongodb server. Is it possible to add this
> parameter to the mongo storage plugin, similar to how you added the
> "batchSize" parameter for the 1.20 release?
> 
> On Fri, Jan 28, 2022 at 9:54 AM Charles Givre <cg...@gmail.com> wrote:
> 
>> Daniel,
>> Thanks for flagging this.  One thing I noticed in your logs is this:
>> 
>> Sort exceeded memory limit of 104857600 bytes, but did not opt in to
>> external sorting. Aborting operation. Pass allowDiskUse:true to opt in.
>> 
>> What's happening here is that in the newer version of Drill, Drill is
>> sending the sort operation to Mongo which (in theory) should be faster.  In
>> contrast, Drill 1.19 would receive the unsorted data from Mongo then sort
>> it.  I wonder if setting your mongo up so that the `allowDiskUse` parameter
>> is true, you might get better results if Mongo sorts the data.
>> 
>> -- C
>> 
>> 
>> 
>>> On Jan 28, 2022, at 9:43 AM, Daniel Clark <cl...@gmail.com> wrote:
>>> 
>>> Hi Charles,
>>> 
>>> Yes "supportsSortPushdown" is set to true. I left it at the default. I'll
>>> try setting it to false, and try again. Thanks for the feedback.
>>> 
>>> On Fri, Jan 28, 2022 at 9:38 AM Charles Givre <cg...@gmail.com> wrote:
>>> 
>>>> Hey Daniel,
>>>> Did you have the sort pushdown enabled?  This is one change that we
>> added
>>>> to the mongo pushdown since 1.19 and might be affecting your query.
>>>> Best,
>>>> -- C
>>>> 
>>>> 
>>>>> On Jan 28, 2022, at 9:32 AM, Daniel Clark <cl...@gmail.com> wrote:
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> While evaluating 1.20.0-SNAPSHOT release performance, I ran a mongo
>>>> query that runs in 15 minutes in the 1.19 release (below).
>>>>> 
>>>>> SELECT `Elements_Efforts`.`EffortTypeName` AS `EffortTypeName`,
>>>>> `Elements`.`ElementSubTypeName` AS `ElementSubTypeName`,
>>>>> `Elements`.`ElementTypeName` AS `ElementTypeName`,
>>>>> `Elements`.`PlanID` AS `PlanID`
>>>>> FROM `mongo.grounds`.`Elements` `Elements`
>>>>> INNER JOIN `mongo.grounds`.`Elements_Efforts` `Elements_Efforts` ON
>>>> (`Elements`.`_id` = `Elements_Efforts`.`_id`)
>>>>> WHERE (`Elements`.`PlanID` = '1623263140')
>>>>> GROUP BY `Elements_Efforts`.`EffortTypeName`,
>>>>> `Elements`.`ElementSubTypeName`,
>>>>> `Elements`.`ElementTypeName`,
>>>>> `Elements`.`PlanID`
>>>>> 
>>>>> The query runs for 34 minutes before returning this error; "Sort
>>>> exceeded memory limit of 104857600 bytes, but did not opt in to external
>>>> sorting. Aborting operation. Pass allowDiskUse:true to opt in.' on
>> server
>>>> localhost:27017." Any ideas? I realize that it's a mongodb error, but
>> the
>>>> mongo database doesn't raise this error with the 1.19 release. I was
>>>> expecting improved performance with the mongo storage plugin in the
>>>> upcoming 1.20 release. Nothing in my environment has changed. I've
>> attached
>>>> the full stacktrace.
>>>>> 
>>>>> <stacktrace.txt>
>>>> 
>>>> 
>> 
>> 


Re: 1.20.0-SNAPSHOT: Sort exceeded memory limit of 104857600 bytes

Posted by Daniel Clark <cl...@gmail.com>.
Hi Charles,

I was under the impression that the allowDiskUse parameter is passed by the
client making the call to the mongodb server. Is it possible to add this
parameter to the mongo storage plugin, similar to how you added the
"batchSize" parameter for the 1.20 release?

On Fri, Jan 28, 2022 at 9:54 AM Charles Givre <cg...@gmail.com> wrote:

> Daniel,
> Thanks for flagging this.  One thing I noticed in your logs is this:
>
> Sort exceeded memory limit of 104857600 bytes, but did not opt in to
> external sorting. Aborting operation. Pass allowDiskUse:true to opt in.
>
> What's happening here is that in the newer version of Drill, Drill is
> sending the sort operation to Mongo which (in theory) should be faster.  In
> contrast, Drill 1.19 would receive the unsorted data from Mongo then sort
> it.  I wonder if setting your mongo up so that the `allowDiskUse` parameter
> is true, you might get better results if Mongo sorts the data.
>
> -- C
>
>
>
> > On Jan 28, 2022, at 9:43 AM, Daniel Clark <cl...@gmail.com> wrote:
> >
> > Hi Charles,
> >
> > Yes "supportsSortPushdown" is set to true. I left it at the default. I'll
> > try setting it to false, and try again. Thanks for the feedback.
> >
> > On Fri, Jan 28, 2022 at 9:38 AM Charles Givre <cg...@gmail.com> wrote:
> >
> >> Hey Daniel,
> >> Did you have the sort pushdown enabled?  This is one change that we
> added
> >> to the mongo pushdown since 1.19 and might be affecting your query.
> >> Best,
> >> -- C
> >>
> >>
> >>> On Jan 28, 2022, at 9:32 AM, Daniel Clark <cl...@gmail.com> wrote:
> >>>
> >>> Hello,
> >>>
> >>> While evaluating 1.20.0-SNAPSHOT release performance, I ran a mongo
> >> query that runs in 15 minutes in the 1.19 release (below).
> >>>
> >>> SELECT `Elements_Efforts`.`EffortTypeName` AS `EffortTypeName`,
> >>>  `Elements`.`ElementSubTypeName` AS `ElementSubTypeName`,
> >>>  `Elements`.`ElementTypeName` AS `ElementTypeName`,
> >>>  `Elements`.`PlanID` AS `PlanID`
> >>> FROM `mongo.grounds`.`Elements` `Elements`
> >>>  INNER JOIN `mongo.grounds`.`Elements_Efforts` `Elements_Efforts` ON
> >> (`Elements`.`_id` = `Elements_Efforts`.`_id`)
> >>> WHERE (`Elements`.`PlanID` = '1623263140')
> >>> GROUP BY `Elements_Efforts`.`EffortTypeName`,
> >>>  `Elements`.`ElementSubTypeName`,
> >>>  `Elements`.`ElementTypeName`,
> >>>  `Elements`.`PlanID`
> >>>
> >>> The query runs for 34 minutes before returning this error; "Sort
> >> exceeded memory limit of 104857600 bytes, but did not opt in to external
> >> sorting. Aborting operation. Pass allowDiskUse:true to opt in.' on
> server
> >> localhost:27017." Any ideas? I realize that it's a mongodb error, but
> the
> >> mongo database doesn't raise this error with the 1.19 release. I was
> >> expecting improved performance with the mongo storage plugin in the
> >> upcoming 1.20 release. Nothing in my environment has changed. I've
> attached
> >> the full stacktrace.
> >>>
> >>> <stacktrace.txt>
> >>
> >>
>
>

Re: 1.20.0-SNAPSHOT: Sort exceeded memory limit of 104857600 bytes

Posted by Daniel Clark <cl...@gmail.com>.
Hi Charles,

I was under the impression that the allowDiskUse parameter is passed by the
client making the call to the mongodb server. Is it possible to add this
parameter to the mongo storage plugin, similar to how you added the
"batchSize" parameter for the 1.20 release?

On Fri, Jan 28, 2022 at 9:54 AM Charles Givre <cg...@gmail.com> wrote:

> Daniel,
> Thanks for flagging this.  One thing I noticed in your logs is this:
>
> Sort exceeded memory limit of 104857600 bytes, but did not opt in to
> external sorting. Aborting operation. Pass allowDiskUse:true to opt in.
>
> What's happening here is that in the newer version of Drill, Drill is
> sending the sort operation to Mongo which (in theory) should be faster.  In
> contrast, Drill 1.19 would receive the unsorted data from Mongo then sort
> it.  I wonder if setting your mongo up so that the `allowDiskUse` parameter
> is true, you might get better results if Mongo sorts the data.
>
> -- C
>
>
>
> > On Jan 28, 2022, at 9:43 AM, Daniel Clark <cl...@gmail.com> wrote:
> >
> > Hi Charles,
> >
> > Yes "supportsSortPushdown" is set to true. I left it at the default. I'll
> > try setting it to false, and try again. Thanks for the feedback.
> >
> > On Fri, Jan 28, 2022 at 9:38 AM Charles Givre <cg...@gmail.com> wrote:
> >
> >> Hey Daniel,
> >> Did you have the sort pushdown enabled?  This is one change that we
> added
> >> to the mongo pushdown since 1.19 and might be affecting your query.
> >> Best,
> >> -- C
> >>
> >>
> >>> On Jan 28, 2022, at 9:32 AM, Daniel Clark <cl...@gmail.com> wrote:
> >>>
> >>> Hello,
> >>>
> >>> While evaluating 1.20.0-SNAPSHOT release performance, I ran a mongo
> >> query that runs in 15 minutes in the 1.19 release (below).
> >>>
> >>> SELECT `Elements_Efforts`.`EffortTypeName` AS `EffortTypeName`,
> >>>  `Elements`.`ElementSubTypeName` AS `ElementSubTypeName`,
> >>>  `Elements`.`ElementTypeName` AS `ElementTypeName`,
> >>>  `Elements`.`PlanID` AS `PlanID`
> >>> FROM `mongo.grounds`.`Elements` `Elements`
> >>>  INNER JOIN `mongo.grounds`.`Elements_Efforts` `Elements_Efforts` ON
> >> (`Elements`.`_id` = `Elements_Efforts`.`_id`)
> >>> WHERE (`Elements`.`PlanID` = '1623263140')
> >>> GROUP BY `Elements_Efforts`.`EffortTypeName`,
> >>>  `Elements`.`ElementSubTypeName`,
> >>>  `Elements`.`ElementTypeName`,
> >>>  `Elements`.`PlanID`
> >>>
> >>> The query runs for 34 minutes before returning this error; "Sort
> >> exceeded memory limit of 104857600 bytes, but did not opt in to external
> >> sorting. Aborting operation. Pass allowDiskUse:true to opt in.' on
> server
> >> localhost:27017." Any ideas? I realize that it's a mongodb error, but
> the
> >> mongo database doesn't raise this error with the 1.19 release. I was
> >> expecting improved performance with the mongo storage plugin in the
> >> upcoming 1.20 release. Nothing in my environment has changed. I've
> attached
> >> the full stacktrace.
> >>>
> >>> <stacktrace.txt>
> >>
> >>
>
>