You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Alejandro Bellogin Kouki <al...@uam.es> on 2013/03/20 04:56:31 UTC

Donuts example bug

Hi all,

this morning I attended the Drill workshop in Amsterdam, and as other 
couple of people, my colleagues and I found a bug regarding the 
simple_plan.json query. Its original output was:
{
  "sales" : 109.71,
  "typeCount" : 2,
  "quantity" : 159,
  "ppu" : 0.55
}
 {
  "sales" : 184.25,
  "typeCount" : 2,
  "quantity" : 335,
  "ppu" : 0.55
}

Notice that both "ppu" values are the same, whereas the value for the 
first should be 0.69 (109.71/159) and for the second 0.55.

So, after digging a little bit (maybe too much, considering the time of 
writing this email) into the source code, I managed to generate the 
desired output. For that, I have changed both the simple_plan.json and 
some code in CollapsingAggregateROP that is incompatible with the 
current description in the Apache Drill Plan Syntax. Mainly because of 
the latter, I preferred to start some discussion here instead of in the 
JIRA ticket, but if you want me to file the JIRA first, I will do it 
(please, take into account I am a complete newbie).

Well, basically my solution involves changing the aggregate operation as 
follows (notice now it has a target):
          op: "collapsingaggregate",
          within: "ppusegment",
          carryovers: [ "donuts.ppu" ],
          target: "donuts.ppu",
          aggregations: [
            { ref: "donuts.typeCount",  expr: "count(1)" },
            { ref: "donuts.quantity",  expr: "sum(quantity)" },
            { ref: "donuts.sales",  expr: "sum(donuts.ppu * quantity)" }
          ]
To make this works, I have had to ignore the fact that "*will draw the 
carryover variables from a record where the target field references has 
a true value*" [ADPS]. When no target is used, the carryover contains a 
pointer to the wrong register in method writeOutputRecord (whereas in 
the method consumeCurrent -- where the condition for target is checked 
-- the instance of the register is the one of the current segment).

I acknowledge that this solution is just a workaround, since it does not 
comply with the ADPS, but I hope at least it serves to give some 
indications about where (and how) a real solution could be found (i.e., 
the carryovers should to be computed when they point to the actual 
register).

Regards,
Alejandro

PS: an alternative solution would be to ignore the carryovers from the 
initial query plan and (somehow) be able to print also the ppu field in 
the projection stage.

-- 
  Alejandro Bellogin Kouki
  http://rincon.uam.es/dir?cw=435275268554687


Re: Donuts example bug

Posted by Jacques Nadeau <ja...@apache.org>.
Hey Alejandro,

Thanks for identifying this problem and presenting a solution.  I
appreciate the solution but think that we should really
update CollapsingAggregateROP to correctly operate against the within
segment key, which sounds like it is the core issue.  Can you file a jira
against this and hopefully someone will pick up the correction of the
behavior.  Clearly we need to add some more tests to validate the full
output of the donuts example as it was, in fact, working correctly at one
point.  (I'm probably the one that broke it.)  If you wanted to add an
additional jira for CollapsingAggregateROP test cases and maybe make one or
two, that would be very helpful.

Thanks again,
Jacques


On Tue, Mar 19, 2013 at 8:56 PM, Alejandro Bellogin Kouki <
alejandro.bellogin@uam.es> wrote:

> Hi all,
>
> this morning I attended the Drill workshop in Amsterdam, and as other
> couple of people, my colleagues and I found a bug regarding the
> simple_plan.json query. Its original output was:
> {
>  "sales" : 109.71,
>  "typeCount" : 2,
>  "quantity" : 159,
>  "ppu" : 0.55
> }
> {
>  "sales" : 184.25,
>  "typeCount" : 2,
>  "quantity" : 335,
>  "ppu" : 0.55
> }
>
> Notice that both "ppu" values are the same, whereas the value for the
> first should be 0.69 (109.71/159) and for the second 0.55.
>
> So, after digging a little bit (maybe too much, considering the time of
> writing this email) into the source code, I managed to generate the desired
> output. For that, I have changed both the simple_plan.json and some code in
> CollapsingAggregateROP that is incompatible with the current description in
> the Apache Drill Plan Syntax. Mainly because of the latter, I preferred to
> start some discussion here instead of in the JIRA ticket, but if you want
> me to file the JIRA first, I will do it (please, take into account I am a
> complete newbie).
>
> Well, basically my solution involves changing the aggregate operation as
> follows (notice now it has a target):
>          op: "collapsingaggregate",
>          within: "ppusegment",
>          carryovers: [ "donuts.ppu" ],
>          target: "donuts.ppu",
>          aggregations: [
>            { ref: "donuts.typeCount",  expr: "count(1)" },
>            { ref: "donuts.quantity",  expr: "sum(quantity)" },
>            { ref: "donuts.sales",  expr: "sum(donuts.ppu * quantity)" }
>          ]
> To make this works, I have had to ignore the fact that "*will draw the
> carryover variables from a record where the target field references has a
> true value*" [ADPS]. When no target is used, the carryover contains a
> pointer to the wrong register in method writeOutputRecord (whereas in the
> method consumeCurrent -- where the condition for target is checked -- the
> instance of the register is the one of the current segment).
>
> I acknowledge that this solution is just a workaround, since it does not
> comply with the ADPS, but I hope at least it serves to give some
> indications about where (and how) a real solution could be found (i.e., the
> carryovers should to be computed when they point to the actual register).
>
> Regards,
> Alejandro
>
> PS: an alternative solution would be to ignore the carryovers from the
> initial query plan and (somehow) be able to print also the ppu field in the
> projection stage.
>
> --
>  Alejandro Bellogin Kouki
>  http://rincon.uam.es/dir?cw=**435275268554687<http://rincon.uam.es/dir?cw=435275268554687>
>
>

Re: Donuts example bug

Posted by Ted Dunning <te...@gmail.com>.
Congratulations, btw, to the drill workshop attendees.

Here are some surprising statistics:

30 signups.  28 or 29 attendees showed up.  (unheard of in my experience)

Essentially all attendees actually prepared for the workshop at a very high
level.  Almost all had drill downloaded and compiled and were ready to rock
and roll.

Two separate groups found this bug and a third group seemed to sense that
something was amiss, but they didn't quite connect the dots.

If this is the average calibre of the CWI community, then I expect some
interesting questions tomorrow at the CWI talk on clustering.

On Wed, Mar 20, 2013 at 4:56 AM, Alejandro Bellogin Kouki <
alejandro.bellogin@uam.es> wrote:

> Hi all,
>
> this morning I attended the Drill workshop in Amsterdam, and as other
> couple of people, my colleagues and I found a bug regarding the
> simple_plan.json query. Its original output was:
> {
>  "sales" : 109.71,
>  "typeCount" : 2,
>  "quantity" : 159,
>  "ppu" : 0.55
> }
> {
>  "sales" : 184.25,
>  "typeCount" : 2,
>  "quantity" : 335,
>  "ppu" : 0.55
> }
>
> Notice that both "ppu" values are the same, whereas the value for the
> first should be 0.69 (109.71/159) and for the second 0.55.
>
> So, after digging a little bit (maybe too much, considering the time of
> writing this email) into the source code, I managed to generate the desired
> output. For that, I have changed both the simple_plan.json and some code in
> CollapsingAggregateROP that is incompatible with the current description in
> the Apache Drill Plan Syntax. Mainly because of the latter, I preferred to
> start some discussion here instead of in the JIRA ticket, but if you want
> me to file the JIRA first, I will do it (please, take into account I am a
> complete newbie).
>
> Well, basically my solution involves changing the aggregate operation as
> follows (notice now it has a target):
>          op: "collapsingaggregate",
>          within: "ppusegment",
>          carryovers: [ "donuts.ppu" ],
>          target: "donuts.ppu",
>          aggregations: [
>            { ref: "donuts.typeCount",  expr: "count(1)" },
>            { ref: "donuts.quantity",  expr: "sum(quantity)" },
>            { ref: "donuts.sales",  expr: "sum(donuts.ppu * quantity)" }
>          ]
> To make this works, I have had to ignore the fact that "*will draw the
> carryover variables from a record where the target field references has a
> true value*" [ADPS]. When no target is used, the carryover contains a
> pointer to the wrong register in method writeOutputRecord (whereas in the
> method consumeCurrent -- where the condition for target is checked -- the
> instance of the register is the one of the current segment).
>
> I acknowledge that this solution is just a workaround, since it does not
> comply with the ADPS, but I hope at least it serves to give some
> indications about where (and how) a real solution could be found (i.e., the
> carryovers should to be computed when they point to the actual register).
>
> Regards,
> Alejandro
>
> PS: an alternative solution would be to ignore the carryovers from the
> initial query plan and (somehow) be able to print also the ppu field in the
> projection stage.
>
> --
>  Alejandro Bellogin Kouki
>  http://rincon.uam.es/dir?cw=**435275268554687<http://rincon.uam.es/dir?cw=435275268554687>
>
>