You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Nitin Pawar <ni...@gmail.com> on 2019/11/05 12:27:04 UTC

Help for DRILL-3609

Hi Devs,

I had sent request for this almost 2.5 years ago. Trying it again now.

Currently Apache drill window functions LEAD and LAG support offset as 1.
In another words in a given window these functions can return either
previous or next row only.


I am trying modify the behavior these function and allow offset >=1 in
query such as
select employee_id, department_id,salary, lag(salary,*4*) over(partition by
department_id order by salary asc) from  cp.`employee.json`;

I have managed to remove the limitation which fails the query can not have
offset > 1 and able to pass the offset to actual function implementation.

Currently I am stuck where the record processor is crossing the window
boundary of department_id and gets row from next/previous window in
lead/lag function

For eg: If you notice in row 2 for department_id=2, it is getting previous
windows of department_id=1

Here is sample output for below query
apache drill> select  employee_id, department_id,salary, lag(salary,4)
over(partition by department_id order by salary asc) from
 cp.`employee.json` where department_id <=3;
+-------------+---------------+---------+----------+
| employee_id | department_id | salary  |  EXPR$3  |
+-------------+---------------+---------+----------+
| 20          | 1             | 30000.0 | null     |
| 5           | 1             | 35000.0 | null     |
| 22          | 1             | 35000.0 | null     |
| 21          | 1             | 35000.0 | null     |
| 2           | 1             | 40000.0 | 30000.0  |
| 4           | 1             | 40000.0 | 35000.0  |
| 1           | 1             | 80000.0 | 35000.0  |
| 37          | 2             | 6700.0  | null     |
| 38          | 2             | 8000.0  | 40000.0  |
| 39          | 2             | 10000.0 | 40000.0  |
| 40          | 2             | 10000.0 | 80000.0  |
| 6           | 2             | 25000.0 | 6700.0   |
| 42          | 3             | 5000.0  | null     |
| 41          | 3             | 8500.0  | 10000.0  |
| 7           | 3             | 15000.0 | 10000.0  |
| 36          | 3             | 45000.0 | 25000.0  |
+-------------+---------------+---------+----------+


Thanks,
Nitin Pawar

Re: Help for DRILL-3609

Posted by Nitin Pawar <ni...@gmail.com>.
Thanks for the links and guidance Paul.
I will  use your advice and see if I can get past the limitation.

Thanks,
Nitin

On Thu, Nov 7, 2019 at 1:00 PM Paul Rogers <pa...@yahoo.com.invalid>
wrote:

> Hi Nitin,
>
> As it turns out, I just had to fix a bug in the windowing operator. I'm
> not an expert on this operator, but perhaps I can offer a suggestion or two.
>
> We have a few existing unit tests for window functions in TestWindowFrame.
> They are a bit hard to follow, however. Take a look at testFix3605(), which
> does:
>
> select
>   col2,
>   lead(col2) over(partition by col2 order by col0) as lead_col2
> from
>   dfs.`window/fewRowsAllData.parquet`
>
> When executed, we use the NoFrameSupportTemplate [1] class to do the work.
> Specifically, processPartition() contains code that handles the lead/lag by
> 1 case.
>
> I found it useful to enable saving of the generated code: [2]. When you
> step into the generated code, in Eclipse, you can set the source path to
> include /tmp/drill/codegen (on Linux/Mac). You can then see the contents of
> the generated copyPrev() and copyNext() functions.
>
> If you step into these generated functions, you can see that the code
> simply takes two indexes: a to and a from, then copies the data from one to
> the other. As a result, I suspect that you do not need to change the
> generated code to achieve your goal.
>
> Instead, you may want to change the processPartition() function. Instead
> of the simple +/-1 logic it currently has, use your lead/lag offset instead.
>
> By the way, a handy way to share work is simply to push your work to your
> private GitHub repo, then link to that code.
>
> Thanks,
> - Paul
>
>
> [1]
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/NoFrameSupportTemplate.java#L139
> [2]
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java#L366
>
>     On Wednesday, November 6, 2019, 10:04:12 PM PST, Nitin Pawar <
> nitinpawar432@gmail.com> wrote:
>
>  any help on this?
>
>
> On Tue, Nov 5, 2019 at 7:09 PM Nitin Pawar <ni...@gmail.com>
> wrote:
>
> > Ohh ok
> > let me provide a google drive url
> > Here
> > <
> https://drive.google.com/file/d/1no-xq40Q2HEoWx5xG1t5dOx5BKTFH72F/view?usp=sharing
> >
> > is the link. Can you check if can access it.
> >
> > Thanks,
> > Nitin
> >
> > On Tue, Nov 5, 2019 at 7:02 PM Charles Givre <cg...@gmail.com> wrote:
> >
> >> Hi Nitin,
> >> It seems to have been filtered out.
> >>
> >>
> >> > On Nov 5, 2019, at 8:29 AM, Nitin Pawar <ni...@gmail.com>
> >> wrote:
> >> >
> >> > Hi Charles,
> >> >
> >> > I have attached git patch.
> >> > I was currently doing for lag function only for testing purposes
> >> >
> >> > Thanks,
> >> > Nitin
> >> >
> >> > On Tue, Nov 5, 2019 at 6:34 PM Charles Givre <cgivre@gmail.com
> <mailto:
> >> cgivre@gmail.com>> wrote:
> >> > Hi Nitin,
> >> > Thanks for your question.  Could you/did you share your code?  If not,
> >> could you please post a draft PR so that we can take a look and offer
> >> suggestions?
> >> > Thanks,
> >> > -- C
> >> >
> >> >
> >> > > On Nov 5, 2019, at 7:27 AM, Nitin Pawar <nitinpawar432@gmail.com
> >> <ma...@gmail.com>> wrote:
> >> > >
> >> > > Hi Devs,
> >> > >
> >> > > I had sent request for this almost 2.5 years ago. Trying it again
> now.
> >> > >
> >> > > Currently Apache drill window functions LEAD and LAG support offset
> >> as 1.
> >> > > In another words in a given window these functions can return either
> >> > > previous or next row only.
> >> > >
> >> > >
> >> > > I am trying modify the behavior these function and allow offset >=1
> in
> >> > > query such as
> >> > > select employee_id, department_id,salary, lag(salary,*4*)
> >> over(partition by
> >> > > department_id order by salary asc) from  cp.`employee.json`;
> >> > >
> >> > > I have managed to remove the limitation which fails the query can
> not
> >> have
> >> > > offset > 1 and able to pass the offset to actual function
> >> implementation.
> >> > >
> >> > > Currently I am stuck where the record processor is crossing the
> window
> >> > > boundary of department_id and gets row from next/previous window in
> >> > > lead/lag function
> >> > >
> >> > > For eg: If you notice in row 2 for department_id=2, it is getting
> >> previous
> >> > > windows of department_id=1
> >> > >
> >> > > Here is sample output for below query
> >> > > apache drill> select  employee_id, department_id,salary,
> lag(salary,4)
> >> > > over(partition by department_id order by salary asc) from
> >> > > cp.`employee.json` where department_id <=3;
> >> > > +-------------+---------------+---------+----------+
> >> > > | employee_id | department_id | salary  |  EXPR$3  |
> >> > > +-------------+---------------+---------+----------+
> >> > > | 20          | 1            | 30000.0 | null    |
> >> > > | 5          | 1            | 35000.0 | null    |
> >> > > | 22          | 1            | 35000.0 | null    |
> >> > > | 21          | 1            | 35000.0 | null    |
> >> > > | 2          | 1            | 40000.0 | 30000.0  |
> >> > > | 4          | 1            | 40000.0 | 35000.0  |
> >> > > | 1          | 1            | 80000.0 | 35000.0  |
> >> > > | 37          | 2            | 6700.0  | null    |
> >> > > | 38          | 2            | 8000.0  | 40000.0  |
> >> > > | 39          | 2            | 10000.0 | 40000.0  |
> >> > > | 40          | 2            | 10000.0 | 80000.0  |
> >> > > | 6          | 2            | 25000.0 | 6700.0  |
> >> > > | 42          | 3            | 5000.0  | null    |
> >> > > | 41          | 3            | 8500.0  | 10000.0  |
> >> > > | 7          | 3            | 15000.0 | 10000.0  |
> >> > > | 36          | 3            | 45000.0 | 25000.0  |
> >> > > +-------------+---------------+---------+----------+
> >> > >
> >> > >
> >> > > Thanks,
> >> > > Nitin Pawar
> >> >
> >> >
> >> >
> >> > --
> >> > Nitin Pawar
> >>
> >>
> >
> > --
> > Nitin Pawar
> >
>
>
> --
> Nitin Pawar
>



-- 
Nitin Pawar

Re: Help for DRILL-3609

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Hi Nitin,

As it turns out, I just had to fix a bug in the windowing operator. I'm not an expert on this operator, but perhaps I can offer a suggestion or two.

We have a few existing unit tests for window functions in TestWindowFrame. They are a bit hard to follow, however. Take a look at testFix3605(), which does:

select
  col2,
  lead(col2) over(partition by col2 order by col0) as lead_col2
from
  dfs.`window/fewRowsAllData.parquet`

When executed, we use the NoFrameSupportTemplate [1] class to do the work. Specifically, processPartition() contains code that handles the lead/lag by 1 case.

I found it useful to enable saving of the generated code: [2]. When you step into the generated code, in Eclipse, you can set the source path to include /tmp/drill/codegen (on Linux/Mac). You can then see the contents of the generated copyPrev() and copyNext() functions.

If you step into these generated functions, you can see that the code simply takes two indexes: a to and a from, then copies the data from one to the other. As a result, I suspect that you do not need to change the generated code to achieve your goal.

Instead, you may want to change the processPartition() function. Instead of the simple +/-1 logic it currently has, use your lead/lag offset instead.

By the way, a handy way to share work is simply to push your work to your private GitHub repo, then link to that code.

Thanks,
- Paul


[1] https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/NoFrameSupportTemplate.java#L139
[2] https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java#L366 

    On Wednesday, November 6, 2019, 10:04:12 PM PST, Nitin Pawar <ni...@gmail.com> wrote:  
 
 any help on this?


On Tue, Nov 5, 2019 at 7:09 PM Nitin Pawar <ni...@gmail.com> wrote:

> Ohh ok
> let me provide a google drive url
> Here
> <https://drive.google.com/file/d/1no-xq40Q2HEoWx5xG1t5dOx5BKTFH72F/view?usp=sharing>
> is the link. Can you check if can access it.
>
> Thanks,
> Nitin
>
> On Tue, Nov 5, 2019 at 7:02 PM Charles Givre <cg...@gmail.com> wrote:
>
>> Hi Nitin,
>> It seems to have been filtered out.
>>
>>
>> > On Nov 5, 2019, at 8:29 AM, Nitin Pawar <ni...@gmail.com>
>> wrote:
>> >
>> > Hi Charles,
>> >
>> > I have attached git patch.
>> > I was currently doing for lag function only for testing purposes
>> >
>> > Thanks,
>> > Nitin
>> >
>> > On Tue, Nov 5, 2019 at 6:34 PM Charles Givre <cgivre@gmail.com <mailto:
>> cgivre@gmail.com>> wrote:
>> > Hi Nitin,
>> > Thanks for your question.  Could you/did you share your code?  If not,
>> could you please post a draft PR so that we can take a look and offer
>> suggestions?
>> > Thanks,
>> > -- C
>> >
>> >
>> > > On Nov 5, 2019, at 7:27 AM, Nitin Pawar <nitinpawar432@gmail.com
>> <ma...@gmail.com>> wrote:
>> > >
>> > > Hi Devs,
>> > >
>> > > I had sent request for this almost 2.5 years ago. Trying it again now.
>> > >
>> > > Currently Apache drill window functions LEAD and LAG support offset
>> as 1.
>> > > In another words in a given window these functions can return either
>> > > previous or next row only.
>> > >
>> > >
>> > > I am trying modify the behavior these function and allow offset >=1 in
>> > > query such as
>> > > select employee_id, department_id,salary, lag(salary,*4*)
>> over(partition by
>> > > department_id order by salary asc) from  cp.`employee.json`;
>> > >
>> > > I have managed to remove the limitation which fails the query can not
>> have
>> > > offset > 1 and able to pass the offset to actual function
>> implementation.
>> > >
>> > > Currently I am stuck where the record processor is crossing the window
>> > > boundary of department_id and gets row from next/previous window in
>> > > lead/lag function
>> > >
>> > > For eg: If you notice in row 2 for department_id=2, it is getting
>> previous
>> > > windows of department_id=1
>> > >
>> > > Here is sample output for below query
>> > > apache drill> select  employee_id, department_id,salary, lag(salary,4)
>> > > over(partition by department_id order by salary asc) from
>> > > cp.`employee.json` where department_id <=3;
>> > > +-------------+---------------+---------+----------+
>> > > | employee_id | department_id | salary  |  EXPR$3  |
>> > > +-------------+---------------+---------+----------+
>> > > | 20          | 1            | 30000.0 | null    |
>> > > | 5          | 1            | 35000.0 | null    |
>> > > | 22          | 1            | 35000.0 | null    |
>> > > | 21          | 1            | 35000.0 | null    |
>> > > | 2          | 1            | 40000.0 | 30000.0  |
>> > > | 4          | 1            | 40000.0 | 35000.0  |
>> > > | 1          | 1            | 80000.0 | 35000.0  |
>> > > | 37          | 2            | 6700.0  | null    |
>> > > | 38          | 2            | 8000.0  | 40000.0  |
>> > > | 39          | 2            | 10000.0 | 40000.0  |
>> > > | 40          | 2            | 10000.0 | 80000.0  |
>> > > | 6          | 2            | 25000.0 | 6700.0  |
>> > > | 42          | 3            | 5000.0  | null    |
>> > > | 41          | 3            | 8500.0  | 10000.0  |
>> > > | 7          | 3            | 15000.0 | 10000.0  |
>> > > | 36          | 3            | 45000.0 | 25000.0  |
>> > > +-------------+---------------+---------+----------+
>> > >
>> > >
>> > > Thanks,
>> > > Nitin Pawar
>> >
>> >
>> >
>> > --
>> > Nitin Pawar
>>
>>
>
> --
> Nitin Pawar
>


-- 
Nitin Pawar
  

Re: Help for DRILL-3609

Posted by Nitin Pawar <ni...@gmail.com>.
any help on this?


On Tue, Nov 5, 2019 at 7:09 PM Nitin Pawar <ni...@gmail.com> wrote:

> Ohh ok
> let me provide a google drive url
> Here
> <https://drive.google.com/file/d/1no-xq40Q2HEoWx5xG1t5dOx5BKTFH72F/view?usp=sharing>
> is the link. Can you check if can access it.
>
> Thanks,
> Nitin
>
> On Tue, Nov 5, 2019 at 7:02 PM Charles Givre <cg...@gmail.com> wrote:
>
>> Hi Nitin,
>> It seems to have been filtered out.
>>
>>
>> > On Nov 5, 2019, at 8:29 AM, Nitin Pawar <ni...@gmail.com>
>> wrote:
>> >
>> > Hi Charles,
>> >
>> > I have attached git patch.
>> > I was currently doing for lag function only for testing purposes
>> >
>> > Thanks,
>> > Nitin
>> >
>> > On Tue, Nov 5, 2019 at 6:34 PM Charles Givre <cgivre@gmail.com <mailto:
>> cgivre@gmail.com>> wrote:
>> > Hi Nitin,
>> > Thanks for your question.  Could you/did you share your code?  If not,
>> could you please post a draft PR so that we can take a look and offer
>> suggestions?
>> > Thanks,
>> > -- C
>> >
>> >
>> > > On Nov 5, 2019, at 7:27 AM, Nitin Pawar <nitinpawar432@gmail.com
>> <ma...@gmail.com>> wrote:
>> > >
>> > > Hi Devs,
>> > >
>> > > I had sent request for this almost 2.5 years ago. Trying it again now.
>> > >
>> > > Currently Apache drill window functions LEAD and LAG support offset
>> as 1.
>> > > In another words in a given window these functions can return either
>> > > previous or next row only.
>> > >
>> > >
>> > > I am trying modify the behavior these function and allow offset >=1 in
>> > > query such as
>> > > select employee_id, department_id,salary, lag(salary,*4*)
>> over(partition by
>> > > department_id order by salary asc) from  cp.`employee.json`;
>> > >
>> > > I have managed to remove the limitation which fails the query can not
>> have
>> > > offset > 1 and able to pass the offset to actual function
>> implementation.
>> > >
>> > > Currently I am stuck where the record processor is crossing the window
>> > > boundary of department_id and gets row from next/previous window in
>> > > lead/lag function
>> > >
>> > > For eg: If you notice in row 2 for department_id=2, it is getting
>> previous
>> > > windows of department_id=1
>> > >
>> > > Here is sample output for below query
>> > > apache drill> select  employee_id, department_id,salary, lag(salary,4)
>> > > over(partition by department_id order by salary asc) from
>> > > cp.`employee.json` where department_id <=3;
>> > > +-------------+---------------+---------+----------+
>> > > | employee_id | department_id | salary  |  EXPR$3  |
>> > > +-------------+---------------+---------+----------+
>> > > | 20          | 1             | 30000.0 | null     |
>> > > | 5           | 1             | 35000.0 | null     |
>> > > | 22          | 1             | 35000.0 | null     |
>> > > | 21          | 1             | 35000.0 | null     |
>> > > | 2           | 1             | 40000.0 | 30000.0  |
>> > > | 4           | 1             | 40000.0 | 35000.0  |
>> > > | 1           | 1             | 80000.0 | 35000.0  |
>> > > | 37          | 2             | 6700.0  | null     |
>> > > | 38          | 2             | 8000.0  | 40000.0  |
>> > > | 39          | 2             | 10000.0 | 40000.0  |
>> > > | 40          | 2             | 10000.0 | 80000.0  |
>> > > | 6           | 2             | 25000.0 | 6700.0   |
>> > > | 42          | 3             | 5000.0  | null     |
>> > > | 41          | 3             | 8500.0  | 10000.0  |
>> > > | 7           | 3             | 15000.0 | 10000.0  |
>> > > | 36          | 3             | 45000.0 | 25000.0  |
>> > > +-------------+---------------+---------+----------+
>> > >
>> > >
>> > > Thanks,
>> > > Nitin Pawar
>> >
>> >
>> >
>> > --
>> > Nitin Pawar
>>
>>
>
> --
> Nitin Pawar
>


-- 
Nitin Pawar

Re: Help for DRILL-3609

Posted by Nitin Pawar <ni...@gmail.com>.
Ohh ok
let me provide a google drive url
Here
<https://drive.google.com/file/d/1no-xq40Q2HEoWx5xG1t5dOx5BKTFH72F/view?usp=sharing>
is the link. Can you check if can access it.

Thanks,
Nitin

On Tue, Nov 5, 2019 at 7:02 PM Charles Givre <cg...@gmail.com> wrote:

> Hi Nitin,
> It seems to have been filtered out.
>
>
> > On Nov 5, 2019, at 8:29 AM, Nitin Pawar <ni...@gmail.com> wrote:
> >
> > Hi Charles,
> >
> > I have attached git patch.
> > I was currently doing for lag function only for testing purposes
> >
> > Thanks,
> > Nitin
> >
> > On Tue, Nov 5, 2019 at 6:34 PM Charles Givre <cgivre@gmail.com <mailto:
> cgivre@gmail.com>> wrote:
> > Hi Nitin,
> > Thanks for your question.  Could you/did you share your code?  If not,
> could you please post a draft PR so that we can take a look and offer
> suggestions?
> > Thanks,
> > -- C
> >
> >
> > > On Nov 5, 2019, at 7:27 AM, Nitin Pawar <nitinpawar432@gmail.com
> <ma...@gmail.com>> wrote:
> > >
> > > Hi Devs,
> > >
> > > I had sent request for this almost 2.5 years ago. Trying it again now.
> > >
> > > Currently Apache drill window functions LEAD and LAG support offset as
> 1.
> > > In another words in a given window these functions can return either
> > > previous or next row only.
> > >
> > >
> > > I am trying modify the behavior these function and allow offset >=1 in
> > > query such as
> > > select employee_id, department_id,salary, lag(salary,*4*)
> over(partition by
> > > department_id order by salary asc) from  cp.`employee.json`;
> > >
> > > I have managed to remove the limitation which fails the query can not
> have
> > > offset > 1 and able to pass the offset to actual function
> implementation.
> > >
> > > Currently I am stuck where the record processor is crossing the window
> > > boundary of department_id and gets row from next/previous window in
> > > lead/lag function
> > >
> > > For eg: If you notice in row 2 for department_id=2, it is getting
> previous
> > > windows of department_id=1
> > >
> > > Here is sample output for below query
> > > apache drill> select  employee_id, department_id,salary, lag(salary,4)
> > > over(partition by department_id order by salary asc) from
> > > cp.`employee.json` where department_id <=3;
> > > +-------------+---------------+---------+----------+
> > > | employee_id | department_id | salary  |  EXPR$3  |
> > > +-------------+---------------+---------+----------+
> > > | 20          | 1             | 30000.0 | null     |
> > > | 5           | 1             | 35000.0 | null     |
> > > | 22          | 1             | 35000.0 | null     |
> > > | 21          | 1             | 35000.0 | null     |
> > > | 2           | 1             | 40000.0 | 30000.0  |
> > > | 4           | 1             | 40000.0 | 35000.0  |
> > > | 1           | 1             | 80000.0 | 35000.0  |
> > > | 37          | 2             | 6700.0  | null     |
> > > | 38          | 2             | 8000.0  | 40000.0  |
> > > | 39          | 2             | 10000.0 | 40000.0  |
> > > | 40          | 2             | 10000.0 | 80000.0  |
> > > | 6           | 2             | 25000.0 | 6700.0   |
> > > | 42          | 3             | 5000.0  | null     |
> > > | 41          | 3             | 8500.0  | 10000.0  |
> > > | 7           | 3             | 15000.0 | 10000.0  |
> > > | 36          | 3             | 45000.0 | 25000.0  |
> > > +-------------+---------------+---------+----------+
> > >
> > >
> > > Thanks,
> > > Nitin Pawar
> >
> >
> >
> > --
> > Nitin Pawar
>
>

-- 
Nitin Pawar

Re: Help for DRILL-3609

Posted by Charles Givre <cg...@gmail.com>.
Hi Nitin, 
It seems to have been filtered out.  


> On Nov 5, 2019, at 8:29 AM, Nitin Pawar <ni...@gmail.com> wrote:
> 
> Hi Charles,
> 
> I have attached git patch. 
> I was currently doing for lag function only for testing purposes 
> 
> Thanks,
> Nitin 
> 
> On Tue, Nov 5, 2019 at 6:34 PM Charles Givre <cgivre@gmail.com <ma...@gmail.com>> wrote:
> Hi Nitin, 
> Thanks for your question.  Could you/did you share your code?  If not, could you please post a draft PR so that we can take a look and offer suggestions?
> Thanks,
> -- C
> 
> 
> > On Nov 5, 2019, at 7:27 AM, Nitin Pawar <nitinpawar432@gmail.com <ma...@gmail.com>> wrote:
> > 
> > Hi Devs,
> > 
> > I had sent request for this almost 2.5 years ago. Trying it again now.
> > 
> > Currently Apache drill window functions LEAD and LAG support offset as 1.
> > In another words in a given window these functions can return either
> > previous or next row only.
> > 
> > 
> > I am trying modify the behavior these function and allow offset >=1 in
> > query such as
> > select employee_id, department_id,salary, lag(salary,*4*) over(partition by
> > department_id order by salary asc) from  cp.`employee.json`;
> > 
> > I have managed to remove the limitation which fails the query can not have
> > offset > 1 and able to pass the offset to actual function implementation.
> > 
> > Currently I am stuck where the record processor is crossing the window
> > boundary of department_id and gets row from next/previous window in
> > lead/lag function
> > 
> > For eg: If you notice in row 2 for department_id=2, it is getting previous
> > windows of department_id=1
> > 
> > Here is sample output for below query
> > apache drill> select  employee_id, department_id,salary, lag(salary,4)
> > over(partition by department_id order by salary asc) from
> > cp.`employee.json` where department_id <=3;
> > +-------------+---------------+---------+----------+
> > | employee_id | department_id | salary  |  EXPR$3  |
> > +-------------+---------------+---------+----------+
> > | 20          | 1             | 30000.0 | null     |
> > | 5           | 1             | 35000.0 | null     |
> > | 22          | 1             | 35000.0 | null     |
> > | 21          | 1             | 35000.0 | null     |
> > | 2           | 1             | 40000.0 | 30000.0  |
> > | 4           | 1             | 40000.0 | 35000.0  |
> > | 1           | 1             | 80000.0 | 35000.0  |
> > | 37          | 2             | 6700.0  | null     |
> > | 38          | 2             | 8000.0  | 40000.0  |
> > | 39          | 2             | 10000.0 | 40000.0  |
> > | 40          | 2             | 10000.0 | 80000.0  |
> > | 6           | 2             | 25000.0 | 6700.0   |
> > | 42          | 3             | 5000.0  | null     |
> > | 41          | 3             | 8500.0  | 10000.0  |
> > | 7           | 3             | 15000.0 | 10000.0  |
> > | 36          | 3             | 45000.0 | 25000.0  |
> > +-------------+---------------+---------+----------+
> > 
> > 
> > Thanks,
> > Nitin Pawar
> 
> 
> 
> -- 
> Nitin Pawar


Re: Help for DRILL-3609

Posted by Nitin Pawar <ni...@gmail.com>.
Hi Charles,

I have attached git patch.
I was currently doing for lag function only for testing purposes

Thanks,
Nitin

On Tue, Nov 5, 2019 at 6:34 PM Charles Givre <cg...@gmail.com> wrote:

> Hi Nitin,
> Thanks for your question.  Could you/did you share your code?  If not,
> could you please post a draft PR so that we can take a look and offer
> suggestions?
> Thanks,
> -- C
>
>
> > On Nov 5, 2019, at 7:27 AM, Nitin Pawar <ni...@gmail.com> wrote:
> >
> > Hi Devs,
> >
> > I had sent request for this almost 2.5 years ago. Trying it again now.
> >
> > Currently Apache drill window functions LEAD and LAG support offset as 1.
> > In another words in a given window these functions can return either
> > previous or next row only.
> >
> >
> > I am trying modify the behavior these function and allow offset >=1 in
> > query such as
> > select employee_id, department_id,salary, lag(salary,*4*) over(partition
> by
> > department_id order by salary asc) from  cp.`employee.json`;
> >
> > I have managed to remove the limitation which fails the query can not
> have
> > offset > 1 and able to pass the offset to actual function implementation.
> >
> > Currently I am stuck where the record processor is crossing the window
> > boundary of department_id and gets row from next/previous window in
> > lead/lag function
> >
> > For eg: If you notice in row 2 for department_id=2, it is getting
> previous
> > windows of department_id=1
> >
> > Here is sample output for below query
> > apache drill> select  employee_id, department_id,salary, lag(salary,4)
> > over(partition by department_id order by salary asc) from
> > cp.`employee.json` where department_id <=3;
> > +-------------+---------------+---------+----------+
> > | employee_id | department_id | salary  |  EXPR$3  |
> > +-------------+---------------+---------+----------+
> > | 20          | 1             | 30000.0 | null     |
> > | 5           | 1             | 35000.0 | null     |
> > | 22          | 1             | 35000.0 | null     |
> > | 21          | 1             | 35000.0 | null     |
> > | 2           | 1             | 40000.0 | 30000.0  |
> > | 4           | 1             | 40000.0 | 35000.0  |
> > | 1           | 1             | 80000.0 | 35000.0  |
> > | 37          | 2             | 6700.0  | null     |
> > | 38          | 2             | 8000.0  | 40000.0  |
> > | 39          | 2             | 10000.0 | 40000.0  |
> > | 40          | 2             | 10000.0 | 80000.0  |
> > | 6           | 2             | 25000.0 | 6700.0   |
> > | 42          | 3             | 5000.0  | null     |
> > | 41          | 3             | 8500.0  | 10000.0  |
> > | 7           | 3             | 15000.0 | 10000.0  |
> > | 36          | 3             | 45000.0 | 25000.0  |
> > +-------------+---------------+---------+----------+
> >
> >
> > Thanks,
> > Nitin Pawar
>
>

-- 
Nitin Pawar

Re: Help for DRILL-3609

Posted by Charles Givre <cg...@gmail.com>.
Hi Nitin, 
Thanks for your question.  Could you/did you share your code?  If not, could you please post a draft PR so that we can take a look and offer suggestions?
Thanks,
-- C


> On Nov 5, 2019, at 7:27 AM, Nitin Pawar <ni...@gmail.com> wrote:
> 
> Hi Devs,
> 
> I had sent request for this almost 2.5 years ago. Trying it again now.
> 
> Currently Apache drill window functions LEAD and LAG support offset as 1.
> In another words in a given window these functions can return either
> previous or next row only.
> 
> 
> I am trying modify the behavior these function and allow offset >=1 in
> query such as
> select employee_id, department_id,salary, lag(salary,*4*) over(partition by
> department_id order by salary asc) from  cp.`employee.json`;
> 
> I have managed to remove the limitation which fails the query can not have
> offset > 1 and able to pass the offset to actual function implementation.
> 
> Currently I am stuck where the record processor is crossing the window
> boundary of department_id and gets row from next/previous window in
> lead/lag function
> 
> For eg: If you notice in row 2 for department_id=2, it is getting previous
> windows of department_id=1
> 
> Here is sample output for below query
> apache drill> select  employee_id, department_id,salary, lag(salary,4)
> over(partition by department_id order by salary asc) from
> cp.`employee.json` where department_id <=3;
> +-------------+---------------+---------+----------+
> | employee_id | department_id | salary  |  EXPR$3  |
> +-------------+---------------+---------+----------+
> | 20          | 1             | 30000.0 | null     |
> | 5           | 1             | 35000.0 | null     |
> | 22          | 1             | 35000.0 | null     |
> | 21          | 1             | 35000.0 | null     |
> | 2           | 1             | 40000.0 | 30000.0  |
> | 4           | 1             | 40000.0 | 35000.0  |
> | 1           | 1             | 80000.0 | 35000.0  |
> | 37          | 2             | 6700.0  | null     |
> | 38          | 2             | 8000.0  | 40000.0  |
> | 39          | 2             | 10000.0 | 40000.0  |
> | 40          | 2             | 10000.0 | 80000.0  |
> | 6           | 2             | 25000.0 | 6700.0   |
> | 42          | 3             | 5000.0  | null     |
> | 41          | 3             | 8500.0  | 10000.0  |
> | 7           | 3             | 15000.0 | 10000.0  |
> | 36          | 3             | 45000.0 | 25000.0  |
> +-------------+---------------+---------+----------+
> 
> 
> Thanks,
> Nitin Pawar