You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Nitin Pawar <ni...@gmail.com> on 2019/11/05 12:27:04 UTC
Help for DRILL-3609
Hi Devs,
I had sent request for this almost 2.5 years ago. Trying it again now.
Currently Apache drill window functions LEAD and LAG support offset as 1.
In another words in a given window these functions can return either
previous or next row only.
I am trying modify the behavior these function and allow offset >=1 in
query such as
select employee_id, department_id,salary, lag(salary,*4*) over(partition by
department_id order by salary asc) from cp.`employee.json`;
I have managed to remove the limitation which fails the query can not have
offset > 1 and able to pass the offset to actual function implementation.
Currently I am stuck where the record processor is crossing the window
boundary of department_id and gets row from next/previous window in
lead/lag function
For eg: If you notice in row 2 for department_id=2, it is getting previous
windows of department_id=1
Here is sample output for below query
apache drill> select employee_id, department_id,salary, lag(salary,4)
over(partition by department_id order by salary asc) from
cp.`employee.json` where department_id <=3;
+-------------+---------------+---------+----------+
| employee_id | department_id | salary | EXPR$3 |
+-------------+---------------+---------+----------+
| 20 | 1 | 30000.0 | null |
| 5 | 1 | 35000.0 | null |
| 22 | 1 | 35000.0 | null |
| 21 | 1 | 35000.0 | null |
| 2 | 1 | 40000.0 | 30000.0 |
| 4 | 1 | 40000.0 | 35000.0 |
| 1 | 1 | 80000.0 | 35000.0 |
| 37 | 2 | 6700.0 | null |
| 38 | 2 | 8000.0 | 40000.0 |
| 39 | 2 | 10000.0 | 40000.0 |
| 40 | 2 | 10000.0 | 80000.0 |
| 6 | 2 | 25000.0 | 6700.0 |
| 42 | 3 | 5000.0 | null |
| 41 | 3 | 8500.0 | 10000.0 |
| 7 | 3 | 15000.0 | 10000.0 |
| 36 | 3 | 45000.0 | 25000.0 |
+-------------+---------------+---------+----------+
Thanks,
Nitin Pawar
Re: Help for DRILL-3609
Posted by Nitin Pawar <ni...@gmail.com>.
Thanks for the links and guidance Paul.
I will use your advice and see if I can get past the limitation.
Thanks,
Nitin
On Thu, Nov 7, 2019 at 1:00 PM Paul Rogers <pa...@yahoo.com.invalid>
wrote:
> Hi Nitin,
>
> As it turns out, I just had to fix a bug in the windowing operator. I'm
> not an expert on this operator, but perhaps I can offer a suggestion or two.
>
> We have a few existing unit tests for window functions in TestWindowFrame.
> They are a bit hard to follow, however. Take a look at testFix3605(), which
> does:
>
> select
> col2,
> lead(col2) over(partition by col2 order by col0) as lead_col2
> from
> dfs.`window/fewRowsAllData.parquet`
>
> When executed, we use the NoFrameSupportTemplate [1] class to do the work.
> Specifically, processPartition() contains code that handles the lead/lag by
> 1 case.
>
> I found it useful to enable saving of the generated code: [2]. When you
> step into the generated code, in Eclipse, you can set the source path to
> include /tmp/drill/codegen (on Linux/Mac). You can then see the contents of
> the generated copyPrev() and copyNext() functions.
>
> If you step into these generated functions, you can see that the code
> simply takes two indexes: a to and a from, then copies the data from one to
> the other. As a result, I suspect that you do not need to change the
> generated code to achieve your goal.
>
> Instead, you may want to change the processPartition() function. Instead
> of the simple +/-1 logic it currently has, use your lead/lag offset instead.
>
> By the way, a handy way to share work is simply to push your work to your
> private GitHub repo, then link to that code.
>
> Thanks,
> - Paul
>
>
> [1]
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/NoFrameSupportTemplate.java#L139
> [2]
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java#L366
>
> On Wednesday, November 6, 2019, 10:04:12 PM PST, Nitin Pawar <
> nitinpawar432@gmail.com> wrote:
>
> any help on this?
>
>
> On Tue, Nov 5, 2019 at 7:09 PM Nitin Pawar <ni...@gmail.com>
> wrote:
>
> > Ohh ok
> > let me provide a google drive url
> > Here
> > <
> https://drive.google.com/file/d/1no-xq40Q2HEoWx5xG1t5dOx5BKTFH72F/view?usp=sharing
> >
> > is the link. Can you check if can access it.
> >
> > Thanks,
> > Nitin
> >
> > On Tue, Nov 5, 2019 at 7:02 PM Charles Givre <cg...@gmail.com> wrote:
> >
> >> Hi Nitin,
> >> It seems to have been filtered out.
> >>
> >>
> >> > On Nov 5, 2019, at 8:29 AM, Nitin Pawar <ni...@gmail.com>
> >> wrote:
> >> >
> >> > Hi Charles,
> >> >
> >> > I have attached git patch.
> >> > I was currently doing for lag function only for testing purposes
> >> >
> >> > Thanks,
> >> > Nitin
> >> >
> >> > On Tue, Nov 5, 2019 at 6:34 PM Charles Givre <cgivre@gmail.com
> <mailto:
> >> cgivre@gmail.com>> wrote:
> >> > Hi Nitin,
> >> > Thanks for your question. Could you/did you share your code? If not,
> >> could you please post a draft PR so that we can take a look and offer
> >> suggestions?
> >> > Thanks,
> >> > -- C
> >> >
> >> >
> >> > > On Nov 5, 2019, at 7:27 AM, Nitin Pawar <nitinpawar432@gmail.com
> >> <ma...@gmail.com>> wrote:
> >> > >
> >> > > Hi Devs,
> >> > >
> >> > > I had sent request for this almost 2.5 years ago. Trying it again
> now.
> >> > >
> >> > > Currently Apache drill window functions LEAD and LAG support offset
> >> as 1.
> >> > > In another words in a given window these functions can return either
> >> > > previous or next row only.
> >> > >
> >> > >
> >> > > I am trying modify the behavior these function and allow offset >=1
> in
> >> > > query such as
> >> > > select employee_id, department_id,salary, lag(salary,*4*)
> >> over(partition by
> >> > > department_id order by salary asc) from cp.`employee.json`;
> >> > >
> >> > > I have managed to remove the limitation which fails the query can
> not
> >> have
> >> > > offset > 1 and able to pass the offset to actual function
> >> implementation.
> >> > >
> >> > > Currently I am stuck where the record processor is crossing the
> window
> >> > > boundary of department_id and gets row from next/previous window in
> >> > > lead/lag function
> >> > >
> >> > > For eg: If you notice in row 2 for department_id=2, it is getting
> >> previous
> >> > > windows of department_id=1
> >> > >
> >> > > Here is sample output for below query
> >> > > apache drill> select employee_id, department_id,salary,
> lag(salary,4)
> >> > > over(partition by department_id order by salary asc) from
> >> > > cp.`employee.json` where department_id <=3;
> >> > > +-------------+---------------+---------+----------+
> >> > > | employee_id | department_id | salary | EXPR$3 |
> >> > > +-------------+---------------+---------+----------+
> >> > > | 20 | 1 | 30000.0 | null |
> >> > > | 5 | 1 | 35000.0 | null |
> >> > > | 22 | 1 | 35000.0 | null |
> >> > > | 21 | 1 | 35000.0 | null |
> >> > > | 2 | 1 | 40000.0 | 30000.0 |
> >> > > | 4 | 1 | 40000.0 | 35000.0 |
> >> > > | 1 | 1 | 80000.0 | 35000.0 |
> >> > > | 37 | 2 | 6700.0 | null |
> >> > > | 38 | 2 | 8000.0 | 40000.0 |
> >> > > | 39 | 2 | 10000.0 | 40000.0 |
> >> > > | 40 | 2 | 10000.0 | 80000.0 |
> >> > > | 6 | 2 | 25000.0 | 6700.0 |
> >> > > | 42 | 3 | 5000.0 | null |
> >> > > | 41 | 3 | 8500.0 | 10000.0 |
> >> > > | 7 | 3 | 15000.0 | 10000.0 |
> >> > > | 36 | 3 | 45000.0 | 25000.0 |
> >> > > +-------------+---------------+---------+----------+
> >> > >
> >> > >
> >> > > Thanks,
> >> > > Nitin Pawar
> >> >
> >> >
> >> >
> >> > --
> >> > Nitin Pawar
> >>
> >>
> >
> > --
> > Nitin Pawar
> >
>
>
> --
> Nitin Pawar
>
--
Nitin Pawar
Re: Help for DRILL-3609
Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Hi Nitin,
As it turns out, I just had to fix a bug in the windowing operator. I'm not an expert on this operator, but perhaps I can offer a suggestion or two.
We have a few existing unit tests for window functions in TestWindowFrame. They are a bit hard to follow, however. Take a look at testFix3605(), which does:
select
col2,
lead(col2) over(partition by col2 order by col0) as lead_col2
from
dfs.`window/fewRowsAllData.parquet`
When executed, we use the NoFrameSupportTemplate [1] class to do the work. Specifically, processPartition() contains code that handles the lead/lag by 1 case.
I found it useful to enable saving of the generated code: [2]. When you step into the generated code, in Eclipse, you can set the source path to include /tmp/drill/codegen (on Linux/Mac). You can then see the contents of the generated copyPrev() and copyNext() functions.
If you step into these generated functions, you can see that the code simply takes two indexes: a to and a from, then copies the data from one to the other. As a result, I suspect that you do not need to change the generated code to achieve your goal.
Instead, you may want to change the processPartition() function. Instead of the simple +/-1 logic it currently has, use your lead/lag offset instead.
By the way, a handy way to share work is simply to push your work to your private GitHub repo, then link to that code.
Thanks,
- Paul
[1] https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/NoFrameSupportTemplate.java#L139
[2] https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java#L366
On Wednesday, November 6, 2019, 10:04:12 PM PST, Nitin Pawar <ni...@gmail.com> wrote:
any help on this?
On Tue, Nov 5, 2019 at 7:09 PM Nitin Pawar <ni...@gmail.com> wrote:
> Ohh ok
> let me provide a google drive url
> Here
> <https://drive.google.com/file/d/1no-xq40Q2HEoWx5xG1t5dOx5BKTFH72F/view?usp=sharing>
> is the link. Can you check if can access it.
>
> Thanks,
> Nitin
>
> On Tue, Nov 5, 2019 at 7:02 PM Charles Givre <cg...@gmail.com> wrote:
>
>> Hi Nitin,
>> It seems to have been filtered out.
>>
>>
>> > On Nov 5, 2019, at 8:29 AM, Nitin Pawar <ni...@gmail.com>
>> wrote:
>> >
>> > Hi Charles,
>> >
>> > I have attached git patch.
>> > I was currently doing for lag function only for testing purposes
>> >
>> > Thanks,
>> > Nitin
>> >
>> > On Tue, Nov 5, 2019 at 6:34 PM Charles Givre <cgivre@gmail.com <mailto:
>> cgivre@gmail.com>> wrote:
>> > Hi Nitin,
>> > Thanks for your question. Could you/did you share your code? If not,
>> could you please post a draft PR so that we can take a look and offer
>> suggestions?
>> > Thanks,
>> > -- C
>> >
>> >
>> > > On Nov 5, 2019, at 7:27 AM, Nitin Pawar <nitinpawar432@gmail.com
>> <ma...@gmail.com>> wrote:
>> > >
>> > > Hi Devs,
>> > >
>> > > I had sent request for this almost 2.5 years ago. Trying it again now.
>> > >
>> > > Currently Apache drill window functions LEAD and LAG support offset
>> as 1.
>> > > In another words in a given window these functions can return either
>> > > previous or next row only.
>> > >
>> > >
>> > > I am trying modify the behavior these function and allow offset >=1 in
>> > > query such as
>> > > select employee_id, department_id,salary, lag(salary,*4*)
>> over(partition by
>> > > department_id order by salary asc) from cp.`employee.json`;
>> > >
>> > > I have managed to remove the limitation which fails the query can not
>> have
>> > > offset > 1 and able to pass the offset to actual function
>> implementation.
>> > >
>> > > Currently I am stuck where the record processor is crossing the window
>> > > boundary of department_id and gets row from next/previous window in
>> > > lead/lag function
>> > >
>> > > For eg: If you notice in row 2 for department_id=2, it is getting
>> previous
>> > > windows of department_id=1
>> > >
>> > > Here is sample output for below query
>> > > apache drill> select employee_id, department_id,salary, lag(salary,4)
>> > > over(partition by department_id order by salary asc) from
>> > > cp.`employee.json` where department_id <=3;
>> > > +-------------+---------------+---------+----------+
>> > > | employee_id | department_id | salary | EXPR$3 |
>> > > +-------------+---------------+---------+----------+
>> > > | 20 | 1 | 30000.0 | null |
>> > > | 5 | 1 | 35000.0 | null |
>> > > | 22 | 1 | 35000.0 | null |
>> > > | 21 | 1 | 35000.0 | null |
>> > > | 2 | 1 | 40000.0 | 30000.0 |
>> > > | 4 | 1 | 40000.0 | 35000.0 |
>> > > | 1 | 1 | 80000.0 | 35000.0 |
>> > > | 37 | 2 | 6700.0 | null |
>> > > | 38 | 2 | 8000.0 | 40000.0 |
>> > > | 39 | 2 | 10000.0 | 40000.0 |
>> > > | 40 | 2 | 10000.0 | 80000.0 |
>> > > | 6 | 2 | 25000.0 | 6700.0 |
>> > > | 42 | 3 | 5000.0 | null |
>> > > | 41 | 3 | 8500.0 | 10000.0 |
>> > > | 7 | 3 | 15000.0 | 10000.0 |
>> > > | 36 | 3 | 45000.0 | 25000.0 |
>> > > +-------------+---------------+---------+----------+
>> > >
>> > >
>> > > Thanks,
>> > > Nitin Pawar
>> >
>> >
>> >
>> > --
>> > Nitin Pawar
>>
>>
>
> --
> Nitin Pawar
>
--
Nitin Pawar
Re: Help for DRILL-3609
Posted by Nitin Pawar <ni...@gmail.com>.
any help on this?
On Tue, Nov 5, 2019 at 7:09 PM Nitin Pawar <ni...@gmail.com> wrote:
> Ohh ok
> let me provide a google drive url
> Here
> <https://drive.google.com/file/d/1no-xq40Q2HEoWx5xG1t5dOx5BKTFH72F/view?usp=sharing>
> is the link. Can you check if can access it.
>
> Thanks,
> Nitin
>
> On Tue, Nov 5, 2019 at 7:02 PM Charles Givre <cg...@gmail.com> wrote:
>
>> Hi Nitin,
>> It seems to have been filtered out.
>>
>>
>> > On Nov 5, 2019, at 8:29 AM, Nitin Pawar <ni...@gmail.com>
>> wrote:
>> >
>> > Hi Charles,
>> >
>> > I have attached git patch.
>> > I was currently doing for lag function only for testing purposes
>> >
>> > Thanks,
>> > Nitin
>> >
>> > On Tue, Nov 5, 2019 at 6:34 PM Charles Givre <cgivre@gmail.com <mailto:
>> cgivre@gmail.com>> wrote:
>> > Hi Nitin,
>> > Thanks for your question. Could you/did you share your code? If not,
>> could you please post a draft PR so that we can take a look and offer
>> suggestions?
>> > Thanks,
>> > -- C
>> >
>> >
>> > > On Nov 5, 2019, at 7:27 AM, Nitin Pawar <nitinpawar432@gmail.com
>> <ma...@gmail.com>> wrote:
>> > >
>> > > Hi Devs,
>> > >
>> > > I had sent request for this almost 2.5 years ago. Trying it again now.
>> > >
>> > > Currently Apache drill window functions LEAD and LAG support offset
>> as 1.
>> > > In another words in a given window these functions can return either
>> > > previous or next row only.
>> > >
>> > >
>> > > I am trying modify the behavior these function and allow offset >=1 in
>> > > query such as
>> > > select employee_id, department_id,salary, lag(salary,*4*)
>> over(partition by
>> > > department_id order by salary asc) from cp.`employee.json`;
>> > >
>> > > I have managed to remove the limitation which fails the query can not
>> have
>> > > offset > 1 and able to pass the offset to actual function
>> implementation.
>> > >
>> > > Currently I am stuck where the record processor is crossing the window
>> > > boundary of department_id and gets row from next/previous window in
>> > > lead/lag function
>> > >
>> > > For eg: If you notice in row 2 for department_id=2, it is getting
>> previous
>> > > windows of department_id=1
>> > >
>> > > Here is sample output for below query
>> > > apache drill> select employee_id, department_id,salary, lag(salary,4)
>> > > over(partition by department_id order by salary asc) from
>> > > cp.`employee.json` where department_id <=3;
>> > > +-------------+---------------+---------+----------+
>> > > | employee_id | department_id | salary | EXPR$3 |
>> > > +-------------+---------------+---------+----------+
>> > > | 20 | 1 | 30000.0 | null |
>> > > | 5 | 1 | 35000.0 | null |
>> > > | 22 | 1 | 35000.0 | null |
>> > > | 21 | 1 | 35000.0 | null |
>> > > | 2 | 1 | 40000.0 | 30000.0 |
>> > > | 4 | 1 | 40000.0 | 35000.0 |
>> > > | 1 | 1 | 80000.0 | 35000.0 |
>> > > | 37 | 2 | 6700.0 | null |
>> > > | 38 | 2 | 8000.0 | 40000.0 |
>> > > | 39 | 2 | 10000.0 | 40000.0 |
>> > > | 40 | 2 | 10000.0 | 80000.0 |
>> > > | 6 | 2 | 25000.0 | 6700.0 |
>> > > | 42 | 3 | 5000.0 | null |
>> > > | 41 | 3 | 8500.0 | 10000.0 |
>> > > | 7 | 3 | 15000.0 | 10000.0 |
>> > > | 36 | 3 | 45000.0 | 25000.0 |
>> > > +-------------+---------------+---------+----------+
>> > >
>> > >
>> > > Thanks,
>> > > Nitin Pawar
>> >
>> >
>> >
>> > --
>> > Nitin Pawar
>>
>>
>
> --
> Nitin Pawar
>
--
Nitin Pawar
Re: Help for DRILL-3609
Posted by Nitin Pawar <ni...@gmail.com>.
Ohh ok
let me provide a google drive url
Here
<https://drive.google.com/file/d/1no-xq40Q2HEoWx5xG1t5dOx5BKTFH72F/view?usp=sharing>
is the link. Can you check if can access it.
Thanks,
Nitin
On Tue, Nov 5, 2019 at 7:02 PM Charles Givre <cg...@gmail.com> wrote:
> Hi Nitin,
> It seems to have been filtered out.
>
>
> > On Nov 5, 2019, at 8:29 AM, Nitin Pawar <ni...@gmail.com> wrote:
> >
> > Hi Charles,
> >
> > I have attached git patch.
> > I was currently doing for lag function only for testing purposes
> >
> > Thanks,
> > Nitin
> >
> > On Tue, Nov 5, 2019 at 6:34 PM Charles Givre <cgivre@gmail.com <mailto:
> cgivre@gmail.com>> wrote:
> > Hi Nitin,
> > Thanks for your question. Could you/did you share your code? If not,
> could you please post a draft PR so that we can take a look and offer
> suggestions?
> > Thanks,
> > -- C
> >
> >
> > > On Nov 5, 2019, at 7:27 AM, Nitin Pawar <nitinpawar432@gmail.com
> <ma...@gmail.com>> wrote:
> > >
> > > Hi Devs,
> > >
> > > I had sent request for this almost 2.5 years ago. Trying it again now.
> > >
> > > Currently Apache drill window functions LEAD and LAG support offset as
> 1.
> > > In another words in a given window these functions can return either
> > > previous or next row only.
> > >
> > >
> > > I am trying modify the behavior these function and allow offset >=1 in
> > > query such as
> > > select employee_id, department_id,salary, lag(salary,*4*)
> over(partition by
> > > department_id order by salary asc) from cp.`employee.json`;
> > >
> > > I have managed to remove the limitation which fails the query can not
> have
> > > offset > 1 and able to pass the offset to actual function
> implementation.
> > >
> > > Currently I am stuck where the record processor is crossing the window
> > > boundary of department_id and gets row from next/previous window in
> > > lead/lag function
> > >
> > > For eg: If you notice in row 2 for department_id=2, it is getting
> previous
> > > windows of department_id=1
> > >
> > > Here is sample output for below query
> > > apache drill> select employee_id, department_id,salary, lag(salary,4)
> > > over(partition by department_id order by salary asc) from
> > > cp.`employee.json` where department_id <=3;
> > > +-------------+---------------+---------+----------+
> > > | employee_id | department_id | salary | EXPR$3 |
> > > +-------------+---------------+---------+----------+
> > > | 20 | 1 | 30000.0 | null |
> > > | 5 | 1 | 35000.0 | null |
> > > | 22 | 1 | 35000.0 | null |
> > > | 21 | 1 | 35000.0 | null |
> > > | 2 | 1 | 40000.0 | 30000.0 |
> > > | 4 | 1 | 40000.0 | 35000.0 |
> > > | 1 | 1 | 80000.0 | 35000.0 |
> > > | 37 | 2 | 6700.0 | null |
> > > | 38 | 2 | 8000.0 | 40000.0 |
> > > | 39 | 2 | 10000.0 | 40000.0 |
> > > | 40 | 2 | 10000.0 | 80000.0 |
> > > | 6 | 2 | 25000.0 | 6700.0 |
> > > | 42 | 3 | 5000.0 | null |
> > > | 41 | 3 | 8500.0 | 10000.0 |
> > > | 7 | 3 | 15000.0 | 10000.0 |
> > > | 36 | 3 | 45000.0 | 25000.0 |
> > > +-------------+---------------+---------+----------+
> > >
> > >
> > > Thanks,
> > > Nitin Pawar
> >
> >
> >
> > --
> > Nitin Pawar
>
>
--
Nitin Pawar
Re: Help for DRILL-3609
Posted by Charles Givre <cg...@gmail.com>.
Hi Nitin,
It seems to have been filtered out.
> On Nov 5, 2019, at 8:29 AM, Nitin Pawar <ni...@gmail.com> wrote:
>
> Hi Charles,
>
> I have attached git patch.
> I was currently doing for lag function only for testing purposes
>
> Thanks,
> Nitin
>
> On Tue, Nov 5, 2019 at 6:34 PM Charles Givre <cgivre@gmail.com <ma...@gmail.com>> wrote:
> Hi Nitin,
> Thanks for your question. Could you/did you share your code? If not, could you please post a draft PR so that we can take a look and offer suggestions?
> Thanks,
> -- C
>
>
> > On Nov 5, 2019, at 7:27 AM, Nitin Pawar <nitinpawar432@gmail.com <ma...@gmail.com>> wrote:
> >
> > Hi Devs,
> >
> > I had sent request for this almost 2.5 years ago. Trying it again now.
> >
> > Currently Apache drill window functions LEAD and LAG support offset as 1.
> > In another words in a given window these functions can return either
> > previous or next row only.
> >
> >
> > I am trying modify the behavior these function and allow offset >=1 in
> > query such as
> > select employee_id, department_id,salary, lag(salary,*4*) over(partition by
> > department_id order by salary asc) from cp.`employee.json`;
> >
> > I have managed to remove the limitation which fails the query can not have
> > offset > 1 and able to pass the offset to actual function implementation.
> >
> > Currently I am stuck where the record processor is crossing the window
> > boundary of department_id and gets row from next/previous window in
> > lead/lag function
> >
> > For eg: If you notice in row 2 for department_id=2, it is getting previous
> > windows of department_id=1
> >
> > Here is sample output for below query
> > apache drill> select employee_id, department_id,salary, lag(salary,4)
> > over(partition by department_id order by salary asc) from
> > cp.`employee.json` where department_id <=3;
> > +-------------+---------------+---------+----------+
> > | employee_id | department_id | salary | EXPR$3 |
> > +-------------+---------------+---------+----------+
> > | 20 | 1 | 30000.0 | null |
> > | 5 | 1 | 35000.0 | null |
> > | 22 | 1 | 35000.0 | null |
> > | 21 | 1 | 35000.0 | null |
> > | 2 | 1 | 40000.0 | 30000.0 |
> > | 4 | 1 | 40000.0 | 35000.0 |
> > | 1 | 1 | 80000.0 | 35000.0 |
> > | 37 | 2 | 6700.0 | null |
> > | 38 | 2 | 8000.0 | 40000.0 |
> > | 39 | 2 | 10000.0 | 40000.0 |
> > | 40 | 2 | 10000.0 | 80000.0 |
> > | 6 | 2 | 25000.0 | 6700.0 |
> > | 42 | 3 | 5000.0 | null |
> > | 41 | 3 | 8500.0 | 10000.0 |
> > | 7 | 3 | 15000.0 | 10000.0 |
> > | 36 | 3 | 45000.0 | 25000.0 |
> > +-------------+---------------+---------+----------+
> >
> >
> > Thanks,
> > Nitin Pawar
>
>
>
> --
> Nitin Pawar
Re: Help for DRILL-3609
Posted by Nitin Pawar <ni...@gmail.com>.
Hi Charles,
I have attached git patch.
I was currently doing for lag function only for testing purposes
Thanks,
Nitin
On Tue, Nov 5, 2019 at 6:34 PM Charles Givre <cg...@gmail.com> wrote:
> Hi Nitin,
> Thanks for your question. Could you/did you share your code? If not,
> could you please post a draft PR so that we can take a look and offer
> suggestions?
> Thanks,
> -- C
>
>
> > On Nov 5, 2019, at 7:27 AM, Nitin Pawar <ni...@gmail.com> wrote:
> >
> > Hi Devs,
> >
> > I had sent request for this almost 2.5 years ago. Trying it again now.
> >
> > Currently Apache drill window functions LEAD and LAG support offset as 1.
> > In another words in a given window these functions can return either
> > previous or next row only.
> >
> >
> > I am trying modify the behavior these function and allow offset >=1 in
> > query such as
> > select employee_id, department_id,salary, lag(salary,*4*) over(partition
> by
> > department_id order by salary asc) from cp.`employee.json`;
> >
> > I have managed to remove the limitation which fails the query can not
> have
> > offset > 1 and able to pass the offset to actual function implementation.
> >
> > Currently I am stuck where the record processor is crossing the window
> > boundary of department_id and gets row from next/previous window in
> > lead/lag function
> >
> > For eg: If you notice in row 2 for department_id=2, it is getting
> previous
> > windows of department_id=1
> >
> > Here is sample output for below query
> > apache drill> select employee_id, department_id,salary, lag(salary,4)
> > over(partition by department_id order by salary asc) from
> > cp.`employee.json` where department_id <=3;
> > +-------------+---------------+---------+----------+
> > | employee_id | department_id | salary | EXPR$3 |
> > +-------------+---------------+---------+----------+
> > | 20 | 1 | 30000.0 | null |
> > | 5 | 1 | 35000.0 | null |
> > | 22 | 1 | 35000.0 | null |
> > | 21 | 1 | 35000.0 | null |
> > | 2 | 1 | 40000.0 | 30000.0 |
> > | 4 | 1 | 40000.0 | 35000.0 |
> > | 1 | 1 | 80000.0 | 35000.0 |
> > | 37 | 2 | 6700.0 | null |
> > | 38 | 2 | 8000.0 | 40000.0 |
> > | 39 | 2 | 10000.0 | 40000.0 |
> > | 40 | 2 | 10000.0 | 80000.0 |
> > | 6 | 2 | 25000.0 | 6700.0 |
> > | 42 | 3 | 5000.0 | null |
> > | 41 | 3 | 8500.0 | 10000.0 |
> > | 7 | 3 | 15000.0 | 10000.0 |
> > | 36 | 3 | 45000.0 | 25000.0 |
> > +-------------+---------------+---------+----------+
> >
> >
> > Thanks,
> > Nitin Pawar
>
>
--
Nitin Pawar
Re: Help for DRILL-3609
Posted by Charles Givre <cg...@gmail.com>.
Hi Nitin,
Thanks for your question. Could you/did you share your code? If not, could you please post a draft PR so that we can take a look and offer suggestions?
Thanks,
-- C
> On Nov 5, 2019, at 7:27 AM, Nitin Pawar <ni...@gmail.com> wrote:
>
> Hi Devs,
>
> I had sent request for this almost 2.5 years ago. Trying it again now.
>
> Currently Apache drill window functions LEAD and LAG support offset as 1.
> In another words in a given window these functions can return either
> previous or next row only.
>
>
> I am trying modify the behavior these function and allow offset >=1 in
> query such as
> select employee_id, department_id,salary, lag(salary,*4*) over(partition by
> department_id order by salary asc) from cp.`employee.json`;
>
> I have managed to remove the limitation which fails the query can not have
> offset > 1 and able to pass the offset to actual function implementation.
>
> Currently I am stuck where the record processor is crossing the window
> boundary of department_id and gets row from next/previous window in
> lead/lag function
>
> For eg: If you notice in row 2 for department_id=2, it is getting previous
> windows of department_id=1
>
> Here is sample output for below query
> apache drill> select employee_id, department_id,salary, lag(salary,4)
> over(partition by department_id order by salary asc) from
> cp.`employee.json` where department_id <=3;
> +-------------+---------------+---------+----------+
> | employee_id | department_id | salary | EXPR$3 |
> +-------------+---------------+---------+----------+
> | 20 | 1 | 30000.0 | null |
> | 5 | 1 | 35000.0 | null |
> | 22 | 1 | 35000.0 | null |
> | 21 | 1 | 35000.0 | null |
> | 2 | 1 | 40000.0 | 30000.0 |
> | 4 | 1 | 40000.0 | 35000.0 |
> | 1 | 1 | 80000.0 | 35000.0 |
> | 37 | 2 | 6700.0 | null |
> | 38 | 2 | 8000.0 | 40000.0 |
> | 39 | 2 | 10000.0 | 40000.0 |
> | 40 | 2 | 10000.0 | 80000.0 |
> | 6 | 2 | 25000.0 | 6700.0 |
> | 42 | 3 | 5000.0 | null |
> | 41 | 3 | 8500.0 | 10000.0 |
> | 7 | 3 | 15000.0 | 10000.0 |
> | 36 | 3 | 45000.0 | 25000.0 |
> +-------------+---------------+---------+----------+
>
>
> Thanks,
> Nitin Pawar