You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hive.apache.org by "Dudu Markovitz (JIRA)" <ji...@apache.org> on 2016/07/30 19:24:20 UTC

[jira] [Comment Edited] (HIVE-10142) Calculating formula based on difference between each row's value and current row's in Windowing function

    [ https://issues.apache.org/jira/browse/HIVE-10142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400487#comment-15400487 ] 

Dudu Markovitz edited comment on HIVE-10142 at 7/30/16 7:23 PM:
----------------------------------------------------------------

Although I can relate to the request, I've never seen it implemented before, probably because it is an O(N^2) operation.

E.g.-
For every event I would like to count the number of events with higher values, that occurred before this event.
Assuming we have a new keyword  "CURRENT_ROW", the analytic function would look something like this:

count (case when val > CURRENT_ROW.val then 1 end) over (order by ts rows between unbounded preceding and current row)

The thing is that in order to implement this we would probably sort the data set by ts (so far so good) and then compare each record against its preceding records which is a O(N^2) operation.
That mean that for a table of 1M (1,000,000) record we are at the scale of 1T (1,000,000,000,000) operations.

I'm not sure we want to go there.



was (Author: dmarkovitz):
Although I can relate to the request, I've never seen it implemented before, probably because it is an O(N^2) operation.

Take this for example -
For every event I would like to count the number of events with higher higher values that occurred before it.
Assuming we have a new keyword  "CURRENT_ROW", the analytic function would look something like this:

count (case when val > CURRENT_ROW.val then 1 end) over (order by ts rows between unbounded preceding and current row)

The thing is that in order to implement this we would probably sort the data set by ts (so far so good) and then compare each record against its preceding records which is a O(N^2) operation.
That mean that for a table of 1M (1,000,000) record we are at the scale of 1T (1,000,000,000,000) operations.

I'm not sure want to go there.


> Calculating formula based on difference between each row's value and current row's in Windowing function
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-10142
>                 URL: https://issues.apache.org/jira/browse/HIVE-10142
>             Project: Hive
>          Issue Type: New Feature
>          Components: PTF-Windowing
>    Affects Versions: 1.0.0
>            Reporter: Yi Zhang
>            Assignee: Aihua Xu
>
> For analytics with windowing function, the calculation formula sometimes needs to perform over each row's value against current tow's value. The decay value is a good example, such as sums of value with a decay function based on difference of timestamp between each row and current row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)