You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Ted Dunning (JIRA)" <ji...@apache.org> on 2011/03/24 18:19:05 UTC

[jira] [Created] (MAHOUT-634) Need more online averagers

Need more online averagers
--------------------------

                 Key: MAHOUT-634
                 URL: https://issues.apache.org/jira/browse/MAHOUT-634
             Project: Mahout
          Issue Type: Improvement
            Reporter: Ted Dunning


I am occasionally seeing a need to do exponential averaging of values or rates.

Hbase guys want this as well.

So it is time to do it.  I have a patch that does the averaging of values according to
http://tdunning.blogspot.com/2011/03/exponential-weighted-averages-with.html

I will attach that as a patch now and do the rate averaging as well before committing.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [jira] [Commented] (MAHOUT-634) Need more online averagers

Posted by Ted Dunning <te...@gmail.com>.
The current implementation should allow updates to the past, but it will
only ever give you an average at the latest data point.

On Tue, May 17, 2011 at 11:33 AM, Dmitriy Lyubimov (JIRA)
<ji...@apache.org>wrote:

> I am also using this with slight modifications to enable to use with
> map-reduce. 2 suggestions i implemented on a side: updates to the past
>  (unordered input w.r.t. to time of sampling, albeit potentially less
> numerically stable) and combining to use with MR.
> http://weatheringthrutechdays.blogspot.com/2011/04/follow-up-for-mean-summarizer-post.html.
> No algorithm in Mahout currently uses MR for summarizing inputs but it
> might. These improvements allowed to implement Pig functions that run that
> formulas.
>

Re: [jira] [Commented] (MAHOUT-634) Need more online averagers

Posted by Ted Dunning <te...@gmail.com>.
This is a cute idea.  Discount old data and revert to the prior.  Should be
very straightforward.  I don't know of a use off-hand, but I will keep an
eye out for it.

On Tue, May 17, 2011 at 11:33 AM, Dmitriy Lyubimov (JIRA)
<ji...@apache.org>wrote:

> Also i experimented with yet another biased estimator for binomial sums
> (similar to use of beta disitribution as a conjugate prior for binomial
> distribution) that allows to converge on a predefined value P_0 (similar to
> beta distribution mode converging to 0.5 with n going to 0) under two
> circumstances: 1) there's a lack of history (as in beta-distribution-based
> estimate). 2) there's lack of _recent_ history.
>

[jira] [Commented] (MAHOUT-634) Need more online averagers

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034875#comment-13034875 ] 

Lance Norskog commented on MAHOUT-634:
--------------------------------------

Is this numerically stable? Or rather, in which range is this numerically stable?

> Need more online averagers
> --------------------------
>
>                 Key: MAHOUT-634
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-634
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.4
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>             Fix For: 0.5
>
>         Attachments: 0001-MAHOUT-634-time-embedded-moving-averages.patch
>
>
> I am occasionally seeing a need to do exponential averaging of values or rates.
> Hbase guys want this as well.
> So it is time to do it.  I have a patch that does the averaging of values according to
> http://tdunning.blogspot.com/2011/03/exponential-weighted-averages-with.html
> I will attach that as a patch now and do the rate averaging as well before committing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-634) Need more online averagers

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034946#comment-13034946 ] 

Ted Dunning commented on MAHOUT-634:
------------------------------------

Should be pretty much unconditionally stable for positive time constants.  I should have mentioned this, but negative time constants don't make sense, so I spaced the warning.

What kind of scaling factor do you mean?



> Need more online averagers
> --------------------------
>
>                 Key: MAHOUT-634
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-634
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.4
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>             Fix For: 0.5
>
>         Attachments: 0001-MAHOUT-634-time-embedded-moving-averages.patch
>
>
> I am occasionally seeing a need to do exponential averaging of values or rates.
> Hbase guys want this as well.
> So it is time to do it.  I have a patch that does the averaging of values according to
> http://tdunning.blogspot.com/2011/03/exponential-weighted-averages-with.html
> I will attach that as a patch now and do the rate averaging as well before committing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (MAHOUT-634) Need more online averagers

Posted by "Lance Norskog (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034875#comment-13034875 ] 

Lance Norskog edited comment on MAHOUT-634 at 5/17/11 5:12 PM:
---------------------------------------------------------------

Is this numerically stable? Or rather, in which range is this numerically stable? 
Could there be a scaling factor?

      was (Author: lancenorskog):
    Is this numerically stable? Or rather, in which range is this numerically stable?
  
> Need more online averagers
> --------------------------
>
>                 Key: MAHOUT-634
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-634
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.4
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>             Fix For: 0.5
>
>         Attachments: 0001-MAHOUT-634-time-embedded-moving-averages.patch
>
>
> I am occasionally seeing a need to do exponential averaging of values or rates.
> Hbase guys want this as well.
> So it is time to do it.  I have a patch that does the averaging of values according to
> http://tdunning.blogspot.com/2011/03/exponential-weighted-averages-with.html
> I will attach that as a patch now and do the rate averaging as well before committing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (MAHOUT-634) Need more online averagers

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Dunning reassigned MAHOUT-634:
----------------------------------

    Assignee: Ted Dunning

> Need more online averagers
> --------------------------
>
>                 Key: MAHOUT-634
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-634
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.5
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>
> I am occasionally seeing a need to do exponential averaging of values or rates.
> Hbase guys want this as well.
> So it is time to do it.  I have a patch that does the averaging of values according to
> http://tdunning.blogspot.com/2011/03/exponential-weighted-averages-with.html
> I will attach that as a patch now and do the rate averaging as well before committing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-634) Need more online averagers

Posted by "Dmitriy Lyubimov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034945#comment-13034945 ] 

Dmitriy Lyubimov commented on MAHOUT-634:
-----------------------------------------

Ted, 

I am also using this with slight modifications to enable to use with map-reduce. 2 suggestions i implemented on a side: updates to the past  (unordered input w.r.t. to time of sampling, albeit potentially less numerically stable) and combining to use with MR. http://weatheringthrutechdays.blogspot.com/2011/04/follow-up-for-mean-summarizer-post.html. No algorithm in Mahout currently uses MR for summarizing inputs but it might. These improvements allowed to implement Pig functions that run that formulas.

Also i experimented with yet another biased estimator for binomial sums (similar to use of beta disitribution as a conjugate prior for binomial distribution) that allows to converge on a predefined value P_0 (similar to beta distribution mode converging to 0.5 with n going to 0) under two circumstances: 1) there's a lack of history (as in beta-distribution-based estimate). 2) there's lack of _recent_ history. 

There's probably no immediate use for either in Mahout but both problems seem to be pretty common otherwise.


> Need more online averagers
> --------------------------
>
>                 Key: MAHOUT-634
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-634
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.4
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>             Fix For: 0.5
>
>         Attachments: 0001-MAHOUT-634-time-embedded-moving-averages.patch
>
>
> I am occasionally seeing a need to do exponential averaging of values or rates.
> Hbase guys want this as well.
> So it is time to do it.  I have a patch that does the averaging of values according to
> http://tdunning.blogspot.com/2011/03/exponential-weighted-averages-with.html
> I will attach that as a patch now and do the rate averaging as well before committing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-634) Need more online averagers

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated MAHOUT-634:
-----------------------------

    Affects Version/s:     (was: 0.5)
                       0.4
        Fix Version/s: 0.5

> Need more online averagers
> --------------------------
>
>                 Key: MAHOUT-634
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-634
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.4
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>             Fix For: 0.5
>
>         Attachments: 0001-MAHOUT-634-time-embedded-moving-averages.patch
>
>
> I am occasionally seeing a need to do exponential averaging of values or rates.
> Hbase guys want this as well.
> So it is time to do it.  I have a patch that does the averaging of values according to
> http://tdunning.blogspot.com/2011/03/exponential-weighted-averages-with.html
> I will attach that as a patch now and do the rate averaging as well before committing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-634) Need more online averagers

Posted by "Dmitriy Lyubimov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034947#comment-13034947 ] 

Dmitriy Lyubimov commented on MAHOUT-634:
-----------------------------------------

Unfortunately, latex server seems to be down for the moment so formulas are not rendering. I don't know if that's an intermediate condition or permanent.

> Need more online averagers
> --------------------------
>
>                 Key: MAHOUT-634
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-634
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.4
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>             Fix For: 0.5
>
>         Attachments: 0001-MAHOUT-634-time-embedded-moving-averages.patch
>
>
> I am occasionally seeing a need to do exponential averaging of values or rates.
> Hbase guys want this as well.
> So it is time to do it.  I have a patch that does the averaging of values according to
> http://tdunning.blogspot.com/2011/03/exponential-weighted-averages-with.html
> I will attach that as a patch now and do the rate averaging as well before committing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-634) Need more online averagers

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Dunning updated MAHOUT-634:
-------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Committed.  Didn't wait long for reviews because this is pretty trivial stuff.  We can reopen or open a new issue if somebody has a problem.

> Need more online averagers
> --------------------------
>
>                 Key: MAHOUT-634
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-634
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.5
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>         Attachments: 0001-MAHOUT-634-time-embedded-moving-averages.patch
>
>
> I am occasionally seeing a need to do exponential averaging of values or rates.
> Hbase guys want this as well.
> So it is time to do it.  I have a patch that does the averaging of values according to
> http://tdunning.blogspot.com/2011/03/exponential-weighted-averages-with.html
> I will attach that as a patch now and do the rate averaging as well before committing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (MAHOUT-634) Need more online averagers

Posted by "Dmitriy Lyubimov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034945#comment-13034945 ] 

Dmitriy Lyubimov edited comment on MAHOUT-634 at 5/17/11 6:37 PM:
------------------------------------------------------------------

Ted, 

I am also using this with slight modifications to enable to use with map-reduce. 2 suggestions i implemented on a side: updates to the past  (unordered input w.r.t. to time of sampling, albeit potentially less numerically stable) and combining to use with MR. http://weatheringthrutechdays.blogspot.com/2011/04/follow-up-for-mean-summarizer-post.html. No algorithm in Mahout currently uses MR for summarizing inputs but it might. These improvements allowed to implement Pig functions that run those formulas.

Also i experimented with yet another biased estimator for binomial sums (similar to use of beta disitribution as a conjugate prior for binomial distribution) that allows to converge on a predefined value P_0 (similar to beta distribution mode converging to 0.5 with n going to 0) under two circumstances: 1) there's a lack of history (as in beta-distribution-based estimate). 2) there's lack of _recent_ history. 

There's probably no immediate use for either in Mahout but both problems seem to be pretty common otherwise.


      was (Author: dlyubimov):
    Ted, 

I am also using this with slight modifications to enable to use with map-reduce. 2 suggestions i implemented on a side: updates to the past  (unordered input w.r.t. to time of sampling, albeit potentially less numerically stable) and combining to use with MR. http://weatheringthrutechdays.blogspot.com/2011/04/follow-up-for-mean-summarizer-post.html. No algorithm in Mahout currently uses MR for summarizing inputs but it might. These improvements allowed to implement Pig functions that run that formulas.

Also i experimented with yet another biased estimator for binomial sums (similar to use of beta disitribution as a conjugate prior for binomial distribution) that allows to converge on a predefined value P_0 (similar to beta distribution mode converging to 0.5 with n going to 0) under two circumstances: 1) there's a lack of history (as in beta-distribution-based estimate). 2) there's lack of _recent_ history. 

There's probably no immediate use for either in Mahout but both problems seem to be pretty common otherwise.

  
> Need more online averagers
> --------------------------
>
>                 Key: MAHOUT-634
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-634
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.4
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>             Fix For: 0.5
>
>         Attachments: 0001-MAHOUT-634-time-embedded-moving-averages.patch
>
>
> I am occasionally seeing a need to do exponential averaging of values or rates.
> Hbase guys want this as well.
> So it is time to do it.  I have a patch that does the averaging of values according to
> http://tdunning.blogspot.com/2011/03/exponential-weighted-averages-with.html
> I will attach that as a patch now and do the rate averaging as well before committing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-634) Need more online averagers

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Dunning updated MAHOUT-634:
-------------------------------

    Attachment: 0001-MAHOUT-634-time-embedded-moving-averages.patch

> Need more online averagers
> --------------------------
>
>                 Key: MAHOUT-634
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-634
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.5
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>         Attachments: 0001-MAHOUT-634-time-embedded-moving-averages.patch
>
>
> I am occasionally seeing a need to do exponential averaging of values or rates.
> Hbase guys want this as well.
> So it is time to do it.  I have a patch that does the averaging of values according to
> http://tdunning.blogspot.com/2011/03/exponential-weighted-averages-with.html
> I will attach that as a patch now and do the rate averaging as well before committing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (MAHOUT-634) Need more online averagers

Posted by "Dmitriy Lyubimov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034947#comment-13034947 ] 

Dmitriy Lyubimov edited comment on MAHOUT-634 at 5/17/11 7:41 PM:
------------------------------------------------------------------

bq. Unfortunately, latex server seems to be down for the moment so formulas are not rendering. I don't know if that's an intermediate condition or permanent.

Ok fixed now. don't know for how long though, looks like googlegroups change access token every so often to fight attacks or something. 

      was (Author: dlyubimov):
    Unfortunately, latex server seems to be down for the moment so formulas are not rendering. I don't know if that's an intermediate condition or permanent.
  
> Need more online averagers
> --------------------------
>
>                 Key: MAHOUT-634
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-634
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.4
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>             Fix For: 0.5
>
>         Attachments: 0001-MAHOUT-634-time-embedded-moving-averages.patch
>
>
> I am occasionally seeing a need to do exponential averaging of values or rates.
> Hbase guys want this as well.
> So it is time to do it.  I have a patch that does the averaging of values according to
> http://tdunning.blogspot.com/2011/03/exponential-weighted-averages-with.html
> I will attach that as a patch now and do the rate averaging as well before committing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-634) Need more online averagers

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Dunning updated MAHOUT-634:
-------------------------------

    Affects Version/s: 0.5
               Status: Patch Available  (was: Open)

This implements time averaging and rate averaging with test coverage for both.

I will commit shortly if I don't hear otherwise.

> Need more online averagers
> --------------------------
>
>                 Key: MAHOUT-634
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-634
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.5
>            Reporter: Ted Dunning
>            Assignee: Ted Dunning
>
> I am occasionally seeing a need to do exponential averaging of values or rates.
> Hbase guys want this as well.
> So it is time to do it.  I have a patch that does the averaging of values according to
> http://tdunning.blogspot.com/2011/03/exponential-weighted-averages-with.html
> I will attach that as a patch now and do the rate averaging as well before committing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira