You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by myn <my...@163.com> on 2011/12/05 02:47:20 UTC

Time series analysis

does mahout contain this method?
or is there any other open soure projcet about this?

Re: Time series analysis

Posted by Ted Dunning <te...@gmail.com>.
Classification and clustering a also common tasks in time series analysis.
 Furthermore, not all time series have sample that are expressed as simple
continuous values.  Think about click streams or financial transactions.
 Neither can be expressed as a simple number.

On Sun, Dec 4, 2011 at 7:29 PM, Peyman Mohajerian <mo...@gmail.com>wrote:

> Any time you have data collected over time, you have time series data.
> For example data form trajectory of hand movement in biomechanics or
> movement of a give stock in a given day, x-axis is time. FFT,
> frequency analysis of the data is an example of time series analysis.
> In general regression are more applicable to time series and from what
> I read Mahout does not deal with regression, 3 C's, Cluster,
> Classification and CF are covered in Mahout.
>
> On Sun, Dec 4, 2011 at 7:22 PM, Ted Dunning <te...@gmail.com> wrote:
> > 2011/12/4 myn <my...@163.com>
> >
> >> does mahout contain this method?
> >>
> >
> > Which method?
> >
> > Time series analysis is not a method.
>

Re: Time series analysis

Posted by Ted Dunning <te...@gmail.com>.
Classification and clustering a also common tasks in time series analysis.
 Furthermore, not all time series have sample that are expressed as simple
continuous values.  Think about click streams or financial transactions.
 Neither can be expressed as a simple number.

On Sun, Dec 4, 2011 at 7:29 PM, Peyman Mohajerian <mo...@gmail.com>wrote:

> Any time you have data collected over time, you have time series data.
> For example data form trajectory of hand movement in biomechanics or
> movement of a give stock in a given day, x-axis is time. FFT,
> frequency analysis of the data is an example of time series analysis.
> In general regression are more applicable to time series and from what
> I read Mahout does not deal with regression, 3 C's, Cluster,
> Classification and CF are covered in Mahout.
>
> On Sun, Dec 4, 2011 at 7:22 PM, Ted Dunning <te...@gmail.com> wrote:
> > 2011/12/4 myn <my...@163.com>
> >
> >> does mahout contain this method?
> >>
> >
> > Which method?
> >
> > Time series analysis is not a method.
>

Re: Time series analysis

Posted by Peyman Mohajerian <mo...@gmail.com>.
Any time you have data collected over time, you have time series data.
For example data form trajectory of hand movement in biomechanics or
movement of a give stock in a given day, x-axis is time. FFT,
frequency analysis of the data is an example of time series analysis.
In general regression are more applicable to time series and from what
I read Mahout does not deal with regression, 3 C's, Cluster,
Classification and CF are covered in Mahout.

On Sun, Dec 4, 2011 at 7:22 PM, Ted Dunning <te...@gmail.com> wrote:
> 2011/12/4 myn <my...@163.com>
>
>> does mahout contain this method?
>>
>
> Which method?
>
> Time series analysis is not a method.

Re: Time series analysis

Posted by Peyman Mohajerian <mo...@gmail.com>.
Any time you have data collected over time, you have time series data.
For example data form trajectory of hand movement in biomechanics or
movement of a give stock in a given day, x-axis is time. FFT,
frequency analysis of the data is an example of time series analysis.
In general regression are more applicable to time series and from what
I read Mahout does not deal with regression, 3 C's, Cluster,
Classification and CF are covered in Mahout.

On Sun, Dec 4, 2011 at 7:22 PM, Ted Dunning <te...@gmail.com> wrote:
> 2011/12/4 myn <my...@163.com>
>
>> does mahout contain this method?
>>
>
> Which method?
>
> Time series analysis is not a method.

Re: Time series analysis

Posted by Ted Dunning <te...@gmail.com>.
2011/12/4 myn <my...@163.com>

> does mahout contain this method?
>

Which method?

Time series analysis is not a method.

Re: Time series analysis

Posted by Ted Dunning <te...@gmail.com>.
2011/12/4 myn <my...@163.com>

> does mahout contain this method?
>

Which method?

Time series analysis is not a method.

Re:Re: Time series analysis

Posted by myn <my...@163.com>.
thank you, i had view the source code,
it only compute the last average of N  ,not consider Season factor

the code as follow
 
point_sum = 0;
    for (int x = 0; x < oWindow.size(); x++) {
     point_sum += oWindow.get(x).fValue;
    } // for
    moving_avg = point_sum / oWindow.size();
    out_val.set("Moving Average: " + moving_avg);




At 2011-12-07 06:52:39,"Raphael Cendrillon" <ce...@gmail.com> wrote:
>If the data series is large it might be interesting to further split the job over time using overlap/add or overlap/save, or even an FFT suitably partitioned. 
>
>On Dec 6, 2011, at 1:48 PM, Josh Patterson <jo...@cloudera.com> wrote:
>
>> Mahout currently does not have, afaik, much/any time series specific
>> code for it. If I were to point someone at some good resources I'd
>> start wtih:
>> 
>> - Box and Jenkins book
>> - Dr Keogh's line of research on time series pattern matching
>> 
>> And then beyond that it begins to become "what are you specifically
>> looking for?". R is typically the "go to" resource for a lot of time
>> series work, but there has been some very successful work with Hadoop
>> and large scale time series data. Below I link to a few articles where
>> time series techniques are demonstrated with Hadoop. Specifically here
>> is a blog article on general time series processing with  Hadoop:
>> 
>> http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-1/
>> http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-2/
>> http://www.cloudera.com/blog/2011/04/simple-moving-average-secondary-sort-and-mapreduce-part-3/
>> 
>> Beyond that you could take a look at how we applied these concepts to
>> the US powergrid PMU / smartgrid data back in 2009:
>> 
>> http://openpdc.codeplex.com
>> http://www.slideshare.net/jpatanooga/oscon-data-2011-lumberyard
>> 
>> Hope that gets you going,
>> 
>> Josh
>> 
>> 2011/12/4 myn <my...@163.com>:
>>> does mahout contain this method?
>>> or is there any other open soure projcet about this?
>> 
>> 
>> 
>> -- 
>> Twitter: @jpatanooga
>> Solution Architect @ Cloudera
>> hadoop: http://www.cloudera.com

Re: Time series analysis

Posted by Raphael Cendrillon <ce...@gmail.com>.
Thanks for the link, it looks pretty interesting. 

How long are your filters typically? I guess there is no need for frequency domain processing if the filters are fairly short. 

Out of interest do you see any need for 2D filtering?

On Dec 7, 2011, at 9:52 AM, Josh Patterson <jo...@cloudera.com> wrote:

> We did that with the openPDC classifications system where we broke up
> high resolution PMU/sensor data into "blocks of time + sensor id"
> buckets, with some overlap.
> 
> code at: http://openpdc.codeplex.com
> 
> The Cloudera article is just a basic example illustrating the
> secondary sort mechanic, which is key for time series on hadoop (sort
> for free).
> 
> The openPDC has one MR job that scans time series for fuzzy patterns
> using Keogh's SAX/iSAX technique and a 1NN classifier based on a
> BallTree.
> 
> Josh
> 
> On Tue, Dec 6, 2011 at 5:52 PM, Raphael Cendrillon
> <ce...@gmail.com> wrote:
>> If the data series is large it might be interesting to further split the job over time using overlap/add or overlap/save, or even an FFT suitably partitioned.
>> 
>> On Dec 6, 2011, at 1:48 PM, Josh Patterson <jo...@cloudera.com> wrote:
>> 
>>> Mahout currently does not have, afaik, much/any time series specific
>>> code for it. If I were to point someone at some good resources I'd
>>> start wtih:
>>> 
>>> - Box and Jenkins book
>>> - Dr Keogh's line of research on time series pattern matching
>>> 
>>> And then beyond that it begins to become "what are you specifically
>>> looking for?". R is typically the "go to" resource for a lot of time
>>> series work, but there has been some very successful work with Hadoop
>>> and large scale time series data. Below I link to a few articles where
>>> time series techniques are demonstrated with Hadoop. Specifically here
>>> is a blog article on general time series processing with  Hadoop:
>>> 
>>> http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-1/
>>> http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-2/
>>> http://www.cloudera.com/blog/2011/04/simple-moving-average-secondary-sort-and-mapreduce-part-3/
>>> 
>>> Beyond that you could take a look at how we applied these concepts to
>>> the US powergrid PMU / smartgrid data back in 2009:
>>> 
>>> http://openpdc.codeplex.com
>>> http://www.slideshare.net/jpatanooga/oscon-data-2011-lumberyard
>>> 
>>> Hope that gets you going,
>>> 
>>> Josh
>>> 
>>> 2011/12/4 myn <my...@163.com>:
>>>> does mahout contain this method?
>>>> or is there any other open soure projcet about this?
>>> 
>>> 
>>> 
>>> --
>>> Twitter: @jpatanooga
>>> Solution Architect @ Cloudera
>>> hadoop: http://www.cloudera.com
> 
> 
> 
> -- 
> Twitter: @jpatanooga
> Solution Architect @ Cloudera
> hadoop: http://www.cloudera.com

Re: Time series analysis

Posted by Josh Patterson <jo...@cloudera.com>.
We did that with the openPDC classification system where we broke up
high resolution PMU/sensor data into "blocks of time + sensor id"
buckets, with some overlap.

code at: http://openpdc.codeplex.com

The Cloudera article is just a basic example illustrating the
secondary sort mechanic, which is key for time series on hadoop (sort
for free).

The openPDC has one MR job that scans time series for fuzzy patterns
using Keogh's SAX/iSAX technique and a 1NN classifier based on a
BallTree.

Josh

On Tue, Dec 6, 2011 at 5:52 PM, Raphael Cendrillon
<ce...@gmail.com> wrote:
> If the data series is large it might be interesting to further split the job over time using overlap/add or overlap/save, or even an FFT suitably partitioned.
>
> On Dec 6, 2011, at 1:48 PM, Josh Patterson <jo...@cloudera.com> wrote:
>
>> Mahout currently does not have, afaik, much/any time series specific
>> code for it. If I were to point someone at some good resources I'd
>> start wtih:
>>
>> - Box and Jenkins book
>> - Dr Keogh's line of research on time series pattern matching
>>
>> And then beyond that it begins to become "what are you specifically
>> looking for?". R is typically the "go to" resource for a lot of time
>> series work, but there has been some very successful work with Hadoop
>> and large scale time series data. Below I link to a few articles where
>> time series techniques are demonstrated with Hadoop. Specifically here
>> is a blog article on general time series processing with  Hadoop:
>>
>> http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-1/
>> http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-2/
>> http://www.cloudera.com/blog/2011/04/simple-moving-average-secondary-sort-and-mapreduce-part-3/
>>
>> Beyond that you could take a look at how we applied these concepts to
>> the US powergrid PMU / smartgrid data back in 2009:
>>
>> http://openpdc.codeplex.com
>> http://www.slideshare.net/jpatanooga/oscon-data-2011-lumberyard
>>
>> Hope that gets you going,
>>
>> Josh
>>
>> 2011/12/4 myn <my...@163.com>:
>>> does mahout contain this method?
>>> or is there any other open soure projcet about this?
>>
>>
>>
>> --
>> Twitter: @jpatanooga
>> Solution Architect @ Cloudera
>> hadoop: http://www.cloudera.com



-- 
Twitter: @jpatanooga
Solution Architect @ Cloudera
hadoop: http://www.cloudera.com

Re:Re: Time series analysis

Posted by myn <my...@163.com>.
thank you, i had view the source code,
it only compute the last average of N  ,not consider Season factor

the code as follow
 
point_sum = 0;
    for (int x = 0; x < oWindow.size(); x++) {
     point_sum += oWindow.get(x).fValue;
    } // for
    moving_avg = point_sum / oWindow.size();
    out_val.set("Moving Average: " + moving_avg);




At 2011-12-07 06:52:39,"Raphael Cendrillon" <ce...@gmail.com> wrote:
>If the data series is large it might be interesting to further split the job over time using overlap/add or overlap/save, or even an FFT suitably partitioned. 
>
>On Dec 6, 2011, at 1:48 PM, Josh Patterson <jo...@cloudera.com> wrote:
>
>> Mahout currently does not have, afaik, much/any time series specific
>> code for it. If I were to point someone at some good resources I'd
>> start wtih:
>> 
>> - Box and Jenkins book
>> - Dr Keogh's line of research on time series pattern matching
>> 
>> And then beyond that it begins to become "what are you specifically
>> looking for?". R is typically the "go to" resource for a lot of time
>> series work, but there has been some very successful work with Hadoop
>> and large scale time series data. Below I link to a few articles where
>> time series techniques are demonstrated with Hadoop. Specifically here
>> is a blog article on general time series processing with  Hadoop:
>> 
>> http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-1/
>> http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-2/
>> http://www.cloudera.com/blog/2011/04/simple-moving-average-secondary-sort-and-mapreduce-part-3/
>> 
>> Beyond that you could take a look at how we applied these concepts to
>> the US powergrid PMU / smartgrid data back in 2009:
>> 
>> http://openpdc.codeplex.com
>> http://www.slideshare.net/jpatanooga/oscon-data-2011-lumberyard
>> 
>> Hope that gets you going,
>> 
>> Josh
>> 
>> 2011/12/4 myn <my...@163.com>:
>>> does mahout contain this method?
>>> or is there any other open soure projcet about this?
>> 
>> 
>> 
>> -- 
>> Twitter: @jpatanooga
>> Solution Architect @ Cloudera
>> hadoop: http://www.cloudera.com

Re: Time series analysis

Posted by Josh Patterson <jo...@cloudera.com>.
We did that with the openPDC classification system where we broke up
high resolution PMU/sensor data into "blocks of time + sensor id"
buckets, with some overlap.

code at: http://openpdc.codeplex.com

The Cloudera article is just a basic example illustrating the
secondary sort mechanic, which is key for time series on hadoop (sort
for free).

The openPDC has one MR job that scans time series for fuzzy patterns
using Keogh's SAX/iSAX technique and a 1NN classifier based on a
BallTree.

Josh

On Tue, Dec 6, 2011 at 5:52 PM, Raphael Cendrillon
<ce...@gmail.com> wrote:
> If the data series is large it might be interesting to further split the job over time using overlap/add or overlap/save, or even an FFT suitably partitioned.
>
> On Dec 6, 2011, at 1:48 PM, Josh Patterson <jo...@cloudera.com> wrote:
>
>> Mahout currently does not have, afaik, much/any time series specific
>> code for it. If I were to point someone at some good resources I'd
>> start wtih:
>>
>> - Box and Jenkins book
>> - Dr Keogh's line of research on time series pattern matching
>>
>> And then beyond that it begins to become "what are you specifically
>> looking for?". R is typically the "go to" resource for a lot of time
>> series work, but there has been some very successful work with Hadoop
>> and large scale time series data. Below I link to a few articles where
>> time series techniques are demonstrated with Hadoop. Specifically here
>> is a blog article on general time series processing with  Hadoop:
>>
>> http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-1/
>> http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-2/
>> http://www.cloudera.com/blog/2011/04/simple-moving-average-secondary-sort-and-mapreduce-part-3/
>>
>> Beyond that you could take a look at how we applied these concepts to
>> the US powergrid PMU / smartgrid data back in 2009:
>>
>> http://openpdc.codeplex.com
>> http://www.slideshare.net/jpatanooga/oscon-data-2011-lumberyard
>>
>> Hope that gets you going,
>>
>> Josh
>>
>> 2011/12/4 myn <my...@163.com>:
>>> does mahout contain this method?
>>> or is there any other open soure projcet about this?
>>
>>
>>
>> --
>> Twitter: @jpatanooga
>> Solution Architect @ Cloudera
>> hadoop: http://www.cloudera.com



-- 
Twitter: @jpatanooga
Solution Architect @ Cloudera
hadoop: http://www.cloudera.com

Re: Time series analysis

Posted by Raphael Cendrillon <ce...@gmail.com>.
If the data series is large it might be interesting to further split the job over time using overlap/add or overlap/save, or even an FFT suitably partitioned. 

On Dec 6, 2011, at 1:48 PM, Josh Patterson <jo...@cloudera.com> wrote:

> Mahout currently does not have, afaik, much/any time series specific
> code for it. If I were to point someone at some good resources I'd
> start wtih:
> 
> - Box and Jenkins book
> - Dr Keogh's line of research on time series pattern matching
> 
> And then beyond that it begins to become "what are you specifically
> looking for?". R is typically the "go to" resource for a lot of time
> series work, but there has been some very successful work with Hadoop
> and large scale time series data. Below I link to a few articles where
> time series techniques are demonstrated with Hadoop. Specifically here
> is a blog article on general time series processing with  Hadoop:
> 
> http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-1/
> http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-2/
> http://www.cloudera.com/blog/2011/04/simple-moving-average-secondary-sort-and-mapreduce-part-3/
> 
> Beyond that you could take a look at how we applied these concepts to
> the US powergrid PMU / smartgrid data back in 2009:
> 
> http://openpdc.codeplex.com
> http://www.slideshare.net/jpatanooga/oscon-data-2011-lumberyard
> 
> Hope that gets you going,
> 
> Josh
> 
> 2011/12/4 myn <my...@163.com>:
>> does mahout contain this method?
>> or is there any other open soure projcet about this?
> 
> 
> 
> -- 
> Twitter: @jpatanooga
> Solution Architect @ Cloudera
> hadoop: http://www.cloudera.com

Re: Time series analysis

Posted by Raphael Cendrillon <ce...@gmail.com>.
If the data series is large it might be interesting to further split the job over time using overlap/add or overlap/save, or even an FFT suitably partitioned. 

On Dec 6, 2011, at 1:48 PM, Josh Patterson <jo...@cloudera.com> wrote:

> Mahout currently does not have, afaik, much/any time series specific
> code for it. If I were to point someone at some good resources I'd
> start wtih:
> 
> - Box and Jenkins book
> - Dr Keogh's line of research on time series pattern matching
> 
> And then beyond that it begins to become "what are you specifically
> looking for?". R is typically the "go to" resource for a lot of time
> series work, but there has been some very successful work with Hadoop
> and large scale time series data. Below I link to a few articles where
> time series techniques are demonstrated with Hadoop. Specifically here
> is a blog article on general time series processing with  Hadoop:
> 
> http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-1/
> http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-2/
> http://www.cloudera.com/blog/2011/04/simple-moving-average-secondary-sort-and-mapreduce-part-3/
> 
> Beyond that you could take a look at how we applied these concepts to
> the US powergrid PMU / smartgrid data back in 2009:
> 
> http://openpdc.codeplex.com
> http://www.slideshare.net/jpatanooga/oscon-data-2011-lumberyard
> 
> Hope that gets you going,
> 
> Josh
> 
> 2011/12/4 myn <my...@163.com>:
>> does mahout contain this method?
>> or is there any other open soure projcet about this?
> 
> 
> 
> -- 
> Twitter: @jpatanooga
> Solution Architect @ Cloudera
> hadoop: http://www.cloudera.com

Re: Time series analysis

Posted by Josh Patterson <jo...@cloudera.com>.
Mahout currently does not have, afaik, much/any time series specific
code for it. If I were to point someone at some good resources I'd
start wtih:

- Box and Jenkins book
- Dr Keogh's line of research on time series pattern matching

And then beyond that it begins to become "what are you specifically
looking for?". R is typically the "go to" resource for a lot of time
series work, but there has been some very successful work with Hadoop
and large scale time series data. Below I link to a few articles where
time series techniques are demonstrated with Hadoop. Specifically here
is a blog article on general time series processing with  Hadoop:

http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-1/
http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-2/
http://www.cloudera.com/blog/2011/04/simple-moving-average-secondary-sort-and-mapreduce-part-3/

Beyond that you could take a look at how we applied these concepts to
the US powergrid PMU / smartgrid data back in 2009:

http://openpdc.codeplex.com
http://www.slideshare.net/jpatanooga/oscon-data-2011-lumberyard

Hope that gets you going,

Josh

2011/12/4 myn <my...@163.com>:
> does mahout contain this method?
> or is there any other open soure projcet about this?



-- 
Twitter: @jpatanooga
Solution Architect @ Cloudera
hadoop: http://www.cloudera.com

Re: Time series analysis

Posted by Josh Patterson <jo...@cloudera.com>.
Mahout currently does not have, afaik, much/any time series specific
code for it. If I were to point someone at some good resources I'd
start wtih:

- Box and Jenkins book
- Dr Keogh's line of research on time series pattern matching

And then beyond that it begins to become "what are you specifically
looking for?". R is typically the "go to" resource for a lot of time
series work, but there has been some very successful work with Hadoop
and large scale time series data. Below I link to a few articles where
time series techniques are demonstrated with Hadoop. Specifically here
is a blog article on general time series processing with  Hadoop:

http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-1/
http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-2/
http://www.cloudera.com/blog/2011/04/simple-moving-average-secondary-sort-and-mapreduce-part-3/

Beyond that you could take a look at how we applied these concepts to
the US powergrid PMU / smartgrid data back in 2009:

http://openpdc.codeplex.com
http://www.slideshare.net/jpatanooga/oscon-data-2011-lumberyard

Hope that gets you going,

Josh

2011/12/4 myn <my...@163.com>:
> does mahout contain this method?
> or is there any other open soure projcet about this?



-- 
Twitter: @jpatanooga
Solution Architect @ Cloudera
hadoop: http://www.cloudera.com

Re: Time series analysis

Posted by Manuel Blechschmidt <Ma...@gmx.de>.
Hi myn,
as far as I know there are not a lot of time series analyzing algorithms implemented in mahout.

To be concrete:

There are no ARIMA algorithms for forecasting:
http://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average
(you could use R: http://cran.r-project.org/web/packages/forecast/index.html)

There is no fast fourier transformation:
http://en.wikipedia.org/wiki/Fast_Fourier_transform
(you could use R too: http://ugrad.stat.ubc.ca/R/library/stats/html/fft.html)
(you could use Apache math commons: https://issues.apache.org/jira/browse/MATH-216)

Actually there are hidden markov models:
https://issues.apache.org/jira/browse/MAHOUT-396

On 05.12.2011, at 02:47, myn wrote:

> does mahout contain this method?
> or is there any other open soure projcet about this?

You are invited to port these algorithms to the Mahout platform especially in a distributed manner. Keep in mind that paralyzing time series algorithms is still on going research:
http://www.r-bloggers.com/functional-and-parallel-time-series-cross-validation/
http://www.fftw.org/parallel/parallel-fftw.html

/Manuel

-- 
Manuel Blechschmidt
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B