You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by Roger Smith <ro...@gmail.com> on 2011/03/05 06:05:19 UTC

Digital Signal Processing Library + Hadoop

All -
I wonder if any of you have integrated a DSP library with Hadoop.
We are considering using Hadoop to processing time series data, but don't
want to write standard DSP functions.

Roger.

Re: Digital Signal Processing Library + Hadoop

Posted by Josh Patterson <jo...@cloudera.com>.
Roger,
A basic time series construct is the "sliding" window in conjunction
with sorted time/value data; A sample implementation is at my github:

https://github.com/jpatanooga/Caduceus/tree/master/src/tv/floe/caduceus/hadoop/movingaverage

There are two jobs in there, one that uses the shuffle and one that
does not --- to illustrate the difference. I have a blog draft coming
that accompanies this code, I'll follow up and send you a copy draft
of it.

>From that code you should be able to build out a more complex time
series / DSP process (using it as base code), something along the
lines of a 1NN classifier:

https://openpdc.svn.codeplex.com/svn/Hadoop/Current%20Version/
https://openpdc.svn.codeplex.com/svn/Hadoop/Current%20Version/docs/openPDC%20Datamining%20Tools%20Guide.pdf
https://openpdc.svn.codeplex.com/svn/Hadoop/Current%20Version/src/TVA/Hadoop/MapReduce/Datamining/SAX/SlidingTSClassifier_kNN.java

I'm in the process of updating that older openPDC code to be more
modern and modular for general data sources.

Josh




On Sat, Mar 5, 2011 at 12:05 AM, Roger Smith <ro...@gmail.com> wrote:
> All -
> I wonder if any of you have integrated a DSP library with Hadoop.
> We are considering using Hadoop to processing time series data, but don't
> want to write standard DSP functions.
>
> Roger.
>



-- 
Twitter: @jpatanooga
Solution Architect @ Cloudera
hadoop: http://www.cloudera.com
blog: http://jpatterson.floe.tv

Re: Digital Signal Processing Library + Hadoop

Posted by Ted Dunning <td...@maprtech.com>.
Come on over to the Apache Mahout mailing list for a warm welcome at least.

We don't have a lot of time series stuff but would be very interested in
hearing more about what you need and would like to see if there are some
common issues that we might work on together.

On Fri, Mar 4, 2011 at 9:05 PM, Roger Smith <ro...@gmail.com>wrote:

> All -
> I wonder if any of you have integrated a DSP library with Hadoop.
> We are considering using Hadoop to processing time series data, but don't
> want to write standard DSP functions.
>
> Roger.
>

Re: Digital Signal Processing Library + Hadoop

Posted by Ted Dunning <td...@maprtech.com>.
Come on over to the Apache Mahout mailing list for a warm welcome at least.

We don't have a lot of time series stuff but would be very interested in
hearing more about what you need and would like to see if there are some
common issues that we might work on together.

On Fri, Mar 4, 2011 at 9:05 PM, Roger Smith <ro...@gmail.com>wrote:

> All -
> I wonder if any of you have integrated a DSP library with Hadoop.
> We are considering using Hadoop to processing time series data, but don't
> want to write standard DSP functions.
>
> Roger.
>

Re: Digital Signal Processing Library + Hadoop

Posted by Josh Patterson <jo...@cloudera.com>.
Roger,
A basic time series construct is the "sliding" window in conjunction
with sorted time/value data; A sample implementation is at my github:

https://github.com/jpatanooga/Caduceus/tree/master/src/tv/floe/caduceus/hadoop/movingaverage

There are two jobs in there, one that uses the shuffle and one that
does not --- to illustrate the difference. I have a blog draft coming
that accompanies this code, I'll follow up and send you a copy draft
of it.

>From that code you should be able to build out a more complex time
series / DSP process (using it as base code), something along the
lines of a 1NN classifier:

https://openpdc.svn.codeplex.com/svn/Hadoop/Current%20Version/
https://openpdc.svn.codeplex.com/svn/Hadoop/Current%20Version/docs/openPDC%20Datamining%20Tools%20Guide.pdf
https://openpdc.svn.codeplex.com/svn/Hadoop/Current%20Version/src/TVA/Hadoop/MapReduce/Datamining/SAX/SlidingTSClassifier_kNN.java

I'm in the process of updating that older openPDC code to be more
modern and modular for general data sources.

Josh




On Sat, Mar 5, 2011 at 12:05 AM, Roger Smith <ro...@gmail.com> wrote:
> All -
> I wonder if any of you have integrated a DSP library with Hadoop.
> We are considering using Hadoop to processing time series data, but don't
> want to write standard DSP functions.
>
> Roger.
>



-- 
Twitter: @jpatanooga
Solution Architect @ Cloudera
hadoop: http://www.cloudera.com
blog: http://jpatterson.floe.tv

Re: Digital Signal Processing Library + Hadoop

Posted by Josh Patterson <jo...@cloudera.com>.
Roger,
A basic time series construct is the "sliding" window in conjunction
with sorted time/value data; A sample implementation is at my github:

https://github.com/jpatanooga/Caduceus/tree/master/src/tv/floe/caduceus/hadoop/movingaverage

There are two jobs in there, one that uses the shuffle and one that
does not --- to illustrate the difference. I have a blog draft coming
that accompanies this code, I'll follow up and send you a copy draft
of it.

>From that code you should be able to build out a more complex time
series / DSP process (using it as base code), something along the
lines of a 1NN classifier:

https://openpdc.svn.codeplex.com/svn/Hadoop/Current%20Version/
https://openpdc.svn.codeplex.com/svn/Hadoop/Current%20Version/docs/openPDC%20Datamining%20Tools%20Guide.pdf
https://openpdc.svn.codeplex.com/svn/Hadoop/Current%20Version/src/TVA/Hadoop/MapReduce/Datamining/SAX/SlidingTSClassifier_kNN.java

I'm in the process of updating that older openPDC code to be more
modern and modular for general data sources.

Josh




On Sat, Mar 5, 2011 at 12:05 AM, Roger Smith <ro...@gmail.com> wrote:
> All -
> I wonder if any of you have integrated a DSP library with Hadoop.
> We are considering using Hadoop to processing time series data, but don't
> want to write standard DSP functions.
>
> Roger.
>



-- 
Twitter: @jpatanooga
Solution Architect @ Cloudera
hadoop: http://www.cloudera.com
blog: http://jpatterson.floe.tv

Re: Digital Signal Processing Library + Hadoop

Posted by Ted Dunning <td...@maprtech.com>.
Come on over to the Apache Mahout mailing list for a warm welcome at least.

We don't have a lot of time series stuff but would be very interested in
hearing more about what you need and would like to see if there are some
common issues that we might work on together.

On Fri, Mar 4, 2011 at 9:05 PM, Roger Smith <ro...@gmail.com>wrote:

> All -
> I wonder if any of you have integrated a DSP library with Hadoop.
> We are considering using Hadoop to processing time series data, but don't
> want to write standard DSP functions.
>
> Roger.
>