You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Dan Brickley <da...@danbri.org> on 2011/03/26 19:59:17 UTC

Any visualization scripts for graphing DataModel stats?

Hi

Cutting across from M.I.A. forum -
http://www.manning-sandbox.com/thread.jspa?threadID=42476&tstart=0

I've loaded a pile of ratings into Mahout and started tweaking a dozen or so
flavours of Recommender with different components, settings. This is great,
I'm getting somewhere and Mahout works.

However this is a new dataset for me and I've not yet got a good feel for
"what's in there". Since Mahout's datamodel CSV format is a simple and
regular, I suspect various other folk on this list already have utilities
that consume it, and -being lazy- I thought I'd ask before blundering in and
making my own. The kinds of question I have in mind are fairly pedestrian
for now --- what the spread of rating values look like, how many of the
items have, say, 5 or more ratings; how many are super-popular and so on.

I started toying with [learning] R for this, but before digging further --
am I retreating known ground? Are there any scripts shared already? (I
didn't manage to find much by searching). Does it make sense to have shared
utilities for poking around inside a FileDataModel?

Thanks for suggestions, pointers etc

cheers,

Dan

ps. started learning R ->

> ratings <- read.csv('2010ratingtests-datamodel.csv', sep=',')
> names(ratings) <-c("userid","itemid","pref")
> summary(ratings$pref)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  1.000   7.000   8.000   8.022  10.000  10.000
> library(lattice)
> histogram(ratings$pref)

Re: Any visualization scripts for graphing DataModel stats?

Posted by Brian Clsrk <br...@btinternet.com>.
On 28/03/2011 06:51, Jeremy Lewi wrote:
> Another option is Python+MatPlotlib+Numpy. For matlab users, Matplotlib
> provides equivalent plotting routines with nearly identical syntax.
>
> One of the reasons I spent time looking into JPype+Python+Mahout was so
> that I could visualize/inspect the output generated by mahout (e.g
> Vectors stored in sequence files) without having to convert to an
> intermediary format such as csv.
>
> J
> On Sun, 2011-03-27 at 21:37 -0700, Dmitriy Lyubimov wrote:
>> R is good.
>>
>> RapidMiner has tons of visualizations and presumably might be less of
>> a curve than R but it would work modest datasets or subsamples.
>>
>> On Sat, Mar 26, 2011 at 11:59 AM, Dan Brickley<da...@danbri.org>  wrote:
>>> Hi
>>>
>>> Cutting across from M.I.A. forum -
>>> http://www.manning-sandbox.com/thread.jspa?threadID=42476&tstart=0
>>>
>>> I've loaded a pile of ratings into Mahout and started tweaking a dozen or so
>>> flavours of Recommender with different components, settings. This is great,
>>> I'm getting somewhere and Mahout works.
>>>
>>> However this is a new dataset for me and I've not yet got a good feel for
>>> "what's in there". Since Mahout's datamodel CSV format is a simple and
>>> regular, I suspect various other folk on this list already have utilities
>>> that consume it, and -being lazy- I thought I'd ask before blundering in and
>>> making my own. The kinds of question I have in mind are fairly pedestrian
>>> for now --- what the spread of rating values look like, how many of the
>>> items have, say, 5 or more ratings; how many are super-popular and so on.
>>>
>>> I started toying with [learning] R for this, but before digging further --
>>> am I retreating known ground? Are there any scripts shared already? (I
>>> didn't manage to find much by searching). Does it make sense to have shared
>>> utilities for poking around inside a FileDataModel?
>>>
>>> Thanks for suggestions, pointers etc
>>>
>>> cheers,
>>>
>>> Dan
>>>
>>> ps. started learning R ->
>>>
>>>> ratings<- read.csv('2010ratingtests-datamodel.csv', sep=',')
>>>> names(ratings)<-c("userid","itemid","pref")
>>>> summary(ratings$pref)
>>>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>>>   1.000   7.000   8.000   8.022  10.000  10.000
>>>> library(lattice)
>>>> histogram(ratings$pref)
>
>
For those of us who still like a good book here are a couple of 
suggestions  for R (even if it's just for under the covers with a torch 
at night).

As was said, there are many books on R out there.  I've looked at most 
(OK, own - I should get out more) and my favourite introduction is

A First Course in Statistical Programming with R  by Braun and Murdoch

It's clearly written with lots of examples and packs a lot into 160 odd 
pages.

On a grander scale there is

The R Book, by Michael Crawley

At 942 pages this is billed as a comprehensive reference manual for R.  
Well-written with lots of examples.  Not cheap but I got a lot of use 
from it when I was starting out.

Regards,

Brian

Re: Any visualization scripts for graphing DataModel stats?

Posted by Jeremy Lewi <je...@lewi.us>.
Another option is Python+MatPlotlib+Numpy. For matlab users, Matplotlib
provides equivalent plotting routines with nearly identical syntax.

One of the reasons I spent time looking into JPype+Python+Mahout was so
that I could visualize/inspect the output generated by mahout (e.g
Vectors stored in sequence files) without having to convert to an
intermediary format such as csv.

J
On Sun, 2011-03-27 at 21:37 -0700, Dmitriy Lyubimov wrote:
> R is good.
> 
> RapidMiner has tons of visualizations and presumably might be less of
> a curve than R but it would work modest datasets or subsamples.
> 
> On Sat, Mar 26, 2011 at 11:59 AM, Dan Brickley <da...@danbri.org> wrote:
> > Hi
> >
> > Cutting across from M.I.A. forum -
> > http://www.manning-sandbox.com/thread.jspa?threadID=42476&tstart=0
> >
> > I've loaded a pile of ratings into Mahout and started tweaking a dozen or so
> > flavours of Recommender with different components, settings. This is great,
> > I'm getting somewhere and Mahout works.
> >
> > However this is a new dataset for me and I've not yet got a good feel for
> > "what's in there". Since Mahout's datamodel CSV format is a simple and
> > regular, I suspect various other folk on this list already have utilities
> > that consume it, and -being lazy- I thought I'd ask before blundering in and
> > making my own. The kinds of question I have in mind are fairly pedestrian
> > for now --- what the spread of rating values look like, how many of the
> > items have, say, 5 or more ratings; how many are super-popular and so on.
> >
> > I started toying with [learning] R for this, but before digging further --
> > am I retreating known ground? Are there any scripts shared already? (I
> > didn't manage to find much by searching). Does it make sense to have shared
> > utilities for poking around inside a FileDataModel?
> >
> > Thanks for suggestions, pointers etc
> >
> > cheers,
> >
> > Dan
> >
> > ps. started learning R ->
> >
> >> ratings <- read.csv('2010ratingtests-datamodel.csv', sep=',')
> >> names(ratings) <-c("userid","itemid","pref")
> >> summary(ratings$pref)
> >   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
> >  1.000   7.000   8.000   8.022  10.000  10.000
> >> library(lattice)
> >> histogram(ratings$pref)
> >


Re: Any visualization scripts for graphing DataModel stats?

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
R is good.

RapidMiner has tons of visualizations and presumably might be less of
a curve than R but it would work modest datasets or subsamples.

On Sat, Mar 26, 2011 at 11:59 AM, Dan Brickley <da...@danbri.org> wrote:
> Hi
>
> Cutting across from M.I.A. forum -
> http://www.manning-sandbox.com/thread.jspa?threadID=42476&tstart=0
>
> I've loaded a pile of ratings into Mahout and started tweaking a dozen or so
> flavours of Recommender with different components, settings. This is great,
> I'm getting somewhere and Mahout works.
>
> However this is a new dataset for me and I've not yet got a good feel for
> "what's in there". Since Mahout's datamodel CSV format is a simple and
> regular, I suspect various other folk on this list already have utilities
> that consume it, and -being lazy- I thought I'd ask before blundering in and
> making my own. The kinds of question I have in mind are fairly pedestrian
> for now --- what the spread of rating values look like, how many of the
> items have, say, 5 or more ratings; how many are super-popular and so on.
>
> I started toying with [learning] R for this, but before digging further --
> am I retreating known ground? Are there any scripts shared already? (I
> didn't manage to find much by searching). Does it make sense to have shared
> utilities for poking around inside a FileDataModel?
>
> Thanks for suggestions, pointers etc
>
> cheers,
>
> Dan
>
> ps. started learning R ->
>
>> ratings <- read.csv('2010ratingtests-datamodel.csv', sep=',')
>> names(ratings) <-c("userid","itemid","pref")
>> summary(ratings$pref)
>   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>  1.000   7.000   8.000   8.022  10.000  10.000
>> library(lattice)
>> histogram(ratings$pref)
>

Re: Any visualization scripts for graphing DataModel stats?

Posted by Lance Norskog <go...@gmail.com>.
There's an R plug-in for KNime. You write 50 lines of metadata and
three lines of R, and you get a visual programming block to move
around.

As to GPUs: the core problem with co-processors, from the beginning of
computer time (c.f. Konrad Zuse) has been getting the data in and out
of the co-processor memory. Only algorithms that do a lot of math work
out, or algorithms like SVM where you park a model on the GPU and
consult it. Also they're memory-limited. Last I heard the NVidia chips
are limited to 4G.

These chips are designed to put a huge pile of 3D model on a chip, and
push incremental updates. For that use the bandwidth problem works
out.

On Mon, Mar 28, 2011 at 9:40 AM, Ted Dunning <te...@gmail.com> wrote:
> Dan,
>
> Great links.  Especially processing.js and the R graphics book.
>
> On Mon, Mar 28, 2011 at 9:24 AM, Dan Brickley <da...@danbri.org> wrote:
>
>>
>>
>> On 28 March 2011 17:12, Ted Dunning <te...@gmail.com> wrote:
>>
>>> The standard tool in the visualization community (besides R) is
>>> processing.
>>>
>>> I should have mentioned it before.  I haven't used it, but the gallery and
>>> simplicity seem to make for an easy on-ramp.
>>>
>>> http://processing.org/
>>
>>
>> As a coding language, Processing is essentially Java repackaged so as not
>> to scare off arty people. Plus a bunch of nice APIs of course.
>>
>> Rather impressively, this has been ported to run via Javascript in the
>> browser: http://processingjs.org/  ...and there are already efforts at
>> hooking that version up with browser-based OpenGL aka WebGL.
>> http://asalga.wordpress.com/2010/05/24/webgl-browser-stress-tests-using-processing-js/
>>
>> Random aside: see http://learningwebgl.com/blog/?p=1828 for experiments in
>> offloading matrix processing to the graphics card, and
>> http://learningwebgl.com/blog/?p=3396 for more serious news in this
>> direction; according to
>> http://www.khronos.org/news/press/releases/khronos-releases-final-webgl-1.0-specificationthere will be a WebCL Javascript API to expose OpenCL functionality: "Khronos
>> is also today announcing the formation of the WebCL™ working group to
>> explore defining a JavaScript binding to the Khronos OpenCL™ standard for
>> heterogeneous parallel computing.  WebCL creates the potential to harness
>> GPU and multi-core CPU parallel processing from a Web browser, enabling
>> significant acceleration of applications such as image and video processing
>> and advanced physics for WebGL games. ".
>>
>> Other js dataviz: http://www.jeromecukier.net/?p=623 (protovis howto),
>> http://thejit.org/ http://www.highcharts.com/ ... and, aw... there's new
>> stuff every day. Oh and while I'm closing browser tabs, here's a book on R
>> and graphics:
>> http://www.stat.auckland.ac.nz/~paul/RGraphics/rgraphics.html
>>
>> Anyhow back to my original question. There seems some consensus that R is
>> amongst the most natural tools to use as  utility alongside Mahout. So my
>> little plea was to see whether anyone has some basic general R scripts to
>> contrib to the Mahout community for poking around Mahout Taste datamodel
>> files. Not rocket science, but a pity for everyone to reinvent those same
>> wheels...
>>
>> cheers,
>>
>> Dan
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Any visualization scripts for graphing DataModel stats?

Posted by Ted Dunning <te...@gmail.com>.
Dan,

Great links.  Especially processing.js and the R graphics book.

On Mon, Mar 28, 2011 at 9:24 AM, Dan Brickley <da...@danbri.org> wrote:

>
>
> On 28 March 2011 17:12, Ted Dunning <te...@gmail.com> wrote:
>
>> The standard tool in the visualization community (besides R) is
>> processing.
>>
>> I should have mentioned it before.  I haven't used it, but the gallery and
>> simplicity seem to make for an easy on-ramp.
>>
>> http://processing.org/
>
>
> As a coding language, Processing is essentially Java repackaged so as not
> to scare off arty people. Plus a bunch of nice APIs of course.
>
> Rather impressively, this has been ported to run via Javascript in the
> browser: http://processingjs.org/  ...and there are already efforts at
> hooking that version up with browser-based OpenGL aka WebGL.
> http://asalga.wordpress.com/2010/05/24/webgl-browser-stress-tests-using-processing-js/
>
> Random aside: see http://learningwebgl.com/blog/?p=1828 for experiments in
> offloading matrix processing to the graphics card, and
> http://learningwebgl.com/blog/?p=3396 for more serious news in this
> direction; according to
> http://www.khronos.org/news/press/releases/khronos-releases-final-webgl-1.0-specificationthere will be a WebCL Javascript API to expose OpenCL functionality: "Khronos
> is also today announcing the formation of the WebCL™ working group to
> explore defining a JavaScript binding to the Khronos OpenCL™ standard for
> heterogeneous parallel computing.  WebCL creates the potential to harness
> GPU and multi-core CPU parallel processing from a Web browser, enabling
> significant acceleration of applications such as image and video processing
> and advanced physics for WebGL games. ".
>
> Other js dataviz: http://www.jeromecukier.net/?p=623 (protovis howto),
> http://thejit.org/ http://www.highcharts.com/ ... and, aw... there's new
> stuff every day. Oh and while I'm closing browser tabs, here's a book on R
> and graphics:
> http://www.stat.auckland.ac.nz/~paul/RGraphics/rgraphics.html
>
> Anyhow back to my original question. There seems some consensus that R is
> amongst the most natural tools to use as  utility alongside Mahout. So my
> little plea was to see whether anyone has some basic general R scripts to
> contrib to the Mahout community for poking around Mahout Taste datamodel
> files. Not rocket science, but a pity for everyone to reinvent those same
> wheels...
>
> cheers,
>
> Dan
>

Re: Any visualization scripts for graphing DataModel stats?

Posted by Dan Brickley <da...@danbri.org>.
On 28 March 2011 17:12, Ted Dunning <te...@gmail.com> wrote:

> The standard tool in the visualization community (besides R) is processing.
>
> I should have mentioned it before.  I haven't used it, but the gallery and
> simplicity seem to make for an easy on-ramp.
>
> http://processing.org/


As a coding language, Processing is essentially Java repackaged so as not to
scare off arty people. Plus a bunch of nice APIs of course.

Rather impressively, this has been ported to run via Javascript in the
browser: http://processingjs.org/  ...and there are already efforts at
hooking that version up with browser-based OpenGL aka WebGL.
http://asalga.wordpress.com/2010/05/24/webgl-browser-stress-tests-using-processing-js/

Random aside: see http://learningwebgl.com/blog/?p=1828 for experiments in
offloading matrix processing to the graphics card, and
http://learningwebgl.com/blog/?p=3396 for more serious news in this
direction; according to
http://www.khronos.org/news/press/releases/khronos-releases-final-webgl-1.0-specificationthere
will be a WebCL Javascript API to expose OpenCL functionality:
"Khronos
is also today announcing the formation of the WebCL™ working group to
explore defining a JavaScript binding to the Khronos OpenCL™ standard for
heterogeneous parallel computing.  WebCL creates the potential to harness
GPU and multi-core CPU parallel processing from a Web browser, enabling
significant acceleration of applications such as image and video processing
and advanced physics for WebGL games. ".

Other js dataviz: http://www.jeromecukier.net/?p=623 (protovis howto),
http://thejit.org/ http://www.highcharts.com/ ... and, aw... there's new
stuff every day. Oh and while I'm closing browser tabs, here's a book on R
and graphics: http://www.stat.auckland.ac.nz/~paul/RGraphics/rgraphics.html

Anyhow back to my original question. There seems some consensus that R is
amongst the most natural tools to use as  utility alongside Mahout. So my
little plea was to see whether anyone has some basic general R scripts to
contrib to the Mahout community for poking around Mahout Taste datamodel
files. Not rocket science, but a pity for everyone to reinvent those same
wheels...

cheers,

Dan

Re: Any visualization scripts for graphing DataModel stats?

Posted by Ted Dunning <te...@gmail.com>.
The standard tool in the visualization community (besides R) is processing.

I should have mentioned it before.  I haven't used it, but the gallery and
simplicity seem to make for an easy on-ramp.

http://processing.org/

On Sun, Mar 27, 2011 at 11:58 PM, Dawid Weiss
<da...@cs.put.poznan.pl>wrote:

> Oh, if you're a hard-core visualization freak and are looking for the
> corresponding assembly-level visualization language, I recommend
> trying coding directly in postscript. It will be super  unproductive,
> but boy... what retro-style-fun :)
>

Re: Any visualization scripts for graphing DataModel stats?

Posted by Dawid Weiss <da...@cs.put.poznan.pl>.
This comment made me laugh so hard (because I know the pain :)

It's true: gnuplot is very old-fashioned, the default documentation
sucks and the programming
interface is, hmm... let's just say it's very specific. I still use it
from time to time, especially for large data sets -- if you decide to
use gnuplot, I recommend Phillips Janert's "Gnuplot in action" -- it
gives you ready to use recipes and puts some structure to using
gnuplot in a way the default documentation does not. Note: I am biased
because I was a reviewer of this book. Note 2: R and js-based
libraries may be a better choice for more modern look and feel :)

Oh, if you're a hard-core visualization freak and are looking for the
corresponding assembly-level visualization language, I recommend
trying coding directly in postscript. It will be super  unproductive,
but boy... what retro-style-fun :)

Dawid

On Mon, Mar 28, 2011 at 7:42 AM, Ted Dunning <te...@gmail.com> wrote:
> Gnuplot is sooo 20th century.
>
> Try modern stuff instead:
> http://www.1stwebdesigner.com/css/top-jquery-chart-libraries-interactive-charts/if
> you need web accessible charts.
>
> If you need interactive charting, use R.
>
> On Sun, Mar 27, 2011 at 8:54 PM, Lance Norskog <go...@gmail.com> wrote:
>
>> Things I haven't used
>> gnuplot
>> http://zimg.sourceforge.net/
>>
>> If you want to script image generation and park them in a web page,
>> these two will probably get you where you want to go.
>>
>> On Sat, Mar 26, 2011 at 3:48 PM, Lance Norskog <go...@gmail.com> wrote:
>> > If you are coming at this with no data analysis background, there is
>> > nothing easier than KNime:
>> > www.knime.org
>> >
>> > It is visual programming tool for stringing together numerical
>> > processing tools. KNime.com sells chemistry and various bio analysis
>> > tools for it. It also has full Weka integration. It has the same
>> > problem as all visual programming environments: screen eating. But it
>> > does have a sub-graph templating tool that somewhat mitigates this.
>> >
>> > Processing is parallel. I set up three different benchmark reporting
>> > servlets on the same running Solr instance, read all three
>> > simulaneously, and cross-correlated the numbers. This took a couple of
>> > hours on the servlets and maybe 5 minutes in KNime.
>> >
>> > A testimonial, dear friends.
>> >
>> > On Sat, Mar 26, 2011 at 1:20 PM, Ted Dunning <te...@gmail.com>
>> wrote:
>> >> I use R exclusively for this kind of task.  The only commercial shop
>> using
>> >> Mahout in anger that I have been strongly connect with also uses R for
>> this.
>> >>
>> >>
>> >> Wheels.
>> >>
>> >> Reinvention.
>> >>
>> >> And so on.
>> >>
>> >> On Sat, Mar 26, 2011 at 11:59 AM, Dan Brickley <da...@danbri.org>
>> wrote:
>> >>
>> >>> I started toying with [learning] R for this, but before digging further
>> --
>> >>> am I retreating known ground? Are there any scripts shared already? (I
>> >>> didn't manage to find much by searching). Does it make sense to have
>> shared
>> >>> utilities for poking around inside a FileDataModel?
>> >>>
>> >>> Thanks for suggestions, pointers etc
>> >>>
>> >>> cheers,
>> >>>
>> >>> Dan
>> >>>
>> >>> ps. started learning R ->
>> >>>
>> >>> > ratings <- read.csv('2010ratingtests-datamodel.csv', sep=',')
>> >>> > names(ratings) <-c("userid","itemid","pref")
>> >>> > summary(ratings$pref)
>> >>>   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>> >>>  1.000   7.000   8.000   8.022  10.000  10.000
>> >>> > library(lattice)
>> >>> > histogram(ratings$pref)
>> >>>
>> >>
>> >
>> >
>> >
>> > --
>> > Lance Norskog
>> > goksron@gmail.com
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>>
>

Re: Any visualization scripts for graphing DataModel stats?

Posted by Ted Dunning <te...@gmail.com>.
Gnuplot is sooo 20th century.

Try modern stuff instead:
http://www.1stwebdesigner.com/css/top-jquery-chart-libraries-interactive-charts/if
you need web accessible charts.

If you need interactive charting, use R.

On Sun, Mar 27, 2011 at 8:54 PM, Lance Norskog <go...@gmail.com> wrote:

> Things I haven't used
> gnuplot
> http://zimg.sourceforge.net/
>
> If you want to script image generation and park them in a web page,
> these two will probably get you where you want to go.
>
> On Sat, Mar 26, 2011 at 3:48 PM, Lance Norskog <go...@gmail.com> wrote:
> > If you are coming at this with no data analysis background, there is
> > nothing easier than KNime:
> > www.knime.org
> >
> > It is visual programming tool for stringing together numerical
> > processing tools. KNime.com sells chemistry and various bio analysis
> > tools for it. It also has full Weka integration. It has the same
> > problem as all visual programming environments: screen eating. But it
> > does have a sub-graph templating tool that somewhat mitigates this.
> >
> > Processing is parallel. I set up three different benchmark reporting
> > servlets on the same running Solr instance, read all three
> > simulaneously, and cross-correlated the numbers. This took a couple of
> > hours on the servlets and maybe 5 minutes in KNime.
> >
> > A testimonial, dear friends.
> >
> > On Sat, Mar 26, 2011 at 1:20 PM, Ted Dunning <te...@gmail.com>
> wrote:
> >> I use R exclusively for this kind of task.  The only commercial shop
> using
> >> Mahout in anger that I have been strongly connect with also uses R for
> this.
> >>
> >>
> >> Wheels.
> >>
> >> Reinvention.
> >>
> >> And so on.
> >>
> >> On Sat, Mar 26, 2011 at 11:59 AM, Dan Brickley <da...@danbri.org>
> wrote:
> >>
> >>> I started toying with [learning] R for this, but before digging further
> --
> >>> am I retreating known ground? Are there any scripts shared already? (I
> >>> didn't manage to find much by searching). Does it make sense to have
> shared
> >>> utilities for poking around inside a FileDataModel?
> >>>
> >>> Thanks for suggestions, pointers etc
> >>>
> >>> cheers,
> >>>
> >>> Dan
> >>>
> >>> ps. started learning R ->
> >>>
> >>> > ratings <- read.csv('2010ratingtests-datamodel.csv', sep=',')
> >>> > names(ratings) <-c("userid","itemid","pref")
> >>> > summary(ratings$pref)
> >>>   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
> >>>  1.000   7.000   8.000   8.022  10.000  10.000
> >>> > library(lattice)
> >>> > histogram(ratings$pref)
> >>>
> >>
> >
> >
> >
> > --
> > Lance Norskog
> > goksron@gmail.com
> >
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>

Re: Any visualization scripts for graphing DataModel stats?

Posted by Lance Norskog <go...@gmail.com>.
Things I haven't used
gnuplot
http://zimg.sourceforge.net/

If you want to script image generation and park them in a web page,
these two will probably get you where you want to go.

On Sat, Mar 26, 2011 at 3:48 PM, Lance Norskog <go...@gmail.com> wrote:
> If you are coming at this with no data analysis background, there is
> nothing easier than KNime:
> www.knime.org
>
> It is visual programming tool for stringing together numerical
> processing tools. KNime.com sells chemistry and various bio analysis
> tools for it. It also has full Weka integration. It has the same
> problem as all visual programming environments: screen eating. But it
> does have a sub-graph templating tool that somewhat mitigates this.
>
> Processing is parallel. I set up three different benchmark reporting
> servlets on the same running Solr instance, read all three
> simulaneously, and cross-correlated the numbers. This took a couple of
> hours on the servlets and maybe 5 minutes in KNime.
>
> A testimonial, dear friends.
>
> On Sat, Mar 26, 2011 at 1:20 PM, Ted Dunning <te...@gmail.com> wrote:
>> I use R exclusively for this kind of task.  The only commercial shop using
>> Mahout in anger that I have been strongly connect with also uses R for this.
>>
>>
>> Wheels.
>>
>> Reinvention.
>>
>> And so on.
>>
>> On Sat, Mar 26, 2011 at 11:59 AM, Dan Brickley <da...@danbri.org> wrote:
>>
>>> I started toying with [learning] R for this, but before digging further --
>>> am I retreating known ground? Are there any scripts shared already? (I
>>> didn't manage to find much by searching). Does it make sense to have shared
>>> utilities for poking around inside a FileDataModel?
>>>
>>> Thanks for suggestions, pointers etc
>>>
>>> cheers,
>>>
>>> Dan
>>>
>>> ps. started learning R ->
>>>
>>> > ratings <- read.csv('2010ratingtests-datamodel.csv', sep=',')
>>> > names(ratings) <-c("userid","itemid","pref")
>>> > summary(ratings$pref)
>>>   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>>>  1.000   7.000   8.000   8.022  10.000  10.000
>>> > library(lattice)
>>> > histogram(ratings$pref)
>>>
>>
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>



-- 
Lance Norskog
goksron@gmail.com

Re: Any visualization scripts for graphing DataModel stats?

Posted by Lance Norskog <go...@gmail.com>.
If you are coming at this with no data analysis background, there is
nothing easier than KNime:
www.knime.org

It is visual programming tool for stringing together numerical
processing tools. KNime.com sells chemistry and various bio analysis
tools for it. It also has full Weka integration. It has the same
problem as all visual programming environments: screen eating. But it
does have a sub-graph templating tool that somewhat mitigates this.

Processing is parallel. I set up three different benchmark reporting
servlets on the same running Solr instance, read all three
simulaneously, and cross-correlated the numbers. This took a couple of
hours on the servlets and maybe 5 minutes in KNime.

A testimonial, dear friends.

On Sat, Mar 26, 2011 at 1:20 PM, Ted Dunning <te...@gmail.com> wrote:
> I use R exclusively for this kind of task.  The only commercial shop using
> Mahout in anger that I have been strongly connect with also uses R for this.
>
>
> Wheels.
>
> Reinvention.
>
> And so on.
>
> On Sat, Mar 26, 2011 at 11:59 AM, Dan Brickley <da...@danbri.org> wrote:
>
>> I started toying with [learning] R for this, but before digging further --
>> am I retreating known ground? Are there any scripts shared already? (I
>> didn't manage to find much by searching). Does it make sense to have shared
>> utilities for poking around inside a FileDataModel?
>>
>> Thanks for suggestions, pointers etc
>>
>> cheers,
>>
>> Dan
>>
>> ps. started learning R ->
>>
>> > ratings <- read.csv('2010ratingtests-datamodel.csv', sep=',')
>> > names(ratings) <-c("userid","itemid","pref")
>> > summary(ratings$pref)
>>   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>>  1.000   7.000   8.000   8.022  10.000  10.000
>> > library(lattice)
>> > histogram(ratings$pref)
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Any visualization scripts for graphing DataModel stats?

Posted by Ted Dunning <te...@gmail.com>.
I use R exclusively for this kind of task.  The only commercial shop using
Mahout in anger that I have been strongly connect with also uses R for this.


Wheels.

Reinvention.

And so on.

On Sat, Mar 26, 2011 at 11:59 AM, Dan Brickley <da...@danbri.org> wrote:

> I started toying with [learning] R for this, but before digging further --
> am I retreating known ground? Are there any scripts shared already? (I
> didn't manage to find much by searching). Does it make sense to have shared
> utilities for poking around inside a FileDataModel?
>
> Thanks for suggestions, pointers etc
>
> cheers,
>
> Dan
>
> ps. started learning R ->
>
> > ratings <- read.csv('2010ratingtests-datamodel.csv', sep=',')
> > names(ratings) <-c("userid","itemid","pref")
> > summary(ratings$pref)
>   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>  1.000   7.000   8.000   8.022  10.000  10.000
> > library(lattice)
> > histogram(ratings$pref)
>