You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Grant Ingersoll <gs...@apache.org> on 2014/04/04 15:39:12 UTC

Re: Mail and IRC parsing

We've (LucidWorks) got full indexing and search of the Mahout mail archives at http://find.searchub.org.  We could probably add in IRC pretty easily if you want.

-Grant

On Mar 22, 2014, at 2:06 AM, Andrew Musselman <an...@gmail.com> wrote:

> I put up a parser for the IRC history logs here
> https://github.com/andrewmusselman/util/blob/master/irc-parser.sh
> 
> I'd like to write one for the user list too to figure out the most common
> problems/questions so we can focus effort on repairs to bugs and docs.
> 
> But the mail archives at
> https://mail-archives.apache.org/mod_mbox/mahout-user/ are dynamic, loaded
> in through JavaScript, so parsing them isn't that straightforward.
> 
> Is it possible to get the mbox files directly?

--------------------------------------------
Grant Ingersoll | @gsingers
http://www.lucidworks.com






Re: Mail and IRC parsing

Posted by Andrew Musselman <an...@gmail.com>.
Could be useful; in the meantime I found the mail files on people.apache.org and can use those.

> On Apr 4, 2014, at 8:39 AM, Grant Ingersoll <gs...@apache.org> wrote:
> 
> We've (LucidWorks) got full indexing and search of the Mahout mail archives at http://find.searchub.org.  We could probably add in IRC pretty easily if you want.
> 
> -Grant
> 
>> On Mar 22, 2014, at 2:06 AM, Andrew Musselman <an...@gmail.com> wrote:
>> 
>> I put up a parser for the IRC history logs here
>> https://github.com/andrewmusselman/util/blob/master/irc-parser.sh
>> 
>> I'd like to write one for the user list too to figure out the most common
>> problems/questions so we can focus effort on repairs to bugs and docs.
>> 
>> But the mail archives at
>> https://mail-archives.apache.org/mod_mbox/mahout-user/ are dynamic, loaded
>> in through JavaScript, so parsing them isn't that straightforward.
>> 
>> Is it possible to get the mbox files directly?
> 
> --------------------------------------------
> Grant Ingersoll | @gsingers
> http://www.lucidworks.com
> 
> 
> 
> 
>