You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Suneel Marthi <su...@gmail.com> on 2015/03/27 03:07:38 UTC

Mahout 0.10.0 Bug bash

Ok here's the bug bash as of today

Andrew Palumbo
--------------------------
M-1648: Update CMS for Mahout 0.10.0
M-1638: H2O bindings fail at drmParallelizeWithRowLabels
M-1564: Naive Bayes classifier for new Text Documents
M-1635: Exception when providing classification Labels
M-1493: Port Naive Bayes to Spark DSL
M-1559: Documentation and cleanup for Naive Bayes Example
M-1609: NullPointerException
M-1607: Spark-shell DAG scheduler

Andrew Musselman
-----------------------------
M-1655: Refactor module dependencies
M-1563: cleanup Warnings during Build
M-1470: LDA Topic dump

Dmitriy Lyubimov
--------------------------
M-1646: Refactor out all legacy MR dependencies from scala code

Frank Scholten
---------------------
M-1649: Lucene 5 upgrade

Pat Ferrel
-----------------
M-1589: mahout.cmd has duplicated content
M-1618: co-occurence recommender example

Suneel Marthi
---------------------
M-1586: Collections downloads must have hash signatures
M-1647: Release build
M-1652: Java 7 update
M-1512: Hadoop 2 compatibility
M-1469: Streaming KMeans fails when executed in MR mode and
REDUCE_STREAMING_KMEANS set to true
M-1443: Update "How to Release" page
M-1585: Javadocs not hosted by Mahout-Quality
M-1612: NPE during JSON outputformatter for clusterdump

Stevo Slavic
--------------------
M-1650: upgrade 3rd party jars
M-1602: Euclidean Distance Similarity Math
M-1278: Improve inheritance of apache parent pom

Shannon Quinn
-----------------------
M-1540: Reuters Example spectral clustering
Also online docs for Spectral clustering

Ted Dunning
-------------------
M-1636: Class dependencies for Spark module are put in job.jar, which is
inefficient

Re: Mahout 0.10.0 Bug bash

Posted by Suneel Marthi <su...@gmail.com>.
If no one volunteer's in the next one hour I would just close it and get it
out of the way :)

On Fri, Mar 27, 2015 at 11:33 AM, Suneel Marthi <su...@gmail.com>
wrote:

> Its not a blocker, I would just close it and move on until the next
> Windows guy creates a new Jira :)
>
> On Fri, Mar 27, 2015 at 11:29 AM, Pat Ferrel <pa...@occamsmachete.com>
> wrote:
>
>> Not sure what to do about the Windows mahout.cmd script. I don’t even own
>> a Window VM so there is no way I can look into this except for asking for
>> help, which I have done. What happens if no one volunteers? Is this a
>> blocker? M-1589
>>
>> I took M-1636, should be resolved. Need a final test on a cluster, which
>> I am trying today.
>>
>> Aren’t M-1655 and M-1646 the same? Dmitriy is not committing code so any
>> work must be reassigned if it needs to be done.
>>
>>
>> On Mar 26, 2015, at 7:07 PM, Suneel Marthi <su...@gmail.com>
>> wrote:
>>
>> Ok here's the bug bash as of today
>>
>> Andrew Palumbo
>> --------------------------
>> M-1648: Update CMS for Mahout 0.10.0
>> M-1638: H2O bindings fail at drmParallelizeWithRowLabels
>> M-1564: Naive Bayes classifier for new Text Documents
>> M-1635: Exception when providing classification Labels
>> M-1493: Port Naive Bayes to Spark DSL
>> M-1559: Documentation and cleanup for Naive Bayes Example
>> M-1609: NullPointerException
>> M-1607: Spark-shell DAG scheduler
>>
>> Andrew Musselman
>> -----------------------------
>> M-1655: Refactor module dependencies
>> M-1563: cleanup Warnings during Build
>> M-1470: LDA Topic dump
>>
>> Dmitriy Lyubimov
>> --------------------------
>> M-1646: Refactor out all legacy MR dependencies from scala code
>>
>> Frank Scholten
>> ---------------------
>> M-1649: Lucene 5 upgrade
>>
>> Pat Ferrel
>> -----------------
>> M-1589: mahout.cmd has duplicated content
>> M-1618: co-occurence recommender example
>>
>> Suneel Marthi
>> ---------------------
>> M-1586: Collections downloads must have hash signatures
>> M-1647: Release build
>> M-1652: Java 7 update
>> M-1512: Hadoop 2 compatibility
>> M-1469: Streaming KMeans fails when executed in MR mode and
>> REDUCE_STREAMING_KMEANS set to true
>> M-1443: Update "How to Release" page
>> M-1585: Javadocs not hosted by Mahout-Quality
>> M-1612: NPE during JSON outputformatter for clusterdump
>>
>> Stevo Slavic
>> --------------------
>> M-1650: upgrade 3rd party jars
>> M-1602: Euclidean Distance Similarity Math
>> M-1278: Improve inheritance of apache parent pom
>>
>> Shannon Quinn
>> -----------------------
>> M-1540: Reuters Example spectral clustering
>> Also online docs for Spectral clustering
>>
>> Ted Dunning
>> -------------------
>> M-1636: Class dependencies for Spark module are put in job.jar, which is
>> inefficient
>>
>>
>

Re: Mahout 0.10.0 Bug bash

Posted by Suneel Marthi <su...@gmail.com>.
Its not a blocker, I would just close it and move on until the next Windows
guy creates a new Jira :)

On Fri, Mar 27, 2015 at 11:29 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> Not sure what to do about the Windows mahout.cmd script. I don’t even own
> a Window VM so there is no way I can look into this except for asking for
> help, which I have done. What happens if no one volunteers? Is this a
> blocker? M-1589
>
> I took M-1636, should be resolved. Need a final test on a cluster, which I
> am trying today.
>
> Aren’t M-1655 and M-1646 the same? Dmitriy is not committing code so any
> work must be reassigned if it needs to be done.
>
>
> On Mar 26, 2015, at 7:07 PM, Suneel Marthi <su...@gmail.com>
> wrote:
>
> Ok here's the bug bash as of today
>
> Andrew Palumbo
> --------------------------
> M-1648: Update CMS for Mahout 0.10.0
> M-1638: H2O bindings fail at drmParallelizeWithRowLabels
> M-1564: Naive Bayes classifier for new Text Documents
> M-1635: Exception when providing classification Labels
> M-1493: Port Naive Bayes to Spark DSL
> M-1559: Documentation and cleanup for Naive Bayes Example
> M-1609: NullPointerException
> M-1607: Spark-shell DAG scheduler
>
> Andrew Musselman
> -----------------------------
> M-1655: Refactor module dependencies
> M-1563: cleanup Warnings during Build
> M-1470: LDA Topic dump
>
> Dmitriy Lyubimov
> --------------------------
> M-1646: Refactor out all legacy MR dependencies from scala code
>
> Frank Scholten
> ---------------------
> M-1649: Lucene 5 upgrade
>
> Pat Ferrel
> -----------------
> M-1589: mahout.cmd has duplicated content
> M-1618: co-occurence recommender example
>
> Suneel Marthi
> ---------------------
> M-1586: Collections downloads must have hash signatures
> M-1647: Release build
> M-1652: Java 7 update
> M-1512: Hadoop 2 compatibility
> M-1469: Streaming KMeans fails when executed in MR mode and
> REDUCE_STREAMING_KMEANS set to true
> M-1443: Update "How to Release" page
> M-1585: Javadocs not hosted by Mahout-Quality
> M-1612: NPE during JSON outputformatter for clusterdump
>
> Stevo Slavic
> --------------------
> M-1650: upgrade 3rd party jars
> M-1602: Euclidean Distance Similarity Math
> M-1278: Improve inheritance of apache parent pom
>
> Shannon Quinn
> -----------------------
> M-1540: Reuters Example spectral clustering
> Also online docs for Spectral clustering
>
> Ted Dunning
> -------------------
> M-1636: Class dependencies for Spark module are put in job.jar, which is
> inefficient
>
>

Re: Mahout 0.10.0 Bug bash

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Not sure what to do about the Windows mahout.cmd script. I don’t even own a Window VM so there is no way I can look into this except for asking for help, which I have done. What happens if no one volunteers? Is this a blocker? M-1589

I took M-1636, should be resolved. Need a final test on a cluster, which I am trying today.

Aren’t M-1655 and M-1646 the same? Dmitriy is not committing code so any work must be reassigned if it needs to be done.


On Mar 26, 2015, at 7:07 PM, Suneel Marthi <su...@gmail.com> wrote:

Ok here's the bug bash as of today

Andrew Palumbo
--------------------------
M-1648: Update CMS for Mahout 0.10.0
M-1638: H2O bindings fail at drmParallelizeWithRowLabels
M-1564: Naive Bayes classifier for new Text Documents
M-1635: Exception when providing classification Labels
M-1493: Port Naive Bayes to Spark DSL
M-1559: Documentation and cleanup for Naive Bayes Example
M-1609: NullPointerException
M-1607: Spark-shell DAG scheduler

Andrew Musselman
-----------------------------
M-1655: Refactor module dependencies
M-1563: cleanup Warnings during Build
M-1470: LDA Topic dump

Dmitriy Lyubimov
--------------------------
M-1646: Refactor out all legacy MR dependencies from scala code

Frank Scholten
---------------------
M-1649: Lucene 5 upgrade

Pat Ferrel
-----------------
M-1589: mahout.cmd has duplicated content
M-1618: co-occurence recommender example

Suneel Marthi
---------------------
M-1586: Collections downloads must have hash signatures
M-1647: Release build
M-1652: Java 7 update
M-1512: Hadoop 2 compatibility
M-1469: Streaming KMeans fails when executed in MR mode and
REDUCE_STREAMING_KMEANS set to true
M-1443: Update "How to Release" page
M-1585: Javadocs not hosted by Mahout-Quality
M-1612: NPE during JSON outputformatter for clusterdump

Stevo Slavic
--------------------
M-1650: upgrade 3rd party jars
M-1602: Euclidean Distance Similarity Math
M-1278: Improve inheritance of apache parent pom

Shannon Quinn
-----------------------
M-1540: Reuters Example spectral clustering
Also online docs for Spectral clustering

Ted Dunning
-------------------
M-1636: Class dependencies for Spark module are put in job.jar, which is
inefficient


Re: Mahout 0.10.0 Bug bash

Posted by Shannon Quinn <sq...@gatech.edu>.
Yes--removing the Lanczos solver from spectral clustering.

On 3/27/15 10:29 AM, Suneel Marthi wrote:
> and this is for 0.10.0 ???
>
> On Fri, Mar 27, 2015 at 10:27 AM, Shannon Quinn <sq...@gatech.edu> wrote:
>
>> Created M-1659 and assigned it to myself to reflect current work.
>>
>> Shannon
>>
>>
>> On 3/26/15 10:07 PM, Suneel Marthi wrote:
>>
>>> Ok here's the bug bash as of today
>>>
>>> Andrew Palumbo
>>> --------------------------
>>> M-1648: Update CMS for Mahout 0.10.0
>>> M-1638: H2O bindings fail at drmParallelizeWithRowLabels
>>> M-1564: Naive Bayes classifier for new Text Documents
>>> M-1635: Exception when providing classification Labels
>>> M-1493: Port Naive Bayes to Spark DSL
>>> M-1559: Documentation and cleanup for Naive Bayes Example
>>> M-1609: NullPointerException
>>> M-1607: Spark-shell DAG scheduler
>>>
>>> Andrew Musselman
>>> -----------------------------
>>> M-1655: Refactor module dependencies
>>> M-1563: cleanup Warnings during Build
>>> M-1470: LDA Topic dump
>>>
>>> Dmitriy Lyubimov
>>> --------------------------
>>> M-1646: Refactor out all legacy MR dependencies from scala code
>>>
>>> Frank Scholten
>>> ---------------------
>>> M-1649: Lucene 5 upgrade
>>>
>>> Pat Ferrel
>>> -----------------
>>> M-1589: mahout.cmd has duplicated content
>>> M-1618: co-occurence recommender example
>>>
>>> Suneel Marthi
>>> ---------------------
>>> M-1586: Collections downloads must have hash signatures
>>> M-1647: Release build
>>> M-1652: Java 7 update
>>> M-1512: Hadoop 2 compatibility
>>> M-1469: Streaming KMeans fails when executed in MR mode and
>>> REDUCE_STREAMING_KMEANS set to true
>>> M-1443: Update "How to Release" page
>>> M-1585: Javadocs not hosted by Mahout-Quality
>>> M-1612: NPE during JSON outputformatter for clusterdump
>>>
>>> Stevo Slavic
>>> --------------------
>>> M-1650: upgrade 3rd party jars
>>> M-1602: Euclidean Distance Similarity Math
>>> M-1278: Improve inheritance of apache parent pom
>>>
>>> Shannon Quinn
>>> -----------------------
>>> M-1540: Reuters Example spectral clustering
>>> Also online docs for Spectral clustering
>>>
>>> Ted Dunning
>>> -------------------
>>> M-1636: Class dependencies for Spark module are put in job.jar, which is
>>> inefficient
>>>
>>>


Re: Mahout 0.10.0 Bug bash

Posted by Suneel Marthi <su...@gmail.com>.
and this is for 0.10.0 ???

On Fri, Mar 27, 2015 at 10:27 AM, Shannon Quinn <sq...@gatech.edu> wrote:

> Created M-1659 and assigned it to myself to reflect current work.
>
> Shannon
>
>
> On 3/26/15 10:07 PM, Suneel Marthi wrote:
>
>> Ok here's the bug bash as of today
>>
>> Andrew Palumbo
>> --------------------------
>> M-1648: Update CMS for Mahout 0.10.0
>> M-1638: H2O bindings fail at drmParallelizeWithRowLabels
>> M-1564: Naive Bayes classifier for new Text Documents
>> M-1635: Exception when providing classification Labels
>> M-1493: Port Naive Bayes to Spark DSL
>> M-1559: Documentation and cleanup for Naive Bayes Example
>> M-1609: NullPointerException
>> M-1607: Spark-shell DAG scheduler
>>
>> Andrew Musselman
>> -----------------------------
>> M-1655: Refactor module dependencies
>> M-1563: cleanup Warnings during Build
>> M-1470: LDA Topic dump
>>
>> Dmitriy Lyubimov
>> --------------------------
>> M-1646: Refactor out all legacy MR dependencies from scala code
>>
>> Frank Scholten
>> ---------------------
>> M-1649: Lucene 5 upgrade
>>
>> Pat Ferrel
>> -----------------
>> M-1589: mahout.cmd has duplicated content
>> M-1618: co-occurence recommender example
>>
>> Suneel Marthi
>> ---------------------
>> M-1586: Collections downloads must have hash signatures
>> M-1647: Release build
>> M-1652: Java 7 update
>> M-1512: Hadoop 2 compatibility
>> M-1469: Streaming KMeans fails when executed in MR mode and
>> REDUCE_STREAMING_KMEANS set to true
>> M-1443: Update "How to Release" page
>> M-1585: Javadocs not hosted by Mahout-Quality
>> M-1612: NPE during JSON outputformatter for clusterdump
>>
>> Stevo Slavic
>> --------------------
>> M-1650: upgrade 3rd party jars
>> M-1602: Euclidean Distance Similarity Math
>> M-1278: Improve inheritance of apache parent pom
>>
>> Shannon Quinn
>> -----------------------
>> M-1540: Reuters Example spectral clustering
>> Also online docs for Spectral clustering
>>
>> Ted Dunning
>> -------------------
>> M-1636: Class dependencies for Spark module are put in job.jar, which is
>> inefficient
>>
>>
>

Re: Mahout 0.10.0 Bug bash

Posted by Shannon Quinn <sq...@gatech.edu>.
Created M-1659 and assigned it to myself to reflect current work.

Shannon

On 3/26/15 10:07 PM, Suneel Marthi wrote:
> Ok here's the bug bash as of today
>
> Andrew Palumbo
> --------------------------
> M-1648: Update CMS for Mahout 0.10.0
> M-1638: H2O bindings fail at drmParallelizeWithRowLabels
> M-1564: Naive Bayes classifier for new Text Documents
> M-1635: Exception when providing classification Labels
> M-1493: Port Naive Bayes to Spark DSL
> M-1559: Documentation and cleanup for Naive Bayes Example
> M-1609: NullPointerException
> M-1607: Spark-shell DAG scheduler
>
> Andrew Musselman
> -----------------------------
> M-1655: Refactor module dependencies
> M-1563: cleanup Warnings during Build
> M-1470: LDA Topic dump
>
> Dmitriy Lyubimov
> --------------------------
> M-1646: Refactor out all legacy MR dependencies from scala code
>
> Frank Scholten
> ---------------------
> M-1649: Lucene 5 upgrade
>
> Pat Ferrel
> -----------------
> M-1589: mahout.cmd has duplicated content
> M-1618: co-occurence recommender example
>
> Suneel Marthi
> ---------------------
> M-1586: Collections downloads must have hash signatures
> M-1647: Release build
> M-1652: Java 7 update
> M-1512: Hadoop 2 compatibility
> M-1469: Streaming KMeans fails when executed in MR mode and
> REDUCE_STREAMING_KMEANS set to true
> M-1443: Update "How to Release" page
> M-1585: Javadocs not hosted by Mahout-Quality
> M-1612: NPE during JSON outputformatter for clusterdump
>
> Stevo Slavic
> --------------------
> M-1650: upgrade 3rd party jars
> M-1602: Euclidean Distance Similarity Math
> M-1278: Improve inheritance of apache parent pom
>
> Shannon Quinn
> -----------------------
> M-1540: Reuters Example spectral clustering
> Also online docs for Spectral clustering
>
> Ted Dunning
> -------------------
> M-1636: Class dependencies for Spark module are put in job.jar, which is
> inefficient
>


Re: Mahout 0.10.0 Bug bash

Posted by Shannon Quinn <sq...@gatech.edu>.
Ah no worries, just got a bit panicked when I saw that. 

Summer will be better for me but for now these tickets have about maxed me out; 3 months into the new tenure-track shtick is grueling. 

iPhone'd

> On Mar 28, 2015, at 14:27, Andrew Musselman <an...@gmail.com> wrote:
> 
> Okay, go ahead and move it; I was just moving things from 1.0 to 0.10.0
> almost indiscriminately.
> 
>> On Sat, Mar 28, 2015 at 11:22 AM, Shannon Quinn <sq...@gatech.edu> wrote:
>> 
>> Wait, I thought all DSL work on spectral clustering was waiting until
>> 0.10.1?
>> 
>> iPhone'd
>> 
>>>> On Mar 28, 2015, at 13:49, Suneel Marthi <su...@gmail.com>
>>> wrote:
>>> 
>>> Seems like we are stretched pretty thin given the work load, not to
>> mention
>>> that Mahout work is completely orthogonal to our paychecks.
>>> 
>>> Ted, Grant, Shannon - possible you guys could take some of the load??
>>> 
>>> On Sat, Mar 28, 2015 at 1:25 PM, Andrew Musselman <
>>> andrew.musselman@gmail.com> wrote:
>>> 
>>>> Today's:
>>>> 
>>>> Andrew Palumbo
>>>> --------------------------
>>>> M-1648: Update CMS for Mahout 0.10.0
>>>> M-1638: H2O bindings fail at drmParallelizeWithRowLabels
>>>> M-1477: Clean up website on Logistic Regression
>>>> M-1564: Naive Bayes classifier for new Text Documents
>>>> M-1635: Getting an exception when I provide classification labels
>> manually
>>>> for Naive Bayes
>>>> M-1493: Port Naive Bayes to Spark DSL    (Patch available)
>>>> M-1559: Documentation and cleanup for Naive Bayes Example
>>>> M-1609: NullPointerException
>>>> M-1607: Spark-shell DAG scheduler
>>>> 
>>>> Andrew Musselman
>>>> -----------------------------
>>>> M-1655: Refactor module dependencies
>>>> M-1522: Handle logging levels via log4j.xml
>>>> M-1563: cleanup Warnings during Build
>>>> M-1470: LDA Topic dump
>>>> M-1462: Cleaning up Random Forests documentation on Mahout website
>>>> 
>>>> Dmitriy Lyubimov
>>>> --------------------------
>>>> M-1646: Refactor out all legacy MR dependencies from scala code
>>>> 
>>>> Frank Scholten
>>>> ---------------------
>>>> M-1649: Lucene 5 upgrade
>>>> M-1625: lucene2seq: failure to convert a document that does not contain
>> a
>>>> field (the field is not required)
>>>> 
>>>> Pat Ferrel
>>>> -----------------
>>>> M-1589: mahout.cmd has duplicated content    (Patch available)
>>>> M-1618: co-occurence recommender example
>>>> 
>>>> Suneel Marthi
>>>> ---------------------
>>>> M-1586: Collections downloads must have hash signatures
>>>> M-1647: The release build is incomplete
>>>> M-1652: Java 7 update
>>>> M-1512: Hadoop 2 compatibility
>>>> M-1469: Streaming KMeans fails when executed in MR mode and
>>>> REDUCE_STREAMING_KMEANS
>>>> set to true
>>>> M-1443: Update "How to Release" page    (Tagged 0.10.1)
>>>> M-1585: Javadocs not hosted by Mahout-Quality
>>>> M-1612: NPE during JSON outputformatter for clusterdump
>>>> M-1656: Change SNAPSHOT version from 1.0 to 0.10
>>>> M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf
>>>> M-1619: HighDFWordsPruner overwrites cache files
>>>> 
>>>> Stevo Slavic
>>>> --------------------
>>>> M-1650: upgrade 3rd party jars
>>>> M-1602: Euclidean Distance Similarity Math
>>>> M-1278: Improve inheritance of apache parent pom
>>>> M-1562: Publish Scaladocs
>>>> M-1277: Lose dependency on custom commons-cli
>>>> 
>>>> Shannon Quinn
>>>> -----------------------
>>>> M-1538: Port spectral clustering to Mahout DSL
>>>> M-1593: Implement affinity matrix computation in Mahout DSL
>>>> M-1540: Reuters Example spectral clustering Also online docs for
>> Spectral
>>>> clustering
>>>> M-1659: Remove deprecated Lanczos solver from spectral clustering in
>>>> mr-legacy
>>>> 
>>>> Ted Dunning
>>>> -------------------
>>>> M-1636: Class dependencies for Spark module are put in job.jar, which is
>>>> inefficient
>>>> 
>>>> Sebastian Schelter
>>>> --------------------------
>>>> M-1584: Create a detailed example of how to index an arbitrary dataset
>> and
>>>> run LDA on it    (Patch available)
>>>> 
>>>> Gokhan Capan
>>>> ----------------------
>>>> M-1626: Support for required quasi-algebraic operations and starting
>> with
>>>> aggregating rows/blocks
>>>> 
>>>> Unassigned
>>>> ------------------
>>>> M-1594: Example factorize-movielens-1M.sh does not use HDFS    (Patch
>>>> available)
>>>> M-1593: cluster-reuters.sh does not work complaining
>>>> java.lang.IllegalStateException    (Patch available)
>>>> M-1557: Add support for sparse training vectors in MLP    (Patch
>> available)
>>>> M-1516: run classify-20newsgroups.sh failed cause by
>>>> /tmp/mahout-work-jpan/20news-all does not exists in hdfs.    (Patch
>>>> available)
>>>> M-1643: CLI arguments are not being processed in spark-shell
>>>> M-1637: RecommenderJob of ALS fails in the mapper because it uses the
>>>> instance of other class
>>>> M-1634: ALS don't work when it adds new files in Distributed Cache
>>>> (Patch available)
>>>> M-1633: Failure to execute query when solr index contains documents with
>>>> different fields
>>>> M-1551: Add document to describe how to use mlp with command line
>> (Patch
>>>> available)
>>>> 
>>>> On Thu, Mar 26, 2015 at 7:07 PM, Suneel Marthi <suneel.marthi@gmail.com
>>> 
>>>> wrote:
>>>> 
>>>>> Ok here's the bug bash as of today
>>>>> 
>>>>> Andrew Palumbo
>>>>> --------------------------
>>>>> M-1648: Update CMS for Mahout 0.10.0
>>>>> M-1638: H2O bindings fail at drmParallelizeWithRowLabels
>>>>> M-1564: Naive Bayes classifier for new Text Documents
>>>>> M-1635: Exception when providing classification Labels
>>>>> M-1493: Port Naive Bayes to Spark DSL
>>>>> M-1559: Documentation and cleanup for Naive Bayes Example
>>>>> M-1609: NullPointerException
>>>>> M-1607: Spark-shell DAG scheduler
>>>>> 
>>>>> Andrew Musselman
>>>>> -----------------------------
>>>>> M-1655: Refactor module dependencies
>>>>> M-1563: cleanup Warnings during Build
>>>>> M-1470: LDA Topic dump
>>>>> 
>>>>> Dmitriy Lyubimov
>>>>> --------------------------
>>>>> M-1646: Refactor out all legacy MR dependencies from scala code
>>>>> 
>>>>> Frank Scholten
>>>>> ---------------------
>>>>> M-1649: Lucene 5 upgrade
>>>>> 
>>>>> Pat Ferrel
>>>>> -----------------
>>>>> M-1589: mahout.cmd has duplicated content
>>>>> M-1618: co-occurence recommender example
>>>>> 
>>>>> Suneel Marthi
>>>>> ---------------------
>>>>> M-1586: Collections downloads must have hash signatures
>>>>> M-1647: Release build
>>>>> M-1652: Java 7 update
>>>>> M-1512: Hadoop 2 compatibility
>>>>> M-1469: Streaming KMeans fails when executed in MR mode and
>>>>> REDUCE_STREAMING_KMEANS set to true
>>>>> M-1443: Update "How to Release" page
>>>>> M-1585: Javadocs not hosted by Mahout-Quality
>>>>> M-1612: NPE during JSON outputformatter for clusterdump
>>>>> 
>>>>> Stevo Slavic
>>>>> --------------------
>>>>> M-1650: upgrade 3rd party jars
>>>>> M-1602: Euclidean Distance Similarity Math
>>>>> M-1278: Improve inheritance of apache parent pom
>>>>> 
>>>>> Shannon Quinn
>>>>> -----------------------
>>>>> M-1540: Reuters Example spectral clustering
>>>>> Also online docs for Spectral clustering
>>>>> 
>>>>> Ted Dunning
>>>>> -------------------
>>>>> M-1636: Class dependencies for Spark module are put in job.jar, which
>> is
>>>>> inefficient
>> 

Re: Mahout 0.10.0 Bug bash

Posted by Andrew Musselman <an...@gmail.com>.
Okay, go ahead and move it; I was just moving things from 1.0 to 0.10.0
almost indiscriminately.

On Sat, Mar 28, 2015 at 11:22 AM, Shannon Quinn <sq...@gatech.edu> wrote:

> Wait, I thought all DSL work on spectral clustering was waiting until
> 0.10.1?
>
> iPhone'd
>
> > On Mar 28, 2015, at 13:49, Suneel Marthi <su...@gmail.com>
> wrote:
> >
> > Seems like we are stretched pretty thin given the work load, not to
> mention
> > that Mahout work is completely orthogonal to our paychecks.
> >
> > Ted, Grant, Shannon - possible you guys could take some of the load??
> >
> > On Sat, Mar 28, 2015 at 1:25 PM, Andrew Musselman <
> > andrew.musselman@gmail.com> wrote:
> >
> >> Today's:
> >>
> >> Andrew Palumbo
> >> --------------------------
> >> M-1648: Update CMS for Mahout 0.10.0
> >> M-1638: H2O bindings fail at drmParallelizeWithRowLabels
> >> M-1477: Clean up website on Logistic Regression
> >> M-1564: Naive Bayes classifier for new Text Documents
> >> M-1635: Getting an exception when I provide classification labels
> manually
> >> for Naive Bayes
> >> M-1493: Port Naive Bayes to Spark DSL    (Patch available)
> >> M-1559: Documentation and cleanup for Naive Bayes Example
> >> M-1609: NullPointerException
> >> M-1607: Spark-shell DAG scheduler
> >>
> >> Andrew Musselman
> >> -----------------------------
> >> M-1655: Refactor module dependencies
> >> M-1522: Handle logging levels via log4j.xml
> >> M-1563: cleanup Warnings during Build
> >> M-1470: LDA Topic dump
> >> M-1462: Cleaning up Random Forests documentation on Mahout website
> >>
> >> Dmitriy Lyubimov
> >> --------------------------
> >> M-1646: Refactor out all legacy MR dependencies from scala code
> >>
> >> Frank Scholten
> >> ---------------------
> >> M-1649: Lucene 5 upgrade
> >> M-1625: lucene2seq: failure to convert a document that does not contain
> a
> >> field (the field is not required)
> >>
> >> Pat Ferrel
> >> -----------------
> >> M-1589: mahout.cmd has duplicated content    (Patch available)
> >> M-1618: co-occurence recommender example
> >>
> >> Suneel Marthi
> >> ---------------------
> >> M-1586: Collections downloads must have hash signatures
> >> M-1647: The release build is incomplete
> >> M-1652: Java 7 update
> >> M-1512: Hadoop 2 compatibility
> >> M-1469: Streaming KMeans fails when executed in MR mode and
> >> REDUCE_STREAMING_KMEANS
> >> set to true
> >> M-1443: Update "How to Release" page    (Tagged 0.10.1)
> >> M-1585: Javadocs not hosted by Mahout-Quality
> >> M-1612: NPE during JSON outputformatter for clusterdump
> >> M-1656: Change SNAPSHOT version from 1.0 to 0.10
> >> M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf
> >> M-1619: HighDFWordsPruner overwrites cache files
> >>
> >> Stevo Slavic
> >> --------------------
> >> M-1650: upgrade 3rd party jars
> >> M-1602: Euclidean Distance Similarity Math
> >> M-1278: Improve inheritance of apache parent pom
> >> M-1562: Publish Scaladocs
> >> M-1277: Lose dependency on custom commons-cli
> >>
> >> Shannon Quinn
> >> -----------------------
> >> M-1538: Port spectral clustering to Mahout DSL
> >> M-1593: Implement affinity matrix computation in Mahout DSL
> >> M-1540: Reuters Example spectral clustering Also online docs for
> Spectral
> >> clustering
> >> M-1659: Remove deprecated Lanczos solver from spectral clustering in
> >> mr-legacy
> >>
> >> Ted Dunning
> >> -------------------
> >> M-1636: Class dependencies for Spark module are put in job.jar, which is
> >> inefficient
> >>
> >> Sebastian Schelter
> >> --------------------------
> >> M-1584: Create a detailed example of how to index an arbitrary dataset
> and
> >> run LDA on it    (Patch available)
> >>
> >> Gokhan Capan
> >> ----------------------
> >> M-1626: Support for required quasi-algebraic operations and starting
> with
> >> aggregating rows/blocks
> >>
> >> Unassigned
> >> ------------------
> >> M-1594: Example factorize-movielens-1M.sh does not use HDFS    (Patch
> >> available)
> >> M-1593: cluster-reuters.sh does not work complaining
> >> java.lang.IllegalStateException    (Patch available)
> >> M-1557: Add support for sparse training vectors in MLP    (Patch
> available)
> >> M-1516: run classify-20newsgroups.sh failed cause by
> >> /tmp/mahout-work-jpan/20news-all does not exists in hdfs.    (Patch
> >> available)
> >> M-1643: CLI arguments are not being processed in spark-shell
> >> M-1637: RecommenderJob of ALS fails in the mapper because it uses the
> >> instance of other class
> >> M-1634: ALS don't work when it adds new files in Distributed Cache
> >> (Patch available)
> >> M-1633: Failure to execute query when solr index contains documents with
> >> different fields
> >> M-1551: Add document to describe how to use mlp with command line
> (Patch
> >> available)
> >>
> >> On Thu, Mar 26, 2015 at 7:07 PM, Suneel Marthi <suneel.marthi@gmail.com
> >
> >> wrote:
> >>
> >>> Ok here's the bug bash as of today
> >>>
> >>> Andrew Palumbo
> >>> --------------------------
> >>> M-1648: Update CMS for Mahout 0.10.0
> >>> M-1638: H2O bindings fail at drmParallelizeWithRowLabels
> >>> M-1564: Naive Bayes classifier for new Text Documents
> >>> M-1635: Exception when providing classification Labels
> >>> M-1493: Port Naive Bayes to Spark DSL
> >>> M-1559: Documentation and cleanup for Naive Bayes Example
> >>> M-1609: NullPointerException
> >>> M-1607: Spark-shell DAG scheduler
> >>>
> >>> Andrew Musselman
> >>> -----------------------------
> >>> M-1655: Refactor module dependencies
> >>> M-1563: cleanup Warnings during Build
> >>> M-1470: LDA Topic dump
> >>>
> >>> Dmitriy Lyubimov
> >>> --------------------------
> >>> M-1646: Refactor out all legacy MR dependencies from scala code
> >>>
> >>> Frank Scholten
> >>> ---------------------
> >>> M-1649: Lucene 5 upgrade
> >>>
> >>> Pat Ferrel
> >>> -----------------
> >>> M-1589: mahout.cmd has duplicated content
> >>> M-1618: co-occurence recommender example
> >>>
> >>> Suneel Marthi
> >>> ---------------------
> >>> M-1586: Collections downloads must have hash signatures
> >>> M-1647: Release build
> >>> M-1652: Java 7 update
> >>> M-1512: Hadoop 2 compatibility
> >>> M-1469: Streaming KMeans fails when executed in MR mode and
> >>> REDUCE_STREAMING_KMEANS set to true
> >>> M-1443: Update "How to Release" page
> >>> M-1585: Javadocs not hosted by Mahout-Quality
> >>> M-1612: NPE during JSON outputformatter for clusterdump
> >>>
> >>> Stevo Slavic
> >>> --------------------
> >>> M-1650: upgrade 3rd party jars
> >>> M-1602: Euclidean Distance Similarity Math
> >>> M-1278: Improve inheritance of apache parent pom
> >>>
> >>> Shannon Quinn
> >>> -----------------------
> >>> M-1540: Reuters Example spectral clustering
> >>> Also online docs for Spectral clustering
> >>>
> >>> Ted Dunning
> >>> -------------------
> >>> M-1636: Class dependencies for Spark module are put in job.jar, which
> is
> >>> inefficient
> >>
>

Re: Mahout 0.10.0 Bug bash

Posted by Suneel Marthi <su...@gmail.com>.
that's right, feel free to edit ur Jiras to reflect that.

On Sat, Mar 28, 2015 at 2:22 PM, Shannon Quinn <sq...@gatech.edu> wrote:

> Wait, I thought all DSL work on spectral clustering was waiting until
> 0.10.1?
>
> iPhone'd
>
> > On Mar 28, 2015, at 13:49, Suneel Marthi <su...@gmail.com>
> wrote:
> >
> > Seems like we are stretched pretty thin given the work load, not to
> mention
> > that Mahout work is completely orthogonal to our paychecks.
> >
> > Ted, Grant, Shannon - possible you guys could take some of the load??
> >
> > On Sat, Mar 28, 2015 at 1:25 PM, Andrew Musselman <
> > andrew.musselman@gmail.com> wrote:
> >
> >> Today's:
> >>
> >> Andrew Palumbo
> >> --------------------------
> >> M-1648: Update CMS for Mahout 0.10.0
> >> M-1638: H2O bindings fail at drmParallelizeWithRowLabels
> >> M-1477: Clean up website on Logistic Regression
> >> M-1564: Naive Bayes classifier for new Text Documents
> >> M-1635: Getting an exception when I provide classification labels
> manually
> >> for Naive Bayes
> >> M-1493: Port Naive Bayes to Spark DSL    (Patch available)
> >> M-1559: Documentation and cleanup for Naive Bayes Example
> >> M-1609: NullPointerException
> >> M-1607: Spark-shell DAG scheduler
> >>
> >> Andrew Musselman
> >> -----------------------------
> >> M-1655: Refactor module dependencies
> >> M-1522: Handle logging levels via log4j.xml
> >> M-1563: cleanup Warnings during Build
> >> M-1470: LDA Topic dump
> >> M-1462: Cleaning up Random Forests documentation on Mahout website
> >>
> >> Dmitriy Lyubimov
> >> --------------------------
> >> M-1646: Refactor out all legacy MR dependencies from scala code
> >>
> >> Frank Scholten
> >> ---------------------
> >> M-1649: Lucene 5 upgrade
> >> M-1625: lucene2seq: failure to convert a document that does not contain
> a
> >> field (the field is not required)
> >>
> >> Pat Ferrel
> >> -----------------
> >> M-1589: mahout.cmd has duplicated content    (Patch available)
> >> M-1618: co-occurence recommender example
> >>
> >> Suneel Marthi
> >> ---------------------
> >> M-1586: Collections downloads must have hash signatures
> >> M-1647: The release build is incomplete
> >> M-1652: Java 7 update
> >> M-1512: Hadoop 2 compatibility
> >> M-1469: Streaming KMeans fails when executed in MR mode and
> >> REDUCE_STREAMING_KMEANS
> >> set to true
> >> M-1443: Update "How to Release" page    (Tagged 0.10.1)
> >> M-1585: Javadocs not hosted by Mahout-Quality
> >> M-1612: NPE during JSON outputformatter for clusterdump
> >> M-1656: Change SNAPSHOT version from 1.0 to 0.10
> >> M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf
> >> M-1619: HighDFWordsPruner overwrites cache files
> >>
> >> Stevo Slavic
> >> --------------------
> >> M-1650: upgrade 3rd party jars
> >> M-1602: Euclidean Distance Similarity Math
> >> M-1278: Improve inheritance of apache parent pom
> >> M-1562: Publish Scaladocs
> >> M-1277: Lose dependency on custom commons-cli
> >>
> >> Shannon Quinn
> >> -----------------------
> >> M-1538: Port spectral clustering to Mahout DSL
> >> M-1593: Implement affinity matrix computation in Mahout DSL
> >> M-1540: Reuters Example spectral clustering Also online docs for
> Spectral
> >> clustering
> >> M-1659: Remove deprecated Lanczos solver from spectral clustering in
> >> mr-legacy
> >>
> >> Ted Dunning
> >> -------------------
> >> M-1636: Class dependencies for Spark module are put in job.jar, which is
> >> inefficient
> >>
> >> Sebastian Schelter
> >> --------------------------
> >> M-1584: Create a detailed example of how to index an arbitrary dataset
> and
> >> run LDA on it    (Patch available)
> >>
> >> Gokhan Capan
> >> ----------------------
> >> M-1626: Support for required quasi-algebraic operations and starting
> with
> >> aggregating rows/blocks
> >>
> >> Unassigned
> >> ------------------
> >> M-1594: Example factorize-movielens-1M.sh does not use HDFS    (Patch
> >> available)
> >> M-1593: cluster-reuters.sh does not work complaining
> >> java.lang.IllegalStateException    (Patch available)
> >> M-1557: Add support for sparse training vectors in MLP    (Patch
> available)
> >> M-1516: run classify-20newsgroups.sh failed cause by
> >> /tmp/mahout-work-jpan/20news-all does not exists in hdfs.    (Patch
> >> available)
> >> M-1643: CLI arguments are not being processed in spark-shell
> >> M-1637: RecommenderJob of ALS fails in the mapper because it uses the
> >> instance of other class
> >> M-1634: ALS don't work when it adds new files in Distributed Cache
> >> (Patch available)
> >> M-1633: Failure to execute query when solr index contains documents with
> >> different fields
> >> M-1551: Add document to describe how to use mlp with command line
> (Patch
> >> available)
> >>
> >> On Thu, Mar 26, 2015 at 7:07 PM, Suneel Marthi <suneel.marthi@gmail.com
> >
> >> wrote:
> >>
> >>> Ok here's the bug bash as of today
> >>>
> >>> Andrew Palumbo
> >>> --------------------------
> >>> M-1648: Update CMS for Mahout 0.10.0
> >>> M-1638: H2O bindings fail at drmParallelizeWithRowLabels
> >>> M-1564: Naive Bayes classifier for new Text Documents
> >>> M-1635: Exception when providing classification Labels
> >>> M-1493: Port Naive Bayes to Spark DSL
> >>> M-1559: Documentation and cleanup for Naive Bayes Example
> >>> M-1609: NullPointerException
> >>> M-1607: Spark-shell DAG scheduler
> >>>
> >>> Andrew Musselman
> >>> -----------------------------
> >>> M-1655: Refactor module dependencies
> >>> M-1563: cleanup Warnings during Build
> >>> M-1470: LDA Topic dump
> >>>
> >>> Dmitriy Lyubimov
> >>> --------------------------
> >>> M-1646: Refactor out all legacy MR dependencies from scala code
> >>>
> >>> Frank Scholten
> >>> ---------------------
> >>> M-1649: Lucene 5 upgrade
> >>>
> >>> Pat Ferrel
> >>> -----------------
> >>> M-1589: mahout.cmd has duplicated content
> >>> M-1618: co-occurence recommender example
> >>>
> >>> Suneel Marthi
> >>> ---------------------
> >>> M-1586: Collections downloads must have hash signatures
> >>> M-1647: Release build
> >>> M-1652: Java 7 update
> >>> M-1512: Hadoop 2 compatibility
> >>> M-1469: Streaming KMeans fails when executed in MR mode and
> >>> REDUCE_STREAMING_KMEANS set to true
> >>> M-1443: Update "How to Release" page
> >>> M-1585: Javadocs not hosted by Mahout-Quality
> >>> M-1612: NPE during JSON outputformatter for clusterdump
> >>>
> >>> Stevo Slavic
> >>> --------------------
> >>> M-1650: upgrade 3rd party jars
> >>> M-1602: Euclidean Distance Similarity Math
> >>> M-1278: Improve inheritance of apache parent pom
> >>>
> >>> Shannon Quinn
> >>> -----------------------
> >>> M-1540: Reuters Example spectral clustering
> >>> Also online docs for Spectral clustering
> >>>
> >>> Ted Dunning
> >>> -------------------
> >>> M-1636: Class dependencies for Spark module are put in job.jar, which
> is
> >>> inefficient
> >>
>

Re: Mahout 0.10.0 Bug bash

Posted by Shannon Quinn <sq...@gatech.edu>.
Wait, I thought all DSL work on spectral clustering was waiting until 0.10.1?

iPhone'd

> On Mar 28, 2015, at 13:49, Suneel Marthi <su...@gmail.com> wrote:
> 
> Seems like we are stretched pretty thin given the work load, not to mention
> that Mahout work is completely orthogonal to our paychecks.
> 
> Ted, Grant, Shannon - possible you guys could take some of the load??
> 
> On Sat, Mar 28, 2015 at 1:25 PM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
> 
>> Today's:
>> 
>> Andrew Palumbo
>> --------------------------
>> M-1648: Update CMS for Mahout 0.10.0
>> M-1638: H2O bindings fail at drmParallelizeWithRowLabels
>> M-1477: Clean up website on Logistic Regression
>> M-1564: Naive Bayes classifier for new Text Documents
>> M-1635: Getting an exception when I provide classification labels manually
>> for Naive Bayes
>> M-1493: Port Naive Bayes to Spark DSL    (Patch available)
>> M-1559: Documentation and cleanup for Naive Bayes Example
>> M-1609: NullPointerException
>> M-1607: Spark-shell DAG scheduler
>> 
>> Andrew Musselman
>> -----------------------------
>> M-1655: Refactor module dependencies
>> M-1522: Handle logging levels via log4j.xml
>> M-1563: cleanup Warnings during Build
>> M-1470: LDA Topic dump
>> M-1462: Cleaning up Random Forests documentation on Mahout website
>> 
>> Dmitriy Lyubimov
>> --------------------------
>> M-1646: Refactor out all legacy MR dependencies from scala code
>> 
>> Frank Scholten
>> ---------------------
>> M-1649: Lucene 5 upgrade
>> M-1625: lucene2seq: failure to convert a document that does not contain a
>> field (the field is not required)
>> 
>> Pat Ferrel
>> -----------------
>> M-1589: mahout.cmd has duplicated content    (Patch available)
>> M-1618: co-occurence recommender example
>> 
>> Suneel Marthi
>> ---------------------
>> M-1586: Collections downloads must have hash signatures
>> M-1647: The release build is incomplete
>> M-1652: Java 7 update
>> M-1512: Hadoop 2 compatibility
>> M-1469: Streaming KMeans fails when executed in MR mode and
>> REDUCE_STREAMING_KMEANS
>> set to true
>> M-1443: Update "How to Release" page    (Tagged 0.10.1)
>> M-1585: Javadocs not hosted by Mahout-Quality
>> M-1612: NPE during JSON outputformatter for clusterdump
>> M-1656: Change SNAPSHOT version from 1.0 to 0.10
>> M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf
>> M-1619: HighDFWordsPruner overwrites cache files
>> 
>> Stevo Slavic
>> --------------------
>> M-1650: upgrade 3rd party jars
>> M-1602: Euclidean Distance Similarity Math
>> M-1278: Improve inheritance of apache parent pom
>> M-1562: Publish Scaladocs
>> M-1277: Lose dependency on custom commons-cli
>> 
>> Shannon Quinn
>> -----------------------
>> M-1538: Port spectral clustering to Mahout DSL
>> M-1593: Implement affinity matrix computation in Mahout DSL
>> M-1540: Reuters Example spectral clustering Also online docs for Spectral
>> clustering
>> M-1659: Remove deprecated Lanczos solver from spectral clustering in
>> mr-legacy
>> 
>> Ted Dunning
>> -------------------
>> M-1636: Class dependencies for Spark module are put in job.jar, which is
>> inefficient
>> 
>> Sebastian Schelter
>> --------------------------
>> M-1584: Create a detailed example of how to index an arbitrary dataset and
>> run LDA on it    (Patch available)
>> 
>> Gokhan Capan
>> ----------------------
>> M-1626: Support for required quasi-algebraic operations and starting with
>> aggregating rows/blocks
>> 
>> Unassigned
>> ------------------
>> M-1594: Example factorize-movielens-1M.sh does not use HDFS    (Patch
>> available)
>> M-1593: cluster-reuters.sh does not work complaining
>> java.lang.IllegalStateException    (Patch available)
>> M-1557: Add support for sparse training vectors in MLP    (Patch available)
>> M-1516: run classify-20newsgroups.sh failed cause by
>> /tmp/mahout-work-jpan/20news-all does not exists in hdfs.    (Patch
>> available)
>> M-1643: CLI arguments are not being processed in spark-shell
>> M-1637: RecommenderJob of ALS fails in the mapper because it uses the
>> instance of other class
>> M-1634: ALS don't work when it adds new files in Distributed Cache
>> (Patch available)
>> M-1633: Failure to execute query when solr index contains documents with
>> different fields
>> M-1551: Add document to describe how to use mlp with command line    (Patch
>> available)
>> 
>> On Thu, Mar 26, 2015 at 7:07 PM, Suneel Marthi <su...@gmail.com>
>> wrote:
>> 
>>> Ok here's the bug bash as of today
>>> 
>>> Andrew Palumbo
>>> --------------------------
>>> M-1648: Update CMS for Mahout 0.10.0
>>> M-1638: H2O bindings fail at drmParallelizeWithRowLabels
>>> M-1564: Naive Bayes classifier for new Text Documents
>>> M-1635: Exception when providing classification Labels
>>> M-1493: Port Naive Bayes to Spark DSL
>>> M-1559: Documentation and cleanup for Naive Bayes Example
>>> M-1609: NullPointerException
>>> M-1607: Spark-shell DAG scheduler
>>> 
>>> Andrew Musselman
>>> -----------------------------
>>> M-1655: Refactor module dependencies
>>> M-1563: cleanup Warnings during Build
>>> M-1470: LDA Topic dump
>>> 
>>> Dmitriy Lyubimov
>>> --------------------------
>>> M-1646: Refactor out all legacy MR dependencies from scala code
>>> 
>>> Frank Scholten
>>> ---------------------
>>> M-1649: Lucene 5 upgrade
>>> 
>>> Pat Ferrel
>>> -----------------
>>> M-1589: mahout.cmd has duplicated content
>>> M-1618: co-occurence recommender example
>>> 
>>> Suneel Marthi
>>> ---------------------
>>> M-1586: Collections downloads must have hash signatures
>>> M-1647: Release build
>>> M-1652: Java 7 update
>>> M-1512: Hadoop 2 compatibility
>>> M-1469: Streaming KMeans fails when executed in MR mode and
>>> REDUCE_STREAMING_KMEANS set to true
>>> M-1443: Update "How to Release" page
>>> M-1585: Javadocs not hosted by Mahout-Quality
>>> M-1612: NPE during JSON outputformatter for clusterdump
>>> 
>>> Stevo Slavic
>>> --------------------
>>> M-1650: upgrade 3rd party jars
>>> M-1602: Euclidean Distance Similarity Math
>>> M-1278: Improve inheritance of apache parent pom
>>> 
>>> Shannon Quinn
>>> -----------------------
>>> M-1540: Reuters Example spectral clustering
>>> Also online docs for Spectral clustering
>>> 
>>> Ted Dunning
>>> -------------------
>>> M-1636: Class dependencies for Spark module are put in job.jar, which is
>>> inefficient
>> 

RE: Mahout 0.10.0 Bug bash

Posted by Saikat Kanjilal <sx...@hotmail.com>.
Suneel et al,I can take on some documentation related work and have offered help last week but didn't hear back, @AndrewP--any ideas on where I can start

> Date: Sat, 28 Mar 2015 13:49:19 -0400
> Subject: Re: Mahout 0.10.0 Bug bash
> From: suneel.marthi@gmail.com
> To: dev@mahout.apache.org
> 
> Seems like we are stretched pretty thin given the work load, not to mention
> that Mahout work is completely orthogonal to our paychecks.
> 
> Ted, Grant, Shannon - possible you guys could take some of the load??
> 
> On Sat, Mar 28, 2015 at 1:25 PM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
> 
> > Today's:
> >
> > Andrew Palumbo
> > --------------------------
> > M-1648: Update CMS for Mahout 0.10.0
> > M-1638: H2O bindings fail at drmParallelizeWithRowLabels
> > M-1477: Clean up website on Logistic Regression
> > M-1564: Naive Bayes classifier for new Text Documents
> > M-1635: Getting an exception when I provide classification labels manually
> > for Naive Bayes
> > M-1493: Port Naive Bayes to Spark DSL    (Patch available)
> > M-1559: Documentation and cleanup for Naive Bayes Example
> > M-1609: NullPointerException
> > M-1607: Spark-shell DAG scheduler
> >
> > Andrew Musselman
> > -----------------------------
> > M-1655: Refactor module dependencies
> > M-1522: Handle logging levels via log4j.xml
> > M-1563: cleanup Warnings during Build
> > M-1470: LDA Topic dump
> > M-1462: Cleaning up Random Forests documentation on Mahout website
> >
> > Dmitriy Lyubimov
> > --------------------------
> > M-1646: Refactor out all legacy MR dependencies from scala code
> >
> > Frank Scholten
> > ---------------------
> > M-1649: Lucene 5 upgrade
> > M-1625: lucene2seq: failure to convert a document that does not contain a
> > field (the field is not required)
> >
> > Pat Ferrel
> > -----------------
> > M-1589: mahout.cmd has duplicated content    (Patch available)
> > M-1618: co-occurence recommender example
> >
> > Suneel Marthi
> > ---------------------
> > M-1586: Collections downloads must have hash signatures
> > M-1647: The release build is incomplete
> > M-1652: Java 7 update
> > M-1512: Hadoop 2 compatibility
> > M-1469: Streaming KMeans fails when executed in MR mode and
> > REDUCE_STREAMING_KMEANS
> > set to true
> > M-1443: Update "How to Release" page    (Tagged 0.10.1)
> > M-1585: Javadocs not hosted by Mahout-Quality
> > M-1612: NPE during JSON outputformatter for clusterdump
> > M-1656: Change SNAPSHOT version from 1.0 to 0.10
> > M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf
> > M-1619: HighDFWordsPruner overwrites cache files
> >
> > Stevo Slavic
> > --------------------
> > M-1650: upgrade 3rd party jars
> > M-1602: Euclidean Distance Similarity Math
> > M-1278: Improve inheritance of apache parent pom
> > M-1562: Publish Scaladocs
> > M-1277: Lose dependency on custom commons-cli
> >
> > Shannon Quinn
> > -----------------------
> > M-1538: Port spectral clustering to Mahout DSL
> > M-1593: Implement affinity matrix computation in Mahout DSL
> > M-1540: Reuters Example spectral clustering Also online docs for Spectral
> > clustering
> > M-1659: Remove deprecated Lanczos solver from spectral clustering in
> > mr-legacy
> >
> > Ted Dunning
> > -------------------
> > M-1636: Class dependencies for Spark module are put in job.jar, which is
> > inefficient
> >
> > Sebastian Schelter
> > --------------------------
> > M-1584: Create a detailed example of how to index an arbitrary dataset and
> > run LDA on it    (Patch available)
> >
> > Gokhan Capan
> > ----------------------
> > M-1626: Support for required quasi-algebraic operations and starting with
> > aggregating rows/blocks
> >
> > Unassigned
> > ------------------
> > M-1594: Example factorize-movielens-1M.sh does not use HDFS    (Patch
> > available)
> > M-1593: cluster-reuters.sh does not work complaining
> > java.lang.IllegalStateException    (Patch available)
> > M-1557: Add support for sparse training vectors in MLP    (Patch available)
> > M-1516: run classify-20newsgroups.sh failed cause by
> > /tmp/mahout-work-jpan/20news-all does not exists in hdfs.    (Patch
> > available)
> > M-1643: CLI arguments are not being processed in spark-shell
> > M-1637: RecommenderJob of ALS fails in the mapper because it uses the
> > instance of other class
> > M-1634: ALS don't work when it adds new files in Distributed Cache
> >  (Patch available)
> > M-1633: Failure to execute query when solr index contains documents with
> > different fields
> > M-1551: Add document to describe how to use mlp with command line    (Patch
> > available)
> >
> > On Thu, Mar 26, 2015 at 7:07 PM, Suneel Marthi <su...@gmail.com>
> > wrote:
> >
> > > Ok here's the bug bash as of today
> > >
> > > Andrew Palumbo
> > > --------------------------
> > > M-1648: Update CMS for Mahout 0.10.0
> > > M-1638: H2O bindings fail at drmParallelizeWithRowLabels
> > > M-1564: Naive Bayes classifier for new Text Documents
> > > M-1635: Exception when providing classification Labels
> > > M-1493: Port Naive Bayes to Spark DSL
> > > M-1559: Documentation and cleanup for Naive Bayes Example
> > > M-1609: NullPointerException
> > > M-1607: Spark-shell DAG scheduler
> > >
> > > Andrew Musselman
> > > -----------------------------
> > > M-1655: Refactor module dependencies
> > > M-1563: cleanup Warnings during Build
> > > M-1470: LDA Topic dump
> > >
> > > Dmitriy Lyubimov
> > > --------------------------
> > > M-1646: Refactor out all legacy MR dependencies from scala code
> > >
> > > Frank Scholten
> > > ---------------------
> > > M-1649: Lucene 5 upgrade
> > >
> > > Pat Ferrel
> > > -----------------
> > > M-1589: mahout.cmd has duplicated content
> > > M-1618: co-occurence recommender example
> > >
> > > Suneel Marthi
> > > ---------------------
> > > M-1586: Collections downloads must have hash signatures
> > > M-1647: Release build
> > > M-1652: Java 7 update
> > > M-1512: Hadoop 2 compatibility
> > > M-1469: Streaming KMeans fails when executed in MR mode and
> > > REDUCE_STREAMING_KMEANS set to true
> > > M-1443: Update "How to Release" page
> > > M-1585: Javadocs not hosted by Mahout-Quality
> > > M-1612: NPE during JSON outputformatter for clusterdump
> > >
> > > Stevo Slavic
> > > --------------------
> > > M-1650: upgrade 3rd party jars
> > > M-1602: Euclidean Distance Similarity Math
> > > M-1278: Improve inheritance of apache parent pom
> > >
> > > Shannon Quinn
> > > -----------------------
> > > M-1540: Reuters Example spectral clustering
> > > Also online docs for Spectral clustering
> > >
> > > Ted Dunning
> > > -------------------
> > > M-1636: Class dependencies for Spark module are put in job.jar, which is
> > > inefficient
> > >
> >
 		 	   		  

Re: Mahout 0.10.0 Bug bash

Posted by Suneel Marthi <su...@gmail.com>.
Seems like we are stretched pretty thin given the work load, not to mention
that Mahout work is completely orthogonal to our paychecks.

Ted, Grant, Shannon - possible you guys could take some of the load??

On Sat, Mar 28, 2015 at 1:25 PM, Andrew Musselman <
andrew.musselman@gmail.com> wrote:

> Today's:
>
> Andrew Palumbo
> --------------------------
> M-1648: Update CMS for Mahout 0.10.0
> M-1638: H2O bindings fail at drmParallelizeWithRowLabels
> M-1477: Clean up website on Logistic Regression
> M-1564: Naive Bayes classifier for new Text Documents
> M-1635: Getting an exception when I provide classification labels manually
> for Naive Bayes
> M-1493: Port Naive Bayes to Spark DSL    (Patch available)
> M-1559: Documentation and cleanup for Naive Bayes Example
> M-1609: NullPointerException
> M-1607: Spark-shell DAG scheduler
>
> Andrew Musselman
> -----------------------------
> M-1655: Refactor module dependencies
> M-1522: Handle logging levels via log4j.xml
> M-1563: cleanup Warnings during Build
> M-1470: LDA Topic dump
> M-1462: Cleaning up Random Forests documentation on Mahout website
>
> Dmitriy Lyubimov
> --------------------------
> M-1646: Refactor out all legacy MR dependencies from scala code
>
> Frank Scholten
> ---------------------
> M-1649: Lucene 5 upgrade
> M-1625: lucene2seq: failure to convert a document that does not contain a
> field (the field is not required)
>
> Pat Ferrel
> -----------------
> M-1589: mahout.cmd has duplicated content    (Patch available)
> M-1618: co-occurence recommender example
>
> Suneel Marthi
> ---------------------
> M-1586: Collections downloads must have hash signatures
> M-1647: The release build is incomplete
> M-1652: Java 7 update
> M-1512: Hadoop 2 compatibility
> M-1469: Streaming KMeans fails when executed in MR mode and
> REDUCE_STREAMING_KMEANS
> set to true
> M-1443: Update "How to Release" page    (Tagged 0.10.1)
> M-1585: Javadocs not hosted by Mahout-Quality
> M-1612: NPE during JSON outputformatter for clusterdump
> M-1656: Change SNAPSHOT version from 1.0 to 0.10
> M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf
> M-1619: HighDFWordsPruner overwrites cache files
>
> Stevo Slavic
> --------------------
> M-1650: upgrade 3rd party jars
> M-1602: Euclidean Distance Similarity Math
> M-1278: Improve inheritance of apache parent pom
> M-1562: Publish Scaladocs
> M-1277: Lose dependency on custom commons-cli
>
> Shannon Quinn
> -----------------------
> M-1538: Port spectral clustering to Mahout DSL
> M-1593: Implement affinity matrix computation in Mahout DSL
> M-1540: Reuters Example spectral clustering Also online docs for Spectral
> clustering
> M-1659: Remove deprecated Lanczos solver from spectral clustering in
> mr-legacy
>
> Ted Dunning
> -------------------
> M-1636: Class dependencies for Spark module are put in job.jar, which is
> inefficient
>
> Sebastian Schelter
> --------------------------
> M-1584: Create a detailed example of how to index an arbitrary dataset and
> run LDA on it    (Patch available)
>
> Gokhan Capan
> ----------------------
> M-1626: Support for required quasi-algebraic operations and starting with
> aggregating rows/blocks
>
> Unassigned
> ------------------
> M-1594: Example factorize-movielens-1M.sh does not use HDFS    (Patch
> available)
> M-1593: cluster-reuters.sh does not work complaining
> java.lang.IllegalStateException    (Patch available)
> M-1557: Add support for sparse training vectors in MLP    (Patch available)
> M-1516: run classify-20newsgroups.sh failed cause by
> /tmp/mahout-work-jpan/20news-all does not exists in hdfs.    (Patch
> available)
> M-1643: CLI arguments are not being processed in spark-shell
> M-1637: RecommenderJob of ALS fails in the mapper because it uses the
> instance of other class
> M-1634: ALS don't work when it adds new files in Distributed Cache
>  (Patch available)
> M-1633: Failure to execute query when solr index contains documents with
> different fields
> M-1551: Add document to describe how to use mlp with command line    (Patch
> available)
>
> On Thu, Mar 26, 2015 at 7:07 PM, Suneel Marthi <su...@gmail.com>
> wrote:
>
> > Ok here's the bug bash as of today
> >
> > Andrew Palumbo
> > --------------------------
> > M-1648: Update CMS for Mahout 0.10.0
> > M-1638: H2O bindings fail at drmParallelizeWithRowLabels
> > M-1564: Naive Bayes classifier for new Text Documents
> > M-1635: Exception when providing classification Labels
> > M-1493: Port Naive Bayes to Spark DSL
> > M-1559: Documentation and cleanup for Naive Bayes Example
> > M-1609: NullPointerException
> > M-1607: Spark-shell DAG scheduler
> >
> > Andrew Musselman
> > -----------------------------
> > M-1655: Refactor module dependencies
> > M-1563: cleanup Warnings during Build
> > M-1470: LDA Topic dump
> >
> > Dmitriy Lyubimov
> > --------------------------
> > M-1646: Refactor out all legacy MR dependencies from scala code
> >
> > Frank Scholten
> > ---------------------
> > M-1649: Lucene 5 upgrade
> >
> > Pat Ferrel
> > -----------------
> > M-1589: mahout.cmd has duplicated content
> > M-1618: co-occurence recommender example
> >
> > Suneel Marthi
> > ---------------------
> > M-1586: Collections downloads must have hash signatures
> > M-1647: Release build
> > M-1652: Java 7 update
> > M-1512: Hadoop 2 compatibility
> > M-1469: Streaming KMeans fails when executed in MR mode and
> > REDUCE_STREAMING_KMEANS set to true
> > M-1443: Update "How to Release" page
> > M-1585: Javadocs not hosted by Mahout-Quality
> > M-1612: NPE during JSON outputformatter for clusterdump
> >
> > Stevo Slavic
> > --------------------
> > M-1650: upgrade 3rd party jars
> > M-1602: Euclidean Distance Similarity Math
> > M-1278: Improve inheritance of apache parent pom
> >
> > Shannon Quinn
> > -----------------------
> > M-1540: Reuters Example spectral clustering
> > Also online docs for Spectral clustering
> >
> > Ted Dunning
> > -------------------
> > M-1636: Class dependencies for Spark module are put in job.jar, which is
> > inefficient
> >
>

Re: Mahout 0.10.0 Bug bash

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Things like that question make me more suspicious. 

We really need to get a handle on the Hadoop version question.

I have run:

spark-itemsimilarity on Hadoop 1.2.1, 2.6.0 (fails), Andy ran it successfully on 2.2 and a user runs it on 2.4-MapR
2.6.0 seems to find the local file system with these lines:
  val conf = new Configuration()
  val fs = FileSystem.get(conf)
On the earlier versions of Hadoop, it finds the cluster, or pseudo cluster HDFS

I’ve run Any’s 20 new groups classifier test script on hadoop 1.2.1 with a classdef mismatch error, that probably means I built wrong. I’ll be testing that again Monday.

i’m building a 2.2.0 pseudo cluster and will run 20 news groups and spark-itemsimilairty Monday

I guess the big question is still 2.5 or 2.6 does anyone know why the two lines above would cause a problem in recent Hadoop versions? Does someone have a known good 2.6 cluster that they can try a couple tests on?


On Apr 5, 2015, at 9:52 AM, Andrew Musselman <an...@gmail.com> wrote:

I wonder if that HDFS/FS issue is the same problem I have with
cluster-reuters.sh.

On Sunday, April 5, 2015, Pat Ferrel <pa...@occamsmachete.com> wrote:

> Very few of these are on the “official” ticket list here:
> 
> https://issues.apache.org/jira/browse/MAHOUT-1648?filter=-4&jql=project%20%3D%20MAHOUT%20AND%20status%20in%20(Open%2C%20Reopened)%20AND%20priority%20in%20(Blocker%2C%20Critical)%20ORDER%20BY%20createdDate%20DESC
> 
> M-1674
> M-1665
> M-1648
> 
> The next time this is published it would be great to get versions of
> Hadoop people are using and what has actually been run on a cluster or
> pseudo cluster, under yarn etc. I’m increasingly suspicious that we don’t
> run uniformly on Hadoop 2.5-2.6 but have no hard evidence. I’ve failed on
> H2.6.0 but may not have an airtight configuration. If anyone has this
> config woking I can supply a very simple test.
> 
> The failure happens when an HDFS path gets applied to the raw local
> filesystem, even though hadoop 2.6 HDFS is running and MAHOUT-LOCAL is
> unset. The root of the error I’ve seen is in getting the FileSystem, which
> always returns the local one.
> 
> 
> M-1674 is new and was found on Friday. Dmitriy already has a private fix
> but can’t commit it so I think we need a workaround.
> 
> On Apr 4, 2015, at 8:46 PM, Suneel Marthi <suneel.marthi@gmail.com
> <javascript:;>> wrote:
> 
> Saturday(2 days before code freeze). The code freeze's gonna be on Monday -
> April 6.  Please address ur assigned JIRAs on time.
> 
> Anand Avati
> -------------------------
> 
> M-1622: Multithreaded batch Item similarities output incorrect similarities
> M-1605: Make Visualizer test locale independent
> 
> Andrew Palumbo
> --------------------------
> M-1559: Add documentation for Wikipedia example
> M-1648: Update CMS for Mahout 0.10.0
> 
> Andrew Musselman
> -----------------------------
> M-1462: Cleaning up Random Forests documentation on Mahout website
> M-1470: LDA Topic dump
> M-1655: Refactor module dependencies
> M-1658: KMeans fails when run on Hadoop clusters
> 
> Frank Scholten
> ---------------------
> M-1625: lucene2seq: failure to convert a document that does not contain a
> field (the field is not required)
> M-1633: Failure to execute query when solr index contains documents with
> different fields
> M-1649: Lucene 5 upgrade
> 
> Pat Ferrel
> -----------------
> M-1507: Support input and output using user defined ID wherever possible
> M-1588: Multiple Input path support in Recommenders
> 
> Stevo Slavic
> --------------------
> M-1277: Lose dependency on custom commons-cli
> M-1278: Improve inheritance of apache parent pom
> M-1562: Publish Scaladocs
> M-1585: Javadocs are not hosted By Mahout Quality
> M-1650: upgrade 3rd party jars
> 
> Suneel Marthi
> ---------------------
> M-1469: Streaming KMeans fails when executed in MR mode and
> REDUCE_STREAMING_KMEANS set to true
> M-1512: Hadoop 2 compatibility
> M-1652: Java 7 update
> M-1630: Incorrect SparseMatrix.numColSlices() causes IllegalStateException
> 
> Ted Dunning
> -------------------
> 
> M-1672: TDigest update to 3.1 in OnlineSummarizers
> 
> Unassigned
> ------------------
> M-1551: Add document to describe how to use mlp with command line    (Patch
> available)
> M-1637: RecommenderJob of ALS fails in the mapper because it uses the
> instance of other classs
> 
> 


Re: Mahout 0.10.0 Bug bash

Posted by Andrew Musselman <an...@gmail.com>.
I wonder if that HDFS/FS issue is the same problem I have with
cluster-reuters.sh.

On Sunday, April 5, 2015, Pat Ferrel <pa...@occamsmachete.com> wrote:

> Very few of these are on the “official” ticket list here:
>
> https://issues.apache.org/jira/browse/MAHOUT-1648?filter=-4&jql=project%20%3D%20MAHOUT%20AND%20status%20in%20(Open%2C%20Reopened)%20AND%20priority%20in%20(Blocker%2C%20Critical)%20ORDER%20BY%20createdDate%20DESC
>
> M-1674
> M-1665
> M-1648
>
> The next time this is published it would be great to get versions of
> Hadoop people are using and what has actually been run on a cluster or
> pseudo cluster, under yarn etc. I’m increasingly suspicious that we don’t
> run uniformly on Hadoop 2.5-2.6 but have no hard evidence. I’ve failed on
> H2.6.0 but may not have an airtight configuration. If anyone has this
> config woking I can supply a very simple test.
>
> The failure happens when an HDFS path gets applied to the raw local
> filesystem, even though hadoop 2.6 HDFS is running and MAHOUT-LOCAL is
> unset. The root of the error I’ve seen is in getting the FileSystem, which
> always returns the local one.
>
>
> M-1674 is new and was found on Friday. Dmitriy already has a private fix
> but can’t commit it so I think we need a workaround.
>
> On Apr 4, 2015, at 8:46 PM, Suneel Marthi <suneel.marthi@gmail.com
> <javascript:;>> wrote:
>
> Saturday(2 days before code freeze). The code freeze's gonna be on Monday -
> April 6.  Please address ur assigned JIRAs on time.
>
> Anand Avati
> -------------------------
>
> M-1622: Multithreaded batch Item similarities output incorrect similarities
> M-1605: Make Visualizer test locale independent
>
> Andrew Palumbo
> --------------------------
> M-1559: Add documentation for Wikipedia example
> M-1648: Update CMS for Mahout 0.10.0
>
> Andrew Musselman
> -----------------------------
> M-1462: Cleaning up Random Forests documentation on Mahout website
> M-1470: LDA Topic dump
> M-1655: Refactor module dependencies
> M-1658: KMeans fails when run on Hadoop clusters
>
> Frank Scholten
> ---------------------
> M-1625: lucene2seq: failure to convert a document that does not contain a
> field (the field is not required)
> M-1633: Failure to execute query when solr index contains documents with
> different fields
> M-1649: Lucene 5 upgrade
>
> Pat Ferrel
> -----------------
> M-1507: Support input and output using user defined ID wherever possible
> M-1588: Multiple Input path support in Recommenders
>
> Stevo Slavic
> --------------------
> M-1277: Lose dependency on custom commons-cli
> M-1278: Improve inheritance of apache parent pom
> M-1562: Publish Scaladocs
> M-1585: Javadocs are not hosted By Mahout Quality
> M-1650: upgrade 3rd party jars
>
> Suneel Marthi
> ---------------------
> M-1469: Streaming KMeans fails when executed in MR mode and
> REDUCE_STREAMING_KMEANS set to true
> M-1512: Hadoop 2 compatibility
> M-1652: Java 7 update
> M-1630: Incorrect SparseMatrix.numColSlices() causes IllegalStateException
>
> Ted Dunning
> -------------------
>
> M-1672: TDigest update to 3.1 in OnlineSummarizers
>
> Unassigned
> ------------------
> M-1551: Add document to describe how to use mlp with command line    (Patch
> available)
> M-1637: RecommenderJob of ALS fails in the mapper because it uses the
> instance of other classs
>
>

Re: Mahout 0.10.0 Bug bash

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Very few of these are on the “official” ticket list here:
https://issues.apache.org/jira/browse/MAHOUT-1648?filter=-4&jql=project%20%3D%20MAHOUT%20AND%20status%20in%20(Open%2C%20Reopened)%20AND%20priority%20in%20(Blocker%2C%20Critical)%20ORDER%20BY%20createdDate%20DESC

M-1674
M-1665
M-1648

The next time this is published it would be great to get versions of Hadoop people are using and what has actually been run on a cluster or pseudo cluster, under yarn etc. I’m increasingly suspicious that we don’t run uniformly on Hadoop 2.5-2.6 but have no hard evidence. I’ve failed on H2.6.0 but may not have an airtight configuration. If anyone has this config woking I can supply a very simple test.

The failure happens when an HDFS path gets applied to the raw local filesystem, even though hadoop 2.6 HDFS is running and MAHOUT-LOCAL is unset. The root of the error I’ve seen is in getting the FileSystem, which always returns the local one.


M-1674 is new and was found on Friday. Dmitriy already has a private fix but can’t commit it so I think we need a workaround.

On Apr 4, 2015, at 8:46 PM, Suneel Marthi <su...@gmail.com> wrote:

Saturday(2 days before code freeze). The code freeze's gonna be on Monday -
April 6.  Please address ur assigned JIRAs on time.

Anand Avati
-------------------------

M-1622: Multithreaded batch Item similarities output incorrect similarities
M-1605: Make Visualizer test locale independent

Andrew Palumbo
--------------------------
M-1559: Add documentation for Wikipedia example
M-1648: Update CMS for Mahout 0.10.0

Andrew Musselman
-----------------------------
M-1462: Cleaning up Random Forests documentation on Mahout website
M-1470: LDA Topic dump
M-1655: Refactor module dependencies
M-1658: KMeans fails when run on Hadoop clusters

Frank Scholten
---------------------
M-1625: lucene2seq: failure to convert a document that does not contain a
field (the field is not required)
M-1633: Failure to execute query when solr index contains documents with
different fields
M-1649: Lucene 5 upgrade

Pat Ferrel
-----------------
M-1507: Support input and output using user defined ID wherever possible
M-1588: Multiple Input path support in Recommenders

Stevo Slavic
--------------------
M-1277: Lose dependency on custom commons-cli
M-1278: Improve inheritance of apache parent pom
M-1562: Publish Scaladocs
M-1585: Javadocs are not hosted By Mahout Quality
M-1650: upgrade 3rd party jars

Suneel Marthi
---------------------
M-1469: Streaming KMeans fails when executed in MR mode and
REDUCE_STREAMING_KMEANS set to true
M-1512: Hadoop 2 compatibility
M-1652: Java 7 update
M-1630: Incorrect SparseMatrix.numColSlices() causes IllegalStateException

Ted Dunning
-------------------

M-1672: TDigest update to 3.1 in OnlineSummarizers

Unassigned
------------------
M-1551: Add document to describe how to use mlp with command line    (Patch
available)
M-1637: RecommenderJob of ALS fails in the mapper because it uses the
instance of other classs


Re: Mahout 0.10.0 Bug bash

Posted by Suneel Marthi <su...@gmail.com>.
Saturday(2 days before code freeze). The code freeze's gonna be on Monday -
April 6.  Please address ur assigned JIRAs on time.

Anand Avati
-------------------------

M-1622: Multithreaded batch Item similarities output incorrect similarities
M-1605: Make Visualizer test locale independent

Andrew Palumbo
--------------------------
M-1559: Add documentation for Wikipedia example
M-1648: Update CMS for Mahout 0.10.0

Andrew Musselman
-----------------------------
M-1462: Cleaning up Random Forests documentation on Mahout website
M-1470: LDA Topic dump
M-1655: Refactor module dependencies
M-1658: KMeans fails when run on Hadoop clusters

Frank Scholten
---------------------
M-1625: lucene2seq: failure to convert a document that does not contain a
field (the field is not required)
M-1633: Failure to execute query when solr index contains documents with
different fields
M-1649: Lucene 5 upgrade

Pat Ferrel
-----------------
M-1507: Support input and output using user defined ID wherever possible
M-1588: Multiple Input path support in Recommenders

Stevo Slavic
--------------------
M-1277: Lose dependency on custom commons-cli
M-1278: Improve inheritance of apache parent pom
M-1562: Publish Scaladocs
M-1585: Javadocs are not hosted By Mahout Quality
M-1650: upgrade 3rd party jars

Suneel Marthi
---------------------
M-1469: Streaming KMeans fails when executed in MR mode and
REDUCE_STREAMING_KMEANS set to true
M-1512: Hadoop 2 compatibility
M-1652: Java 7 update
M-1630: Incorrect SparseMatrix.numColSlices() causes IllegalStateException

Ted Dunning
-------------------

M-1672: TDigest update to 3.1 in OnlineSummarizers

Unassigned
------------------
M-1551: Add document to describe how to use mlp with command line    (Patch
available)
M-1637: RecommenderJob of ALS fails in the mapper because it uses the
instance of other classs

Re: Mahout 0.10.0 Bug bash

Posted by Andrew Musselman <an...@gmail.com>.
Wednesday(*four days from code freeze Sunday*); some progress:

Andrew Palumbo
--------------------------
M-1493: Port Naive Bayes to Spark DSL    (Patch available)
M-1559: Documentation and cleanup for Naive Bayes Example
M-1564: Naive Bayes classifier for new Text Documents
M-1617: 404 error on link in cluster-dumper tutorial page
M-1635: Getting an exception when I provide classification labels manually
for Naive Bayes
M-1638: H2O bindings fail at drmParallelizeWithRowLabels
M-1648: Update CMS for Mahout 0.10.0

Andrew Musselman
-----------------------------
M-1462: Cleaning up Random Forests documentation on Mahout website
M-1470: LDA Topic dump
M-1655: Refactor module dependencies

Dmitriy Lyubimov
--------------------------
M-1646: Refactor out all legacy MR dependencies from scala code

Frank Scholten
---------------------
M-1625: lucene2seq: failure to convert a document that does not contain a
field (the field is not required)
M-1633: Failure to execute query when solr index contains documents with
different fields
M-1649: Lucene 5 upgrade

Gokhan Capan
----------------------
M-1626: Support for required quasi-algebraic operations and starting with
aggregating rows/blocks

Pat Ferrel
-----------------
M-1507: Support input and output using user defined ID wherever possible

Sebastian Schelter
--------------------------
M-1584: Create a detailed example of how to index an arbitrary dataset and
run LDA on it    (Patch available)

Shannon Quinn
-----------------------
M-1661: Remove Lanczos from the code base
M-1662: Potential Path bug in SequenceFileVaultIterator breaks
DisplaySpectralKMeans

Stevo Slavic
--------------------
M-1277: Lose dependency on custom commons-cli
M-1278: Improve inheritance of apache parent pom
M-1562: Publish Scaladocs
M-1585: Javadocs are not hosted By Mahout Quality
M-1650: upgrade 3rd party jars

Suneel Marthi
---------------------
M-1469: Streaming KMeans fails when executed in MR mode and
REDUCE_STREAMING_KMEANS set to true
M-1512: Hadoop 2 compatibility
M-1586: Collections downloads must have hash signatures
M-1647: The release build is incomplete
M-1652: Java 7 update
M-1656: Change SNAPSHOT version from 1.0 to 0.10

Unassigned
------------------
M-1551: Add document to describe how to use mlp with command line    (Patch
available)
M-1637: RecommenderJob of ALS fails in the mapper because it uses the
instance of other classs

Re: Mahout 0.10.0 Bug bash

Posted by Suneel Marthi <su...@gmail.com>.
Tuesday (5 days from code freeze Sunday)

Andrew Palumbo
--------------------------
M-1493: Port Naive Bayes to Spark DSL    (Patch available)
M-1559: Documentation and cleanup for Naive Bayes Example
M-1564: Naive Bayes classifier for new Text Documents
M-1635: Getting an exception when I provide classification labels manually
for Naive Bayes
M-1638: H2O bindings fail at drmParallelizeWithRowLabels
M-1648: Update CMS for Mahout 0.10.0

Andrew Musselman
-----------------------------
M-1462: Cleaning up Random Forests documentation on Mahout website
M-1470: LDA Topic dump
M-1522: Handle logging levels via log4j.xml
M-1655: Refactor module dependencies

Dmitriy Lyubimov
--------------------------
M-1646: Refactor out all legacy MR dependencies from scala code

Frank Scholten
---------------------
M-1625: lucene2seq: failure to convert a document that does not contain a
field (the field is not required)
M-1633: Failure to execute query when solr index contains documents with
different fields
M-1649: Lucene 5 upgrade

Gokhan Capan
----------------------
M-1626: Support for required quasi-algebraic operations and starting with
aggregating rows/blocks

Pat Ferrel
-----------------
M-1507: Support input and output using user defined ID wherever possible
M-1589: mahout.cmd has duplicated content    (Patch available)

>>> Pat can't we just close this as 'Will not Fix"

Sebastian Schelter
--------------------------
M-1584: Create a detailed example of how to index an arbitrary dataset and
run LDA on it    (Patch available)

Shannon Quinn
-----------------------
M-1661: Deprecate Lanczos from the code base (Patch Available)
M-1662: Potential Path bug in SequenceFileVaultIterator breaks
DisplaySpectralKMeans

Stevo Slavic
--------------------
M-1277: Lose dependency on custom commons-cli
M-1278: Improve inheritance of apache parent pom
M-1562: Publish Scaladocs
M-1585: Javadocs are not hosted By Mahout Quality
M-1650: upgrade 3rd party jars

Suneel Marthi
---------------------
M-1469: Streaming KMeans fails when executed in MR mode and
REDUCE_STREAMING_KMEANS set to true
M-1512: Hadoop 2 compatibility
M-1586: Collections downloads must have hash signatures
M-1647: The release build is incomplete
M-1652: Java 7 update
M-1656: Change SNAPSHOT version from 1.0 to 0.10
M-1593: cluster-reuters.sh does not work complaining
java.lang.IllegalStateException    (Patch available)

Unassigned
------------------
M-1551: Add document to describe how to use mlp with command line    (Patch
available)
M-1557: Add support for sparse training vectors in MLP    (Patch available)
M-1594: Example factorize-movielens-1M.sh does not use HDFS    (Patch
available)
M-1634: ALS don't work when it adds new files in Distributed Cache
 (Patch available)
M-1637: RecommenderJob of ALS fails in the mapper because it uses the
instance of other class

Re: Mahout 0.10.0 Bug bash

Posted by Andrew Musselman <an...@gmail.com>.
Monday(six days from code freeze Sunday)

Andrew Palumbo
--------------------------
M-1493: Port Naive Bayes to Spark DSL    (Patch available)
M-1559: Documentation and cleanup for Naive Bayes Example
M-1564: Naive Bayes classifier for new Text Documents
M-1635: Getting an exception when I provide classification labels manually
for Naive Bayes
M-1638: H2O bindings fail at drmParallelizeWithRowLabels
M-1648: Update CMS for Mahout 0.10.0

Andrew Musselman
-----------------------------
M-1462: Cleaning up Random Forests documentation on Mahout website
M-1470: LDA Topic dump
M-1522: Handle logging levels via log4j.xml
M-1655: Refactor module dependencies

Dmitriy Lyubimov
--------------------------
M-1646: Refactor out all legacy MR dependencies from scala code

Frank Scholten
---------------------
M-1625: lucene2seq: failure to convert a document that does not contain a
field (the field is not required)
M-1633: Failure to execute query when solr index contains documents with
different fields
M-1649: Lucene 5 upgrade

Gokhan Capan
----------------------
M-1626: Support for required quasi-algebraic operations and starting with
aggregating rows/blocks

Pat Ferrel
-----------------
M-1507: Support input and output using user defined ID wherever possible
M-1589: mahout.cmd has duplicated content    (Patch available)

Sebastian Schelter
--------------------------
M-1584: Create a detailed example of how to index an arbitrary dataset and
run LDA on it    (Patch available)

Shannon Quinn
-----------------------
M-1661: Remove Lanczos from the code base
M-1662: Potential Path bug in SequenceFileVaultIterator breaks
DisplaySpectralKMeans

Stevo Slavic
--------------------
M-1277: Lose dependency on custom commons-cli
M-1278: Improve inheritance of apache parent pom
M-1562: Publish Scaladocs
M-1585: Javadocs are not hosted By Mahout Quality
M-1602: Euclidean Distance Similarity Math
M-1650: upgrade 3rd party jars

Suneel Marthi
---------------------
M-1469: Streaming KMeans fails when executed in MR mode and
REDUCE_STREAMING_KMEANS set to true
M-1512: Hadoop 2 compatibility
M-1586: Collections downloads must have hash signatures
M-1619: HighDFWordsPruner overwrites cache files
M-1647: The release build is incomplete
M-1652: Java 7 update
M-1656: Change SNAPSHOT version from 1.0 to 0.10
M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf

Unassigned
------------------
M-1551: Add document to describe how to use mlp with command line    (Patch
available)
M-1557: Add support for sparse training vectors in MLP    (Patch available)
M-1593: cluster-reuters.sh does not work complaining
java.lang.IllegalStateException    (Patch available)
M-1594: Example factorize-movielens-1M.sh does not use HDFS    (Patch
available)
M-1634: ALS don't work when it adds new files in Distributed Cache
 (Patch available)
M-1637: RecommenderJob of ALS fails in the mapper because it uses the
instance of other classs

Re: Mahout 0.10.0 Bug bash

Posted by Andrew Palumbo <ap...@outlook.com>.
Sometimes it comes up and sometimes it doesn't, but it is resolved.

On 03/29/2015 01:57 PM, Suneel Marthi wrote:
> yeah i noticed the weirdness with M-1609 too. Well lets keep that out of
> the daily bug bash.
>
> On Sun, Mar 29, 2015 at 1:55 PM, Andrew Palumbo <ap...@outlook.com> wrote:
>
>> yeah there's something weird going on with  M-1609, but I closed it on
>> Friday.
>>
>>
>> On 03/29/2015 12:36 PM, Andrew Musselman wrote:
>>
>>> Sunday's:
>>>
>>> Andrew Palumbo
>>> --------------------------
>>> M-1477: Clean up website on Logistic Regression
>>> M-1493: Port Naive Bayes to Spark DSL    (Patch available)
>>> M-1559: Documentation and cleanup for Naive Bayes Example
>>> M-1564: Naive Bayes classifier for new Text Documents
>>> M-1609: NullPointerException    (This bug is not showing up aside from its
>>> title)
>>> M-1635: Getting an exception when I provide classification labels manually
>>> for Naive Bayes
>>> M-1638: H2O bindings fail at drmParallelizeWithRowLabels
>>> M-1648: Update CMS for Mahout 0.10.0
>>>
>>> Andrew Musselman
>>> -----------------------------
>>> M-1462: Cleaning up Random Forests documentation on Mahout website
>>> M-1470: LDA Topic dump
>>> M-1522: Handle logging levels via log4j.xml
>>> M-1563: cleanup Warnings during Build
>>> M-1655: Refactor module dependencies
>>>
>>> Dmitriy Lyubimov
>>> --------------------------
>>> M-1646: Refactor out all legacy MR dependencies from scala code
>>>
>>> Frank Scholten
>>> ---------------------
>>> M-1625: lucene2seq: failure to convert a document that does not contain a
>>> field (the field is not required)
>>> M-1649: Lucene 5 upgrade
>>>
>>> Pat Ferrel
>>> -----------------
>>> M-1589: mahout.cmd has duplicated content    (Patch available)
>>>
>>> Suneel Marthi
>>> ---------------------
>>> M-1469: Streaming KMeans fails when executed in MR mode and
>>> REDUCE_STREAMING_KMEANS set to true
>>> M-1512: Hadoop 2 compatibility
>>> M-1585: Javadocs not hosted by Mahout-Quality
>>> M-1586: Collections downloads must have hash signatures
>>> M-1619: HighDFWordsPruner overwrites cache files
>>> M-1647: The release build is incomplete
>>> M-1652: Java 7 update
>>> M-1656: Change SNAPSHOT version from 1.0 to 0.10
>>> M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf
>>>
>>> Stevo Slavic
>>> --------------------
>>> M-1277: Lose dependency on custom commons-cli
>>> M-1278: Improve inheritance of apache parent pom
>>> M-1562: Publish Scaladocs
>>> M-1602: Euclidean Distance Similarity Math
>>> M-1650: upgrade 3rd party jars
>>>
>>> Shannon Quinn
>>> -----------------------
>>> M-1538: Port spectral clustering to Mahout DSL
>>> M-1539: Implement affinity matrix computation in Mahout DSL
>>> M-1659: Remove deprecated Lanczos solver from spectral clustering in
>>> mr-legacy
>>>
>>> Sebastian Schelter
>>> --------------------------
>>> M-1584: Create a detailed example of how to index an arbitrary dataset and
>>> run LDA on it    (Patch available)
>>>
>>> Gokhan Capan
>>> ----------------------
>>> M-1626: Support for required quasi-algebraic operations and starting with
>>> aggregating rows/blocks
>>>
>>> Unassigned
>>> ------------------
>>> M-1516: run classify-20newsgroups.sh failed cause by
>>> /tmp/mahout-work-jpan/20news-all does not exists in hdfs.    (Patch
>>> available)
>>> M-1551: Add document to describe how to use mlp with command line
>>> (Patch
>>> available)
>>> M-1557: Add support for sparse training vectors in MLP    (Patch
>>> available)
>>> M-1593: cluster-reuters.sh does not work complaining
>>> java.lang.IllegalStateException    (Patch available)
>>> M-1594: Example factorize-movielens-1M.sh does not use HDFS    (Patch
>>> available)
>>> M-1633: Failure to execute query when solr index contains documents with
>>> different fields
>>> M-1634: ALS don't work when it adds new files in Distributed Cache
>>>    (Patch available)
>>> M-1637: RecommenderJob of ALS fails in the mapper because it uses the
>>> instance of other class
>>>
>>>


Re: Mahout 0.10.0 Bug bash

Posted by Suneel Marthi <su...@gmail.com>.
yeah i noticed the weirdness with M-1609 too. Well lets keep that out of
the daily bug bash.

On Sun, Mar 29, 2015 at 1:55 PM, Andrew Palumbo <ap...@outlook.com> wrote:

> yeah there's something weird going on with  M-1609, but I closed it on
> Friday.
>
>
> On 03/29/2015 12:36 PM, Andrew Musselman wrote:
>
>> Sunday's:
>>
>> Andrew Palumbo
>> --------------------------
>> M-1477: Clean up website on Logistic Regression
>> M-1493: Port Naive Bayes to Spark DSL    (Patch available)
>> M-1559: Documentation and cleanup for Naive Bayes Example
>> M-1564: Naive Bayes classifier for new Text Documents
>> M-1609: NullPointerException    (This bug is not showing up aside from its
>> title)
>> M-1635: Getting an exception when I provide classification labels manually
>> for Naive Bayes
>> M-1638: H2O bindings fail at drmParallelizeWithRowLabels
>> M-1648: Update CMS for Mahout 0.10.0
>>
>> Andrew Musselman
>> -----------------------------
>> M-1462: Cleaning up Random Forests documentation on Mahout website
>> M-1470: LDA Topic dump
>> M-1522: Handle logging levels via log4j.xml
>> M-1563: cleanup Warnings during Build
>> M-1655: Refactor module dependencies
>>
>> Dmitriy Lyubimov
>> --------------------------
>> M-1646: Refactor out all legacy MR dependencies from scala code
>>
>> Frank Scholten
>> ---------------------
>> M-1625: lucene2seq: failure to convert a document that does not contain a
>> field (the field is not required)
>> M-1649: Lucene 5 upgrade
>>
>> Pat Ferrel
>> -----------------
>> M-1589: mahout.cmd has duplicated content    (Patch available)
>>
>> Suneel Marthi
>> ---------------------
>> M-1469: Streaming KMeans fails when executed in MR mode and
>> REDUCE_STREAMING_KMEANS set to true
>> M-1512: Hadoop 2 compatibility
>> M-1585: Javadocs not hosted by Mahout-Quality
>> M-1586: Collections downloads must have hash signatures
>> M-1619: HighDFWordsPruner overwrites cache files
>> M-1647: The release build is incomplete
>> M-1652: Java 7 update
>> M-1656: Change SNAPSHOT version from 1.0 to 0.10
>> M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf
>>
>> Stevo Slavic
>> --------------------
>> M-1277: Lose dependency on custom commons-cli
>> M-1278: Improve inheritance of apache parent pom
>> M-1562: Publish Scaladocs
>> M-1602: Euclidean Distance Similarity Math
>> M-1650: upgrade 3rd party jars
>>
>> Shannon Quinn
>> -----------------------
>> M-1538: Port spectral clustering to Mahout DSL
>> M-1539: Implement affinity matrix computation in Mahout DSL
>> M-1659: Remove deprecated Lanczos solver from spectral clustering in
>> mr-legacy
>>
>> Sebastian Schelter
>> --------------------------
>> M-1584: Create a detailed example of how to index an arbitrary dataset and
>> run LDA on it    (Patch available)
>>
>> Gokhan Capan
>> ----------------------
>> M-1626: Support for required quasi-algebraic operations and starting with
>> aggregating rows/blocks
>>
>> Unassigned
>> ------------------
>> M-1516: run classify-20newsgroups.sh failed cause by
>> /tmp/mahout-work-jpan/20news-all does not exists in hdfs.    (Patch
>> available)
>> M-1551: Add document to describe how to use mlp with command line
>> (Patch
>> available)
>> M-1557: Add support for sparse training vectors in MLP    (Patch
>> available)
>> M-1593: cluster-reuters.sh does not work complaining
>> java.lang.IllegalStateException    (Patch available)
>> M-1594: Example factorize-movielens-1M.sh does not use HDFS    (Patch
>> available)
>> M-1633: Failure to execute query when solr index contains documents with
>> different fields
>> M-1634: ALS don't work when it adds new files in Distributed Cache
>>   (Patch available)
>> M-1637: RecommenderJob of ALS fails in the mapper because it uses the
>> instance of other class
>>
>>
>

Re: Mahout 0.10.0 Bug bash

Posted by Andrew Palumbo <ap...@outlook.com>.
yeah there's something weird going on with  M-1609, but I closed it on 
Friday.

On 03/29/2015 12:36 PM, Andrew Musselman wrote:
> Sunday's:
>
> Andrew Palumbo
> --------------------------
> M-1477: Clean up website on Logistic Regression
> M-1493: Port Naive Bayes to Spark DSL    (Patch available)
> M-1559: Documentation and cleanup for Naive Bayes Example
> M-1564: Naive Bayes classifier for new Text Documents
> M-1609: NullPointerException    (This bug is not showing up aside from its
> title)
> M-1635: Getting an exception when I provide classification labels manually
> for Naive Bayes
> M-1638: H2O bindings fail at drmParallelizeWithRowLabels
> M-1648: Update CMS for Mahout 0.10.0
>
> Andrew Musselman
> -----------------------------
> M-1462: Cleaning up Random Forests documentation on Mahout website
> M-1470: LDA Topic dump
> M-1522: Handle logging levels via log4j.xml
> M-1563: cleanup Warnings during Build
> M-1655: Refactor module dependencies
>
> Dmitriy Lyubimov
> --------------------------
> M-1646: Refactor out all legacy MR dependencies from scala code
>
> Frank Scholten
> ---------------------
> M-1625: lucene2seq: failure to convert a document that does not contain a
> field (the field is not required)
> M-1649: Lucene 5 upgrade
>
> Pat Ferrel
> -----------------
> M-1589: mahout.cmd has duplicated content    (Patch available)
>
> Suneel Marthi
> ---------------------
> M-1469: Streaming KMeans fails when executed in MR mode and
> REDUCE_STREAMING_KMEANS set to true
> M-1512: Hadoop 2 compatibility
> M-1585: Javadocs not hosted by Mahout-Quality
> M-1586: Collections downloads must have hash signatures
> M-1619: HighDFWordsPruner overwrites cache files
> M-1647: The release build is incomplete
> M-1652: Java 7 update
> M-1656: Change SNAPSHOT version from 1.0 to 0.10
> M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf
>
> Stevo Slavic
> --------------------
> M-1277: Lose dependency on custom commons-cli
> M-1278: Improve inheritance of apache parent pom
> M-1562: Publish Scaladocs
> M-1602: Euclidean Distance Similarity Math
> M-1650: upgrade 3rd party jars
>
> Shannon Quinn
> -----------------------
> M-1538: Port spectral clustering to Mahout DSL
> M-1539: Implement affinity matrix computation in Mahout DSL
> M-1659: Remove deprecated Lanczos solver from spectral clustering in
> mr-legacy
>
> Sebastian Schelter
> --------------------------
> M-1584: Create a detailed example of how to index an arbitrary dataset and
> run LDA on it    (Patch available)
>
> Gokhan Capan
> ----------------------
> M-1626: Support for required quasi-algebraic operations and starting with
> aggregating rows/blocks
>
> Unassigned
> ------------------
> M-1516: run classify-20newsgroups.sh failed cause by
> /tmp/mahout-work-jpan/20news-all does not exists in hdfs.    (Patch
> available)
> M-1551: Add document to describe how to use mlp with command line    (Patch
> available)
> M-1557: Add support for sparse training vectors in MLP    (Patch available)
> M-1593: cluster-reuters.sh does not work complaining
> java.lang.IllegalStateException    (Patch available)
> M-1594: Example factorize-movielens-1M.sh does not use HDFS    (Patch
> available)
> M-1633: Failure to execute query when solr index contains documents with
> different fields
> M-1634: ALS don't work when it adds new files in Distributed Cache
>   (Patch available)
> M-1637: RecommenderJob of ALS fails in the mapper because it uses the
> instance of other class
>


Re: Mahout 0.10.0 Bug bash

Posted by Andrew Musselman <an...@gmail.com>.
Yes, reminder we want to freeze/slush next Sunday.

If you won't be able to finish your bugs let's do some more triage and
split up work.

On Sunday, March 29, 2015, Suneel Marthi <su...@gmail.com> wrote:

> A daily "politely harsh' reminder of the April 5 code freeze date with the
> daily bug bash would be helpful.
>
> On Sun, Mar 29, 2015 at 12:36 PM, Andrew Musselman <
> andrew.musselman@gmail.com <javascript:;>> wrote:
>
> > Sunday's:
> >
> > Andrew Palumbo
> > --------------------------
> > M-1477: Clean up website on Logistic Regression
> > M-1493: Port Naive Bayes to Spark DSL    (Patch available)
> > M-1559: Documentation and cleanup for Naive Bayes Example
> > M-1564: Naive Bayes classifier for new Text Documents
> > M-1609: NullPointerException    (This bug is not showing up aside from
> its
> > title)
> > M-1635: Getting an exception when I provide classification labels
> manually
> > for Naive Bayes
> > M-1638: H2O bindings fail at drmParallelizeWithRowLabels
> > M-1648: Update CMS for Mahout 0.10.0
> >
> > Andrew Musselman
> > -----------------------------
> > M-1462: Cleaning up Random Forests documentation on Mahout website
> > M-1470: LDA Topic dump
> > M-1522: Handle logging levels via log4j.xml
> > M-1563: cleanup Warnings during Build
> > M-1655: Refactor module dependencies
> >
> > Dmitriy Lyubimov
> > --------------------------
> > M-1646: Refactor out all legacy MR dependencies from scala code
> >
> > Frank Scholten
> > ---------------------
> > M-1625: lucene2seq: failure to convert a document that does not contain a
> > field (the field is not required)
> > M-1649: Lucene 5 upgrade
> >
> > Pat Ferrel
> > -----------------
> > M-1589: mahout.cmd has duplicated content    (Patch available)
> >
> > Suneel Marthi
> > ---------------------
> > M-1469: Streaming KMeans fails when executed in MR mode and
> > REDUCE_STREAMING_KMEANS set to true
> > M-1512: Hadoop 2 compatibility
> > M-1585: Javadocs not hosted by Mahout-Quality
> > M-1586: Collections downloads must have hash signatures
> > M-1619: HighDFWordsPruner overwrites cache files
> > M-1647: The release build is incomplete
> > M-1652: Java 7 update
> > M-1656: Change SNAPSHOT version from 1.0 to 0.10
> > M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf
> >
> > Stevo Slavic
> > --------------------
> > M-1277: Lose dependency on custom commons-cli
> > M-1278: Improve inheritance of apache parent pom
> > M-1562: Publish Scaladocs
> > M-1602: Euclidean Distance Similarity Math
> > M-1650: upgrade 3rd party jars
> >
> > Shannon Quinn
> > -----------------------
> > M-1538: Port spectral clustering to Mahout DSL
> > M-1539: Implement affinity matrix computation in Mahout DSL
> > M-1659: Remove deprecated Lanczos solver from spectral clustering in
> > mr-legacy
> >
> > Sebastian Schelter
> > --------------------------
> > M-1584: Create a detailed example of how to index an arbitrary dataset
> and
> > run LDA on it    (Patch available)
> >
> > Gokhan Capan
> > ----------------------
> > M-1626: Support for required quasi-algebraic operations and starting with
> > aggregating rows/blocks
> >
> > Unassigned
> > ------------------
> > M-1516: run classify-20newsgroups.sh failed cause by
> > /tmp/mahout-work-jpan/20news-all does not exists in hdfs.    (Patch
> > available)
> > M-1551: Add document to describe how to use mlp with command line
> (Patch
> > available)
> > M-1557: Add support for sparse training vectors in MLP    (Patch
> available)
> > M-1593: cluster-reuters.sh does not work complaining
> > java.lang.IllegalStateException    (Patch available)
> > M-1594: Example factorize-movielens-1M.sh does not use HDFS    (Patch
> > available)
> > M-1633: Failure to execute query when solr index contains documents with
> > different fields
> > M-1634: ALS don't work when it adds new files in Distributed Cache
> >  (Patch available)
> > M-1637: RecommenderJob of ALS fails in the mapper because it uses the
> > instance of other class
> >
>

Re: Mahout 0.10.0 Bug bash

Posted by Suneel Marthi <su...@gmail.com>.
A daily "politely harsh' reminder of the April 5 code freeze date with the
daily bug bash would be helpful.

On Sun, Mar 29, 2015 at 12:36 PM, Andrew Musselman <
andrew.musselman@gmail.com> wrote:

> Sunday's:
>
> Andrew Palumbo
> --------------------------
> M-1477: Clean up website on Logistic Regression
> M-1493: Port Naive Bayes to Spark DSL    (Patch available)
> M-1559: Documentation and cleanup for Naive Bayes Example
> M-1564: Naive Bayes classifier for new Text Documents
> M-1609: NullPointerException    (This bug is not showing up aside from its
> title)
> M-1635: Getting an exception when I provide classification labels manually
> for Naive Bayes
> M-1638: H2O bindings fail at drmParallelizeWithRowLabels
> M-1648: Update CMS for Mahout 0.10.0
>
> Andrew Musselman
> -----------------------------
> M-1462: Cleaning up Random Forests documentation on Mahout website
> M-1470: LDA Topic dump
> M-1522: Handle logging levels via log4j.xml
> M-1563: cleanup Warnings during Build
> M-1655: Refactor module dependencies
>
> Dmitriy Lyubimov
> --------------------------
> M-1646: Refactor out all legacy MR dependencies from scala code
>
> Frank Scholten
> ---------------------
> M-1625: lucene2seq: failure to convert a document that does not contain a
> field (the field is not required)
> M-1649: Lucene 5 upgrade
>
> Pat Ferrel
> -----------------
> M-1589: mahout.cmd has duplicated content    (Patch available)
>
> Suneel Marthi
> ---------------------
> M-1469: Streaming KMeans fails when executed in MR mode and
> REDUCE_STREAMING_KMEANS set to true
> M-1512: Hadoop 2 compatibility
> M-1585: Javadocs not hosted by Mahout-Quality
> M-1586: Collections downloads must have hash signatures
> M-1619: HighDFWordsPruner overwrites cache files
> M-1647: The release build is incomplete
> M-1652: Java 7 update
> M-1656: Change SNAPSHOT version from 1.0 to 0.10
> M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf
>
> Stevo Slavic
> --------------------
> M-1277: Lose dependency on custom commons-cli
> M-1278: Improve inheritance of apache parent pom
> M-1562: Publish Scaladocs
> M-1602: Euclidean Distance Similarity Math
> M-1650: upgrade 3rd party jars
>
> Shannon Quinn
> -----------------------
> M-1538: Port spectral clustering to Mahout DSL
> M-1539: Implement affinity matrix computation in Mahout DSL
> M-1659: Remove deprecated Lanczos solver from spectral clustering in
> mr-legacy
>
> Sebastian Schelter
> --------------------------
> M-1584: Create a detailed example of how to index an arbitrary dataset and
> run LDA on it    (Patch available)
>
> Gokhan Capan
> ----------------------
> M-1626: Support for required quasi-algebraic operations and starting with
> aggregating rows/blocks
>
> Unassigned
> ------------------
> M-1516: run classify-20newsgroups.sh failed cause by
> /tmp/mahout-work-jpan/20news-all does not exists in hdfs.    (Patch
> available)
> M-1551: Add document to describe how to use mlp with command line    (Patch
> available)
> M-1557: Add support for sparse training vectors in MLP    (Patch available)
> M-1593: cluster-reuters.sh does not work complaining
> java.lang.IllegalStateException    (Patch available)
> M-1594: Example factorize-movielens-1M.sh does not use HDFS    (Patch
> available)
> M-1633: Failure to execute query when solr index contains documents with
> different fields
> M-1634: ALS don't work when it adds new files in Distributed Cache
>  (Patch available)
> M-1637: RecommenderJob of ALS fails in the mapper because it uses the
> instance of other class
>

Re: Mahout 0.10.0 Bug bash

Posted by Andrew Musselman <an...@gmail.com>.
Sunday's:

Andrew Palumbo
--------------------------
M-1477: Clean up website on Logistic Regression
M-1493: Port Naive Bayes to Spark DSL    (Patch available)
M-1559: Documentation and cleanup for Naive Bayes Example
M-1564: Naive Bayes classifier for new Text Documents
M-1609: NullPointerException    (This bug is not showing up aside from its
title)
M-1635: Getting an exception when I provide classification labels manually
for Naive Bayes
M-1638: H2O bindings fail at drmParallelizeWithRowLabels
M-1648: Update CMS for Mahout 0.10.0

Andrew Musselman
-----------------------------
M-1462: Cleaning up Random Forests documentation on Mahout website
M-1470: LDA Topic dump
M-1522: Handle logging levels via log4j.xml
M-1563: cleanup Warnings during Build
M-1655: Refactor module dependencies

Dmitriy Lyubimov
--------------------------
M-1646: Refactor out all legacy MR dependencies from scala code

Frank Scholten
---------------------
M-1625: lucene2seq: failure to convert a document that does not contain a
field (the field is not required)
M-1649: Lucene 5 upgrade

Pat Ferrel
-----------------
M-1589: mahout.cmd has duplicated content    (Patch available)

Suneel Marthi
---------------------
M-1469: Streaming KMeans fails when executed in MR mode and
REDUCE_STREAMING_KMEANS set to true
M-1512: Hadoop 2 compatibility
M-1585: Javadocs not hosted by Mahout-Quality
M-1586: Collections downloads must have hash signatures
M-1619: HighDFWordsPruner overwrites cache files
M-1647: The release build is incomplete
M-1652: Java 7 update
M-1656: Change SNAPSHOT version from 1.0 to 0.10
M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf

Stevo Slavic
--------------------
M-1277: Lose dependency on custom commons-cli
M-1278: Improve inheritance of apache parent pom
M-1562: Publish Scaladocs
M-1602: Euclidean Distance Similarity Math
M-1650: upgrade 3rd party jars

Shannon Quinn
-----------------------
M-1538: Port spectral clustering to Mahout DSL
M-1539: Implement affinity matrix computation in Mahout DSL
M-1659: Remove deprecated Lanczos solver from spectral clustering in
mr-legacy

Sebastian Schelter
--------------------------
M-1584: Create a detailed example of how to index an arbitrary dataset and
run LDA on it    (Patch available)

Gokhan Capan
----------------------
M-1626: Support for required quasi-algebraic operations and starting with
aggregating rows/blocks

Unassigned
------------------
M-1516: run classify-20newsgroups.sh failed cause by
/tmp/mahout-work-jpan/20news-all does not exists in hdfs.    (Patch
available)
M-1551: Add document to describe how to use mlp with command line    (Patch
available)
M-1557: Add support for sparse training vectors in MLP    (Patch available)
M-1593: cluster-reuters.sh does not work complaining
java.lang.IllegalStateException    (Patch available)
M-1594: Example factorize-movielens-1M.sh does not use HDFS    (Patch
available)
M-1633: Failure to execute query when solr index contains documents with
different fields
M-1634: ALS don't work when it adds new files in Distributed Cache
 (Patch available)
M-1637: RecommenderJob of ALS fails in the mapper because it uses the
instance of other class

Re: Mahout 0.10.0 Bug bash

Posted by Andrew Musselman <an...@gmail.com>.
Today's:

Andrew Palumbo
--------------------------
M-1648: Update CMS for Mahout 0.10.0
M-1638: H2O bindings fail at drmParallelizeWithRowLabels
M-1477: Clean up website on Logistic Regression
M-1564: Naive Bayes classifier for new Text Documents
M-1635: Getting an exception when I provide classification labels manually
for Naive Bayes
M-1493: Port Naive Bayes to Spark DSL    (Patch available)
M-1559: Documentation and cleanup for Naive Bayes Example
M-1609: NullPointerException
M-1607: Spark-shell DAG scheduler

Andrew Musselman
-----------------------------
M-1655: Refactor module dependencies
M-1522: Handle logging levels via log4j.xml
M-1563: cleanup Warnings during Build
M-1470: LDA Topic dump
M-1462: Cleaning up Random Forests documentation on Mahout website

Dmitriy Lyubimov
--------------------------
M-1646: Refactor out all legacy MR dependencies from scala code

Frank Scholten
---------------------
M-1649: Lucene 5 upgrade
M-1625: lucene2seq: failure to convert a document that does not contain a
field (the field is not required)

Pat Ferrel
-----------------
M-1589: mahout.cmd has duplicated content    (Patch available)
M-1618: co-occurence recommender example

Suneel Marthi
---------------------
M-1586: Collections downloads must have hash signatures
M-1647: The release build is incomplete
M-1652: Java 7 update
M-1512: Hadoop 2 compatibility
M-1469: Streaming KMeans fails when executed in MR mode and
REDUCE_STREAMING_KMEANS
set to true
M-1443: Update "How to Release" page    (Tagged 0.10.1)
M-1585: Javadocs not hosted by Mahout-Quality
M-1612: NPE during JSON outputformatter for clusterdump
M-1656: Change SNAPSHOT version from 1.0 to 0.10
M-1660: Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf
M-1619: HighDFWordsPruner overwrites cache files

Stevo Slavic
--------------------
M-1650: upgrade 3rd party jars
M-1602: Euclidean Distance Similarity Math
M-1278: Improve inheritance of apache parent pom
M-1562: Publish Scaladocs
M-1277: Lose dependency on custom commons-cli

Shannon Quinn
-----------------------
M-1538: Port spectral clustering to Mahout DSL
M-1593: Implement affinity matrix computation in Mahout DSL
M-1540: Reuters Example spectral clustering Also online docs for Spectral
clustering
M-1659: Remove deprecated Lanczos solver from spectral clustering in
mr-legacy

Ted Dunning
-------------------
M-1636: Class dependencies for Spark module are put in job.jar, which is
inefficient

Sebastian Schelter
--------------------------
M-1584: Create a detailed example of how to index an arbitrary dataset and
run LDA on it    (Patch available)

Gokhan Capan
----------------------
M-1626: Support for required quasi-algebraic operations and starting with
aggregating rows/blocks

Unassigned
------------------
M-1594: Example factorize-movielens-1M.sh does not use HDFS    (Patch
available)
M-1593: cluster-reuters.sh does not work complaining
java.lang.IllegalStateException    (Patch available)
M-1557: Add support for sparse training vectors in MLP    (Patch available)
M-1516: run classify-20newsgroups.sh failed cause by
/tmp/mahout-work-jpan/20news-all does not exists in hdfs.    (Patch
available)
M-1643: CLI arguments are not being processed in spark-shell
M-1637: RecommenderJob of ALS fails in the mapper because it uses the
instance of other class
M-1634: ALS don't work when it adds new files in Distributed Cache
 (Patch available)
M-1633: Failure to execute query when solr index contains documents with
different fields
M-1551: Add document to describe how to use mlp with command line    (Patch
available)

On Thu, Mar 26, 2015 at 7:07 PM, Suneel Marthi <su...@gmail.com>
wrote:

> Ok here's the bug bash as of today
>
> Andrew Palumbo
> --------------------------
> M-1648: Update CMS for Mahout 0.10.0
> M-1638: H2O bindings fail at drmParallelizeWithRowLabels
> M-1564: Naive Bayes classifier for new Text Documents
> M-1635: Exception when providing classification Labels
> M-1493: Port Naive Bayes to Spark DSL
> M-1559: Documentation and cleanup for Naive Bayes Example
> M-1609: NullPointerException
> M-1607: Spark-shell DAG scheduler
>
> Andrew Musselman
> -----------------------------
> M-1655: Refactor module dependencies
> M-1563: cleanup Warnings during Build
> M-1470: LDA Topic dump
>
> Dmitriy Lyubimov
> --------------------------
> M-1646: Refactor out all legacy MR dependencies from scala code
>
> Frank Scholten
> ---------------------
> M-1649: Lucene 5 upgrade
>
> Pat Ferrel
> -----------------
> M-1589: mahout.cmd has duplicated content
> M-1618: co-occurence recommender example
>
> Suneel Marthi
> ---------------------
> M-1586: Collections downloads must have hash signatures
> M-1647: Release build
> M-1652: Java 7 update
> M-1512: Hadoop 2 compatibility
> M-1469: Streaming KMeans fails when executed in MR mode and
> REDUCE_STREAMING_KMEANS set to true
> M-1443: Update "How to Release" page
> M-1585: Javadocs not hosted by Mahout-Quality
> M-1612: NPE during JSON outputformatter for clusterdump
>
> Stevo Slavic
> --------------------
> M-1650: upgrade 3rd party jars
> M-1602: Euclidean Distance Similarity Math
> M-1278: Improve inheritance of apache parent pom
>
> Shannon Quinn
> -----------------------
> M-1540: Reuters Example spectral clustering
> Also online docs for Spectral clustering
>
> Ted Dunning
> -------------------
> M-1636: Class dependencies for Spark module are put in job.jar, which is
> inefficient
>