You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Andrew Musselman <an...@gmail.com> on 2015/04/01 20:04:17 UTC

Re: Mahout 0.10.0 Bug bash

Wednesday(*four days from code freeze Sunday*); some progress:

Andrew Palumbo
--------------------------
M-1493: Port Naive Bayes to Spark DSL    (Patch available)
M-1559: Documentation and cleanup for Naive Bayes Example
M-1564: Naive Bayes classifier for new Text Documents
M-1617: 404 error on link in cluster-dumper tutorial page
M-1635: Getting an exception when I provide classification labels manually
for Naive Bayes
M-1638: H2O bindings fail at drmParallelizeWithRowLabels
M-1648: Update CMS for Mahout 0.10.0

Andrew Musselman
-----------------------------
M-1462: Cleaning up Random Forests documentation on Mahout website
M-1470: LDA Topic dump
M-1655: Refactor module dependencies

Dmitriy Lyubimov
--------------------------
M-1646: Refactor out all legacy MR dependencies from scala code

Frank Scholten
---------------------
M-1625: lucene2seq: failure to convert a document that does not contain a
field (the field is not required)
M-1633: Failure to execute query when solr index contains documents with
different fields
M-1649: Lucene 5 upgrade

Gokhan Capan
----------------------
M-1626: Support for required quasi-algebraic operations and starting with
aggregating rows/blocks

Pat Ferrel
-----------------
M-1507: Support input and output using user defined ID wherever possible

Sebastian Schelter
--------------------------
M-1584: Create a detailed example of how to index an arbitrary dataset and
run LDA on it    (Patch available)

Shannon Quinn
-----------------------
M-1661: Remove Lanczos from the code base
M-1662: Potential Path bug in SequenceFileVaultIterator breaks
DisplaySpectralKMeans

Stevo Slavic
--------------------
M-1277: Lose dependency on custom commons-cli
M-1278: Improve inheritance of apache parent pom
M-1562: Publish Scaladocs
M-1585: Javadocs are not hosted By Mahout Quality
M-1650: upgrade 3rd party jars

Suneel Marthi
---------------------
M-1469: Streaming KMeans fails when executed in MR mode and
REDUCE_STREAMING_KMEANS set to true
M-1512: Hadoop 2 compatibility
M-1586: Collections downloads must have hash signatures
M-1647: The release build is incomplete
M-1652: Java 7 update
M-1656: Change SNAPSHOT version from 1.0 to 0.10

Unassigned
------------------
M-1551: Add document to describe how to use mlp with command line    (Patch
available)
M-1637: RecommenderJob of ALS fails in the mapper because it uses the
instance of other classs

Re: Mahout 0.10.0 Bug bash

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Things like that question make me more suspicious. 

We really need to get a handle on the Hadoop version question.

I have run:

spark-itemsimilarity on Hadoop 1.2.1, 2.6.0 (fails), Andy ran it successfully on 2.2 and a user runs it on 2.4-MapR
2.6.0 seems to find the local file system with these lines:
  val conf = new Configuration()
  val fs = FileSystem.get(conf)
On the earlier versions of Hadoop, it finds the cluster, or pseudo cluster HDFS

I’ve run Any’s 20 new groups classifier test script on hadoop 1.2.1 with a classdef mismatch error, that probably means I built wrong. I’ll be testing that again Monday.

i’m building a 2.2.0 pseudo cluster and will run 20 news groups and spark-itemsimilairty Monday

I guess the big question is still 2.5 or 2.6 does anyone know why the two lines above would cause a problem in recent Hadoop versions? Does someone have a known good 2.6 cluster that they can try a couple tests on?


On Apr 5, 2015, at 9:52 AM, Andrew Musselman <an...@gmail.com> wrote:

I wonder if that HDFS/FS issue is the same problem I have with
cluster-reuters.sh.

On Sunday, April 5, 2015, Pat Ferrel <pa...@occamsmachete.com> wrote:

> Very few of these are on the “official” ticket list here:
> 
> https://issues.apache.org/jira/browse/MAHOUT-1648?filter=-4&jql=project%20%3D%20MAHOUT%20AND%20status%20in%20(Open%2C%20Reopened)%20AND%20priority%20in%20(Blocker%2C%20Critical)%20ORDER%20BY%20createdDate%20DESC
> 
> M-1674
> M-1665
> M-1648
> 
> The next time this is published it would be great to get versions of
> Hadoop people are using and what has actually been run on a cluster or
> pseudo cluster, under yarn etc. I’m increasingly suspicious that we don’t
> run uniformly on Hadoop 2.5-2.6 but have no hard evidence. I’ve failed on
> H2.6.0 but may not have an airtight configuration. If anyone has this
> config woking I can supply a very simple test.
> 
> The failure happens when an HDFS path gets applied to the raw local
> filesystem, even though hadoop 2.6 HDFS is running and MAHOUT-LOCAL is
> unset. The root of the error I’ve seen is in getting the FileSystem, which
> always returns the local one.
> 
> 
> M-1674 is new and was found on Friday. Dmitriy already has a private fix
> but can’t commit it so I think we need a workaround.
> 
> On Apr 4, 2015, at 8:46 PM, Suneel Marthi <suneel.marthi@gmail.com
> <javascript:;>> wrote:
> 
> Saturday(2 days before code freeze). The code freeze's gonna be on Monday -
> April 6.  Please address ur assigned JIRAs on time.
> 
> Anand Avati
> -------------------------
> 
> M-1622: Multithreaded batch Item similarities output incorrect similarities
> M-1605: Make Visualizer test locale independent
> 
> Andrew Palumbo
> --------------------------
> M-1559: Add documentation for Wikipedia example
> M-1648: Update CMS for Mahout 0.10.0
> 
> Andrew Musselman
> -----------------------------
> M-1462: Cleaning up Random Forests documentation on Mahout website
> M-1470: LDA Topic dump
> M-1655: Refactor module dependencies
> M-1658: KMeans fails when run on Hadoop clusters
> 
> Frank Scholten
> ---------------------
> M-1625: lucene2seq: failure to convert a document that does not contain a
> field (the field is not required)
> M-1633: Failure to execute query when solr index contains documents with
> different fields
> M-1649: Lucene 5 upgrade
> 
> Pat Ferrel
> -----------------
> M-1507: Support input and output using user defined ID wherever possible
> M-1588: Multiple Input path support in Recommenders
> 
> Stevo Slavic
> --------------------
> M-1277: Lose dependency on custom commons-cli
> M-1278: Improve inheritance of apache parent pom
> M-1562: Publish Scaladocs
> M-1585: Javadocs are not hosted By Mahout Quality
> M-1650: upgrade 3rd party jars
> 
> Suneel Marthi
> ---------------------
> M-1469: Streaming KMeans fails when executed in MR mode and
> REDUCE_STREAMING_KMEANS set to true
> M-1512: Hadoop 2 compatibility
> M-1652: Java 7 update
> M-1630: Incorrect SparseMatrix.numColSlices() causes IllegalStateException
> 
> Ted Dunning
> -------------------
> 
> M-1672: TDigest update to 3.1 in OnlineSummarizers
> 
> Unassigned
> ------------------
> M-1551: Add document to describe how to use mlp with command line    (Patch
> available)
> M-1637: RecommenderJob of ALS fails in the mapper because it uses the
> instance of other classs
> 
> 


Re: Mahout 0.10.0 Bug bash

Posted by Andrew Musselman <an...@gmail.com>.
I wonder if that HDFS/FS issue is the same problem I have with
cluster-reuters.sh.

On Sunday, April 5, 2015, Pat Ferrel <pa...@occamsmachete.com> wrote:

> Very few of these are on the “official” ticket list here:
>
> https://issues.apache.org/jira/browse/MAHOUT-1648?filter=-4&jql=project%20%3D%20MAHOUT%20AND%20status%20in%20(Open%2C%20Reopened)%20AND%20priority%20in%20(Blocker%2C%20Critical)%20ORDER%20BY%20createdDate%20DESC
>
> M-1674
> M-1665
> M-1648
>
> The next time this is published it would be great to get versions of
> Hadoop people are using and what has actually been run on a cluster or
> pseudo cluster, under yarn etc. I’m increasingly suspicious that we don’t
> run uniformly on Hadoop 2.5-2.6 but have no hard evidence. I’ve failed on
> H2.6.0 but may not have an airtight configuration. If anyone has this
> config woking I can supply a very simple test.
>
> The failure happens when an HDFS path gets applied to the raw local
> filesystem, even though hadoop 2.6 HDFS is running and MAHOUT-LOCAL is
> unset. The root of the error I’ve seen is in getting the FileSystem, which
> always returns the local one.
>
>
> M-1674 is new and was found on Friday. Dmitriy already has a private fix
> but can’t commit it so I think we need a workaround.
>
> On Apr 4, 2015, at 8:46 PM, Suneel Marthi <suneel.marthi@gmail.com
> <javascript:;>> wrote:
>
> Saturday(2 days before code freeze). The code freeze's gonna be on Monday -
> April 6.  Please address ur assigned JIRAs on time.
>
> Anand Avati
> -------------------------
>
> M-1622: Multithreaded batch Item similarities output incorrect similarities
> M-1605: Make Visualizer test locale independent
>
> Andrew Palumbo
> --------------------------
> M-1559: Add documentation for Wikipedia example
> M-1648: Update CMS for Mahout 0.10.0
>
> Andrew Musselman
> -----------------------------
> M-1462: Cleaning up Random Forests documentation on Mahout website
> M-1470: LDA Topic dump
> M-1655: Refactor module dependencies
> M-1658: KMeans fails when run on Hadoop clusters
>
> Frank Scholten
> ---------------------
> M-1625: lucene2seq: failure to convert a document that does not contain a
> field (the field is not required)
> M-1633: Failure to execute query when solr index contains documents with
> different fields
> M-1649: Lucene 5 upgrade
>
> Pat Ferrel
> -----------------
> M-1507: Support input and output using user defined ID wherever possible
> M-1588: Multiple Input path support in Recommenders
>
> Stevo Slavic
> --------------------
> M-1277: Lose dependency on custom commons-cli
> M-1278: Improve inheritance of apache parent pom
> M-1562: Publish Scaladocs
> M-1585: Javadocs are not hosted By Mahout Quality
> M-1650: upgrade 3rd party jars
>
> Suneel Marthi
> ---------------------
> M-1469: Streaming KMeans fails when executed in MR mode and
> REDUCE_STREAMING_KMEANS set to true
> M-1512: Hadoop 2 compatibility
> M-1652: Java 7 update
> M-1630: Incorrect SparseMatrix.numColSlices() causes IllegalStateException
>
> Ted Dunning
> -------------------
>
> M-1672: TDigest update to 3.1 in OnlineSummarizers
>
> Unassigned
> ------------------
> M-1551: Add document to describe how to use mlp with command line    (Patch
> available)
> M-1637: RecommenderJob of ALS fails in the mapper because it uses the
> instance of other classs
>
>

Re: Mahout 0.10.0 Bug bash

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Very few of these are on the “official” ticket list here:
https://issues.apache.org/jira/browse/MAHOUT-1648?filter=-4&jql=project%20%3D%20MAHOUT%20AND%20status%20in%20(Open%2C%20Reopened)%20AND%20priority%20in%20(Blocker%2C%20Critical)%20ORDER%20BY%20createdDate%20DESC

M-1674
M-1665
M-1648

The next time this is published it would be great to get versions of Hadoop people are using and what has actually been run on a cluster or pseudo cluster, under yarn etc. I’m increasingly suspicious that we don’t run uniformly on Hadoop 2.5-2.6 but have no hard evidence. I’ve failed on H2.6.0 but may not have an airtight configuration. If anyone has this config woking I can supply a very simple test.

The failure happens when an HDFS path gets applied to the raw local filesystem, even though hadoop 2.6 HDFS is running and MAHOUT-LOCAL is unset. The root of the error I’ve seen is in getting the FileSystem, which always returns the local one.


M-1674 is new and was found on Friday. Dmitriy already has a private fix but can’t commit it so I think we need a workaround.

On Apr 4, 2015, at 8:46 PM, Suneel Marthi <su...@gmail.com> wrote:

Saturday(2 days before code freeze). The code freeze's gonna be on Monday -
April 6.  Please address ur assigned JIRAs on time.

Anand Avati
-------------------------

M-1622: Multithreaded batch Item similarities output incorrect similarities
M-1605: Make Visualizer test locale independent

Andrew Palumbo
--------------------------
M-1559: Add documentation for Wikipedia example
M-1648: Update CMS for Mahout 0.10.0

Andrew Musselman
-----------------------------
M-1462: Cleaning up Random Forests documentation on Mahout website
M-1470: LDA Topic dump
M-1655: Refactor module dependencies
M-1658: KMeans fails when run on Hadoop clusters

Frank Scholten
---------------------
M-1625: lucene2seq: failure to convert a document that does not contain a
field (the field is not required)
M-1633: Failure to execute query when solr index contains documents with
different fields
M-1649: Lucene 5 upgrade

Pat Ferrel
-----------------
M-1507: Support input and output using user defined ID wherever possible
M-1588: Multiple Input path support in Recommenders

Stevo Slavic
--------------------
M-1277: Lose dependency on custom commons-cli
M-1278: Improve inheritance of apache parent pom
M-1562: Publish Scaladocs
M-1585: Javadocs are not hosted By Mahout Quality
M-1650: upgrade 3rd party jars

Suneel Marthi
---------------------
M-1469: Streaming KMeans fails when executed in MR mode and
REDUCE_STREAMING_KMEANS set to true
M-1512: Hadoop 2 compatibility
M-1652: Java 7 update
M-1630: Incorrect SparseMatrix.numColSlices() causes IllegalStateException

Ted Dunning
-------------------

M-1672: TDigest update to 3.1 in OnlineSummarizers

Unassigned
------------------
M-1551: Add document to describe how to use mlp with command line    (Patch
available)
M-1637: RecommenderJob of ALS fails in the mapper because it uses the
instance of other classs


Re: Mahout 0.10.0 Bug bash

Posted by Suneel Marthi <su...@gmail.com>.
Saturday(2 days before code freeze). The code freeze's gonna be on Monday -
April 6.  Please address ur assigned JIRAs on time.

Anand Avati
-------------------------

M-1622: Multithreaded batch Item similarities output incorrect similarities
M-1605: Make Visualizer test locale independent

Andrew Palumbo
--------------------------
M-1559: Add documentation for Wikipedia example
M-1648: Update CMS for Mahout 0.10.0

Andrew Musselman
-----------------------------
M-1462: Cleaning up Random Forests documentation on Mahout website
M-1470: LDA Topic dump
M-1655: Refactor module dependencies
M-1658: KMeans fails when run on Hadoop clusters

Frank Scholten
---------------------
M-1625: lucene2seq: failure to convert a document that does not contain a
field (the field is not required)
M-1633: Failure to execute query when solr index contains documents with
different fields
M-1649: Lucene 5 upgrade

Pat Ferrel
-----------------
M-1507: Support input and output using user defined ID wherever possible
M-1588: Multiple Input path support in Recommenders

Stevo Slavic
--------------------
M-1277: Lose dependency on custom commons-cli
M-1278: Improve inheritance of apache parent pom
M-1562: Publish Scaladocs
M-1585: Javadocs are not hosted By Mahout Quality
M-1650: upgrade 3rd party jars

Suneel Marthi
---------------------
M-1469: Streaming KMeans fails when executed in MR mode and
REDUCE_STREAMING_KMEANS set to true
M-1512: Hadoop 2 compatibility
M-1652: Java 7 update
M-1630: Incorrect SparseMatrix.numColSlices() causes IllegalStateException

Ted Dunning
-------------------

M-1672: TDigest update to 3.1 in OnlineSummarizers

Unassigned
------------------
M-1551: Add document to describe how to use mlp with command line    (Patch
available)
M-1637: RecommenderJob of ALS fails in the mapper because it uses the
instance of other classs