You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Antonio Molins (Created) (JIRA)" <ji...@apache.org> on 2012/02/03 23:37:53 UTC

[jira] [Created] (MAHOUT-971) kmeans does not work in S3

kmeans does not work in S3
--------------------------

                 Key: MAHOUT-971
                 URL: https://issues.apache.org/jira/browse/MAHOUT-971
             Project: Mahout
          Issue Type: Bug
          Components: Clustering
    Affects Versions: 0.6
         Environment: amazon S3
            Reporter: Antonio Molins


S3n:// URIs will not work in kmeans because of a couple of calls to FileSystem.get(conf) with no URI. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-971) kmeans does not work in S3

Posted by "Antonio Molins (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200118#comment-13200118 ] 

Antonio Molins commented on MAHOUT-971:
---------------------------------------

I am new and didn't get around how to make commits in SVN, but was able to fix this by modifying the offending lines 

core/src/main/java/org/apache/mahout/common/iterator/sequencefile/SequenceFileDirValueIterator.java:66
core/src/main/java/org/apache/mahout/common/iterator/sequencefile/SequenceFileDirValueIterator.java:89

to

FileSystem fs = FileSystem.get(path.toUri(), conf);

and

core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java:298
core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java:322

to

FileSystem.get(output.toUri(), conf).rename(new Path(output, AbstractCluster.CLUSTERS_DIR + (iteration-1)), finalClustersIn);


                
> kmeans does not work in S3
> --------------------------
>
>                 Key: MAHOUT-971
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-971
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.6
>         Environment: amazon S3
>            Reporter: Antonio Molins
>              Labels: hadoop
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> S3n:// URIs will not work in kmeans because of a couple of calls to FileSystem.get(conf) with no URI. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (MAHOUT-971) kmeans does not work in S3

Posted by "Antonio Molins (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200118#comment-13200118 ] 

Antonio Molins edited comment on MAHOUT-971 at 2/3/12 10:57 PM:
----------------------------------------------------------------

I am new and didn't get around how to make commits in SVN, but was able to fix this by modifying the offending lines 

core/src/main/java/org/apache/mahout/common/iterator/sequencefile/SequenceFileDirValueIterator.java:66

to

FileSystem fs = FileSystem.get(path.toUri(), conf);

and

core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java:298
core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java:322

to

FileSystem.get(output.toUri(), conf).rename(new Path(output, AbstractCluster.CLUSTERS_DIR + (iteration-1)), finalClustersIn);


                
      was (Author: amolins):
    I am new and didn't get around how to make commits in SVN, but was able to fix this by modifying the offending lines 

core/src/main/java/org/apache/mahout/common/iterator/sequencefile/SequenceFileDirValueIterator.java:66
core/src/main/java/org/apache/mahout/common/iterator/sequencefile/SequenceFileDirValueIterator.java:89

to

FileSystem fs = FileSystem.get(path.toUri(), conf);

and

core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java:298
core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java:322

to

FileSystem.get(output.toUri(), conf).rename(new Path(output, AbstractCluster.CLUSTERS_DIR + (iteration-1)), finalClustersIn);


                  
> kmeans does not work in S3
> --------------------------
>
>                 Key: MAHOUT-971
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-971
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.6
>         Environment: amazon S3
>            Reporter: Antonio Molins
>              Labels: hadoop
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> S3n:// URIs will not work in kmeans because of a couple of calls to FileSystem.get(conf) with no URI. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (MAHOUT-971) kmeans does not work in S3

Posted by "Sean Owen (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved MAHOUT-971.
------------------------------

       Resolution: Fixed
    Fix Version/s: 0.7
         Assignee: Sean Owen

I made these changes, and similar changes across the board. Yes I think we need to tell FileSystem.get() to use the scheme of the Path it's going to be processing.
                
> kmeans does not work in S3
> --------------------------
>
>                 Key: MAHOUT-971
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-971
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.6
>         Environment: amazon S3
>            Reporter: Antonio Molins
>            Assignee: Sean Owen
>              Labels: hadoop
>             Fix For: 0.7
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> S3n:// URIs will not work in kmeans because of a couple of calls to FileSystem.get(conf) with no URI. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-971) kmeans does not work in S3

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203149#comment-13203149 ] 

Hudson commented on MAHOUT-971:
-------------------------------

Integrated in Mahout-Quality #1340 (See [https://builds.apache.org/job/Mahout-Quality/1340/])
    MAHOUT-971 Use FileSystem.get(URI, Configuration) across the board to make it (more likely to) work with S3

srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241649
Files : 
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletDriver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/FuzzyKMeansDriver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/meanshift/MeanShiftCanopyDriver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/meanshift/MeanShiftCanopyReducer.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/spectral/common/VectorCache.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/spectral/common/VectorMatrixMultiplicationJob.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/common/iterator/sequencefile/SequenceFileDirIterator.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/common/iterator/sequencefile/SequenceFileDirValueIterator.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/FPGrowthDriver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/PFPGrowth.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/ga/watchmaker/MahoutEvaluator.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/DistributedRowMatrix.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/TimesSquaredJob.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/TransposeJob.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/decomposer/DistributedLanczosSolver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/decomposer/EigenVerificationJob.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/ABtDenseOutJob.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/ABtJob.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/BtJob.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCli.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDSolver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/UJob.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/VJob.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/TestClusterClassifier.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/minhash/TestMinHashClustering.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/spectral/common/TestVectorCache.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/fpm/pfpgrowth/FPGrowthTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/fpm/pfpgrowth/FPGrowthTest2.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/TestDistributedRowMatrix.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/stats/BasicStatsTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverDenseTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverSparseSequentialTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/math/stats/entropy/ConditionalEntropyTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/math/stats/entropy/EntropyTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/math/stats/entropy/InformationGainRatioTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/math/stats/entropy/InformationGainTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/vectorizer/DictionaryVectorizerTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/vectorizer/DocumentProcessorTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/vectorizer/EncodedVectorsFromSequenceFilesTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/vectorizer/HighDFWordsPrunerTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/vectorizer/SparseVectorsFromSequenceFilesTest.java

                
> kmeans does not work in S3
> --------------------------
>
>                 Key: MAHOUT-971
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-971
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.6
>         Environment: amazon S3
>            Reporter: Antonio Molins
>            Assignee: Sean Owen
>              Labels: hadoop
>             Fix For: 0.7
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> S3n:// URIs will not work in kmeans because of a couple of calls to FileSystem.get(conf) with no URI. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira