You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Antonio Molins (Created) (JIRA)" <ji...@apache.org> on 2012/02/03 23:37:53 UTC
[jira] [Created] (MAHOUT-971) kmeans does not work in S3
kmeans does not work in S3
--------------------------
Key: MAHOUT-971
URL: https://issues.apache.org/jira/browse/MAHOUT-971
Project: Mahout
Issue Type: Bug
Components: Clustering
Affects Versions: 0.6
Environment: amazon S3
Reporter: Antonio Molins
S3n:// URIs will not work in kmeans because of a couple of calls to FileSystem.get(conf) with no URI.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-971) kmeans does not work in S3
Posted by "Antonio Molins (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200118#comment-13200118 ]
Antonio Molins commented on MAHOUT-971:
---------------------------------------
I am new and didn't get around how to make commits in SVN, but was able to fix this by modifying the offending lines
core/src/main/java/org/apache/mahout/common/iterator/sequencefile/SequenceFileDirValueIterator.java:66
core/src/main/java/org/apache/mahout/common/iterator/sequencefile/SequenceFileDirValueIterator.java:89
to
FileSystem fs = FileSystem.get(path.toUri(), conf);
and
core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java:298
core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java:322
to
FileSystem.get(output.toUri(), conf).rename(new Path(output, AbstractCluster.CLUSTERS_DIR + (iteration-1)), finalClustersIn);
> kmeans does not work in S3
> --------------------------
>
> Key: MAHOUT-971
> URL: https://issues.apache.org/jira/browse/MAHOUT-971
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 0.6
> Environment: amazon S3
> Reporter: Antonio Molins
> Labels: hadoop
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> S3n:// URIs will not work in kmeans because of a couple of calls to FileSystem.get(conf) with no URI.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (MAHOUT-971) kmeans does not work in
S3
Posted by "Antonio Molins (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200118#comment-13200118 ]
Antonio Molins edited comment on MAHOUT-971 at 2/3/12 10:57 PM:
----------------------------------------------------------------
I am new and didn't get around how to make commits in SVN, but was able to fix this by modifying the offending lines
core/src/main/java/org/apache/mahout/common/iterator/sequencefile/SequenceFileDirValueIterator.java:66
to
FileSystem fs = FileSystem.get(path.toUri(), conf);
and
core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java:298
core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java:322
to
FileSystem.get(output.toUri(), conf).rename(new Path(output, AbstractCluster.CLUSTERS_DIR + (iteration-1)), finalClustersIn);
was (Author: amolins):
I am new and didn't get around how to make commits in SVN, but was able to fix this by modifying the offending lines
core/src/main/java/org/apache/mahout/common/iterator/sequencefile/SequenceFileDirValueIterator.java:66
core/src/main/java/org/apache/mahout/common/iterator/sequencefile/SequenceFileDirValueIterator.java:89
to
FileSystem fs = FileSystem.get(path.toUri(), conf);
and
core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java:298
core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java:322
to
FileSystem.get(output.toUri(), conf).rename(new Path(output, AbstractCluster.CLUSTERS_DIR + (iteration-1)), finalClustersIn);
> kmeans does not work in S3
> --------------------------
>
> Key: MAHOUT-971
> URL: https://issues.apache.org/jira/browse/MAHOUT-971
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 0.6
> Environment: amazon S3
> Reporter: Antonio Molins
> Labels: hadoop
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> S3n:// URIs will not work in kmeans because of a couple of calls to FileSystem.get(conf) with no URI.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAHOUT-971) kmeans does not work in S3
Posted by "Sean Owen (Resolved) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved MAHOUT-971.
------------------------------
Resolution: Fixed
Fix Version/s: 0.7
Assignee: Sean Owen
I made these changes, and similar changes across the board. Yes I think we need to tell FileSystem.get() to use the scheme of the Path it's going to be processing.
> kmeans does not work in S3
> --------------------------
>
> Key: MAHOUT-971
> URL: https://issues.apache.org/jira/browse/MAHOUT-971
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 0.6
> Environment: amazon S3
> Reporter: Antonio Molins
> Assignee: Sean Owen
> Labels: hadoop
> Fix For: 0.7
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> S3n:// URIs will not work in kmeans because of a couple of calls to FileSystem.get(conf) with no URI.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-971) kmeans does not work in S3
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203149#comment-13203149 ]
Hudson commented on MAHOUT-971:
-------------------------------
Integrated in Mahout-Quality #1340 (See [https://builds.apache.org/job/Mahout-Quality/1340/])
MAHOUT-971 Use FileSystem.get(URI, Configuration) across the board to make it (more likely to) work with S3
srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241649
Files :
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletDriver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/FuzzyKMeansDriver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/LDADriver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/lda/cvb/CVB0Driver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/meanshift/MeanShiftCanopyDriver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/meanshift/MeanShiftCanopyReducer.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/spectral/common/VectorCache.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/spectral/common/VectorMatrixMultiplicationJob.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/common/iterator/sequencefile/SequenceFileDirIterator.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/common/iterator/sequencefile/SequenceFileDirValueIterator.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/FPGrowthDriver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/PFPGrowth.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/ga/watchmaker/MahoutEvaluator.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/DistributedRowMatrix.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/TimesSquaredJob.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/TransposeJob.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/decomposer/DistributedLanczosSolver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/decomposer/EigenVerificationJob.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/ABtDenseOutJob.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/ABtJob.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/BtJob.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCli.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDSolver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/UJob.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/VJob.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/TestClusterClassifier.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/minhash/TestMinHashClustering.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/spectral/common/TestVectorCache.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/fpm/pfpgrowth/FPGrowthTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/fpm/pfpgrowth/FPGrowthTest2.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/TestDistributedRowMatrix.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/stats/BasicStatsTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverDenseTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverSparseSequentialTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/math/stats/entropy/ConditionalEntropyTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/math/stats/entropy/EntropyTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/math/stats/entropy/InformationGainRatioTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/math/stats/entropy/InformationGainTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/vectorizer/DictionaryVectorizerTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/vectorizer/DocumentProcessorTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/vectorizer/EncodedVectorsFromSequenceFilesTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/vectorizer/HighDFWordsPrunerTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/vectorizer/SparseVectorsFromSequenceFilesTest.java
> kmeans does not work in S3
> --------------------------
>
> Key: MAHOUT-971
> URL: https://issues.apache.org/jira/browse/MAHOUT-971
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 0.6
> Environment: amazon S3
> Reporter: Antonio Molins
> Assignee: Sean Owen
> Labels: hadoop
> Fix For: 0.7
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> S3n:// URIs will not work in kmeans because of a couple of calls to FileSystem.get(conf) with no URI.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira