You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mahout.apache.org by "Andrew Palumbo (Jira)" <ji...@apache.org> on 2020/03/01 09:00:00 UTC

[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken

    [ https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048499#comment-17048499 ] 

Andrew Palumbo edited comment on MAHOUT-2093 at 3/1/20 8:59 AM:
----------------------------------------------------------------

This is an issue with the Scopt 3.3.0 CLI interface.  We've upgraded in the current master for v14.1 to Scopt v3.7.1, which has solved the problem.

The Mahout Spark Shell is actually handled differently in the call to `/bin/mahout`, and is a pass through to Spark's Scala shell [1], with the mahout jars added, so it does not the Scopt CLI drivers [1][2][3], which is why it works without issue in that release. 

0.14.1 is a huge refactor of the codebase, we're still working out some of the kinks in 14.1. 

I would suggest the last RC, but I believe there was a missing module, from the source distribution which was the reason we scrapped it.

CLI drivers should be working in the current {{github/master}}: [https://github.com/apache/mahout.git] which is currently (mostly) stable.

[1] [https://github.com/apache/mahout/blob/branch-0.14.0/bin/mahout#L299-L314]
 [2] [https://github.com/apache/mahout/blob/branch-0.14.0/community/spark-cli-drivers/src/main/scala/org/apache/mahout/drivers/ItemSimilarityDriver.scala#L44]
 [3] [https://github.com/apache/mahout/blob/branch-0.14.0/community/spark-cli-drivers/src/main/scala/org/apache/mahout/drivers/MahoutOptionParser.scala#L30]


was (Author: andrew_palumbo):
This is an issue with the Scopt 3.3.0 CLI interface.  We've upgraded in the current master for v14.1 to Scopt v3.7.1, which has solved the problem.

The Mahout Spark Shell is actually handled differently in the call to `/bin/mahout`, and is a pass through to Spark's Scala shell [1], with the mahout jars added, so it does not the Scopt CLI drivers [1][2][3], which is why it works without issue in that release.  

0.14.1 is a huge refactor of the codebase, we're still working out some of the kinks in 14.1.  

I would suggest the last RC, but I believe there was a missing module, from the source distribution which was the reason we scrapped it.  It should be working in `github/master`: [https://github.com/apache/mahout.git] which is currently (mostly) stable.

[1] [https://github.com/apache/mahout/blob/branch-0.14.0/bin/mahout#L299-L314]
[2] [https://github.com/apache/mahout/blob/branch-0.14.0/community/spark-cli-drivers/src/main/scala/org/apache/mahout/drivers/ItemSimilarityDriver.scala#L44]
[3] https://github.com/apache/mahout/blob/branch-0.14.0/community/spark-cli-drivers/src/main/scala/org/apache/mahout/drivers/MahoutOptionParser.scala#L30

> Mahout Source Broken
> --------------------
>
>                 Key: MAHOUT-2093
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-2093
>             Project: Mahout
>          Issue Type: Bug
>          Components: Algorithms, Collaborative Filtering, Documentation
>    Affects Versions: 0.14.0, 0.13.2
>            Reporter: Stefan Goldener
>            Priority: Blocker
>
> Seems like newer versions of Mahout do have problems with spark bindings e.g. mahout spark-itemsimilarity or mahout spark-rowsimilarity do not work due to class not found exceptions. 
> {code:java}
> Error: Could not find or load main class org.apache.mahout.drivers.RowSimilarityDriver
> {code}
> {code:java}
> Error: Could not find or load main class org.apache.mahout.drivers.ItemSimilarityDriver
> {code}
> whereas *mahout spark-shell* works flawlessly.
> Here is a short Dockerfile to show the issue:
> {code:yaml}
> FROM openjdk:8-alpine
> ENV spark_uid=185
> ENV SCALA_MAJOR=2.11
> ENV SCALA_MAJOR_MINOR=2.11.12
> ENV HADOOP_MAJOR=2.7
> ENV SPARK_MAJOR_MINOR=2.4.5
> ENV MAHOUT_MAJOR_MINOR=0.14.0
> ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR}
> ENV MAHOUT_BASE=/opt/mahout
> ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION}
> ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR}
> ENV SPARK_BASE=/opt/spark
> ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION}
> ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g"
> ENV SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz"
> ENV MAHOUT_SRC_URL="https://archive.apache.org/dist/mahout/${MAHOUT_MAJOR_MINOR}/mahout-${MAHOUT_MAJOR_MINOR}-source-release.zip"
> ENV ZINC_PORT=3030
> ### build spark
> RUN set -ex && \
>     apk upgrade --no-cache && \
>     ln -s /lib /lib64 && \
>     apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 krb5-libs nss curl openssl git maven && \
>     pip install setuptools && \
>     mkdir -p ${MAHOUT_HOME} && \
>     mkdir -p ${SPARK_BASE} && \
>     curl  -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz  && \
>     tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \
>     rm ${SPARK_HOME}.tgz && \
>     export PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin && \
>     bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \
>     bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} --pip --tgz -DzincPort=${ZINC_PORT} \
>             -Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive -Phive-thriftserver -Pscala-${SCALA_MAJOR}
>     
> ### build mahout
> RUN curl -LfsS $MAHOUT_SRC_URL -o ${MAHOUT_BASE}.zip  && \
>     unzip ${MAHOUT_BASE}.zip -d ${MAHOUT_BASE} && \ 
>     rm ${MAHOUT_BASE}.zip && \
>     cd ${MAHOUT_HOME} && \
>     mvn -Dspark.version=${SPARK_MAJOR_MINOR} -Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} -DskipTests -Dmaven.javadoc.skip=true clean package 
> {code}
> docker build . -t mahout-test
>  docker run -it mahout-test /bin/bash



--
This message was sent by Atlassian Jira
(v8.3.4#803005)