You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by rawkintrevo <gi...@git.apache.org> on 2016/07/13 20:36:05 UTC
[GitHub] zeppelin issue #928: [ZEPPELIN-116][WIP] Add Mahout Support for Spark Interp...

Github user rawkintrevo commented on the issue:

    https://github.com/apache/zeppelin/pull/928
  
    @bzz, I can't recreate the build failure.
    
    I can say
    - Spark, pySpark, and Mahout notebooks and paragraphs run as expected.
    - Spark and pySpark tests pass. Also, integration tests pass in `zeppelin-server`.  The only thing that fails is the Spark Cluster test. 
    - The part of the Spark Cluster Test that fails is python not being found when testing via the REST API
    - I can also confirm that all of the failing tests ALSO work as expected against a built Zeppein (see following python script to recreate tests)
    
    
    ``` python
    
    # build zeppelin like this:
    #
    # mvn clean package -DskipTests -Psparkr -Ppyspark -Pspark-1.6
    
    from requests import post, get, delete
    from json import dumps
    
    ZEPPELIN_SERVER = "localhost"
    ZEPPELIN_PORT = 8080
    base_url = "http://%s:%i" % (ZEPPELIN_SERVER, ZEPPELIN_PORT)
    
    
    
    def create_notebook(name_of_new_notebook):
        payload = {"name": name_of_new_notebook}
        notebook_url = base_url + "/api/notebook"
        r = post(notebook_url, dumps(payload))
        return r.json()
    
    def delete_notebook(notebook_id):
        target_url = base_url + "/api/notebook/%s" % notebook_id
        r = delete(target_url)
        return r
    
    
    def create_paragraph(code, notebook_id, title=""):
        target_url = base_url + "/api/notebook/%s/paragraph" % notebook_id
        payload = { "title": title, "text": code }
        r = post(target_url, dumps(payload))
        return r.json()["body"]
    
    
    notebook_id = create_notebook("test1")["body"]
    
    test_codes = [
    "%spark print(sc.parallelize(1 to 10).reduce(_ + _))",
        "%r localDF <- data.frame(name=c(\"a\", \"b\", \"c\"), age=c(19, 23, 18))\n" +
        "df <- createDataFrame(sqlContext, localDF)\n" +
        "count(df)",
        "%pyspark print(sc.parallelize(range(1, 11)).reduce(lambda a, b: a + b))",
        "%pyspark print(sc.parallelize(range(1, 11)).reduce(lambda a, b: a + b))",
        "%pyspark\nfrom pyspark.sql.functions import *\n"
        + "print(sqlContext.range(0, 10).withColumn('uniform', rand(seed=10) * 3.14).count())",
        "%spark z.run(1)"
    ]
    
    para_ids = [create_paragraph(c, notebook_id) for c in test_codes]
    
    # run all paragraphs:
    post(base_url + "/api/notebook/job/%s" % notebook_id)
    
    #delete_notebook(notebook_id)
    ```
    
    After two weeks of chasing dead ends and my tail, I call this is an issue with the testing env, not the mahout interpreter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---