You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zeppelin.apache.org by fantazic <gi...@git.apache.org> on 2015/07/16 03:09:31 UTC

[GitHub] incubator-zeppelin pull request: add hadoop 2.7 profile in spark/p...

GitHub user fantazic opened a pull request:

    https://github.com/apache/incubator-zeppelin/pull/158

    add hadoop 2.7 profile in spark/pom.xml

    If you build Zeppelin as hadoop 2.7 and use Spark interpreter, a version conflict can occur because protobuf.version is not set.
    So it is needed to add hadoop 2.7 profile.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/fantazic/incubator-zeppelin master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-zeppelin/pull/158.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #158
    
----
commit cdd83ba7b12503e786ef00cbd7f9f827ca928b23
Author: george.jw <ge...@daumkakao.com>
Date:   2015-07-16T00:57:20Z

    add hadoop 2.7 profile

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: add hadoop 2.7 profile in spark/p...

Posted by fantazic <gi...@git.apache.org>.

Github user fantazic commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/158#issuecomment-121829375
  
    I build with this script.
    ```
    mvn clean install -DskipTests -Pspark-1.4 -Dspark.version=1.4.0 -Phadoop-2.7 -Pyarn
    ```
    
    The main reason that I use -D option is I need to build Zeppelin on local machine. The server environments of my company have a lot of restrictions, so it's hard to build on the server.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: add hadoop 2.7 profile in spark/p...

Posted by fantazic <gi...@git.apache.org>.

Github user fantazic commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/158#issuecomment-121869744
  
    @jongyoul It's just a mistake. :-)
    And which document do I update for hadoop 2.7?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: add hadoop 2.7 profile in spark/p...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/158#issuecomment-121843308
  
    @fantazic There is no different dependencies of your new profile between hadoop-2.6 and hadoop-2.7. Only hadoop.version is different. So if your profile is correct, there is no side effect with `-Phadoop-2.6 -Dhadoop.version=2.7`. And please tell me more detail about "using spark 1.3.1 option with Spark 1.4.0". If you use `-Pspark-1.3`, you can use Spark 1.3.1 and you can use Spark 1.4.0 with `-Pspark-1.4'.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: add hadoop 2.7 profile in spark/p...

Posted by fantazic <gi...@git.apache.org>.

Github user fantazic commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/158#issuecomment-121852727
  
    @jongyoul FYI
    
    I built with this script.
    
    ```
    mvn clean install -DskipTests -Pspark-1.3 -Dspark.version=1.3.1 -Phadoop-2.6 -Dhadoop.version=2.7.0 -Pyarn
    ```
    The Spark version installed is 1.4.0, so I got this error message.
    ```
    Py4JError: An error occurred while calling None.org.apache.spark.api.python.PythonRDD. Trace:
    py4j.Py4JException: Constructor org.apache.spark.api.python.PythonRDD([class org.apache.spark.rdd.MapPartitionsRDD, class [B, class java.util.HashMap, class java.util.ArrayList, class java.lang.Boolean, class java.lang.String, class java.lang.String, class java.util.ArrayList, class org.apache.spark.Accumulator]) does not exist
    	at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:184)
    	at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:202)
    	at py4j.Gateway.invoke(Gateway.java:213)
    	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
    	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
    	at py4j.GatewayConnection.run(GatewayConnection.java:207)
    	at java.lang.Thread.run(Thread.java:745)
    ```
    
    This is notebook codes.
    ```
    %pyspark
    
    bankText = sc.textFile("/user/hanadmin/zeppelin/tutorial/bank-full.csv")
    
    head = bankText.filter(lambda l: l[:5] != '"age"').take(5)
    for l in head:
        print l
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: add hadoop 2.7 profile in spark/p...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/158#issuecomment-121825920
  
    @fantazic I think you just use -D option for maven of building Zeppelin with hadoop 2.7. I'm just asking, could you tell me any reason for doing this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: add hadoop 2.7 profile in spark/p...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/158#issuecomment-121829813
  
    If another packages were changes, new profile looks good, but if the only change of hadoop.version, I think you'd better use -Dhadoop.version. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: add hadoop 2.7 profile in spark/p...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-zeppelin/pull/158


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: add hadoop 2.7 profile in spark/p...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/158#issuecomment-121843857
  
    And Zeppelin follows the Spark's way for creating profiles for avoiding confusion. I hope there isn't issues build Zeppelin with hadoop 2.7 using `-Phadoop2.6`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: add hadoop 2.7 profile in spark/p...

Posted by Leemoonsoo <gi...@git.apache.org>.

Github user Leemoonsoo commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/158#issuecomment-122335979
  
    Thanks for the contribution! LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: add hadoop 2.7 profile in spark/p...

Posted by fantazic <gi...@git.apache.org>.

Github user fantazic commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/158#issuecomment-121831123
  
    As I don'k know details inside Zeppelin, I think different hadoop.version can cause some side effects. When I used spark 1.3.1 option with Spark 1.4.0, an exception occered while reflecting a constructor. So I think it's better to provide proper versions. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: add hadoop 2.7 profile in spark/p...

Posted by fantazic <gi...@git.apache.org>.

Github user fantazic commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/158#issuecomment-121852068
  
    @jongyoul I got it. If it's the Spark's way, then it's okay. However Hadoop2.7 users may use -Phadoop2.7 option as I did and be confused when getting protobuf version conflicts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: add hadoop 2.7 profile in spark/p...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/158#issuecomment-121853567
  
    @fantazic If you have a spark 1.4 cluster, why don't you build Zeppelin with spark 1.4? 
    
    ```
    mvn clean install -DskipTests -Pspark-1.4 -Dspark.version=1.4.0 -Phadoop-2.6 -Dhadoop.version=2.7.0 -Pyarn
    ```
    
    It will fix pyspark problem and If the type of your cluster is yarn and you want to use spark 1.3.x, please see this issue - #151 - for fixing PYTHONPATH issue


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: add hadoop 2.7 profile in spark/p...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/158#issuecomment-121829702
  
    @fantazic 
    
    ```
    mvn clean install -DskipTests -Pspark-1.4 -Dspark.version=1.4.0 -Phadoop-2.6 -Dhadoop.version=2.7.0 -Pyarn
    ```
    
    will works as you want, doesn't it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: add hadoop 2.7 profile in spark/p...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/158#issuecomment-121852933
  
    @fantazic I think the mistake will happen. Could you please add a documentation for building Z with hadoop 2.7? It will help reduce those mistakes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: add hadoop 2.7 profile in spark/p...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/158#issuecomment-122512885
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: add hadoop 2.7 profile in spark/p...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/158#issuecomment-121870411
  
    @fantazic https://github.com/apache/incubator-zeppelin/blob/master/README.md is the most appropriate position from now. There isn't docs on official website yet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: add hadoop 2.7 profile in spark/p...

Posted by fantazic <gi...@git.apache.org>.

Github user fantazic commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/158#issuecomment-121877386
  
    @jongyoul So, I removed the hadoop-2.7 profile and add build examples in README.md.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---