You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by bbossy <gi...@git.apache.org> on 2014/08/14 15:55:04 UTC

[GitHub] spark pull request: SPARK-3039: Allow spark to be built using avro...

GitHub user bbossy opened a pull request:

    https://github.com/apache/spark/pull/1945

    SPARK-3039: Allow spark to be built using avro-mapred for hadoop2

    SPARK-3039: Adds the maven property "avro.mapred.classifier" to build spark-assembly with avro-mapred with support for the new Hadoop API. Sets this property to hadoop2 for Hadoop 2 profiles.
    
    I am not very familiar with maven, nor do I know whether this potentially breaks something in the hive part of spark. There might be a more elegant way of doing this.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/bbossy/spark SPARK-3039

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1945.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1945
    
----
commit c32ce59757ef954e4cbf30d8708af698c3ee3710
Author: Bertrand Bossy <be...@gmail.com>
Date:   2014-08-14T13:45:57Z

    SPARK-3039: Allow spark to be built using avro-mapred for hadoop2

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3039: Allow spark to be built using avro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1945#issuecomment-55466591
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/80/consoleFull) for   PR 1945 at commit [`c32ce59`](https://github.com/apache/spark/commit/c32ce59757ef954e4cbf30d8708af698c3ee3710).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3039: Allow spark to be built using avro...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/1945#issuecomment-52187372
  
    I've looked at this part of the build a lot and can say LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3039: Allow spark to be built using avro...

Posted by bbossy <gi...@git.apache.org>.
Github user bbossy commented on the pull request:

    https://github.com/apache/spark/pull/1945#issuecomment-52209453
  
    Yeah, you're right about yarn being orthogonal to the Hadoop version.
    
    Apart from the maven/sbt question there is another issue: The `Cloudera CDH 4.2.0 with MapReduce v2` case from the README is not covered by a hadoop profile right now. I would need to change it to
    `sbt/sbt -Dhadoop.version=2.0.0-cdh4.2.0 -Davro.mapred.classifier=hadoop2 -Pyarn assembly` or the mvn equivalent.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3039: Allow spark to be built using avro...

Posted by bbossy <gi...@git.apache.org>.
Github user bbossy commented on the pull request:

    https://github.com/apache/spark/pull/1945#issuecomment-52201887
  
    The problem I see, is that if you build according to the README:
    ```
    # Apache Hadoop 2.2.X and newer
    $ sbt/sbt -Dhadoop.version=2.2.0 -Pyarn assembly
    ```
    `avro.mapred.classifier` will not be set to `hadoop2`
    
    Either the README should be changed to account for this, or the property should be added to the yarn and yarn-alpha profile (not the mapr, I think)
    
    Or is there a way to fix this with maven?
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3039: Allow spark to be built using avro...

Posted by bbossy <gi...@git.apache.org>.
Github user bbossy commented on the pull request:

    https://github.com/apache/spark/pull/1945#issuecomment-52290012
  
    Created the issue: https://issues.apache.org/jira/browse/SPARK-3069 (Build instructions in README are outdated)
    
    @srowen: Thank you for your input!
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3039: Allow spark to be built using avro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1945#issuecomment-55473914
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/80/consoleFull) for   PR 1945 at commit [`c32ce59`](https://github.com/apache/spark/commit/c32ce59757ef954e4cbf30d8708af698c3ee3710).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `      throw new IllegalStateException("The main method in the given main class must be static")`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3039: Allow spark to be built using avro...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/1945#issuecomment-52287539
  
    I think it works with the invocation you describe. Honestly it's not a big priority, this version, but nice to get it right. Want to open a JIRA to track updating/deleting the info from README.md? I think it needs to be fixed one way or the other. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3039: Allow spark to be built using avro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1945#issuecomment-54694472
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3039: Allow spark to be built using avro...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/1945#issuecomment-52202237
  
    Yeah that's out of date I believe. For example `-Phadoop-2.3` has to be specified with `-Dhadoop.version=2.3.0`. And I think `mvn` is the primary build now. I imagine you could correct this in the PR here. I wonder if the README should not just point to the web site rather than duplicate this info? the web docs are up to date.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3039: Allow spark to be built using avro...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1945#issuecomment-62773928
  
    Hey @pwendell @srowen @bbossy this is actually causing SBT applications that use the `spark-hive_2.10` module. More details can be found here: https://issues.apache.org/jira/browse/SPARK-4359. For now, I have reverted this in branch-1.1 to prepare for the Spark 1.1.1 release. It may need to be reverted in other branches as well. Just a heads up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3039: Allow spark to be built using avro...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1945#issuecomment-55466291
  
    Yeah - LGTM pending tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3039: Allow spark to be built using avro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1945#issuecomment-52186236
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3039: Allow spark to be built using avro...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/1945


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3039: Allow spark to be built using avro...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/1945#issuecomment-52200382
  
    You have to specify a Hadoop profile already, and you added the classifier to all of them. So that's fine. Building with YARN is orthogonal, so doesn't belong elsewhere I think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3039: Allow spark to be built using avro...

Posted by bbossy <gi...@git.apache.org>.
Github user bbossy commented on the pull request:

    https://github.com/apache/spark/pull/1945#issuecomment-52199845
  
    Should I also add the `avro.mapred.classifier` property to the yarn profile? Maybe even yarn-alpha and mapr?
    
    Since now to build it according to the README one should run: `sbt/sbt -Dhadoop.version=2.2.0 -Pyarn  -Davro.mapred.classifier=hadoop2 assembly`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org