You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Saikat Kanjilal <sx...@hotmail.com> on 2016/11/30 04:14:01 UTC

Spark-9487, Need some insight

Hello Spark dev community,

I took this the following jira item (https://github.com/apache/spark/pull/15848) and am looking for some general pointers, it seems that I am running into issues where things work successfully doing local development on my macbook pro but fail on jenkins for a multitiude of reasons and errors, here's an example,  if you see this build output report: https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69297/ you will see the DataFrameStatSuite, now locally I am running these individual tests with this command: ./build/mvn test -P... -DwildcardSuites=none -Dtest=org.apache.spark.sql.DataFrameStatSuite.     It seems that I need to emulate a jenkins like environment locally, this seems sort of like an untenable hurdle, granted that my changes involve changing the total number of workers in the sparkcontext and if so should I be testing my changes in an environment that more closely resembles jenkins.  I really want to work on/complete this PR but I keep getting hamstrung by a dev environment that is not equivalent to our CI environment.



I'm guessing/hoping I'm not the first one to run into this so some insights. pointers to get past this would be very appreciated , would love to keep contributing and hoping this is a hurdle that's overcomeable with some tweaks to my dev environment.



Thanks in advance.

Re: Spark-9487, Need some insight

Posted by Saikat Kanjilal <sx...@hotmail.com>.

Well  other than making the code consistent whats the high level goal in doing this and why does it matter so much how many workers we have in different scenarios (pyspark versus different components of spark).  I'm ok not making the change and working on something else to be honest but spending hours troubleshooting issues in a local dev environment that doesnt resemble jenkins closely enough is not a productive use of time.  Would love to get input on next logical steps.

________________________________
From: Reynold Xin <rx...@databricks.com>
Sent: Monday, December 5, 2016 6:44 PM
To: Saikat Kanjilal
Cc: dev@spark.apache.org
Subject: Re: Spark-9487, Need some insight

Honestly it is pretty difficult. Given the difficulty, would it still make sense to do that change? (the one that sets the same number of workers/parallelism across different languages in testing)

On Mon, Dec 5, 2016 at 3:33 PM, Saikat Kanjilal <sx...@hotmail.com>> wrote:

Hello again dev community,

Ping on this, apologies for rerunning this thread but never heard from anyone, based on this link:  https://wiki.jenkins-ci.org/display/JENKINS/Installing+Jenkins  I can try to install jenkins locally but is that really needed?

Thanks in advance.

________________________________
From: Saikat Kanjilal <sx...@hotmail.com>>
Sent: Tuesday, November 29, 2016 8:14 PM
To: dev@spark.apache.org<ma...@spark.apache.org>
Subject: Spark-9487, Need some insight

Hello Spark dev community,

I took this the following jira item (https://github.com/apache/spark/pull/15848) and am looking for some general pointers, it seems that I am running into issues where things work successfully doing local development on my macbook pro but fail on jenkins for a multitiude of reasons and errors, here's an example,  if you see this build output report: https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69297/ you will see the DataFrameStatSuite, now locally I am running these individual tests with this command: ./build/mvn test -P... -DwildcardSuites=none -Dtest=org.apache.spark.sql.DataFrameStatSuite.     It seems that I need to emulate a jenkins like environment locally, this seems sort of like an untenable hurdle, granted that my changes involve changing the total number of workers in the sparkcontext and if so should I be testing my changes in an environment that more closely resembles jenkins.  I really want to work on/complete this PR but I keep getting hamstrung by a dev environment that is not equivalent to our CI environment.

I'm guessing/hoping I'm not the first one to run into this so some insights. pointers to get past this would be very appreciated , would love to keep contributing and hoping this is a hurdle that's overcomeable with some tweaks to my dev environment.

Thanks in advance.

Re: Spark-9487, Need some insight

Posted by Reynold Xin <rx...@databricks.com>.

Honestly it is pretty difficult. Given the difficulty, would it still make
sense to do that change? (the one that sets the same number of
workers/parallelism across different languages in testing)


On Mon, Dec 5, 2016 at 3:33 PM, Saikat Kanjilal <sx...@hotmail.com> wrote:

> Hello again dev community,
>
> Ping on this, apologies for rerunning this thread but never heard from
> anyone, based on this link:  https://wiki.jenkins-ci.org/
> display/JENKINS/Installing+Jenkins  I can try to install jenkins locally
> but is that really needed?
>
>
> Thanks in advance.
>
>
> ------------------------------
> *From:* Saikat Kanjilal <sx...@hotmail.com>
> *Sent:* Tuesday, November 29, 2016 8:14 PM
> *To:* dev@spark.apache.org
> *Subject:* Spark-9487, Need some insight
>
>
> Hello Spark dev community,
>
> I took this the following jira item (https://github.com/apache/
> spark/pull/15848) and am looking for some general pointers, it seems that
> I am running into issues where things work successfully doing local
> development on my macbook pro but fail on jenkins for a multitiude of
> reasons and errors, here's an example,  if you see this build
> output report: https://amplab.cs.berkeley.edu/jenkins//job/
> SparkPullRequestBuilder/69297/ you will see the DataFrameStatSuite, now
> locally I am running these individual tests with this command: ./build/mvn
> test -P... -DwildcardSuites=none -Dtest=org.apache.spark.sql.DataFrameStatSuite.
>     It seems that I need to emulate a jenkins like environment locally,
> this seems sort of like an untenable hurdle, granted that my changes
> involve changing the total number of workers in the sparkcontext and if so
> should I be testing my changes in an environment that more closely
> resembles jenkins.  I really want to work on/complete this PR but I keep
> getting hamstrung by a dev environment that is not equivalent to our CI
> environment.
>
>
>
> I'm guessing/hoping I'm not the first one to run into this so some
> insights. pointers to get past this would be very appreciated , would love
> to keep contributing and hoping this is a hurdle that's overcomeable with
> some tweaks to my dev environment.
>
>
>
> Thanks in advance.
>

Re: Spark-9487, Need some insight

Posted by Saikat Kanjilal <sx...@hotmail.com>.

Hello again dev community,

Ping on this, apologies for rerunning this thread but never heard from anyone, based on this link:  https://wiki.jenkins-ci.org/display/JENKINS/Installing+Jenkins  I can try to install jenkins locally but is that really needed?


Thanks in advance.


________________________________
From: Saikat Kanjilal <sx...@hotmail.com>
Sent: Tuesday, November 29, 2016 8:14 PM
To: dev@spark.apache.org
Subject: Spark-9487, Need some insight


Hello Spark dev community,

I took this the following jira item (https://github.com/apache/spark/pull/15848) and am looking for some general pointers, it seems that I am running into issues where things work successfully doing local development on my macbook pro but fail on jenkins for a multitiude of reasons and errors, here's an example,  if you see this build output report: https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69297/ you will see the DataFrameStatSuite, now locally I am running these individual tests with this command: ./build/mvn test -P... -DwildcardSuites=none -Dtest=org.apache.spark.sql.DataFrameStatSuite.     It seems that I need to emulate a jenkins like environment locally, this seems sort of like an untenable hurdle, granted that my changes involve changing the total number of workers in the sparkcontext and if so should I be testing my changes in an environment that more closely resembles jenkins.  I really want to work on/complete this PR but I keep getting hamstrung by a dev environment that is not equivalent to our CI environment.



I'm guessing/hoping I'm not the first one to run into this so some insights. pointers to get past this would be very appreciated , would love to keep contributing and hoping this is a hurdle that's overcomeable with some tweaks to my dev environment.



Thanks in advance.

Re: Spark-9487, Need some insight

Posted by Steve Loughran <st...@hortonworks.com>.

jenkins uses SBT, so you need to do the test run there. They are different, and have different test runners in particular.

On 30 Nov 2016, at 04:14, Saikat Kanjilal <sx...@hotmail.com>> wrote:

Hello Spark dev community,
I took this the following jira item (https://github.com/apache/spark/pull/15848) and am looking for some general pointers, it seems that I am running into issues where things work successfully doing local development on my macbook pro but fail on jenkins for a multitiude of reasons and errors, here's an example, if you see this build output report: https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69297/ you will see the DataFrameStatSuite, now locally I am running these individual tests with this command: ./build/mvn test -P... -DwildcardSuites=none -Dtest=org.apache.spark.sql.DataFrameStatSuite. It seems that I need to emulate a jenkins like environment locally, this seems sort of like an untenable hurdle, granted that my changes involve changing the total number of workers in the sparkcontext and if so should I be testing my changes in an environment that more closely resembles jenkins. I really want to work on/complete this PR but I keep getting hamstrung by a dev environment that is not equivalent to our CI environment.

There's always the option of creating a linux VM/container with jenkins in it; there's a nice trick there in which you can have it watch a git branch, and have it kick off a run whenever you push up to it. That way, you have your own personal jenkins to do the full regression tests, while you yourself work on a small bit.

I'm guessing/hoping I'm not the first one to run into this so some insights. pointers to get past this would be very appreciated , would love to keep contributing and hoping this is a hurdle that's overcomeable with some tweaks to my dev environment.

Thanks in advance.