You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Emre Sevinc <em...@gmail.com> on 2015/04/20 10:58:54 UTC

How to use Spark Streaming .jar file that I've built using a different branch than master?

Hello,

I'm building a different version of Spark Streaming (based on a different
branch than master) in my application for testing purposes, but it seems
like spark-submit is ignoring my newly built Spark Streaming .jar, and
using an older version.

Here's some context:

I'm on a different branch:

$ git branch
* SPARK-3276
  master

Then I build the Spark Streaming that I've changed:

✔ ~/code/spark [SPARK-3276 L|✚ 1]
$ mvn --projects streaming/ -DskipTests install

it builds without problems, and then when I check my local Maven
repository, I see that I have newly generated Spark Streaming jars:

$ ls -lh
~/.m2/repository/org/apache/spark/spark-streaming_2.10/1.4.0-SNAPSHOT/
total 3.3M
-rw-rw-r-- 1 emre emre 1.6K Apr 20 10:43 maven-metadata-local.xml
-rw-rw-r-- 1 emre emre  421 Apr 20 10:43 _remote.repositories
-rw-rw-r-- 1 emre emre 1.3M Apr 20 10:42
spark-streaming_2.10-1.4.0-SNAPSHOT.jar
-rw-rw-r-- 1 emre emre 622K Apr 20 10:43
spark-streaming_2.10-1.4.0-SNAPSHOT-javadoc.jar
-rw-rw-r-- 1 emre emre 6.7K Apr 20 10:42
spark-streaming_2.10-1.4.0-SNAPSHOT.pom
-rw-rw-r-- 1 emre emre 181K Apr 20 10:42
spark-streaming_2.10-1.4.0-SNAPSHOT-sources.jar
-rw-rw-r-- 1 emre emre 1.2M Apr 20 10:42
spark-streaming_2.10-1.4.0-SNAPSHOT-tests.jar
-rw-rw-r-- 1 emre emre  82K Apr 20 10:42
spark-streaming_2.10-1.4.0-SNAPSHOT-test-sources.jar

Then I build and run an application (in Java) that uses Spark Streaming. In
that test project's pom.xml I have

...
 <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <hadoop.version>2.4.0</hadoop.version>
    <spark.version>1.4.0-SNAPSHOT</spark.version>
  </properties>
...
 <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming_2.10</artifactId>
      <version>${spark.version}</version>
      <scope>provided</scope>
    </dependency>


And then I use

  ~/code/spark/bin/spark-submit

to submit my application. It starts fine, and continues to run on my local
filesystem but when I check the log messages on the console, I don't see
the changes I have made, and I *did* make changes, e.g. changed some
logging messages. It is like when I submit my application, it is not using
the Spark Streaming from *branch SPARK-3276* but from the master branch.

Any ideas what might be causing this? Is there some form of caching? Or is
spark-submit using a different .jar for streaming? (Where?)

How can I see the effects of my changes that I did to Spark Streaming in my
SPARK-3276 branch?

-- 
Emre Sevinç

Re: How to use Spark Streaming .jar file that I've built using a different branch than master?

Posted by Emre Sevinc <em...@gmail.com>.
I thought it was spark-submit that was configuring and arranging everything
related to classpath (am I wrong?), e.g. that's how I used Spark so far. Is
there a way to do it using spark-submit?

--
Emre

On Mon, Apr 20, 2015 at 11:06 AM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> I think you can override the SPARK_CLASSPATH with your newly built jar.
>
> Thanks
> Best Regards
>
> On Mon, Apr 20, 2015 at 2:28 PM, Emre Sevinc <em...@gmail.com>
> wrote:
>
>> Hello,
>>
>> I'm building a different version of Spark Streaming (based on a different
>> branch than master) in my application for testing purposes, but it seems
>> like spark-submit is ignoring my newly built Spark Streaming .jar, and
>> using an older version.
>>
>> Here's some context:
>>
>> I'm on a different branch:
>>
>> $ git branch
>> * SPARK-3276
>>   master
>>
>> Then I build the Spark Streaming that I've changed:
>>
>> ✔ ~/code/spark [SPARK-3276 L|✚ 1]
>> $ mvn --projects streaming/ -DskipTests install
>>
>> it builds without problems, and then when I check my local Maven
>> repository, I see that I have newly generated Spark Streaming jars:
>>
>> $ ls -lh
>> ~/.m2/repository/org/apache/spark/spark-streaming_2.10/1.4.0-SNAPSHOT/
>> total 3.3M
>> -rw-rw-r-- 1 emre emre 1.6K Apr 20 10:43 maven-metadata-local.xml
>> -rw-rw-r-- 1 emre emre  421 Apr 20 10:43 _remote.repositories
>> -rw-rw-r-- 1 emre emre 1.3M Apr 20 10:42
>> spark-streaming_2.10-1.4.0-SNAPSHOT.jar
>> -rw-rw-r-- 1 emre emre 622K Apr 20 10:43
>> spark-streaming_2.10-1.4.0-SNAPSHOT-javadoc.jar
>> -rw-rw-r-- 1 emre emre 6.7K Apr 20 10:42
>> spark-streaming_2.10-1.4.0-SNAPSHOT.pom
>> -rw-rw-r-- 1 emre emre 181K Apr 20 10:42
>> spark-streaming_2.10-1.4.0-SNAPSHOT-sources.jar
>> -rw-rw-r-- 1 emre emre 1.2M Apr 20 10:42
>> spark-streaming_2.10-1.4.0-SNAPSHOT-tests.jar
>> -rw-rw-r-- 1 emre emre  82K Apr 20 10:42
>> spark-streaming_2.10-1.4.0-SNAPSHOT-test-sources.jar
>>
>> Then I build and run an application (in Java) that uses Spark Streaming.
>> In
>> that test project's pom.xml I have
>>
>> ...
>>  <properties>
>>     <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
>>     <hadoop.version>2.4.0</hadoop.version>
>>     <spark.version>1.4.0-SNAPSHOT</spark.version>
>>   </properties>
>> ...
>>  <dependency>
>>       <groupId>org.apache.spark</groupId>
>>       <artifactId>spark-streaming_2.10</artifactId>
>>       <version>${spark.version}</version>
>>       <scope>provided</scope>
>>     </dependency>
>>
>>
>> And then I use
>>
>>   ~/code/spark/bin/spark-submit
>>
>> to submit my application. It starts fine, and continues to run on my local
>> filesystem but when I check the log messages on the console, I don't see
>> the changes I have made, and I *did* make changes, e.g. changed some
>> logging messages. It is like when I submit my application, it is not using
>> the Spark Streaming from *branch SPARK-3276* but from the master branch.
>>
>> Any ideas what might be causing this? Is there some form of caching? Or is
>> spark-submit using a different .jar for streaming? (Where?)
>>
>> How can I see the effects of my changes that I did to Spark Streaming in
>> my
>> SPARK-3276 branch?
>>
>> --
>> Emre Sevinç
>>
>
>


-- 
Emre Sevinc

Re: How to use Spark Streaming .jar file that I've built using a different branch than master?

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
I think you can override the SPARK_CLASSPATH with your newly built jar.

Thanks
Best Regards

On Mon, Apr 20, 2015 at 2:28 PM, Emre Sevinc <em...@gmail.com> wrote:

> Hello,
>
> I'm building a different version of Spark Streaming (based on a different
> branch than master) in my application for testing purposes, but it seems
> like spark-submit is ignoring my newly built Spark Streaming .jar, and
> using an older version.
>
> Here's some context:
>
> I'm on a different branch:
>
> $ git branch
> * SPARK-3276
>   master
>
> Then I build the Spark Streaming that I've changed:
>
> ✔ ~/code/spark [SPARK-3276 L|✚ 1]
> $ mvn --projects streaming/ -DskipTests install
>
> it builds without problems, and then when I check my local Maven
> repository, I see that I have newly generated Spark Streaming jars:
>
> $ ls -lh
> ~/.m2/repository/org/apache/spark/spark-streaming_2.10/1.4.0-SNAPSHOT/
> total 3.3M
> -rw-rw-r-- 1 emre emre 1.6K Apr 20 10:43 maven-metadata-local.xml
> -rw-rw-r-- 1 emre emre  421 Apr 20 10:43 _remote.repositories
> -rw-rw-r-- 1 emre emre 1.3M Apr 20 10:42
> spark-streaming_2.10-1.4.0-SNAPSHOT.jar
> -rw-rw-r-- 1 emre emre 622K Apr 20 10:43
> spark-streaming_2.10-1.4.0-SNAPSHOT-javadoc.jar
> -rw-rw-r-- 1 emre emre 6.7K Apr 20 10:42
> spark-streaming_2.10-1.4.0-SNAPSHOT.pom
> -rw-rw-r-- 1 emre emre 181K Apr 20 10:42
> spark-streaming_2.10-1.4.0-SNAPSHOT-sources.jar
> -rw-rw-r-- 1 emre emre 1.2M Apr 20 10:42
> spark-streaming_2.10-1.4.0-SNAPSHOT-tests.jar
> -rw-rw-r-- 1 emre emre  82K Apr 20 10:42
> spark-streaming_2.10-1.4.0-SNAPSHOT-test-sources.jar
>
> Then I build and run an application (in Java) that uses Spark Streaming. In
> that test project's pom.xml I have
>
> ...
>  <properties>
>     <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
>     <hadoop.version>2.4.0</hadoop.version>
>     <spark.version>1.4.0-SNAPSHOT</spark.version>
>   </properties>
> ...
>  <dependency>
>       <groupId>org.apache.spark</groupId>
>       <artifactId>spark-streaming_2.10</artifactId>
>       <version>${spark.version}</version>
>       <scope>provided</scope>
>     </dependency>
>
>
> And then I use
>
>   ~/code/spark/bin/spark-submit
>
> to submit my application. It starts fine, and continues to run on my local
> filesystem but when I check the log messages on the console, I don't see
> the changes I have made, and I *did* make changes, e.g. changed some
> logging messages. It is like when I submit my application, it is not using
> the Spark Streaming from *branch SPARK-3276* but from the master branch.
>
> Any ideas what might be causing this? Is there some form of caching? Or is
> spark-submit using a different .jar for streaming? (Where?)
>
> How can I see the effects of my changes that I did to Spark Streaming in my
> SPARK-3276 branch?
>
> --
> Emre Sevinç
>

Re: How to use Spark Streaming .jar file that I've built using a different branch than master?

Posted by Emre Sevinc <em...@gmail.com>.
Apparently, after *only* building Spark Streaming, I also have to:

   mvn --projects assembly/ -DskipTests clean install

so that my test project uses the new version when I pass it to spark-submit.

--
Emre Sevinç


On Mon, Apr 20, 2015 at 10:58 AM, Emre Sevinc <em...@gmail.com> wrote:

> Hello,
>
> I'm building a different version of Spark Streaming (based on a different
> branch than master) in my application for testing purposes, but it seems
> like spark-submit is ignoring my newly built Spark Streaming .jar, and
> using an older version.
>
> Here's some context:
>
> I'm on a different branch:
>
> $ git branch
> * SPARK-3276
>   master
>
> Then I build the Spark Streaming that I've changed:
>
> ✔ ~/code/spark [SPARK-3276 L|✚ 1]
> $ mvn --projects streaming/ -DskipTests install
>
> it builds without problems, and then when I check my local Maven
> repository, I see that I have newly generated Spark Streaming jars:
>
> $ ls -lh
> ~/.m2/repository/org/apache/spark/spark-streaming_2.10/1.4.0-SNAPSHOT/
> total 3.3M
> -rw-rw-r-- 1 emre emre 1.6K Apr 20 10:43 maven-metadata-local.xml
> -rw-rw-r-- 1 emre emre  421 Apr 20 10:43 _remote.repositories
> -rw-rw-r-- 1 emre emre 1.3M Apr 20 10:42
> spark-streaming_2.10-1.4.0-SNAPSHOT.jar
> -rw-rw-r-- 1 emre emre 622K Apr 20 10:43
> spark-streaming_2.10-1.4.0-SNAPSHOT-javadoc.jar
> -rw-rw-r-- 1 emre emre 6.7K Apr 20 10:42
> spark-streaming_2.10-1.4.0-SNAPSHOT.pom
> -rw-rw-r-- 1 emre emre 181K Apr 20 10:42
> spark-streaming_2.10-1.4.0-SNAPSHOT-sources.jar
> -rw-rw-r-- 1 emre emre 1.2M Apr 20 10:42
> spark-streaming_2.10-1.4.0-SNAPSHOT-tests.jar
> -rw-rw-r-- 1 emre emre  82K Apr 20 10:42
> spark-streaming_2.10-1.4.0-SNAPSHOT-test-sources.jar
>
> Then I build and run an application (in Java) that uses Spark Streaming.
> In that test project's pom.xml I have
>
> ...
>  <properties>
>     <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
>     <hadoop.version>2.4.0</hadoop.version>
>     <spark.version>1.4.0-SNAPSHOT</spark.version>
>   </properties>
> ...
>  <dependency>
>       <groupId>org.apache.spark</groupId>
>       <artifactId>spark-streaming_2.10</artifactId>
>       <version>${spark.version}</version>
>       <scope>provided</scope>
>     </dependency>
>
>
> And then I use
>
>   ~/code/spark/bin/spark-submit
>
> to submit my application. It starts fine, and continues to run on my local
> filesystem but when I check the log messages on the console, I don't see
> the changes I have made, and I *did* make changes, e.g. changed some
> logging messages. It is like when I submit my application, it is not using
> the Spark Streaming from *branch SPARK-3276* but from the master branch.
>
> Any ideas what might be causing this? Is there some form of caching? Or is
> spark-submit using a different .jar for streaming? (Where?)
>
> How can I see the effects of my changes that I did to Spark Streaming in
> my SPARK-3276 branch?
>
> --
> Emre Sevinç
>



-- 
Emre Sevinc