You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Shivani Rao <ra...@gmail.com> on 2014/04/29 21:32:26 UTC

Spark: issues with running a sbt fat jar due to akka dependencies

Hello folks,

I was going to post this question to spark user group as well. If you have
any leads on how to solve this issue please let me know:

I am trying to build a basic spark project (spark depends on akka) and I am
trying to create a fatjar using sbt assembly. The goal is to run the fatjar
via commandline as follows:
 java -cp "path to my spark fatjar" mainclassname

I encountered deduplication errors in the following akka libraries during
sbt assembly
akka-remote_2.10-2.2.3.jar with akka-remote_2.10-2.2.3-shaded-protobuf.jar
 akka-actor_2.10-2.2.3.jar with akka-actor_2.10-2.2.3-shaded-protobuf.jar

I resolved them by using MergeStrategy.first and that helped with a
successful compilation of the sbt assembly command. But for some or the
other configuration parameter in the akka kept throwing up with the
following message

"Exception in thread "main" com.typesafe.config.ConfigException$Missing: No
configuration setting found for key"

I then used MergeStrategy.concat for "reference.conf" and I started getting
this repeated error

Exception in thread "main" com.typesafe.config.ConfigException$Missing: No
configuration setting found for key 'akka.version'.

I noticed that akka.version is only in the akka-actor jars and not in the
akka-remote. The resulting reference.conf (in my final fat jar) does not
contain akka.version either. So the strategy is not working.

There are several things I could try

a) Use the following dependency https://github.com/sbt/sbt-proguard
b) Write a build.scala to handle merging of reference.conf

https://spark-project.atlassian.net/browse/SPARK-395
http://letitcrash.com/post/21025950392/howto-sbt-assembly-vs-reference-conf

c) Create a reference.conf by merging all akka configurations and then
passing it in my java -cp command as shown below

java -cp <jar-name> -DConfig.file=<config>

The main issue is that if I run the spark jar as "sbt run" there are no
errors in accessing any of the akka configuration parameters. It is only
when I run it via command line (java -cp <jar-name> classname) that I
encounter the error.

Which of these is a long term fix to akka issues? For now, I removed the
akka dependencies and that solved the problem, but I know that is not a
long term solution

Regards,
Shivani

-- 
Software Engineer
Analytics Engineering Team@ Box
Mountain View, CA

Re: Spark: issues with running a sbt fat jar due to akka dependencies

Posted by Stephen Boesch <ja...@gmail.com>.
Hi Shivani,
    Your work would be helpful to others (well at least to me ;) Would you
be willing to share your resultant sbt build files?



2014-05-01 17:45 GMT-07:00 Shivani Rao <ra...@gmail.com>:

> Hello Koert,
>
> That did not work. I specified it in my email already. But I figured a way
> around it  by excluding akka dependencies
>
> Shivani
>
>
> On Tue, Apr 29, 2014 at 12:37 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> you need to merge reference.conf files and its no longer an issue.
>>
>> see the Build for for spark itself:
>>   case "reference.conf" => MergeStrategy.concat
>>
>>
>> On Tue, Apr 29, 2014 at 3:32 PM, Shivani Rao <ra...@gmail.com>wrote:
>>
>>> Hello folks,
>>>
>>> I was going to post this question to spark user group as well. If you
>>> have any leads on how to solve this issue please let me know:
>>>
>>> I am trying to build a basic spark project (spark depends on akka) and I
>>> am trying to create a fatjar using sbt assembly. The goal is to run the
>>> fatjar via commandline as follows:
>>>  java -cp "path to my spark fatjar" mainclassname
>>>
>>> I encountered deduplication errors in the following akka libraries
>>> during sbt assembly
>>> akka-remote_2.10-2.2.3.jar with
>>> akka-remote_2.10-2.2.3-shaded-protobuf.jar
>>>  akka-actor_2.10-2.2.3.jar with akka-actor_2.10-2.2.3-shaded-protobuf.jar
>>>
>>> I resolved them by using MergeStrategy.first and that helped with a
>>> successful compilation of the sbt assembly command. But for some or the
>>> other configuration parameter in the akka kept throwing up with the
>>> following message
>>>
>>> "Exception in thread "main" com.typesafe.config.ConfigException$Missing:
>>> No configuration setting found for key"
>>>
>>> I then used MergeStrategy.concat for "reference.conf" and I started
>>> getting this repeated error
>>>
>>> Exception in thread "main" com.typesafe.config.ConfigException$Missing:
>>> No configuration setting found for key 'akka.version'.
>>>
>>> I noticed that akka.version is only in the akka-actor jars and not in
>>> the akka-remote. The resulting reference.conf (in my final fat jar) does
>>> not contain akka.version either. So the strategy is not working.
>>>
>>> There are several things I could try
>>>
>>> a) Use the following dependency https://github.com/sbt/sbt-proguard
>>> b) Write a build.scala to handle merging of reference.conf
>>>
>>> https://spark-project.atlassian.net/browse/SPARK-395
>>>
>>> http://letitcrash.com/post/21025950392/howto-sbt-assembly-vs-reference-conf
>>>
>>> c) Create a reference.conf by merging all akka configurations and then
>>> passing it in my java -cp command as shown below
>>>
>>> java -cp <jar-name> -DConfig.file=<config>
>>>
>>> The main issue is that if I run the spark jar as "sbt run" there are no
>>> errors in accessing any of the akka configuration parameters. It is only
>>> when I run it via command line (java -cp <jar-name> classname) that I
>>> encounter the error.
>>>
>>> Which of these is a long term fix to akka issues? For now, I removed the
>>> akka dependencies and that solved the problem, but I know that is not a
>>> long term solution
>>>
>>> Regards,
>>> Shivani
>>>
>>> --
>>> Software Engineer
>>> Analytics Engineering Team@ Box
>>> Mountain View, CA
>>>
>>
>>
>
>
> --
> Software Engineer
> Analytics Engineering Team@ Box
> Mountain View, CA
>

Re: Spark: issues with running a sbt fat jar due to akka dependencies

Posted by Shivani Rao <ra...@gmail.com>.
Hello Stephen,

My goal was to run spark on a cluster that already had spark and hadoop
installed. So the right thing to do was to remove these dependencies in my
spark build. I wrote a
blog<http://myresearchdiaries.blogspot.com/2014/05/building-apache-spark-jars.html>
about
it so that it might help.

Here is the set of lines that changed my life

libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
"2.0.0-mr1-cdh4.4.0" % "provided"

libraryDependencies += "org.apache.hadoop" % "hadoop-core" %
"2.0.0-mr1-cdh4.4.0" % "provided"

libraryDependencies += "org.apache.spark" %% "spark-core" %
"0.9.0-incubating" % "provided"

libraryDependencies += "org.apache.spark" %% "spark-mllib" %
"0.9.0-incubating" % "provided"

HTH,
Shivani
PS: @Koert: I think what happened was that the akka that comes packaged
with Spark was overriding the akka configuration parameters set by the akka
dependencies. Since Spark comes with akka, one should probably not have
akka dependencies explicitly specified


On Fri, May 2, 2014 at 5:21 AM, Koert Kuipers <ko...@tresata.com> wrote:

> not sure why applying concat to reference. conf didn't work for you. since
> it simply concatenates the files the key akka.version should be preserved.
> we had the same situation for a while without issues.
>  On May 1, 2014 8:46 PM, "Shivani Rao" <ra...@gmail.com> wrote:
>
>> Hello Koert,
>>
>> That did not work. I specified it in my email already. But I figured a
>> way around it  by excluding akka dependencies
>>
>> Shivani
>>
>>
>> On Tue, Apr 29, 2014 at 12:37 PM, Koert Kuipers <ko...@tresata.com>wrote:
>>
>>> you need to merge reference.conf files and its no longer an issue.
>>>
>>> see the Build for for spark itself:
>>>   case "reference.conf" => MergeStrategy.concat
>>>
>>>
>>> On Tue, Apr 29, 2014 at 3:32 PM, Shivani Rao <ra...@gmail.com>wrote:
>>>
>>>> Hello folks,
>>>>
>>>> I was going to post this question to spark user group as well. If you
>>>> have any leads on how to solve this issue please let me know:
>>>>
>>>> I am trying to build a basic spark project (spark depends on akka) and
>>>> I am trying to create a fatjar using sbt assembly. The goal is to run the
>>>> fatjar via commandline as follows:
>>>>  java -cp "path to my spark fatjar" mainclassname
>>>>
>>>> I encountered deduplication errors in the following akka libraries
>>>> during sbt assembly
>>>> akka-remote_2.10-2.2.3.jar with
>>>> akka-remote_2.10-2.2.3-shaded-protobuf.jar
>>>>  akka-actor_2.10-2.2.3.jar with
>>>> akka-actor_2.10-2.2.3-shaded-protobuf.jar
>>>>
>>>> I resolved them by using MergeStrategy.first and that helped with a
>>>> successful compilation of the sbt assembly command. But for some or the
>>>> other configuration parameter in the akka kept throwing up with the
>>>> following message
>>>>
>>>> "Exception in thread "main"
>>>> com.typesafe.config.ConfigException$Missing: No configuration setting found
>>>> for key"
>>>>
>>>> I then used MergeStrategy.concat for "reference.conf" and I started
>>>> getting this repeated error
>>>>
>>>> Exception in thread "main" com.typesafe.config.ConfigException$Missing:
>>>> No configuration setting found for key 'akka.version'.
>>>>
>>>> I noticed that akka.version is only in the akka-actor jars and not in
>>>> the akka-remote. The resulting reference.conf (in my final fat jar) does
>>>> not contain akka.version either. So the strategy is not working.
>>>>
>>>> There are several things I could try
>>>>
>>>> a) Use the following dependency https://github.com/sbt/sbt-proguard
>>>> b) Write a build.scala to handle merging of reference.conf
>>>>
>>>> https://spark-project.atlassian.net/browse/SPARK-395
>>>>
>>>> http://letitcrash.com/post/21025950392/howto-sbt-assembly-vs-reference-conf
>>>>
>>>> c) Create a reference.conf by merging all akka configurations and then
>>>> passing it in my java -cp command as shown below
>>>>
>>>> java -cp <jar-name> -DConfig.file=<config>
>>>>
>>>> The main issue is that if I run the spark jar as "sbt run" there are no
>>>> errors in accessing any of the akka configuration parameters. It is only
>>>> when I run it via command line (java -cp <jar-name> classname) that I
>>>> encounter the error.
>>>>
>>>> Which of these is a long term fix to akka issues? For now, I removed
>>>> the akka dependencies and that solved the problem, but I know that is not a
>>>> long term solution
>>>>
>>>> Regards,
>>>> Shivani
>>>>
>>>> --
>>>> Software Engineer
>>>> Analytics Engineering Team@ Box
>>>> Mountain View, CA
>>>>
>>>
>>>
>>
>>
>> --
>> Software Engineer
>> Analytics Engineering Team@ Box
>> Mountain View, CA
>>
>


-- 
Software Engineer
Analytics Engineering Team@ Box
Mountain View, CA

Re: Spark: issues with running a sbt fat jar due to akka dependencies

Posted by Koert Kuipers <ko...@tresata.com>.
not sure why applying concat to reference. conf didn't work for you. since
it simply concatenates the files the key akka.version should be preserved.
we had the same situation for a while without issues.
On May 1, 2014 8:46 PM, "Shivani Rao" <ra...@gmail.com> wrote:

> Hello Koert,
>
> That did not work. I specified it in my email already. But I figured a way
> around it  by excluding akka dependencies
>
> Shivani
>
>
> On Tue, Apr 29, 2014 at 12:37 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> you need to merge reference.conf files and its no longer an issue.
>>
>> see the Build for for spark itself:
>>   case "reference.conf" => MergeStrategy.concat
>>
>>
>> On Tue, Apr 29, 2014 at 3:32 PM, Shivani Rao <ra...@gmail.com>wrote:
>>
>>> Hello folks,
>>>
>>> I was going to post this question to spark user group as well. If you
>>> have any leads on how to solve this issue please let me know:
>>>
>>> I am trying to build a basic spark project (spark depends on akka) and I
>>> am trying to create a fatjar using sbt assembly. The goal is to run the
>>> fatjar via commandline as follows:
>>>  java -cp "path to my spark fatjar" mainclassname
>>>
>>> I encountered deduplication errors in the following akka libraries
>>> during sbt assembly
>>> akka-remote_2.10-2.2.3.jar with
>>> akka-remote_2.10-2.2.3-shaded-protobuf.jar
>>>  akka-actor_2.10-2.2.3.jar with akka-actor_2.10-2.2.3-shaded-protobuf.jar
>>>
>>> I resolved them by using MergeStrategy.first and that helped with a
>>> successful compilation of the sbt assembly command. But for some or the
>>> other configuration parameter in the akka kept throwing up with the
>>> following message
>>>
>>> "Exception in thread "main" com.typesafe.config.ConfigException$Missing:
>>> No configuration setting found for key"
>>>
>>> I then used MergeStrategy.concat for "reference.conf" and I started
>>> getting this repeated error
>>>
>>> Exception in thread "main" com.typesafe.config.ConfigException$Missing:
>>> No configuration setting found for key 'akka.version'.
>>>
>>> I noticed that akka.version is only in the akka-actor jars and not in
>>> the akka-remote. The resulting reference.conf (in my final fat jar) does
>>> not contain akka.version either. So the strategy is not working.
>>>
>>> There are several things I could try
>>>
>>> a) Use the following dependency https://github.com/sbt/sbt-proguard
>>> b) Write a build.scala to handle merging of reference.conf
>>>
>>> https://spark-project.atlassian.net/browse/SPARK-395
>>>
>>> http://letitcrash.com/post/21025950392/howto-sbt-assembly-vs-reference-conf
>>>
>>> c) Create a reference.conf by merging all akka configurations and then
>>> passing it in my java -cp command as shown below
>>>
>>> java -cp <jar-name> -DConfig.file=<config>
>>>
>>> The main issue is that if I run the spark jar as "sbt run" there are no
>>> errors in accessing any of the akka configuration parameters. It is only
>>> when I run it via command line (java -cp <jar-name> classname) that I
>>> encounter the error.
>>>
>>> Which of these is a long term fix to akka issues? For now, I removed the
>>> akka dependencies and that solved the problem, but I know that is not a
>>> long term solution
>>>
>>> Regards,
>>> Shivani
>>>
>>> --
>>> Software Engineer
>>> Analytics Engineering Team@ Box
>>> Mountain View, CA
>>>
>>
>>
>
>
> --
> Software Engineer
> Analytics Engineering Team@ Box
> Mountain View, CA
>

Re: Spark: issues with running a sbt fat jar due to akka dependencies

Posted by Shivani Rao <ra...@gmail.com>.
Hello Koert,

That did not work. I specified it in my email already. But I figured a way
around it  by excluding akka dependencies

Shivani


On Tue, Apr 29, 2014 at 12:37 PM, Koert Kuipers <ko...@tresata.com> wrote:

> you need to merge reference.conf files and its no longer an issue.
>
> see the Build for for spark itself:
>   case "reference.conf" => MergeStrategy.concat
>
>
> On Tue, Apr 29, 2014 at 3:32 PM, Shivani Rao <ra...@gmail.com> wrote:
>
>> Hello folks,
>>
>> I was going to post this question to spark user group as well. If you
>> have any leads on how to solve this issue please let me know:
>>
>> I am trying to build a basic spark project (spark depends on akka) and I
>> am trying to create a fatjar using sbt assembly. The goal is to run the
>> fatjar via commandline as follows:
>>  java -cp "path to my spark fatjar" mainclassname
>>
>> I encountered deduplication errors in the following akka libraries during
>> sbt assembly
>> akka-remote_2.10-2.2.3.jar with akka-remote_2.10-2.2.3-shaded-protobuf.jar
>>  akka-actor_2.10-2.2.3.jar with akka-actor_2.10-2.2.3-shaded-protobuf.jar
>>
>> I resolved them by using MergeStrategy.first and that helped with a
>> successful compilation of the sbt assembly command. But for some or the
>> other configuration parameter in the akka kept throwing up with the
>> following message
>>
>> "Exception in thread "main" com.typesafe.config.ConfigException$Missing:
>> No configuration setting found for key"
>>
>> I then used MergeStrategy.concat for "reference.conf" and I started
>> getting this repeated error
>>
>> Exception in thread "main" com.typesafe.config.ConfigException$Missing:
>> No configuration setting found for key 'akka.version'.
>>
>> I noticed that akka.version is only in the akka-actor jars and not in the
>> akka-remote. The resulting reference.conf (in my final fat jar) does not
>> contain akka.version either. So the strategy is not working.
>>
>> There are several things I could try
>>
>> a) Use the following dependency https://github.com/sbt/sbt-proguard
>> b) Write a build.scala to handle merging of reference.conf
>>
>> https://spark-project.atlassian.net/browse/SPARK-395
>>
>> http://letitcrash.com/post/21025950392/howto-sbt-assembly-vs-reference-conf
>>
>> c) Create a reference.conf by merging all akka configurations and then
>> passing it in my java -cp command as shown below
>>
>> java -cp <jar-name> -DConfig.file=<config>
>>
>> The main issue is that if I run the spark jar as "sbt run" there are no
>> errors in accessing any of the akka configuration parameters. It is only
>> when I run it via command line (java -cp <jar-name> classname) that I
>> encounter the error.
>>
>> Which of these is a long term fix to akka issues? For now, I removed the
>> akka dependencies and that solved the problem, but I know that is not a
>> long term solution
>>
>> Regards,
>> Shivani
>>
>> --
>> Software Engineer
>> Analytics Engineering Team@ Box
>> Mountain View, CA
>>
>
>


-- 
Software Engineer
Analytics Engineering Team@ Box
Mountain View, CA

Re: Spark: issues with running a sbt fat jar due to akka dependencies

Posted by Koert Kuipers <ko...@tresata.com>.
you need to merge reference.conf files and its no longer an issue.

see the Build for for spark itself:
  case "reference.conf" => MergeStrategy.concat


On Tue, Apr 29, 2014 at 3:32 PM, Shivani Rao <ra...@gmail.com> wrote:

> Hello folks,
>
> I was going to post this question to spark user group as well. If you have
> any leads on how to solve this issue please let me know:
>
> I am trying to build a basic spark project (spark depends on akka) and I
> am trying to create a fatjar using sbt assembly. The goal is to run the
> fatjar via commandline as follows:
>  java -cp "path to my spark fatjar" mainclassname
>
> I encountered deduplication errors in the following akka libraries during
> sbt assembly
> akka-remote_2.10-2.2.3.jar with akka-remote_2.10-2.2.3-shaded-protobuf.jar
>  akka-actor_2.10-2.2.3.jar with akka-actor_2.10-2.2.3-shaded-protobuf.jar
>
> I resolved them by using MergeStrategy.first and that helped with a
> successful compilation of the sbt assembly command. But for some or the
> other configuration parameter in the akka kept throwing up with the
> following message
>
> "Exception in thread "main" com.typesafe.config.ConfigException$Missing:
> No configuration setting found for key"
>
> I then used MergeStrategy.concat for "reference.conf" and I started
> getting this repeated error
>
> Exception in thread "main" com.typesafe.config.ConfigException$Missing: No
> configuration setting found for key 'akka.version'.
>
> I noticed that akka.version is only in the akka-actor jars and not in the
> akka-remote. The resulting reference.conf (in my final fat jar) does not
> contain akka.version either. So the strategy is not working.
>
> There are several things I could try
>
> a) Use the following dependency https://github.com/sbt/sbt-proguard
> b) Write a build.scala to handle merging of reference.conf
>
> https://spark-project.atlassian.net/browse/SPARK-395
> http://letitcrash.com/post/21025950392/howto-sbt-assembly-vs-reference-conf
>
> c) Create a reference.conf by merging all akka configurations and then
> passing it in my java -cp command as shown below
>
> java -cp <jar-name> -DConfig.file=<config>
>
> The main issue is that if I run the spark jar as "sbt run" there are no
> errors in accessing any of the akka configuration parameters. It is only
> when I run it via command line (java -cp <jar-name> classname) that I
> encounter the error.
>
> Which of these is a long term fix to akka issues? For now, I removed the
> akka dependencies and that solved the problem, but I know that is not a
> long term solution
>
> Regards,
> Shivani
>
> --
> Software Engineer
> Analytics Engineering Team@ Box
> Mountain View, CA
>