You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jeremy Lee <un...@gmail.com> on 2014/06/04 14:49:18 UTC

Can't seem to link "external/twitter" classes from my own app

Man, this has been hard going. Six days, and I finally got a "Hello World"
App working that I wrote myself.

Now I'm trying to make a minimal streaming app based on the twitter
examples, (running standalone right now while learning) and when running it
like this:

bin/spark-submit --class "SimpleApp"
SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar

I'm getting this error:

Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/spark/streaming/twitter/TwitterUtils$

Which I'm guessing is because I haven't put in a dependency to
"external/twitter" in the .sbt, but _how_? I can't find any docs on it.
Here's my build file so far:

simple.sbt
------------------------------------------
name := "Simple Project"

version := "1.0"

scalaVersion := "2.10.4"

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.0.0"

libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.0.0"

libraryDependencies += "org.apache.spark" %% "spark-streaming-twitter" %
"1.0.0"

libraryDependencies += "org.twitter4j" % "twitter4j-stream" % "3.0.3"

resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
------------------------------------------

I've tried a few obvious things like adding:

libraryDependencies += "org.apache.spark" %% "spark-external" % "1.0.0"

libraryDependencies += "org.apache.spark" %% "spark-external-twitter" %
"1.0.0"

because, well, that would match the naming scheme implied so far, but it
errors.


Also, I just realized I don't completely understand if:
(a) the "spark-submit" command _sends_ the .jar to all the workers, or
(b) the "spark-submit" commands sends a _job_ to the workers, which are
supposed to already have the jar file installed (or in hdfs), or
(c) the Context is supposed to list the jars to be distributed. (is that
deprecated?)

One part of the documentation says:

 "Once you have an assembled jar you can call the bin/spark-submit script
as shown here while passing your jar."

but another says:

"application-jar: Path to a bundled jar including your application and all
dependencies. The URL must be globally visible inside of your cluster, for
instance, an hdfs:// path or a file:// path that is present on all nodes."

I suppose both could be correct if you take a certain point of view.

-- 
Jeremy Lee  BCompSci(Hons)
  The Unorthodox Engineers

Re: Can't seem to link "external/twitter" classes from my own app

Posted by prabeesh k <pr...@gmail.com>.
Hi Jeremy ,

if you are using *addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.11.4") *
in  "project/plugin.sbt"

You also need to edit  "project / project / build.scala" with same sbt
version(0.11.4).

like

import sbt._

object Plugins extends Build {
  lazy val root = Project("root", file(".")) dependsOn(
    uri("git://github.com/sbt/sbt-assembly.git#0.11.4")
  )
}


Then try *sbt assembly.*

Let me know is it working or not.

Regards,
prabeesh



On Thu, Jun 5, 2014 at 1:16 PM, Nick Pentreath <ni...@gmail.com>
wrote:

> Great - well we do hope we hear from you, since the user list is for
> interesting success stories and anecdotes, as well as blog posts etc too :)
>
>
> On Thu, Jun 5, 2014 at 9:40 AM, Jeremy Lee <unorthodox.engineers@gmail.com
> > wrote:
>
>> Oh. Yes of course. *facepalm*
>>
>> I'm sure I typed that at first, but at some point my fingers decided to
>> grammar-check me. Stupid fingers. I wonder what "sbt assemble" does? (apart
>> from error) It certainly takes a while to do it.
>>
>> Thanks for the maven offer, but I'm not scheduled to learn that until
>> after Scala, streaming, graphx, mllib, HDFS, sbt, Python, and yarn. I'll
>> probably need to know it for yarn, but I'm really hoping to put it off
>> until then. (fortunately I already knew about linux, AWS, eclipse, git,
>> java, distributed programming and ssh keyfiles, or I would have been in
>> real trouble)
>>
>> Ha! OK, that worked for the Kafka project... fails on the other old 0.9
>> Twitter project, but who cares... now for mine....
>>
>> HAHA! YES!! Oh thank you! I have the equivalent of "hello world" that
>> uses one external library! Now the compiler and I can have a _proper_
>> conversation.
>>
>> Hopefully you won't be hearing from me for a while.
>>
>>
>>
>> On Thu, Jun 5, 2014 at 3:06 PM, Nick Pentreath <ni...@gmail.com>
>> wrote:
>>
>>> The "magic incantation" is "sbt assembly" (not "assemble").
>>>
>>> Actually I find maven with their assembly plugins to be very easy (mvn
>>> package). I can send a Pom.xml for a skeleton project if you need
>>> —
>>> Sent from Mailbox <https://www.dropbox.com/mailbox>
>>>
>>>
>>> On Thu, Jun 5, 2014 at 6:59 AM, Jeremy Lee <
>>> unorthodox.engineers@gmail.com> wrote:
>>>
>>>> Hmm.. That's not working so well for me. First, I needed to add a
>>>> "project/plugin.sbt" file with the contents:
>>>>
>>>> addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.11.4")
>>>>
>>>> Before 'sbt/sbt assemble' worked at all. And I'm not sure about that
>>>> version number, but "0.9.1" isn't working much better and "11.4" is the
>>>> latest one recommended by the sbt project site. Where did you get your
>>>> version from?
>>>>
>>>> Second, even when I do get it to build a .jar, spark-submit is still
>>>> telling me the external.twitter library is missing.
>>>>
>>>> I tried using your github project as-is, but it also complained about
>>>> the missing plugin.. I'm trying it with various versions now to see if I
>>>> can get that working, even though I don't know anything about kafka. Hmm,
>>>> and no. Here's what I get:
>>>>
>>>>  [info] Set current project to Simple Project (in build
>>>> file:/home/ubuntu/spark-1.0.0/SparkKafka/)
>>>> [error] Not a valid command: assemble
>>>> [error] Not a valid project ID: assemble
>>>> [error] Expected ':' (if selecting a configuration)
>>>> [error] Not a valid key: assemble (similar: assembly, assemblyJarName,
>>>> assemblyDirectory)
>>>> [error] assemble
>>>> [error]
>>>>
>>>> I also found this project which seemed to be exactly what I was after:
>>>>  https://github.com/prabeesh/SparkTwitterAnalysis
>>>>
>>>> ...but it was for Spark 0.9, and though I updated all the version
>>>> references to "1.0.0", that one doesn't work either. I can't even get it to
>>>> build.
>>>>
>>>> *sigh*
>>>>
>>>> Is it going to be easier to just copy the external/ source code into my
>>>> own project? Because I will... especially if creating "Uberjars" takes this
>>>> long every... single... time...
>>>>
>>>>
>>>>
>>>> On Thu, Jun 5, 2014 at 8:52 AM, Jeremy Lee <
>>>> unorthodox.engineers@gmail.com> wrote:
>>>>
>>>>> Thanks Patrick!
>>>>>
>>>>> Uberjars. Cool. I'd actually heard of them. And thanks for the link to
>>>>> the example! I shall work through that today.
>>>>>
>>>>> I'm still learning sbt and it's many options... the last new framework
>>>>> I learned was node.js, and I think I've been rather spoiled by "npm".
>>>>>
>>>>> At least it's not maven. Please, oh please don't make me learn maven
>>>>> too. (The only people who seem to like it have Software Stockholm Syndrome:
>>>>> "I know maven kidnapped me and beat me up, but if you spend long enough
>>>>> with it, you eventually start to sympathize and see it's point of view".)
>>>>>
>>>>>
>>>>> On Thu, Jun 5, 2014 at 3:39 AM, Patrick Wendell <pw...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hey Jeremy,
>>>>>>
>>>>>> The issue is that you are using one of the external libraries and
>>>>>> these aren't actually packaged with Spark on the cluster, so you need
>>>>>> to create an uber jar that includes them.
>>>>>>
>>>>>> You can look at the example here (I recently did this for a kafka
>>>>>> project and the idea is the same):
>>>>>>
>>>>>> https://github.com/pwendell/kafka-spark-example
>>>>>>
>>>>>> You'll want to make an uber jar that includes these packages (run sbt
>>>>>> assembly) and then submit that jar to spark-submit. Also, I'd try
>>>>>> running it locally first (if you aren't already) just to make the
>>>>>> debugging simpler.
>>>>>>
>>>>>> - Patrick
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 4, 2014 at 6:16 AM, Sean Owen <so...@cloudera.com> wrote:
>>>>>> > Ah sorry, this may be the thing I learned for the day. The issue is
>>>>>> > that classes from that particular artifact are missing though. Worth
>>>>>> > interrogating the resulting .jar file with "jar tf" to see if it
>>>>>> made
>>>>>> > it in?
>>>>>> >
>>>>>> > On Wed, Jun 4, 2014 at 2:12 PM, Nick Pentreath <
>>>>>> nick.pentreath@gmail.com> wrote:
>>>>>> >> @Sean, the %% syntax in SBT should automatically add the Scala
>>>>>> major version
>>>>>> >> qualifier (_2.10, _2.11 etc) for you, so that does appear to be
>>>>>> correct
>>>>>> >> syntax for the build.
>>>>>> >>
>>>>>> >> I seemed to run into this issue with some missing Jackson deps,
>>>>>> and solved
>>>>>> >> it by including the jar explicitly on the driver class path:
>>>>>> >>
>>>>>> >> bin/spark-submit --driver-class-path
>>>>>> >> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar --class
>>>>>> "SimpleApp"
>>>>>> >> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
>>>>>> >>
>>>>>> >> Seems redundant to me since I thought that the JAR as argument is
>>>>>> copied to
>>>>>> >> driver and made available. But this solved it for me so perhaps
>>>>>> give it a
>>>>>> >> try?
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> On Wed, Jun 4, 2014 at 3:01 PM, Sean Owen <so...@cloudera.com>
>>>>>> wrote:
>>>>>> >>>
>>>>>> >>> Those aren't the names of the artifacts:
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-streaming-twitter_2.10%22
>>>>>> >>>
>>>>>> >>> The name is "spark-streaming-twitter_2.10"
>>>>>> >>>
>>>>>> >>> On Wed, Jun 4, 2014 at 1:49 PM, Jeremy Lee
>>>>>> >>> <un...@gmail.com> wrote:
>>>>>> >>> > Man, this has been hard going. Six days, and I finally got a
>>>>>> "Hello
>>>>>> >>> > World"
>>>>>> >>> > App working that I wrote myself.
>>>>>> >>> >
>>>>>> >>> > Now I'm trying to make a minimal streaming app based on the
>>>>>> twitter
>>>>>> >>> > examples, (running standalone right now while learning) and
>>>>>> when running
>>>>>> >>> > it
>>>>>> >>> > like this:
>>>>>> >>> >
>>>>>> >>> > bin/spark-submit --class "SimpleApp"
>>>>>> >>> > SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
>>>>>> >>> >
>>>>>> >>> > I'm getting this error:
>>>>>> >>> >
>>>>>> >>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>>>>>> >>> > org/apache/spark/streaming/twitter/TwitterUtils$
>>>>>> >>> >
>>>>>> >>> > Which I'm guessing is because I haven't put in a dependency to
>>>>>> >>> > "external/twitter" in the .sbt, but _how_? I can't find any
>>>>>> docs on it.
>>>>>> >>> > Here's my build file so far:
>>>>>> >>> >
>>>>>> >>> > simple.sbt
>>>>>> >>> > ------------------------------------------
>>>>>> >>> > name := "Simple Project"
>>>>>> >>> >
>>>>>> >>> > version := "1.0"
>>>>>> >>> >
>>>>>> >>> > scalaVersion := "2.10.4"
>>>>>> >>> >
>>>>>> >>> > libraryDependencies += "org.apache.spark" %% "spark-core" %
>>>>>> "1.0.0"
>>>>>> >>> >
>>>>>> >>> > libraryDependencies += "org.apache.spark" %% "spark-streaming"
>>>>>> % "1.0.0"
>>>>>> >>> >
>>>>>> >>> > libraryDependencies += "org.apache.spark" %%
>>>>>> "spark-streaming-twitter" %
>>>>>> >>> > "1.0.0"
>>>>>> >>> >
>>>>>> >>> > libraryDependencies += "org.twitter4j" % "twitter4j-stream" %
>>>>>> "3.0.3"
>>>>>> >>> >
>>>>>> >>> > resolvers += "Akka Repository" at "
>>>>>> http://repo.akka.io/releases/"
>>>>>> >>> > ------------------------------------------
>>>>>> >>> >
>>>>>> >>> > I've tried a few obvious things like adding:
>>>>>> >>> >
>>>>>> >>> > libraryDependencies += "org.apache.spark" %% "spark-external" %
>>>>>> "1.0.0"
>>>>>> >>> >
>>>>>> >>> > libraryDependencies += "org.apache.spark" %%
>>>>>> "spark-external-twitter" %
>>>>>> >>> > "1.0.0"
>>>>>> >>> >
>>>>>> >>> > because, well, that would match the naming scheme implied so
>>>>>> far, but it
>>>>>> >>> > errors.
>>>>>> >>> >
>>>>>> >>> >
>>>>>> >>> > Also, I just realized I don't completely understand if:
>>>>>> >>> > (a) the "spark-submit" command _sends_ the .jar to all the
>>>>>> workers, or
>>>>>> >>> > (b) the "spark-submit" commands sends a _job_ to the workers,
>>>>>> which are
>>>>>> >>> > supposed to already have the jar file installed (or in hdfs), or
>>>>>> >>> > (c) the Context is supposed to list the jars to be distributed.
>>>>>> (is that
>>>>>> >>> > deprecated?)
>>>>>> >>> >
>>>>>> >>> > One part of the documentation says:
>>>>>> >>> >
>>>>>> >>> >  "Once you have an assembled jar you can call the
>>>>>> bin/spark-submit
>>>>>> >>> > script as
>>>>>> >>> > shown here while passing your jar."
>>>>>> >>> >
>>>>>> >>> > but another says:
>>>>>> >>> >
>>>>>> >>> > "application-jar: Path to a bundled jar including your
>>>>>> application and
>>>>>> >>> > all
>>>>>> >>> > dependencies. The URL must be globally visible inside of your
>>>>>> cluster,
>>>>>> >>> > for
>>>>>> >>> > instance, an hdfs:// path or a file:// path that is present on
>>>>>> all
>>>>>> >>> > nodes."
>>>>>> >>> >
>>>>>> >>> > I suppose both could be correct if you take a certain point of
>>>>>> view.
>>>>>> >>> >
>>>>>> >>> > --
>>>>>> >>> > Jeremy Lee  BCompSci(Hons)
>>>>>> >>> >   The Unorthodox Engineers
>>>>>> >>
>>>>>> >>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jeremy Lee  BCompSci(Hons)
>>>>>   The Unorthodox Engineers
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Jeremy Lee  BCompSci(Hons)
>>>>   The Unorthodox Engineers
>>>>
>>>
>>>
>>
>>
>> --
>> Jeremy Lee  BCompSci(Hons)
>>   The Unorthodox Engineers
>>
>
>

Re: Can't seem to link "external/twitter" classes from my own app

Posted by Jeremy Lee <un...@gmail.com>.
I shan't be far. I'm committed now. Spark and I are going to have a very
interesting future together, but hopefully future messages will be about
the algorithms and modules, and less "how do I run make?".

I suspect doing this at the exact moment of the 0.9 -> 1.0.0 transition
hasn't helped me. (I literally had the documentation changing on me between
page reloads last thursday, after days of studying the old version. I
thought I was going crazy until the new version number appeared in the
corner and the release email went out.)

The last time I entered into a serious relationship with a piece of
software like this was with a little company called Cognos. :-) And then
Microsoft asked us for some advice about a thing called "OLAP Server" they
were making. (But I don't think they listened as hard as they should have.)

Oh, the things I'm going to do with Spark! If it hadn't existed, I would
have had to make it.

(My honors thesis was in distributed computing. I once created an
incrementally compiled language that could pause execution, decompile, move
to another machine, recompile, restore state and continue while preserving
all active network connections. discuss.)




On Thu, Jun 5, 2014 at 5:46 PM, Nick Pentreath <ni...@gmail.com>
wrote:

> Great - well we do hope we hear from you, since the user list is for
> interesting success stories and anecdotes, as well as blog posts etc too :)
>
>
> On Thu, Jun 5, 2014 at 9:40 AM, Jeremy Lee <unorthodox.engineers@gmail.com
> > wrote:
>
>> Oh. Yes of course. *facepalm*
>>
>> I'm sure I typed that at first, but at some point my fingers decided to
>> grammar-check me. Stupid fingers. I wonder what "sbt assemble" does? (apart
>> from error) It certainly takes a while to do it.
>>
>> Thanks for the maven offer, but I'm not scheduled to learn that until
>> after Scala, streaming, graphx, mllib, HDFS, sbt, Python, and yarn. I'll
>> probably need to know it for yarn, but I'm really hoping to put it off
>> until then. (fortunately I already knew about linux, AWS, eclipse, git,
>> java, distributed programming and ssh keyfiles, or I would have been in
>> real trouble)
>>
>> Ha! OK, that worked for the Kafka project... fails on the other old 0.9
>> Twitter project, but who cares... now for mine....
>>
>> HAHA! YES!! Oh thank you! I have the equivalent of "hello world" that
>> uses one external library! Now the compiler and I can have a _proper_
>> conversation.
>>
>> Hopefully you won't be hearing from me for a while.
>>
>>
>>
>> On Thu, Jun 5, 2014 at 3:06 PM, Nick Pentreath <ni...@gmail.com>
>> wrote:
>>
>>> The "magic incantation" is "sbt assembly" (not "assemble").
>>>
>>> Actually I find maven with their assembly plugins to be very easy (mvn
>>> package). I can send a Pom.xml for a skeleton project if you need
>>> —
>>> Sent from Mailbox <https://www.dropbox.com/mailbox>
>>>
>>>
>>> On Thu, Jun 5, 2014 at 6:59 AM, Jeremy Lee <
>>> unorthodox.engineers@gmail.com> wrote:
>>>
>>>> Hmm.. That's not working so well for me. First, I needed to add a
>>>> "project/plugin.sbt" file with the contents:
>>>>
>>>> addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.11.4")
>>>>
>>>> Before 'sbt/sbt assemble' worked at all. And I'm not sure about that
>>>> version number, but "0.9.1" isn't working much better and "11.4" is the
>>>> latest one recommended by the sbt project site. Where did you get your
>>>> version from?
>>>>
>>>> Second, even when I do get it to build a .jar, spark-submit is still
>>>> telling me the external.twitter library is missing.
>>>>
>>>> I tried using your github project as-is, but it also complained about
>>>> the missing plugin.. I'm trying it with various versions now to see if I
>>>> can get that working, even though I don't know anything about kafka. Hmm,
>>>> and no. Here's what I get:
>>>>
>>>>  [info] Set current project to Simple Project (in build
>>>> file:/home/ubuntu/spark-1.0.0/SparkKafka/)
>>>> [error] Not a valid command: assemble
>>>> [error] Not a valid project ID: assemble
>>>> [error] Expected ':' (if selecting a configuration)
>>>> [error] Not a valid key: assemble (similar: assembly, assemblyJarName,
>>>> assemblyDirectory)
>>>> [error] assemble
>>>> [error]
>>>>
>>>> I also found this project which seemed to be exactly what I was after:
>>>>  https://github.com/prabeesh/SparkTwitterAnalysis
>>>>
>>>> ...but it was for Spark 0.9, and though I updated all the version
>>>> references to "1.0.0", that one doesn't work either. I can't even get it to
>>>> build.
>>>>
>>>> *sigh*
>>>>
>>>> Is it going to be easier to just copy the external/ source code into my
>>>> own project? Because I will... especially if creating "Uberjars" takes this
>>>> long every... single... time...
>>>>
>>>>
>>>>
>>>> On Thu, Jun 5, 2014 at 8:52 AM, Jeremy Lee <
>>>> unorthodox.engineers@gmail.com> wrote:
>>>>
>>>>> Thanks Patrick!
>>>>>
>>>>> Uberjars. Cool. I'd actually heard of them. And thanks for the link to
>>>>> the example! I shall work through that today.
>>>>>
>>>>> I'm still learning sbt and it's many options... the last new framework
>>>>> I learned was node.js, and I think I've been rather spoiled by "npm".
>>>>>
>>>>> At least it's not maven. Please, oh please don't make me learn maven
>>>>> too. (The only people who seem to like it have Software Stockholm Syndrome:
>>>>> "I know maven kidnapped me and beat me up, but if you spend long enough
>>>>> with it, you eventually start to sympathize and see it's point of view".)
>>>>>
>>>>>
>>>>> On Thu, Jun 5, 2014 at 3:39 AM, Patrick Wendell <pw...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hey Jeremy,
>>>>>>
>>>>>> The issue is that you are using one of the external libraries and
>>>>>> these aren't actually packaged with Spark on the cluster, so you need
>>>>>> to create an uber jar that includes them.
>>>>>>
>>>>>> You can look at the example here (I recently did this for a kafka
>>>>>> project and the idea is the same):
>>>>>>
>>>>>> https://github.com/pwendell/kafka-spark-example
>>>>>>
>>>>>> You'll want to make an uber jar that includes these packages (run sbt
>>>>>> assembly) and then submit that jar to spark-submit. Also, I'd try
>>>>>> running it locally first (if you aren't already) just to make the
>>>>>> debugging simpler.
>>>>>>
>>>>>> - Patrick
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 4, 2014 at 6:16 AM, Sean Owen <so...@cloudera.com> wrote:
>>>>>> > Ah sorry, this may be the thing I learned for the day. The issue is
>>>>>> > that classes from that particular artifact are missing though. Worth
>>>>>> > interrogating the resulting .jar file with "jar tf" to see if it
>>>>>> made
>>>>>> > it in?
>>>>>> >
>>>>>> > On Wed, Jun 4, 2014 at 2:12 PM, Nick Pentreath <
>>>>>> nick.pentreath@gmail.com> wrote:
>>>>>> >> @Sean, the %% syntax in SBT should automatically add the Scala
>>>>>> major version
>>>>>> >> qualifier (_2.10, _2.11 etc) for you, so that does appear to be
>>>>>> correct
>>>>>> >> syntax for the build.
>>>>>> >>
>>>>>> >> I seemed to run into this issue with some missing Jackson deps,
>>>>>> and solved
>>>>>> >> it by including the jar explicitly on the driver class path:
>>>>>> >>
>>>>>> >> bin/spark-submit --driver-class-path
>>>>>> >> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar --class
>>>>>> "SimpleApp"
>>>>>> >> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
>>>>>> >>
>>>>>> >> Seems redundant to me since I thought that the JAR as argument is
>>>>>> copied to
>>>>>> >> driver and made available. But this solved it for me so perhaps
>>>>>> give it a
>>>>>> >> try?
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> On Wed, Jun 4, 2014 at 3:01 PM, Sean Owen <so...@cloudera.com>
>>>>>> wrote:
>>>>>> >>>
>>>>>> >>> Those aren't the names of the artifacts:
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-streaming-twitter_2.10%22
>>>>>> >>>
>>>>>> >>> The name is "spark-streaming-twitter_2.10"
>>>>>> >>>
>>>>>> >>> On Wed, Jun 4, 2014 at 1:49 PM, Jeremy Lee
>>>>>> >>> <un...@gmail.com> wrote:
>>>>>> >>> > Man, this has been hard going. Six days, and I finally got a
>>>>>> "Hello
>>>>>> >>> > World"
>>>>>> >>> > App working that I wrote myself.
>>>>>> >>> >
>>>>>> >>> > Now I'm trying to make a minimal streaming app based on the
>>>>>> twitter
>>>>>> >>> > examples, (running standalone right now while learning) and
>>>>>> when running
>>>>>> >>> > it
>>>>>> >>> > like this:
>>>>>> >>> >
>>>>>> >>> > bin/spark-submit --class "SimpleApp"
>>>>>> >>> > SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
>>>>>> >>> >
>>>>>> >>> > I'm getting this error:
>>>>>> >>> >
>>>>>> >>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>>>>>> >>> > org/apache/spark/streaming/twitter/TwitterUtils$
>>>>>> >>> >
>>>>>> >>> > Which I'm guessing is because I haven't put in a dependency to
>>>>>> >>> > "external/twitter" in the .sbt, but _how_? I can't find any
>>>>>> docs on it.
>>>>>> >>> > Here's my build file so far:
>>>>>> >>> >
>>>>>> >>> > simple.sbt
>>>>>> >>> > ------------------------------------------
>>>>>> >>> > name := "Simple Project"
>>>>>> >>> >
>>>>>> >>> > version := "1.0"
>>>>>> >>> >
>>>>>> >>> > scalaVersion := "2.10.4"
>>>>>> >>> >
>>>>>> >>> > libraryDependencies += "org.apache.spark" %% "spark-core" %
>>>>>> "1.0.0"
>>>>>> >>> >
>>>>>> >>> > libraryDependencies += "org.apache.spark" %% "spark-streaming"
>>>>>> % "1.0.0"
>>>>>> >>> >
>>>>>> >>> > libraryDependencies += "org.apache.spark" %%
>>>>>> "spark-streaming-twitter" %
>>>>>> >>> > "1.0.0"
>>>>>> >>> >
>>>>>> >>> > libraryDependencies += "org.twitter4j" % "twitter4j-stream" %
>>>>>> "3.0.3"
>>>>>> >>> >
>>>>>> >>> > resolvers += "Akka Repository" at "
>>>>>> http://repo.akka.io/releases/"
>>>>>> >>> > ------------------------------------------
>>>>>> >>> >
>>>>>> >>> > I've tried a few obvious things like adding:
>>>>>> >>> >
>>>>>> >>> > libraryDependencies += "org.apache.spark" %% "spark-external" %
>>>>>> "1.0.0"
>>>>>> >>> >
>>>>>> >>> > libraryDependencies += "org.apache.spark" %%
>>>>>> "spark-external-twitter" %
>>>>>> >>> > "1.0.0"
>>>>>> >>> >
>>>>>> >>> > because, well, that would match the naming scheme implied so
>>>>>> far, but it
>>>>>> >>> > errors.
>>>>>> >>> >
>>>>>> >>> >
>>>>>> >>> > Also, I just realized I don't completely understand if:
>>>>>> >>> > (a) the "spark-submit" command _sends_ the .jar to all the
>>>>>> workers, or
>>>>>> >>> > (b) the "spark-submit" commands sends a _job_ to the workers,
>>>>>> which are
>>>>>> >>> > supposed to already have the jar file installed (or in hdfs), or
>>>>>> >>> > (c) the Context is supposed to list the jars to be distributed.
>>>>>> (is that
>>>>>> >>> > deprecated?)
>>>>>> >>> >
>>>>>> >>> > One part of the documentation says:
>>>>>> >>> >
>>>>>> >>> >  "Once you have an assembled jar you can call the
>>>>>> bin/spark-submit
>>>>>> >>> > script as
>>>>>> >>> > shown here while passing your jar."
>>>>>> >>> >
>>>>>> >>> > but another says:
>>>>>> >>> >
>>>>>> >>> > "application-jar: Path to a bundled jar including your
>>>>>> application and
>>>>>> >>> > all
>>>>>> >>> > dependencies. The URL must be globally visible inside of your
>>>>>> cluster,
>>>>>> >>> > for
>>>>>> >>> > instance, an hdfs:// path or a file:// path that is present on
>>>>>> all
>>>>>> >>> > nodes."
>>>>>> >>> >
>>>>>> >>> > I suppose both could be correct if you take a certain point of
>>>>>> view.
>>>>>> >>> >
>>>>>> >>> > --
>>>>>> >>> > Jeremy Lee  BCompSci(Hons)
>>>>>> >>> >   The Unorthodox Engineers
>>>>>> >>
>>>>>> >>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jeremy Lee  BCompSci(Hons)
>>>>>   The Unorthodox Engineers
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Jeremy Lee  BCompSci(Hons)
>>>>   The Unorthodox Engineers
>>>>
>>>
>>>
>>
>>
>> --
>> Jeremy Lee  BCompSci(Hons)
>>   The Unorthodox Engineers
>>
>
>


-- 
Jeremy Lee  BCompSci(Hons)
  The Unorthodox Engineers

Re: Can't seem to link "external/twitter" classes from my own app

Posted by Nick Pentreath <ni...@gmail.com>.
Great - well we do hope we hear from you, since the user list is for
interesting success stories and anecdotes, as well as blog posts etc too :)


On Thu, Jun 5, 2014 at 9:40 AM, Jeremy Lee <un...@gmail.com>
wrote:

> Oh. Yes of course. *facepalm*
>
> I'm sure I typed that at first, but at some point my fingers decided to
> grammar-check me. Stupid fingers. I wonder what "sbt assemble" does? (apart
> from error) It certainly takes a while to do it.
>
> Thanks for the maven offer, but I'm not scheduled to learn that until
> after Scala, streaming, graphx, mllib, HDFS, sbt, Python, and yarn. I'll
> probably need to know it for yarn, but I'm really hoping to put it off
> until then. (fortunately I already knew about linux, AWS, eclipse, git,
> java, distributed programming and ssh keyfiles, or I would have been in
> real trouble)
>
> Ha! OK, that worked for the Kafka project... fails on the other old 0.9
> Twitter project, but who cares... now for mine....
>
> HAHA! YES!! Oh thank you! I have the equivalent of "hello world" that uses
> one external library! Now the compiler and I can have a _proper_
> conversation.
>
> Hopefully you won't be hearing from me for a while.
>
>
>
> On Thu, Jun 5, 2014 at 3:06 PM, Nick Pentreath <ni...@gmail.com>
> wrote:
>
>> The "magic incantation" is "sbt assembly" (not "assemble").
>>
>> Actually I find maven with their assembly plugins to be very easy (mvn
>> package). I can send a Pom.xml for a skeleton project if you need
>> —
>> Sent from Mailbox <https://www.dropbox.com/mailbox>
>>
>>
>> On Thu, Jun 5, 2014 at 6:59 AM, Jeremy Lee <
>> unorthodox.engineers@gmail.com> wrote:
>>
>>> Hmm.. That's not working so well for me. First, I needed to add a
>>> "project/plugin.sbt" file with the contents:
>>>
>>> addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.11.4")
>>>
>>> Before 'sbt/sbt assemble' worked at all. And I'm not sure about that
>>> version number, but "0.9.1" isn't working much better and "11.4" is the
>>> latest one recommended by the sbt project site. Where did you get your
>>> version from?
>>>
>>> Second, even when I do get it to build a .jar, spark-submit is still
>>> telling me the external.twitter library is missing.
>>>
>>> I tried using your github project as-is, but it also complained about
>>> the missing plugin.. I'm trying it with various versions now to see if I
>>> can get that working, even though I don't know anything about kafka. Hmm,
>>> and no. Here's what I get:
>>>
>>>  [info] Set current project to Simple Project (in build
>>> file:/home/ubuntu/spark-1.0.0/SparkKafka/)
>>> [error] Not a valid command: assemble
>>> [error] Not a valid project ID: assemble
>>> [error] Expected ':' (if selecting a configuration)
>>> [error] Not a valid key: assemble (similar: assembly, assemblyJarName,
>>> assemblyDirectory)
>>> [error] assemble
>>> [error]
>>>
>>> I also found this project which seemed to be exactly what I was after:
>>>  https://github.com/prabeesh/SparkTwitterAnalysis
>>>
>>> ...but it was for Spark 0.9, and though I updated all the version
>>> references to "1.0.0", that one doesn't work either. I can't even get it to
>>> build.
>>>
>>> *sigh*
>>>
>>> Is it going to be easier to just copy the external/ source code into my
>>> own project? Because I will... especially if creating "Uberjars" takes this
>>> long every... single... time...
>>>
>>>
>>>
>>> On Thu, Jun 5, 2014 at 8:52 AM, Jeremy Lee <
>>> unorthodox.engineers@gmail.com> wrote:
>>>
>>>> Thanks Patrick!
>>>>
>>>> Uberjars. Cool. I'd actually heard of them. And thanks for the link to
>>>> the example! I shall work through that today.
>>>>
>>>> I'm still learning sbt and it's many options... the last new framework
>>>> I learned was node.js, and I think I've been rather spoiled by "npm".
>>>>
>>>> At least it's not maven. Please, oh please don't make me learn maven
>>>> too. (The only people who seem to like it have Software Stockholm Syndrome:
>>>> "I know maven kidnapped me and beat me up, but if you spend long enough
>>>> with it, you eventually start to sympathize and see it's point of view".)
>>>>
>>>>
>>>> On Thu, Jun 5, 2014 at 3:39 AM, Patrick Wendell <pw...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hey Jeremy,
>>>>>
>>>>> The issue is that you are using one of the external libraries and
>>>>> these aren't actually packaged with Spark on the cluster, so you need
>>>>> to create an uber jar that includes them.
>>>>>
>>>>> You can look at the example here (I recently did this for a kafka
>>>>> project and the idea is the same):
>>>>>
>>>>> https://github.com/pwendell/kafka-spark-example
>>>>>
>>>>> You'll want to make an uber jar that includes these packages (run sbt
>>>>> assembly) and then submit that jar to spark-submit. Also, I'd try
>>>>> running it locally first (if you aren't already) just to make the
>>>>> debugging simpler.
>>>>>
>>>>> - Patrick
>>>>>
>>>>>
>>>>> On Wed, Jun 4, 2014 at 6:16 AM, Sean Owen <so...@cloudera.com> wrote:
>>>>> > Ah sorry, this may be the thing I learned for the day. The issue is
>>>>> > that classes from that particular artifact are missing though. Worth
>>>>> > interrogating the resulting .jar file with "jar tf" to see if it made
>>>>> > it in?
>>>>> >
>>>>> > On Wed, Jun 4, 2014 at 2:12 PM, Nick Pentreath <
>>>>> nick.pentreath@gmail.com> wrote:
>>>>> >> @Sean, the %% syntax in SBT should automatically add the Scala
>>>>> major version
>>>>> >> qualifier (_2.10, _2.11 etc) for you, so that does appear to be
>>>>> correct
>>>>> >> syntax for the build.
>>>>> >>
>>>>> >> I seemed to run into this issue with some missing Jackson deps, and
>>>>> solved
>>>>> >> it by including the jar explicitly on the driver class path:
>>>>> >>
>>>>> >> bin/spark-submit --driver-class-path
>>>>> >> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar --class
>>>>> "SimpleApp"
>>>>> >> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
>>>>> >>
>>>>> >> Seems redundant to me since I thought that the JAR as argument is
>>>>> copied to
>>>>> >> driver and made available. But this solved it for me so perhaps
>>>>> give it a
>>>>> >> try?
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On Wed, Jun 4, 2014 at 3:01 PM, Sean Owen <so...@cloudera.com>
>>>>> wrote:
>>>>> >>>
>>>>> >>> Those aren't the names of the artifacts:
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-streaming-twitter_2.10%22
>>>>> >>>
>>>>> >>> The name is "spark-streaming-twitter_2.10"
>>>>> >>>
>>>>> >>> On Wed, Jun 4, 2014 at 1:49 PM, Jeremy Lee
>>>>> >>> <un...@gmail.com> wrote:
>>>>> >>> > Man, this has been hard going. Six days, and I finally got a
>>>>> "Hello
>>>>> >>> > World"
>>>>> >>> > App working that I wrote myself.
>>>>> >>> >
>>>>> >>> > Now I'm trying to make a minimal streaming app based on the
>>>>> twitter
>>>>> >>> > examples, (running standalone right now while learning) and when
>>>>> running
>>>>> >>> > it
>>>>> >>> > like this:
>>>>> >>> >
>>>>> >>> > bin/spark-submit --class "SimpleApp"
>>>>> >>> > SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
>>>>> >>> >
>>>>> >>> > I'm getting this error:
>>>>> >>> >
>>>>> >>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>>>>> >>> > org/apache/spark/streaming/twitter/TwitterUtils$
>>>>> >>> >
>>>>> >>> > Which I'm guessing is because I haven't put in a dependency to
>>>>> >>> > "external/twitter" in the .sbt, but _how_? I can't find any docs
>>>>> on it.
>>>>> >>> > Here's my build file so far:
>>>>> >>> >
>>>>> >>> > simple.sbt
>>>>> >>> > ------------------------------------------
>>>>> >>> > name := "Simple Project"
>>>>> >>> >
>>>>> >>> > version := "1.0"
>>>>> >>> >
>>>>> >>> > scalaVersion := "2.10.4"
>>>>> >>> >
>>>>> >>> > libraryDependencies += "org.apache.spark" %% "spark-core" %
>>>>> "1.0.0"
>>>>> >>> >
>>>>> >>> > libraryDependencies += "org.apache.spark" %% "spark-streaming" %
>>>>> "1.0.0"
>>>>> >>> >
>>>>> >>> > libraryDependencies += "org.apache.spark" %%
>>>>> "spark-streaming-twitter" %
>>>>> >>> > "1.0.0"
>>>>> >>> >
>>>>> >>> > libraryDependencies += "org.twitter4j" % "twitter4j-stream" %
>>>>> "3.0.3"
>>>>> >>> >
>>>>> >>> > resolvers += "Akka Repository" at "http://repo.akka.io/releases/
>>>>> "
>>>>> >>> > ------------------------------------------
>>>>> >>> >
>>>>> >>> > I've tried a few obvious things like adding:
>>>>> >>> >
>>>>> >>> > libraryDependencies += "org.apache.spark" %% "spark-external" %
>>>>> "1.0.0"
>>>>> >>> >
>>>>> >>> > libraryDependencies += "org.apache.spark" %%
>>>>> "spark-external-twitter" %
>>>>> >>> > "1.0.0"
>>>>> >>> >
>>>>> >>> > because, well, that would match the naming scheme implied so
>>>>> far, but it
>>>>> >>> > errors.
>>>>> >>> >
>>>>> >>> >
>>>>> >>> > Also, I just realized I don't completely understand if:
>>>>> >>> > (a) the "spark-submit" command _sends_ the .jar to all the
>>>>> workers, or
>>>>> >>> > (b) the "spark-submit" commands sends a _job_ to the workers,
>>>>> which are
>>>>> >>> > supposed to already have the jar file installed (or in hdfs), or
>>>>> >>> > (c) the Context is supposed to list the jars to be distributed.
>>>>> (is that
>>>>> >>> > deprecated?)
>>>>> >>> >
>>>>> >>> > One part of the documentation says:
>>>>> >>> >
>>>>> >>> >  "Once you have an assembled jar you can call the
>>>>> bin/spark-submit
>>>>> >>> > script as
>>>>> >>> > shown here while passing your jar."
>>>>> >>> >
>>>>> >>> > but another says:
>>>>> >>> >
>>>>> >>> > "application-jar: Path to a bundled jar including your
>>>>> application and
>>>>> >>> > all
>>>>> >>> > dependencies. The URL must be globally visible inside of your
>>>>> cluster,
>>>>> >>> > for
>>>>> >>> > instance, an hdfs:// path or a file:// path that is present on
>>>>> all
>>>>> >>> > nodes."
>>>>> >>> >
>>>>> >>> > I suppose both could be correct if you take a certain point of
>>>>> view.
>>>>> >>> >
>>>>> >>> > --
>>>>> >>> > Jeremy Lee  BCompSci(Hons)
>>>>> >>> >   The Unorthodox Engineers
>>>>> >>
>>>>> >>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Jeremy Lee  BCompSci(Hons)
>>>>   The Unorthodox Engineers
>>>>
>>>
>>>
>>>
>>> --
>>> Jeremy Lee  BCompSci(Hons)
>>>   The Unorthodox Engineers
>>>
>>
>>
>
>
> --
> Jeremy Lee  BCompSci(Hons)
>   The Unorthodox Engineers
>

Re: Can't seem to link "external/twitter" classes from my own app

Posted by Jeremy Lee <un...@gmail.com>.
Oh. Yes of course. *facepalm*

I'm sure I typed that at first, but at some point my fingers decided to
grammar-check me. Stupid fingers. I wonder what "sbt assemble" does? (apart
from error) It certainly takes a while to do it.

Thanks for the maven offer, but I'm not scheduled to learn that until after
Scala, streaming, graphx, mllib, HDFS, sbt, Python, and yarn. I'll probably
need to know it for yarn, but I'm really hoping to put it off until then.
(fortunately I already knew about linux, AWS, eclipse, git, java,
distributed programming and ssh keyfiles, or I would have been in real
trouble)

Ha! OK, that worked for the Kafka project... fails on the other old 0.9
Twitter project, but who cares... now for mine....

HAHA! YES!! Oh thank you! I have the equivalent of "hello world" that uses
one external library! Now the compiler and I can have a _proper_
conversation.

Hopefully you won't be hearing from me for a while.



On Thu, Jun 5, 2014 at 3:06 PM, Nick Pentreath <ni...@gmail.com>
wrote:

> The "magic incantation" is "sbt assembly" (not "assemble").
>
> Actually I find maven with their assembly plugins to be very easy (mvn
> package). I can send a Pom.xml for a skeleton project if you need
> —
> Sent from Mailbox <https://www.dropbox.com/mailbox>
>
>
> On Thu, Jun 5, 2014 at 6:59 AM, Jeremy Lee <unorthodox.engineers@gmail.com
> > wrote:
>
>> Hmm.. That's not working so well for me. First, I needed to add a
>> "project/plugin.sbt" file with the contents:
>>
>> addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.11.4")
>>
>> Before 'sbt/sbt assemble' worked at all. And I'm not sure about that
>> version number, but "0.9.1" isn't working much better and "11.4" is the
>> latest one recommended by the sbt project site. Where did you get your
>> version from?
>>
>> Second, even when I do get it to build a .jar, spark-submit is still
>> telling me the external.twitter library is missing.
>>
>> I tried using your github project as-is, but it also complained about the
>> missing plugin.. I'm trying it with various versions now to see if I can
>> get that working, even though I don't know anything about kafka. Hmm, and
>> no. Here's what I get:
>>
>>  [info] Set current project to Simple Project (in build
>> file:/home/ubuntu/spark-1.0.0/SparkKafka/)
>> [error] Not a valid command: assemble
>> [error] Not a valid project ID: assemble
>> [error] Expected ':' (if selecting a configuration)
>> [error] Not a valid key: assemble (similar: assembly, assemblyJarName,
>> assemblyDirectory)
>> [error] assemble
>> [error]
>>
>> I also found this project which seemed to be exactly what I was after:
>>  https://github.com/prabeesh/SparkTwitterAnalysis
>>
>> ...but it was for Spark 0.9, and though I updated all the version
>> references to "1.0.0", that one doesn't work either. I can't even get it to
>> build.
>>
>> *sigh*
>>
>> Is it going to be easier to just copy the external/ source code into my
>> own project? Because I will... especially if creating "Uberjars" takes this
>> long every... single... time...
>>
>>
>>
>> On Thu, Jun 5, 2014 at 8:52 AM, Jeremy Lee <
>> unorthodox.engineers@gmail.com> wrote:
>>
>>> Thanks Patrick!
>>>
>>> Uberjars. Cool. I'd actually heard of them. And thanks for the link to
>>> the example! I shall work through that today.
>>>
>>> I'm still learning sbt and it's many options... the last new framework I
>>> learned was node.js, and I think I've been rather spoiled by "npm".
>>>
>>> At least it's not maven. Please, oh please don't make me learn maven
>>> too. (The only people who seem to like it have Software Stockholm Syndrome:
>>> "I know maven kidnapped me and beat me up, but if you spend long enough
>>> with it, you eventually start to sympathize and see it's point of view".)
>>>
>>>
>>> On Thu, Jun 5, 2014 at 3:39 AM, Patrick Wendell <pw...@gmail.com>
>>> wrote:
>>>
>>>> Hey Jeremy,
>>>>
>>>> The issue is that you are using one of the external libraries and
>>>> these aren't actually packaged with Spark on the cluster, so you need
>>>> to create an uber jar that includes them.
>>>>
>>>> You can look at the example here (I recently did this for a kafka
>>>> project and the idea is the same):
>>>>
>>>> https://github.com/pwendell/kafka-spark-example
>>>>
>>>> You'll want to make an uber jar that includes these packages (run sbt
>>>> assembly) and then submit that jar to spark-submit. Also, I'd try
>>>> running it locally first (if you aren't already) just to make the
>>>> debugging simpler.
>>>>
>>>> - Patrick
>>>>
>>>>
>>>> On Wed, Jun 4, 2014 at 6:16 AM, Sean Owen <so...@cloudera.com> wrote:
>>>> > Ah sorry, this may be the thing I learned for the day. The issue is
>>>> > that classes from that particular artifact are missing though. Worth
>>>> > interrogating the resulting .jar file with "jar tf" to see if it made
>>>> > it in?
>>>> >
>>>> > On Wed, Jun 4, 2014 at 2:12 PM, Nick Pentreath <
>>>> nick.pentreath@gmail.com> wrote:
>>>> >> @Sean, the %% syntax in SBT should automatically add the Scala major
>>>> version
>>>> >> qualifier (_2.10, _2.11 etc) for you, so that does appear to be
>>>> correct
>>>> >> syntax for the build.
>>>> >>
>>>> >> I seemed to run into this issue with some missing Jackson deps, and
>>>> solved
>>>> >> it by including the jar explicitly on the driver class path:
>>>> >>
>>>> >> bin/spark-submit --driver-class-path
>>>> >> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar --class
>>>> "SimpleApp"
>>>> >> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
>>>> >>
>>>> >> Seems redundant to me since I thought that the JAR as argument is
>>>> copied to
>>>> >> driver and made available. But this solved it for me so perhaps give
>>>> it a
>>>> >> try?
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Wed, Jun 4, 2014 at 3:01 PM, Sean Owen <so...@cloudera.com>
>>>> wrote:
>>>> >>>
>>>> >>> Those aren't the names of the artifacts:
>>>> >>>
>>>> >>>
>>>> >>>
>>>> http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-streaming-twitter_2.10%22
>>>> >>>
>>>> >>> The name is "spark-streaming-twitter_2.10"
>>>> >>>
>>>> >>> On Wed, Jun 4, 2014 at 1:49 PM, Jeremy Lee
>>>> >>> <un...@gmail.com> wrote:
>>>> >>> > Man, this has been hard going. Six days, and I finally got a
>>>> "Hello
>>>> >>> > World"
>>>> >>> > App working that I wrote myself.
>>>> >>> >
>>>> >>> > Now I'm trying to make a minimal streaming app based on the
>>>> twitter
>>>> >>> > examples, (running standalone right now while learning) and when
>>>> running
>>>> >>> > it
>>>> >>> > like this:
>>>> >>> >
>>>> >>> > bin/spark-submit --class "SimpleApp"
>>>> >>> > SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
>>>> >>> >
>>>> >>> > I'm getting this error:
>>>> >>> >
>>>> >>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>>>> >>> > org/apache/spark/streaming/twitter/TwitterUtils$
>>>> >>> >
>>>> >>> > Which I'm guessing is because I haven't put in a dependency to
>>>> >>> > "external/twitter" in the .sbt, but _how_? I can't find any docs
>>>> on it.
>>>> >>> > Here's my build file so far:
>>>> >>> >
>>>> >>> > simple.sbt
>>>> >>> > ------------------------------------------
>>>> >>> > name := "Simple Project"
>>>> >>> >
>>>> >>> > version := "1.0"
>>>> >>> >
>>>> >>> > scalaVersion := "2.10.4"
>>>> >>> >
>>>> >>> > libraryDependencies += "org.apache.spark" %% "spark-core" %
>>>> "1.0.0"
>>>> >>> >
>>>> >>> > libraryDependencies += "org.apache.spark" %% "spark-streaming" %
>>>> "1.0.0"
>>>> >>> >
>>>> >>> > libraryDependencies += "org.apache.spark" %%
>>>> "spark-streaming-twitter" %
>>>> >>> > "1.0.0"
>>>> >>> >
>>>> >>> > libraryDependencies += "org.twitter4j" % "twitter4j-stream" %
>>>> "3.0.3"
>>>> >>> >
>>>> >>> > resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
>>>> >>> > ------------------------------------------
>>>> >>> >
>>>> >>> > I've tried a few obvious things like adding:
>>>> >>> >
>>>> >>> > libraryDependencies += "org.apache.spark" %% "spark-external" %
>>>> "1.0.0"
>>>> >>> >
>>>> >>> > libraryDependencies += "org.apache.spark" %%
>>>> "spark-external-twitter" %
>>>> >>> > "1.0.0"
>>>> >>> >
>>>> >>> > because, well, that would match the naming scheme implied so far,
>>>> but it
>>>> >>> > errors.
>>>> >>> >
>>>> >>> >
>>>> >>> > Also, I just realized I don't completely understand if:
>>>> >>> > (a) the "spark-submit" command _sends_ the .jar to all the
>>>> workers, or
>>>> >>> > (b) the "spark-submit" commands sends a _job_ to the workers,
>>>> which are
>>>> >>> > supposed to already have the jar file installed (or in hdfs), or
>>>> >>> > (c) the Context is supposed to list the jars to be distributed.
>>>> (is that
>>>> >>> > deprecated?)
>>>> >>> >
>>>> >>> > One part of the documentation says:
>>>> >>> >
>>>> >>> >  "Once you have an assembled jar you can call the bin/spark-submit
>>>> >>> > script as
>>>> >>> > shown here while passing your jar."
>>>> >>> >
>>>> >>> > but another says:
>>>> >>> >
>>>> >>> > "application-jar: Path to a bundled jar including your
>>>> application and
>>>> >>> > all
>>>> >>> > dependencies. The URL must be globally visible inside of your
>>>> cluster,
>>>> >>> > for
>>>> >>> > instance, an hdfs:// path or a file:// path that is present on all
>>>> >>> > nodes."
>>>> >>> >
>>>> >>> > I suppose both could be correct if you take a certain point of
>>>> view.
>>>> >>> >
>>>> >>> > --
>>>> >>> > Jeremy Lee  BCompSci(Hons)
>>>> >>> >   The Unorthodox Engineers
>>>> >>
>>>> >>
>>>>
>>>
>>>
>>>
>>> --
>>> Jeremy Lee  BCompSci(Hons)
>>>   The Unorthodox Engineers
>>>
>>
>>
>>
>> --
>> Jeremy Lee  BCompSci(Hons)
>>   The Unorthodox Engineers
>>
>
>


-- 
Jeremy Lee  BCompSci(Hons)
  The Unorthodox Engineers

Re: Can't seem to link "external/twitter" classes from my own app

Posted by Nick Pentreath <ni...@gmail.com>.
The "magic incantation" is "sbt assembly" (not "assemble").


Actually I find maven with their assembly plugins to be very easy (mvn package). I can send a Pom.xml for a skeleton project if you need
—
Sent from Mailbox

On Thu, Jun 5, 2014 at 6:59 AM, Jeremy Lee <un...@gmail.com>
wrote:

> Hmm.. That's not working so well for me. First, I needed to add a
> "project/plugin.sbt" file with the contents:
> addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.11.4")
> Before 'sbt/sbt assemble' worked at all. And I'm not sure about that
> version number, but "0.9.1" isn't working much better and "11.4" is the
> latest one recommended by the sbt project site. Where did you get your
> version from?
> Second, even when I do get it to build a .jar, spark-submit is still
> telling me the external.twitter library is missing.
> I tried using your github project as-is, but it also complained about the
> missing plugin.. I'm trying it with various versions now to see if I can
> get that working, even though I don't know anything about kafka. Hmm, and
> no. Here's what I get:
> [info] Set current project to Simple Project (in build
> file:/home/ubuntu/spark-1.0.0/SparkKafka/)
> [error] Not a valid command: assemble
> [error] Not a valid project ID: assemble
> [error] Expected ':' (if selecting a configuration)
> [error] Not a valid key: assemble (similar: assembly, assemblyJarName,
> assemblyDirectory)
> [error] assemble
> [error]
> I also found this project which seemed to be exactly what I was after:
> https://github.com/prabeesh/SparkTwitterAnalysis
> ...but it was for Spark 0.9, and though I updated all the version
> references to "1.0.0", that one doesn't work either. I can't even get it to
> build.
> *sigh*
> Is it going to be easier to just copy the external/ source code into my own
> project? Because I will... especially if creating "Uberjars" takes this
> long every... single... time...
> On Thu, Jun 5, 2014 at 8:52 AM, Jeremy Lee <un...@gmail.com>
> wrote:
>> Thanks Patrick!
>>
>> Uberjars. Cool. I'd actually heard of them. And thanks for the link to the
>> example! I shall work through that today.
>>
>> I'm still learning sbt and it's many options... the last new framework I
>> learned was node.js, and I think I've been rather spoiled by "npm".
>>
>> At least it's not maven. Please, oh please don't make me learn maven too.
>> (The only people who seem to like it have Software Stockholm Syndrome: "I
>> know maven kidnapped me and beat me up, but if you spend long enough with
>> it, you eventually start to sympathize and see it's point of view".)
>>
>>
>> On Thu, Jun 5, 2014 at 3:39 AM, Patrick Wendell <pw...@gmail.com>
>> wrote:
>>
>>> Hey Jeremy,
>>>
>>> The issue is that you are using one of the external libraries and
>>> these aren't actually packaged with Spark on the cluster, so you need
>>> to create an uber jar that includes them.
>>>
>>> You can look at the example here (I recently did this for a kafka
>>> project and the idea is the same):
>>>
>>> https://github.com/pwendell/kafka-spark-example
>>>
>>> You'll want to make an uber jar that includes these packages (run sbt
>>> assembly) and then submit that jar to spark-submit. Also, I'd try
>>> running it locally first (if you aren't already) just to make the
>>> debugging simpler.
>>>
>>> - Patrick
>>>
>>>
>>> On Wed, Jun 4, 2014 at 6:16 AM, Sean Owen <so...@cloudera.com> wrote:
>>> > Ah sorry, this may be the thing I learned for the day. The issue is
>>> > that classes from that particular artifact are missing though. Worth
>>> > interrogating the resulting .jar file with "jar tf" to see if it made
>>> > it in?
>>> >
>>> > On Wed, Jun 4, 2014 at 2:12 PM, Nick Pentreath <
>>> nick.pentreath@gmail.com> wrote:
>>> >> @Sean, the %% syntax in SBT should automatically add the Scala major
>>> version
>>> >> qualifier (_2.10, _2.11 etc) for you, so that does appear to be correct
>>> >> syntax for the build.
>>> >>
>>> >> I seemed to run into this issue with some missing Jackson deps, and
>>> solved
>>> >> it by including the jar explicitly on the driver class path:
>>> >>
>>> >> bin/spark-submit --driver-class-path
>>> >> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar --class
>>> "SimpleApp"
>>> >> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
>>> >>
>>> >> Seems redundant to me since I thought that the JAR as argument is
>>> copied to
>>> >> driver and made available. But this solved it for me so perhaps give
>>> it a
>>> >> try?
>>> >>
>>> >>
>>> >>
>>> >> On Wed, Jun 4, 2014 at 3:01 PM, Sean Owen <so...@cloudera.com> wrote:
>>> >>>
>>> >>> Those aren't the names of the artifacts:
>>> >>>
>>> >>>
>>> >>>
>>> http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-streaming-twitter_2.10%22
>>> >>>
>>> >>> The name is "spark-streaming-twitter_2.10"
>>> >>>
>>> >>> On Wed, Jun 4, 2014 at 1:49 PM, Jeremy Lee
>>> >>> <un...@gmail.com> wrote:
>>> >>> > Man, this has been hard going. Six days, and I finally got a "Hello
>>> >>> > World"
>>> >>> > App working that I wrote myself.
>>> >>> >
>>> >>> > Now I'm trying to make a minimal streaming app based on the twitter
>>> >>> > examples, (running standalone right now while learning) and when
>>> running
>>> >>> > it
>>> >>> > like this:
>>> >>> >
>>> >>> > bin/spark-submit --class "SimpleApp"
>>> >>> > SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
>>> >>> >
>>> >>> > I'm getting this error:
>>> >>> >
>>> >>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>>> >>> > org/apache/spark/streaming/twitter/TwitterUtils$
>>> >>> >
>>> >>> > Which I'm guessing is because I haven't put in a dependency to
>>> >>> > "external/twitter" in the .sbt, but _how_? I can't find any docs on
>>> it.
>>> >>> > Here's my build file so far:
>>> >>> >
>>> >>> > simple.sbt
>>> >>> > ------------------------------------------
>>> >>> > name := "Simple Project"
>>> >>> >
>>> >>> > version := "1.0"
>>> >>> >
>>> >>> > scalaVersion := "2.10.4"
>>> >>> >
>>> >>> > libraryDependencies += "org.apache.spark" %% "spark-core" % "1.0.0"
>>> >>> >
>>> >>> > libraryDependencies += "org.apache.spark" %% "spark-streaming" %
>>> "1.0.0"
>>> >>> >
>>> >>> > libraryDependencies += "org.apache.spark" %%
>>> "spark-streaming-twitter" %
>>> >>> > "1.0.0"
>>> >>> >
>>> >>> > libraryDependencies += "org.twitter4j" % "twitter4j-stream" %
>>> "3.0.3"
>>> >>> >
>>> >>> > resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
>>> >>> > ------------------------------------------
>>> >>> >
>>> >>> > I've tried a few obvious things like adding:
>>> >>> >
>>> >>> > libraryDependencies += "org.apache.spark" %% "spark-external" %
>>> "1.0.0"
>>> >>> >
>>> >>> > libraryDependencies += "org.apache.spark" %%
>>> "spark-external-twitter" %
>>> >>> > "1.0.0"
>>> >>> >
>>> >>> > because, well, that would match the naming scheme implied so far,
>>> but it
>>> >>> > errors.
>>> >>> >
>>> >>> >
>>> >>> > Also, I just realized I don't completely understand if:
>>> >>> > (a) the "spark-submit" command _sends_ the .jar to all the workers,
>>> or
>>> >>> > (b) the "spark-submit" commands sends a _job_ to the workers, which
>>> are
>>> >>> > supposed to already have the jar file installed (or in hdfs), or
>>> >>> > (c) the Context is supposed to list the jars to be distributed. (is
>>> that
>>> >>> > deprecated?)
>>> >>> >
>>> >>> > One part of the documentation says:
>>> >>> >
>>> >>> >  "Once you have an assembled jar you can call the bin/spark-submit
>>> >>> > script as
>>> >>> > shown here while passing your jar."
>>> >>> >
>>> >>> > but another says:
>>> >>> >
>>> >>> > "application-jar: Path to a bundled jar including your application
>>> and
>>> >>> > all
>>> >>> > dependencies. The URL must be globally visible inside of your
>>> cluster,
>>> >>> > for
>>> >>> > instance, an hdfs:// path or a file:// path that is present on all
>>> >>> > nodes."
>>> >>> >
>>> >>> > I suppose both could be correct if you take a certain point of view.
>>> >>> >
>>> >>> > --
>>> >>> > Jeremy Lee  BCompSci(Hons)
>>> >>> >   The Unorthodox Engineers
>>> >>
>>> >>
>>>
>>
>>
>>
>> --
>> Jeremy Lee  BCompSci(Hons)
>>   The Unorthodox Engineers
>>
> -- 
> Jeremy Lee  BCompSci(Hons)
>   The Unorthodox Engineers

Re: Can't seem to link "external/twitter" classes from my own app

Posted by Jeremy Lee <un...@gmail.com>.
Hmm.. That's not working so well for me. First, I needed to add a
"project/plugin.sbt" file with the contents:

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.11.4")

Before 'sbt/sbt assemble' worked at all. And I'm not sure about that
version number, but "0.9.1" isn't working much better and "11.4" is the
latest one recommended by the sbt project site. Where did you get your
version from?

Second, even when I do get it to build a .jar, spark-submit is still
telling me the external.twitter library is missing.

I tried using your github project as-is, but it also complained about the
missing plugin.. I'm trying it with various versions now to see if I can
get that working, even though I don't know anything about kafka. Hmm, and
no. Here's what I get:

[info] Set current project to Simple Project (in build
file:/home/ubuntu/spark-1.0.0/SparkKafka/)
[error] Not a valid command: assemble
[error] Not a valid project ID: assemble
[error] Expected ':' (if selecting a configuration)
[error] Not a valid key: assemble (similar: assembly, assemblyJarName,
assemblyDirectory)
[error] assemble
[error]

I also found this project which seemed to be exactly what I was after:
https://github.com/prabeesh/SparkTwitterAnalysis

...but it was for Spark 0.9, and though I updated all the version
references to "1.0.0", that one doesn't work either. I can't even get it to
build.

*sigh*

Is it going to be easier to just copy the external/ source code into my own
project? Because I will... especially if creating "Uberjars" takes this
long every... single... time...



On Thu, Jun 5, 2014 at 8:52 AM, Jeremy Lee <un...@gmail.com>
wrote:

> Thanks Patrick!
>
> Uberjars. Cool. I'd actually heard of them. And thanks for the link to the
> example! I shall work through that today.
>
> I'm still learning sbt and it's many options... the last new framework I
> learned was node.js, and I think I've been rather spoiled by "npm".
>
> At least it's not maven. Please, oh please don't make me learn maven too.
> (The only people who seem to like it have Software Stockholm Syndrome: "I
> know maven kidnapped me and beat me up, but if you spend long enough with
> it, you eventually start to sympathize and see it's point of view".)
>
>
> On Thu, Jun 5, 2014 at 3:39 AM, Patrick Wendell <pw...@gmail.com>
> wrote:
>
>> Hey Jeremy,
>>
>> The issue is that you are using one of the external libraries and
>> these aren't actually packaged with Spark on the cluster, so you need
>> to create an uber jar that includes them.
>>
>> You can look at the example here (I recently did this for a kafka
>> project and the idea is the same):
>>
>> https://github.com/pwendell/kafka-spark-example
>>
>> You'll want to make an uber jar that includes these packages (run sbt
>> assembly) and then submit that jar to spark-submit. Also, I'd try
>> running it locally first (if you aren't already) just to make the
>> debugging simpler.
>>
>> - Patrick
>>
>>
>> On Wed, Jun 4, 2014 at 6:16 AM, Sean Owen <so...@cloudera.com> wrote:
>> > Ah sorry, this may be the thing I learned for the day. The issue is
>> > that classes from that particular artifact are missing though. Worth
>> > interrogating the resulting .jar file with "jar tf" to see if it made
>> > it in?
>> >
>> > On Wed, Jun 4, 2014 at 2:12 PM, Nick Pentreath <
>> nick.pentreath@gmail.com> wrote:
>> >> @Sean, the %% syntax in SBT should automatically add the Scala major
>> version
>> >> qualifier (_2.10, _2.11 etc) for you, so that does appear to be correct
>> >> syntax for the build.
>> >>
>> >> I seemed to run into this issue with some missing Jackson deps, and
>> solved
>> >> it by including the jar explicitly on the driver class path:
>> >>
>> >> bin/spark-submit --driver-class-path
>> >> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar --class
>> "SimpleApp"
>> >> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
>> >>
>> >> Seems redundant to me since I thought that the JAR as argument is
>> copied to
>> >> driver and made available. But this solved it for me so perhaps give
>> it a
>> >> try?
>> >>
>> >>
>> >>
>> >> On Wed, Jun 4, 2014 at 3:01 PM, Sean Owen <so...@cloudera.com> wrote:
>> >>>
>> >>> Those aren't the names of the artifacts:
>> >>>
>> >>>
>> >>>
>> http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-streaming-twitter_2.10%22
>> >>>
>> >>> The name is "spark-streaming-twitter_2.10"
>> >>>
>> >>> On Wed, Jun 4, 2014 at 1:49 PM, Jeremy Lee
>> >>> <un...@gmail.com> wrote:
>> >>> > Man, this has been hard going. Six days, and I finally got a "Hello
>> >>> > World"
>> >>> > App working that I wrote myself.
>> >>> >
>> >>> > Now I'm trying to make a minimal streaming app based on the twitter
>> >>> > examples, (running standalone right now while learning) and when
>> running
>> >>> > it
>> >>> > like this:
>> >>> >
>> >>> > bin/spark-submit --class "SimpleApp"
>> >>> > SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
>> >>> >
>> >>> > I'm getting this error:
>> >>> >
>> >>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>> >>> > org/apache/spark/streaming/twitter/TwitterUtils$
>> >>> >
>> >>> > Which I'm guessing is because I haven't put in a dependency to
>> >>> > "external/twitter" in the .sbt, but _how_? I can't find any docs on
>> it.
>> >>> > Here's my build file so far:
>> >>> >
>> >>> > simple.sbt
>> >>> > ------------------------------------------
>> >>> > name := "Simple Project"
>> >>> >
>> >>> > version := "1.0"
>> >>> >
>> >>> > scalaVersion := "2.10.4"
>> >>> >
>> >>> > libraryDependencies += "org.apache.spark" %% "spark-core" % "1.0.0"
>> >>> >
>> >>> > libraryDependencies += "org.apache.spark" %% "spark-streaming" %
>> "1.0.0"
>> >>> >
>> >>> > libraryDependencies += "org.apache.spark" %%
>> "spark-streaming-twitter" %
>> >>> > "1.0.0"
>> >>> >
>> >>> > libraryDependencies += "org.twitter4j" % "twitter4j-stream" %
>> "3.0.3"
>> >>> >
>> >>> > resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
>> >>> > ------------------------------------------
>> >>> >
>> >>> > I've tried a few obvious things like adding:
>> >>> >
>> >>> > libraryDependencies += "org.apache.spark" %% "spark-external" %
>> "1.0.0"
>> >>> >
>> >>> > libraryDependencies += "org.apache.spark" %%
>> "spark-external-twitter" %
>> >>> > "1.0.0"
>> >>> >
>> >>> > because, well, that would match the naming scheme implied so far,
>> but it
>> >>> > errors.
>> >>> >
>> >>> >
>> >>> > Also, I just realized I don't completely understand if:
>> >>> > (a) the "spark-submit" command _sends_ the .jar to all the workers,
>> or
>> >>> > (b) the "spark-submit" commands sends a _job_ to the workers, which
>> are
>> >>> > supposed to already have the jar file installed (or in hdfs), or
>> >>> > (c) the Context is supposed to list the jars to be distributed. (is
>> that
>> >>> > deprecated?)
>> >>> >
>> >>> > One part of the documentation says:
>> >>> >
>> >>> >  "Once you have an assembled jar you can call the bin/spark-submit
>> >>> > script as
>> >>> > shown here while passing your jar."
>> >>> >
>> >>> > but another says:
>> >>> >
>> >>> > "application-jar: Path to a bundled jar including your application
>> and
>> >>> > all
>> >>> > dependencies. The URL must be globally visible inside of your
>> cluster,
>> >>> > for
>> >>> > instance, an hdfs:// path or a file:// path that is present on all
>> >>> > nodes."
>> >>> >
>> >>> > I suppose both could be correct if you take a certain point of view.
>> >>> >
>> >>> > --
>> >>> > Jeremy Lee  BCompSci(Hons)
>> >>> >   The Unorthodox Engineers
>> >>
>> >>
>>
>
>
>
> --
> Jeremy Lee  BCompSci(Hons)
>   The Unorthodox Engineers
>



-- 
Jeremy Lee  BCompSci(Hons)
  The Unorthodox Engineers

Re: Can't seem to link "external/twitter" classes from my own app

Posted by Jeremy Lee <un...@gmail.com>.
Thanks Patrick!

Uberjars. Cool. I'd actually heard of them. And thanks for the link to the
example! I shall work through that today.

I'm still learning sbt and it's many options... the last new framework I
learned was node.js, and I think I've been rather spoiled by "npm".

At least it's not maven. Please, oh please don't make me learn maven too.
(The only people who seem to like it have Software Stockholm Syndrome: "I
know maven kidnapped me and beat me up, but if you spend long enough with
it, you eventually start to sympathize and see it's point of view".)


On Thu, Jun 5, 2014 at 3:39 AM, Patrick Wendell <pw...@gmail.com> wrote:

> Hey Jeremy,
>
> The issue is that you are using one of the external libraries and
> these aren't actually packaged with Spark on the cluster, so you need
> to create an uber jar that includes them.
>
> You can look at the example here (I recently did this for a kafka
> project and the idea is the same):
>
> https://github.com/pwendell/kafka-spark-example
>
> You'll want to make an uber jar that includes these packages (run sbt
> assembly) and then submit that jar to spark-submit. Also, I'd try
> running it locally first (if you aren't already) just to make the
> debugging simpler.
>
> - Patrick
>
>
> On Wed, Jun 4, 2014 at 6:16 AM, Sean Owen <so...@cloudera.com> wrote:
> > Ah sorry, this may be the thing I learned for the day. The issue is
> > that classes from that particular artifact are missing though. Worth
> > interrogating the resulting .jar file with "jar tf" to see if it made
> > it in?
> >
> > On Wed, Jun 4, 2014 at 2:12 PM, Nick Pentreath <ni...@gmail.com>
> wrote:
> >> @Sean, the %% syntax in SBT should automatically add the Scala major
> version
> >> qualifier (_2.10, _2.11 etc) for you, so that does appear to be correct
> >> syntax for the build.
> >>
> >> I seemed to run into this issue with some missing Jackson deps, and
> solved
> >> it by including the jar explicitly on the driver class path:
> >>
> >> bin/spark-submit --driver-class-path
> >> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar --class
> "SimpleApp"
> >> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
> >>
> >> Seems redundant to me since I thought that the JAR as argument is
> copied to
> >> driver and made available. But this solved it for me so perhaps give it
> a
> >> try?
> >>
> >>
> >>
> >> On Wed, Jun 4, 2014 at 3:01 PM, Sean Owen <so...@cloudera.com> wrote:
> >>>
> >>> Those aren't the names of the artifacts:
> >>>
> >>>
> >>>
> http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-streaming-twitter_2.10%22
> >>>
> >>> The name is "spark-streaming-twitter_2.10"
> >>>
> >>> On Wed, Jun 4, 2014 at 1:49 PM, Jeremy Lee
> >>> <un...@gmail.com> wrote:
> >>> > Man, this has been hard going. Six days, and I finally got a "Hello
> >>> > World"
> >>> > App working that I wrote myself.
> >>> >
> >>> > Now I'm trying to make a minimal streaming app based on the twitter
> >>> > examples, (running standalone right now while learning) and when
> running
> >>> > it
> >>> > like this:
> >>> >
> >>> > bin/spark-submit --class "SimpleApp"
> >>> > SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
> >>> >
> >>> > I'm getting this error:
> >>> >
> >>> > Exception in thread "main" java.lang.NoClassDefFoundError:
> >>> > org/apache/spark/streaming/twitter/TwitterUtils$
> >>> >
> >>> > Which I'm guessing is because I haven't put in a dependency to
> >>> > "external/twitter" in the .sbt, but _how_? I can't find any docs on
> it.
> >>> > Here's my build file so far:
> >>> >
> >>> > simple.sbt
> >>> > ------------------------------------------
> >>> > name := "Simple Project"
> >>> >
> >>> > version := "1.0"
> >>> >
> >>> > scalaVersion := "2.10.4"
> >>> >
> >>> > libraryDependencies += "org.apache.spark" %% "spark-core" % "1.0.0"
> >>> >
> >>> > libraryDependencies += "org.apache.spark" %% "spark-streaming" %
> "1.0.0"
> >>> >
> >>> > libraryDependencies += "org.apache.spark" %%
> "spark-streaming-twitter" %
> >>> > "1.0.0"
> >>> >
> >>> > libraryDependencies += "org.twitter4j" % "twitter4j-stream" % "3.0.3"
> >>> >
> >>> > resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
> >>> > ------------------------------------------
> >>> >
> >>> > I've tried a few obvious things like adding:
> >>> >
> >>> > libraryDependencies += "org.apache.spark" %% "spark-external" %
> "1.0.0"
> >>> >
> >>> > libraryDependencies += "org.apache.spark" %%
> "spark-external-twitter" %
> >>> > "1.0.0"
> >>> >
> >>> > because, well, that would match the naming scheme implied so far,
> but it
> >>> > errors.
> >>> >
> >>> >
> >>> > Also, I just realized I don't completely understand if:
> >>> > (a) the "spark-submit" command _sends_ the .jar to all the workers,
> or
> >>> > (b) the "spark-submit" commands sends a _job_ to the workers, which
> are
> >>> > supposed to already have the jar file installed (or in hdfs), or
> >>> > (c) the Context is supposed to list the jars to be distributed. (is
> that
> >>> > deprecated?)
> >>> >
> >>> > One part of the documentation says:
> >>> >
> >>> >  "Once you have an assembled jar you can call the bin/spark-submit
> >>> > script as
> >>> > shown here while passing your jar."
> >>> >
> >>> > but another says:
> >>> >
> >>> > "application-jar: Path to a bundled jar including your application
> and
> >>> > all
> >>> > dependencies. The URL must be globally visible inside of your
> cluster,
> >>> > for
> >>> > instance, an hdfs:// path or a file:// path that is present on all
> >>> > nodes."
> >>> >
> >>> > I suppose both could be correct if you take a certain point of view.
> >>> >
> >>> > --
> >>> > Jeremy Lee  BCompSci(Hons)
> >>> >   The Unorthodox Engineers
> >>
> >>
>



-- 
Jeremy Lee  BCompSci(Hons)
  The Unorthodox Engineers

Re: Can't seem to link "external/twitter" classes from my own app

Posted by Patrick Wendell <pw...@gmail.com>.
Hey Jeremy,

The issue is that you are using one of the external libraries and
these aren't actually packaged with Spark on the cluster, so you need
to create an uber jar that includes them.

You can look at the example here (I recently did this for a kafka
project and the idea is the same):

https://github.com/pwendell/kafka-spark-example

You'll want to make an uber jar that includes these packages (run sbt
assembly) and then submit that jar to spark-submit. Also, I'd try
running it locally first (if you aren't already) just to make the
debugging simpler.

- Patrick


On Wed, Jun 4, 2014 at 6:16 AM, Sean Owen <so...@cloudera.com> wrote:
> Ah sorry, this may be the thing I learned for the day. The issue is
> that classes from that particular artifact are missing though. Worth
> interrogating the resulting .jar file with "jar tf" to see if it made
> it in?
>
> On Wed, Jun 4, 2014 at 2:12 PM, Nick Pentreath <ni...@gmail.com> wrote:
>> @Sean, the %% syntax in SBT should automatically add the Scala major version
>> qualifier (_2.10, _2.11 etc) for you, so that does appear to be correct
>> syntax for the build.
>>
>> I seemed to run into this issue with some missing Jackson deps, and solved
>> it by including the jar explicitly on the driver class path:
>>
>> bin/spark-submit --driver-class-path
>> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar --class "SimpleApp"
>> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
>>
>> Seems redundant to me since I thought that the JAR as argument is copied to
>> driver and made available. But this solved it for me so perhaps give it a
>> try?
>>
>>
>>
>> On Wed, Jun 4, 2014 at 3:01 PM, Sean Owen <so...@cloudera.com> wrote:
>>>
>>> Those aren't the names of the artifacts:
>>>
>>>
>>> http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-streaming-twitter_2.10%22
>>>
>>> The name is "spark-streaming-twitter_2.10"
>>>
>>> On Wed, Jun 4, 2014 at 1:49 PM, Jeremy Lee
>>> <un...@gmail.com> wrote:
>>> > Man, this has been hard going. Six days, and I finally got a "Hello
>>> > World"
>>> > App working that I wrote myself.
>>> >
>>> > Now I'm trying to make a minimal streaming app based on the twitter
>>> > examples, (running standalone right now while learning) and when running
>>> > it
>>> > like this:
>>> >
>>> > bin/spark-submit --class "SimpleApp"
>>> > SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
>>> >
>>> > I'm getting this error:
>>> >
>>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>>> > org/apache/spark/streaming/twitter/TwitterUtils$
>>> >
>>> > Which I'm guessing is because I haven't put in a dependency to
>>> > "external/twitter" in the .sbt, but _how_? I can't find any docs on it.
>>> > Here's my build file so far:
>>> >
>>> > simple.sbt
>>> > ------------------------------------------
>>> > name := "Simple Project"
>>> >
>>> > version := "1.0"
>>> >
>>> > scalaVersion := "2.10.4"
>>> >
>>> > libraryDependencies += "org.apache.spark" %% "spark-core" % "1.0.0"
>>> >
>>> > libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.0.0"
>>> >
>>> > libraryDependencies += "org.apache.spark" %% "spark-streaming-twitter" %
>>> > "1.0.0"
>>> >
>>> > libraryDependencies += "org.twitter4j" % "twitter4j-stream" % "3.0.3"
>>> >
>>> > resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
>>> > ------------------------------------------
>>> >
>>> > I've tried a few obvious things like adding:
>>> >
>>> > libraryDependencies += "org.apache.spark" %% "spark-external" % "1.0.0"
>>> >
>>> > libraryDependencies += "org.apache.spark" %% "spark-external-twitter" %
>>> > "1.0.0"
>>> >
>>> > because, well, that would match the naming scheme implied so far, but it
>>> > errors.
>>> >
>>> >
>>> > Also, I just realized I don't completely understand if:
>>> > (a) the "spark-submit" command _sends_ the .jar to all the workers, or
>>> > (b) the "spark-submit" commands sends a _job_ to the workers, which are
>>> > supposed to already have the jar file installed (or in hdfs), or
>>> > (c) the Context is supposed to list the jars to be distributed. (is that
>>> > deprecated?)
>>> >
>>> > One part of the documentation says:
>>> >
>>> >  "Once you have an assembled jar you can call the bin/spark-submit
>>> > script as
>>> > shown here while passing your jar."
>>> >
>>> > but another says:
>>> >
>>> > "application-jar: Path to a bundled jar including your application and
>>> > all
>>> > dependencies. The URL must be globally visible inside of your cluster,
>>> > for
>>> > instance, an hdfs:// path or a file:// path that is present on all
>>> > nodes."
>>> >
>>> > I suppose both could be correct if you take a certain point of view.
>>> >
>>> > --
>>> > Jeremy Lee  BCompSci(Hons)
>>> >   The Unorthodox Engineers
>>
>>

Re: Can't seem to link "external/twitter" classes from my own app

Posted by Sean Owen <so...@cloudera.com>.
Ah sorry, this may be the thing I learned for the day. The issue is
that classes from that particular artifact are missing though. Worth
interrogating the resulting .jar file with "jar tf" to see if it made
it in?

On Wed, Jun 4, 2014 at 2:12 PM, Nick Pentreath <ni...@gmail.com> wrote:
> @Sean, the %% syntax in SBT should automatically add the Scala major version
> qualifier (_2.10, _2.11 etc) for you, so that does appear to be correct
> syntax for the build.
>
> I seemed to run into this issue with some missing Jackson deps, and solved
> it by including the jar explicitly on the driver class path:
>
> bin/spark-submit --driver-class-path
> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar --class "SimpleApp"
> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
>
> Seems redundant to me since I thought that the JAR as argument is copied to
> driver and made available. But this solved it for me so perhaps give it a
> try?
>
>
>
> On Wed, Jun 4, 2014 at 3:01 PM, Sean Owen <so...@cloudera.com> wrote:
>>
>> Those aren't the names of the artifacts:
>>
>>
>> http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-streaming-twitter_2.10%22
>>
>> The name is "spark-streaming-twitter_2.10"
>>
>> On Wed, Jun 4, 2014 at 1:49 PM, Jeremy Lee
>> <un...@gmail.com> wrote:
>> > Man, this has been hard going. Six days, and I finally got a "Hello
>> > World"
>> > App working that I wrote myself.
>> >
>> > Now I'm trying to make a minimal streaming app based on the twitter
>> > examples, (running standalone right now while learning) and when running
>> > it
>> > like this:
>> >
>> > bin/spark-submit --class "SimpleApp"
>> > SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
>> >
>> > I'm getting this error:
>> >
>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>> > org/apache/spark/streaming/twitter/TwitterUtils$
>> >
>> > Which I'm guessing is because I haven't put in a dependency to
>> > "external/twitter" in the .sbt, but _how_? I can't find any docs on it.
>> > Here's my build file so far:
>> >
>> > simple.sbt
>> > ------------------------------------------
>> > name := "Simple Project"
>> >
>> > version := "1.0"
>> >
>> > scalaVersion := "2.10.4"
>> >
>> > libraryDependencies += "org.apache.spark" %% "spark-core" % "1.0.0"
>> >
>> > libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.0.0"
>> >
>> > libraryDependencies += "org.apache.spark" %% "spark-streaming-twitter" %
>> > "1.0.0"
>> >
>> > libraryDependencies += "org.twitter4j" % "twitter4j-stream" % "3.0.3"
>> >
>> > resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
>> > ------------------------------------------
>> >
>> > I've tried a few obvious things like adding:
>> >
>> > libraryDependencies += "org.apache.spark" %% "spark-external" % "1.0.0"
>> >
>> > libraryDependencies += "org.apache.spark" %% "spark-external-twitter" %
>> > "1.0.0"
>> >
>> > because, well, that would match the naming scheme implied so far, but it
>> > errors.
>> >
>> >
>> > Also, I just realized I don't completely understand if:
>> > (a) the "spark-submit" command _sends_ the .jar to all the workers, or
>> > (b) the "spark-submit" commands sends a _job_ to the workers, which are
>> > supposed to already have the jar file installed (or in hdfs), or
>> > (c) the Context is supposed to list the jars to be distributed. (is that
>> > deprecated?)
>> >
>> > One part of the documentation says:
>> >
>> >  "Once you have an assembled jar you can call the bin/spark-submit
>> > script as
>> > shown here while passing your jar."
>> >
>> > but another says:
>> >
>> > "application-jar: Path to a bundled jar including your application and
>> > all
>> > dependencies. The URL must be globally visible inside of your cluster,
>> > for
>> > instance, an hdfs:// path or a file:// path that is present on all
>> > nodes."
>> >
>> > I suppose both could be correct if you take a certain point of view.
>> >
>> > --
>> > Jeremy Lee  BCompSci(Hons)
>> >   The Unorthodox Engineers
>
>

Re: Can't seem to link "external/twitter" classes from my own app

Posted by Nick Pentreath <ni...@gmail.com>.
@Sean, the %% syntax in SBT should automatically add the Scala major
version qualifier (_2.10, _2.11 etc) for you, so that does appear to be
correct syntax for the build.

I seemed to run into this issue with some missing Jackson deps, and solved
it by including the jar explicitly on the driver class path:

bin/spark-submit *-*
*-driver-class-path
SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar* --class
"SimpleApp" SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar

Seems redundant to me since I thought that the JAR as argument is copied to
driver and made available. But this solved it for me so perhaps give it a
try?



On Wed, Jun 4, 2014 at 3:01 PM, Sean Owen <so...@cloudera.com> wrote:

> Those aren't the names of the artifacts:
>
>
> http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-streaming-twitter_2.10%22
>
> The name is "spark-streaming-twitter_2.10"
>
> On Wed, Jun 4, 2014 at 1:49 PM, Jeremy Lee
> <un...@gmail.com> wrote:
> > Man, this has been hard going. Six days, and I finally got a "Hello
> World"
> > App working that I wrote myself.
> >
> > Now I'm trying to make a minimal streaming app based on the twitter
> > examples, (running standalone right now while learning) and when running
> it
> > like this:
> >
> > bin/spark-submit --class "SimpleApp"
> > SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
> >
> > I'm getting this error:
> >
> > Exception in thread "main" java.lang.NoClassDefFoundError:
> > org/apache/spark/streaming/twitter/TwitterUtils$
> >
> > Which I'm guessing is because I haven't put in a dependency to
> > "external/twitter" in the .sbt, but _how_? I can't find any docs on it.
> > Here's my build file so far:
> >
> > simple.sbt
> > ------------------------------------------
> > name := "Simple Project"
> >
> > version := "1.0"
> >
> > scalaVersion := "2.10.4"
> >
> > libraryDependencies += "org.apache.spark" %% "spark-core" % "1.0.0"
> >
> > libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.0.0"
> >
> > libraryDependencies += "org.apache.spark" %% "spark-streaming-twitter" %
> > "1.0.0"
> >
> > libraryDependencies += "org.twitter4j" % "twitter4j-stream" % "3.0.3"
> >
> > resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
> > ------------------------------------------
> >
> > I've tried a few obvious things like adding:
> >
> > libraryDependencies += "org.apache.spark" %% "spark-external" % "1.0.0"
> >
> > libraryDependencies += "org.apache.spark" %% "spark-external-twitter" %
> > "1.0.0"
> >
> > because, well, that would match the naming scheme implied so far, but it
> > errors.
> >
> >
> > Also, I just realized I don't completely understand if:
> > (a) the "spark-submit" command _sends_ the .jar to all the workers, or
> > (b) the "spark-submit" commands sends a _job_ to the workers, which are
> > supposed to already have the jar file installed (or in hdfs), or
> > (c) the Context is supposed to list the jars to be distributed. (is that
> > deprecated?)
> >
> > One part of the documentation says:
> >
> >  "Once you have an assembled jar you can call the bin/spark-submit
> script as
> > shown here while passing your jar."
> >
> > but another says:
> >
> > "application-jar: Path to a bundled jar including your application and
> all
> > dependencies. The URL must be globally visible inside of your cluster,
> for
> > instance, an hdfs:// path or a file:// path that is present on all
> nodes."
> >
> > I suppose both could be correct if you take a certain point of view.
> >
> > --
> > Jeremy Lee  BCompSci(Hons)
> >   The Unorthodox Engineers
>

Re: Can't seem to link "external/twitter" classes from my own app

Posted by Sean Owen <so...@cloudera.com>.
Those aren't the names of the artifacts:

http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-streaming-twitter_2.10%22

The name is "spark-streaming-twitter_2.10"

On Wed, Jun 4, 2014 at 1:49 PM, Jeremy Lee
<un...@gmail.com> wrote:
> Man, this has been hard going. Six days, and I finally got a "Hello World"
> App working that I wrote myself.
>
> Now I'm trying to make a minimal streaming app based on the twitter
> examples, (running standalone right now while learning) and when running it
> like this:
>
> bin/spark-submit --class "SimpleApp"
> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
>
> I'm getting this error:
>
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/spark/streaming/twitter/TwitterUtils$
>
> Which I'm guessing is because I haven't put in a dependency to
> "external/twitter" in the .sbt, but _how_? I can't find any docs on it.
> Here's my build file so far:
>
> simple.sbt
> ------------------------------------------
> name := "Simple Project"
>
> version := "1.0"
>
> scalaVersion := "2.10.4"
>
> libraryDependencies += "org.apache.spark" %% "spark-core" % "1.0.0"
>
> libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.0.0"
>
> libraryDependencies += "org.apache.spark" %% "spark-streaming-twitter" %
> "1.0.0"
>
> libraryDependencies += "org.twitter4j" % "twitter4j-stream" % "3.0.3"
>
> resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
> ------------------------------------------
>
> I've tried a few obvious things like adding:
>
> libraryDependencies += "org.apache.spark" %% "spark-external" % "1.0.0"
>
> libraryDependencies += "org.apache.spark" %% "spark-external-twitter" %
> "1.0.0"
>
> because, well, that would match the naming scheme implied so far, but it
> errors.
>
>
> Also, I just realized I don't completely understand if:
> (a) the "spark-submit" command _sends_ the .jar to all the workers, or
> (b) the "spark-submit" commands sends a _job_ to the workers, which are
> supposed to already have the jar file installed (or in hdfs), or
> (c) the Context is supposed to list the jars to be distributed. (is that
> deprecated?)
>
> One part of the documentation says:
>
>  "Once you have an assembled jar you can call the bin/spark-submit script as
> shown here while passing your jar."
>
> but another says:
>
> "application-jar: Path to a bundled jar including your application and all
> dependencies. The URL must be globally visible inside of your cluster, for
> instance, an hdfs:// path or a file:// path that is present on all nodes."
>
> I suppose both could be correct if you take a certain point of view.
>
> --
> Jeremy Lee  BCompSci(Hons)
>   The Unorthodox Engineers