You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by durin <ma...@simon-schaefer.net> on 2014/07/15 00:47:07 UTC

import org.apache.spark.streaming.twitter._ in Shell

I'm using spark > 1.0.0 (three weeks old build of latest). 
Along the lines of  this tutorial
<http://ampcamp.berkeley.edu/big-data-mini-course/realtime-processing-with-spark-streaming.html> 
, I want to read some tweets from twitter.
When trying to execute  in the Spark-Shell, I get

The tutorial builds an app via sbt/sbt. Are there any special requirements
for importing the TwitterUtils in the shell?


Best regards,
Simon




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/import-org-apache-spark-streaming-twitter-in-Shell-tp9665.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: import org.apache.spark.streaming.twitter._ in Shell

Posted by Tathagata Das <ta...@gmail.com>.
I guess this is not clearly documented. At a high level, any class that is
in the package

org.apache.spark.streaming.XXX   where XXX is in { twitter, kafka, flume,
zeromq, mqtt }

is not available in the Spark shell.

I have added this to the larger JIRA of things-to-add-to-streaming-docs
https://issues.apache.org/jira/browse/SPARK-2419

Thanks for bringing this to attention.

TD


On Mon, Jul 14, 2014 at 5:53 PM, durin <ma...@simon-schaefer.net> wrote:

> Thanks. Can I see that a Class is not available in the shell somewhere in
> the
> API Docs or do I have to find out by trial and error?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/import-org-apache-spark-streaming-twitter-in-Shell-tp9665p9678.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: import org.apache.spark.streaming.twitter._ in Shell

Posted by durin <ma...@simon-schaefer.net>.
Thanks. Can I see that a Class is not available in the shell somewhere in the
API Docs or do I have to find out by trial and error?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/import-org-apache-spark-streaming-twitter-in-Shell-tp9665p9678.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: import org.apache.spark.streaming.twitter._ in Shell

Posted by Tathagata Das <ta...@gmail.com>.
Yes, what Nick said is the recommended way. In most usecases, a spark
streaming program in production is not usually run from the shell. Hence,
we chose not to make the external stuff (twitter, kafka, etc.) available to
spark shell to avoid dependency conflicts brought it by them with spark's
dependencies. That said, you could tweak things at your own risk as Praveen
suggested. Specifically for twitter, its quite fine I think, as the
dependencies of the twitter4j library is pretty thin, and do not conflict
with spark's dependencies (At least the current version).

TD


On Tue, Jul 15, 2014 at 1:03 AM, Nick Pentreath <ni...@gmail.com>
wrote:

> You could try the following: create a minimal project using sbt or Maven,
> add spark-streaming-twitter as a dependency, run sbt assembly (or mvn
> package) on that to create a fat jar (with Spark as provided dependency),
> and add that to the shell classpath when starting up.
>
>
> On Tue, Jul 15, 2014 at 9:06 AM, Praveen Seluka <ps...@qubole.com>
> wrote:
>
>> If you want to make Twitter* classes available in your shell, I believe
>> you could do the following
>> 1. Change the parent pom module ordering - Move external/twitter before
>> assembly
>> 2. In assembly/pom.xm, add external/twitter dependency - this will
>> package twitter* into the assembly jar
>>
>> Now when spark-shell is launched, assembly jar is in classpath - hence
>> twitter* too. I think this will work (remember trying this sometime back)
>>
>>
>> On Tue, Jul 15, 2014 at 11:59 AM, Nicholas Chammas <
>> nicholas.chammas@gmail.com> wrote:
>>
>>> Hmm, I'd like to clarify something from your comments, Tathagata.
>>>
>>> Going forward, is Twitter Streaming functionality not supported from the
>>> shell? What should users do if they'd like to process live Tweets from the
>>> shell?
>>>
>>> Nick
>>>
>>>
>>> On Mon, Jul 14, 2014 at 11:50 PM, Nicholas Chammas <
>>> nicholas.chammas@gmail.com> wrote:
>>>
>>>> At some point, you were able to access TwitterUtils from spark shell
>>>>> using Spark 1.0.0+ ?
>>>>
>>>>
>>>> Yep.
>>>>
>>>>
>>>>> If yes, then what change in Spark caused it to not work any more?
>>>>
>>>>
>>>> It still works for me. I was just commenting on your remark that it
>>>> doesn't work through the shell, which I now understand to apply to versions
>>>> of Spark before 1.0.0.
>>>>
>>>>  Nick
>>>>
>>>
>>>
>>
>

Re: import org.apache.spark.streaming.twitter._ in Shell

Posted by Nick Pentreath <ni...@gmail.com>.
You could try the following: create a minimal project using sbt or Maven,
add spark-streaming-twitter as a dependency, run sbt assembly (or mvn
package) on that to create a fat jar (with Spark as provided dependency),
and add that to the shell classpath when starting up.


On Tue, Jul 15, 2014 at 9:06 AM, Praveen Seluka <ps...@qubole.com> wrote:

> If you want to make Twitter* classes available in your shell, I believe
> you could do the following
> 1. Change the parent pom module ordering - Move external/twitter before
> assembly
> 2. In assembly/pom.xm, add external/twitter dependency - this will package
> twitter* into the assembly jar
>
> Now when spark-shell is launched, assembly jar is in classpath - hence
> twitter* too. I think this will work (remember trying this sometime back)
>
>
> On Tue, Jul 15, 2014 at 11:59 AM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> Hmm, I'd like to clarify something from your comments, Tathagata.
>>
>> Going forward, is Twitter Streaming functionality not supported from the
>> shell? What should users do if they'd like to process live Tweets from the
>> shell?
>>
>> Nick
>>
>>
>> On Mon, Jul 14, 2014 at 11:50 PM, Nicholas Chammas <
>> nicholas.chammas@gmail.com> wrote:
>>
>>> At some point, you were able to access TwitterUtils from spark shell
>>>> using Spark 1.0.0+ ?
>>>
>>>
>>> Yep.
>>>
>>>
>>>> If yes, then what change in Spark caused it to not work any more?
>>>
>>>
>>> It still works for me. I was just commenting on your remark that it
>>> doesn't work through the shell, which I now understand to apply to versions
>>> of Spark before 1.0.0.
>>>
>>>  Nick
>>>
>>
>>
>

Re: import org.apache.spark.streaming.twitter._ in Shell

Posted by Praveen Seluka <ps...@qubole.com>.
If you want to make Twitter* classes available in your shell, I believe you
could do the following
1. Change the parent pom module ordering - Move external/twitter before
assembly
2. In assembly/pom.xm, add external/twitter dependency - this will package
twitter* into the assembly jar

Now when spark-shell is launched, assembly jar is in classpath - hence
twitter* too. I think this will work (remember trying this sometime back)


On Tue, Jul 15, 2014 at 11:59 AM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> Hmm, I'd like to clarify something from your comments, Tathagata.
>
> Going forward, is Twitter Streaming functionality not supported from the
> shell? What should users do if they'd like to process live Tweets from the
> shell?
>
> Nick
>
>
> On Mon, Jul 14, 2014 at 11:50 PM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> At some point, you were able to access TwitterUtils from spark shell
>>> using Spark 1.0.0+ ?
>>
>>
>> Yep.
>>
>>
>>> If yes, then what change in Spark caused it to not work any more?
>>
>>
>> It still works for me. I was just commenting on your remark that it
>> doesn't work through the shell, which I now understand to apply to versions
>> of Spark before 1.0.0.
>>
>>  Nick
>>
>
>

Re: import org.apache.spark.streaming.twitter._ in Shell

Posted by Nicholas Chammas <ni...@gmail.com>.
Hmm, I'd like to clarify something from your comments, Tathagata.

Going forward, is Twitter Streaming functionality not supported from the
shell? What should users do if they'd like to process live Tweets from the
shell?

Nick


On Mon, Jul 14, 2014 at 11:50 PM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> At some point, you were able to access TwitterUtils from spark shell using
>> Spark 1.0.0+ ?
>
>
> Yep.
>
>
>> If yes, then what change in Spark caused it to not work any more?
>
>
> It still works for me. I was just commenting on your remark that it
> doesn't work through the shell, which I now understand to apply to versions
> of Spark before 1.0.0.
>
>  Nick
>

Re: import org.apache.spark.streaming.twitter._ in Shell

Posted by Nicholas Chammas <ni...@gmail.com>.
>
> At some point, you were able to access TwitterUtils from spark shell using
> Spark 1.0.0+ ?


Yep.


> If yes, then what change in Spark caused it to not work any more?


It still works for me. I was just commenting on your remark that it doesn't
work through the shell, which I now understand to apply to versions of
Spark before 1.0.0.

Nick

Re: import org.apache.spark.streaming.twitter._ in Shell

Posted by Tathagata Das <ta...@gmail.com>.
Oh right, that could have happened only after Spark 1.0.0. So let me
clarify. At some point, you were able to access TwitterUtils from spark
shell using Spark 1.0.0+ ?  If yes, then what change in Spark caused it to
not work any more?

TD


On Mon, Jul 14, 2014 at 7:52 PM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> If we're talking about the issue you captured in SPARK-2464
> <https://issues.apache.org/jira/browse/SPARK-2464>, then it was a newly
> launched EC2 cluster on 1.0.1.
>
>
> On Mon, Jul 14, 2014 at 10:48 PM, Tathagata Das <
> tathagata.das1565@gmail.com> wrote:
>
>> Did you make any updates in Spark version recently, after which you
>> noticed this problem? Because if you were using Spark 0.8 and below, then
>> twitter would have worked in the Spark shell. In Spark 0.9, we moved those
>> dependencies out of the core spark for those to update more freely without
>> raising dependency-related concerns into the core of spark streaming.
>>
>> TD
>>
>>
>> On Mon, Jul 14, 2014 at 6:29 PM, Nicholas Chammas <
>> nicholas.chammas@gmail.com> wrote:
>>
>>> On Mon, Jul 14, 2014 at 6:52 PM, Tathagata Das <
>>> tathagata.das1565@gmail.com> wrote:
>>>
>>>> The twitter functionality is not available through the shell.
>>>>
>>>
>>> I've been processing Tweets live from the shell, though not for a long
>>> time. That's how I uncovered the problem with the Twitter receiver not
>>> deregistering, btw.
>>>
>>> Did I misunderstand your comment?
>>>
>>> Nick
>>>
>>
>>
>

Re: import org.apache.spark.streaming.twitter._ in Shell

Posted by Nicholas Chammas <ni...@gmail.com>.
If we're talking about the issue you captured in SPARK-2464
<https://issues.apache.org/jira/browse/SPARK-2464>, then it was a newly
launched EC2 cluster on 1.0.1.


On Mon, Jul 14, 2014 at 10:48 PM, Tathagata Das <tathagata.das1565@gmail.com
> wrote:

> Did you make any updates in Spark version recently, after which you
> noticed this problem? Because if you were using Spark 0.8 and below, then
> twitter would have worked in the Spark shell. In Spark 0.9, we moved those
> dependencies out of the core spark for those to update more freely without
> raising dependency-related concerns into the core of spark streaming.
>
> TD
>
>
> On Mon, Jul 14, 2014 at 6:29 PM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> On Mon, Jul 14, 2014 at 6:52 PM, Tathagata Das <
>> tathagata.das1565@gmail.com> wrote:
>>
>>> The twitter functionality is not available through the shell.
>>>
>>
>> I've been processing Tweets live from the shell, though not for a long
>> time. That's how I uncovered the problem with the Twitter receiver not
>> deregistering, btw.
>>
>> Did I misunderstand your comment?
>>
>> Nick
>>
>
>

Re: import org.apache.spark.streaming.twitter._ in Shell

Posted by Tathagata Das <ta...@gmail.com>.
Did you make any updates in Spark version recently, after which you noticed
this problem? Because if you were using Spark 0.8 and below, then twitter
would have worked in the Spark shell. In Spark 0.9, we moved those
dependencies out of the core spark for those to update more freely without
raising dependency-related concerns into the core of spark streaming.

TD


On Mon, Jul 14, 2014 at 6:29 PM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> On Mon, Jul 14, 2014 at 6:52 PM, Tathagata Das <
> tathagata.das1565@gmail.com> wrote:
>
>> The twitter functionality is not available through the shell.
>>
>
> I've been processing Tweets live from the shell, though not for a long
> time. That's how I uncovered the problem with the Twitter receiver not
> deregistering, btw.
>
> Did I misunderstand your comment?
>
> Nick
>

Re: import org.apache.spark.streaming.twitter._ in Shell

Posted by Nicholas Chammas <ni...@gmail.com>.
On Mon, Jul 14, 2014 at 6:52 PM, Tathagata Das <ta...@gmail.com>
wrote:

> The twitter functionality is not available through the shell.
>

I've been processing Tweets live from the shell, though not for a long
time. That's how I uncovered the problem with the Twitter receiver not
deregistering, btw.

Did I misunderstand your comment?

Nick

Re: import org.apache.spark.streaming.twitter._ in Shell

Posted by Tathagata Das <ta...@gmail.com>.
The twitter functionality is not available through the shell.
1) we separated these non-core functionality into separate subprojects so
that their dependencies do not collide/pollute those of of core spark
2) a shell is not really the best way to start a long running stream.

Its best to use twitter through a separate project.

TD


On Mon, Jul 14, 2014 at 3:47 PM, durin <ma...@simon-schaefer.net> wrote:

> I'm using spark > 1.0.0 (three weeks old build of latest).
> Along the lines of  this tutorial
> <
> http://ampcamp.berkeley.edu/big-data-mini-course/realtime-processing-with-spark-streaming.html
> >
> , I want to read some tweets from twitter.
> When trying to execute  in the Spark-Shell, I get
>
> The tutorial builds an app via sbt/sbt. Are there any special requirements
> for importing the TwitterUtils in the shell?
>
>
> Best regards,
> Simon
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/import-org-apache-spark-streaming-twitter-in-Shell-tp9665.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>