You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ezhulenev <gi...@git.apache.org> on 2014/10/01 17:11:50 UTC

[GitHub] spark pull request: [SPARK-3760] [STREAMING] Add Twitter4j FilterQ...

GitHub user ezhulenev opened a pull request:

    https://github.com/apache/spark/pull/2618

    [SPARK-3760] [STREAMING] Add Twitter4j FilterQuery to spark streaming twitter API

    TwitterUtils.createStream(...) allows users to specify keywords that restrict the tweets that are returned. However FilterQuery from Twitter4j has a bunch of other options including location that was asked in SPARK-2788. Best solution will be add alternative createStream method with FilterQuery as argument instead of keywords.
    
    This PR is a replacement for #1717 and #2098
    
    For new method I can't use "createStream" name, I want to keep public API stable & binary compatible and introducing new method with same name cause scalac compilation issue: multiple overloaded alternatives of createStream define default arguments

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ezhulenev/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2618.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2618
    
----
commit abd3c36990bcbb952ec07cb328a1f7e9c8adc3da
Author: ezhulenev <ev...@gmail.com>
Date:   2014-10-01T15:08:03Z

    [SPARK-3760] Add Twitter4j FilterQuery to spark streaming twitter API

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3760] [STREAMING] Add Twitter4j FilterQ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2618#discussion_r18296675
  
    --- Diff: external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterUtils.scala ---
    @@ -32,6 +32,26 @@ object TwitterUtils {
        *        authorization; this uses the system properties twitter4j.oauth.consumerKey,
        *        twitter4j.oauth.consumerSecret, twitter4j.oauth.accessToken and
        *        twitter4j.oauth.accessTokenSecret
    +   * @param query Twitter4j filter query, or None to return a stream of random sample
    +   *              of all public statuses
    +   * @param storageLevel Storage level to use for storing the received objects
    +   */
    +  def createQueryStream(
    --- End diff --
    
    I believe that you will have problem with default params when you will try to add another overload to `createStream`. Scala allows only one version of `createStream` with default params. So you will have manually add versions of `createStream` just like you have done for `JavaStreamingContext`.
    
    Also please 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3760] [STREAMING] Add Twitter4j FilterQ...

Posted by ezhulenev-at-pellucid <gi...@git.apache.org>.
Github user ezhulenev-at-pellucid commented on the pull request:

    https://github.com/apache/spark/pull/2618#issuecomment-57514150
  
    @tdas I pushed update fixing issues from your comments 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3760] [STREAMING] Add Twitter4j FilterQ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2618#issuecomment-57479894
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3760] [STREAMING] Add Twitter4j FilterQ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2618#discussion_r18296730
  
    --- Diff: external/twitter/src/test/java/org/apache/spark/streaming/twitter/JavaTwitterStreamSuite.java ---
    @@ -42,5 +44,7 @@ public void testTwitterStream() {
         JavaDStream<Status> test5 = TwitterUtils.createStream(ssc, auth, filters);
         JavaDStream<Status> test6 = TwitterUtils.createStream(ssc,
           auth, filters, StorageLevel.MEMORY_AND_DISK_SER_2());
    +    JavaDStream<Status> test7 = TwitterUtils.createStream(ssc, query);
    +    JavaDStream<Status> test8 = TwitterUtils.createStream(ssc, auth, query);
    --- End diff --
    
    I think you added 4 variations of the createStream for JavaStreamingContext. So there should be 4 tests added here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3760] [STREAMING] Add Twitter4j FilterQ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2618#discussion_r18296494
  
    --- Diff: external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterUtils.scala ---
    @@ -32,6 +32,26 @@ object TwitterUtils {
        *        authorization; this uses the system properties twitter4j.oauth.consumerKey,
        *        twitter4j.oauth.consumerSecret, twitter4j.oauth.accessToken and
        *        twitter4j.oauth.accessTokenSecret
    +   * @param query Twitter4j filter query, or None to return a stream of random sample
    +   *              of all public statuses
    +   * @param storageLevel Storage level to use for storing the received objects
    +   */
    +  def createQueryStream(
    --- End diff --
    
    Why create a `createQueryStream` and not `createStream`? Its better to keep things consistent by naming it `createStream`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3760] [STREAMING] Add Twitter4j FilterQ...

Posted by ezhulenev <gi...@git.apache.org>.
Github user ezhulenev closed the pull request at:

    https://github.com/apache/spark/pull/2618


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3760] [STREAMING] Add Twitter4j FilterQ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2618#issuecomment-58572887
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org