You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by isaacl <gi...@git.apache.org> on 2016/07/26 19:52:14 UTC

[GitHub] spark pull request #14372: Make offsetRanges variable volatile

GitHub user isaacl opened a pull request:

    https://github.com/apache/spark/pull/14372

    Make offsetRanges variable volatile

    the foreach and transform closures can be executed on different driver threads.  The java example uses an atomic reference, but the scala example doesn't have anything.
    
    
    ## What changes were proposed in this pull request?
    documentation update
    
    ## How was this patch tested?
    ran locally

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/isaacl/spark patch-1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14372.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14372
    
----

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14372: Make offsetRanges variable volatile

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/14372
  
    @isaacl could you close this PR, please? You can open a new one when you find out the correct way. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14372: Make offsetRanges variable volatile

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14372
  
    Yeah, paging @tdas, why was the current structure better? doesn't seem guaranteed to work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14372: Make offsetRanges variable volatile

Posted by isaacl <gi...@git.apache.org>.
Github user isaacl closed the pull request at:

    https://github.com/apache/spark/pull/14372


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14372: Make offsetRanges variable volatile

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/14372
  
    You cant achieve *everything* in foreachRDD. For examples, windowing, updateStateByKey, etc. In that case only the general case (i.e. existing one) would work, not the foreachRDD. So I agree with this patch that the var should be transient.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14372: Make offsetRanges variable volatile

Posted by koeninger <gi...@git.apache.org>.
Github user koeninger commented on the issue:

    https://github.com/apache/spark/pull/14372
  
    I generally try to write my code the way Sean suggested (single foreachRDD) whenever possible.
    
    Sounds like if the transformation isn't guaranteed to run before the foreachRDD later in the chain, this needs to be changed to blocking on a concurrent queue or something.
    
    This kind of points out a longer-term issue that the code snippets in docs don't necessarily even have to compile, much less work correctly.  Would be nice to extract them from tests.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14372: Make offsetRanges variable volatile

Posted by isaacl <gi...@git.apache.org>.
Github user isaacl commented on the issue:

    https://github.com/apache/spark/pull/14372
  
    On spark local at least, the transform and foreachrdd closures do run on different threads. So without volatile there's no explicit happens before relationship (spark might have some implicit locking though). Since there's no happens before the foreachrdd isn't guaranteed to see the write from the transform closure. Of course, as you said, there's also no explicit guarantee that the transform closure will run before the foreach closure. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14372: Make offsetRanges variable volatile

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14372
  
    I think you can accomplish just about anything with `foreachRDD` that you can accomplish with other operations; you just write the thing you want to do to the RDD directly instead of relying on the pass-through DStream wrappers.
    
    At least, the point of this example is to show _some_ reasonable generic operation, and if this different structure is simpler and possibly more correct, IMHO seems like a win.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14372: Make offsetRanges variable volatile

Posted by isaacl <gi...@git.apache.org>.
Github user isaacl commented on the issue:

    https://github.com/apache/spark/pull/14372
  
    @koeninger Looks like you added this documentation in b305e377fb


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14372: Make offsetRanges variable volatile

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14372
  
    Yeah fair point, but if the point is just showing use of offsetRanges, we only need some reasonably generic operation to exhibit it.
    
    Doesn't this rely on the transforms and maps being perfectly interleaved? it may happen to execute that way but is that a safe assumption? 
    
    I don't actually think `volatile` is necessarily the issue here since I don't see that two threads are involved. I was referring to the correctness of assuming a certain order of execution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14372: Make offsetRanges variable volatile

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14372
  
    Hm, backing up a bit, wouldn't it be safer and more straightforward to just ...
    
    ```
    directKafkaStream.foreachRDD { rdd =>
      offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
      ...
      rdd.map(...)
      ...
      for (o <- offsetRanges) {
        ...
      }
    }
    ```
    
    Storing the reference the way this does now doesn't seem guaranteed to work at a higher level than just the issues that `volatile` might resolve.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14372: Make offsetRanges variable volatile

Posted by isaacl <gi...@git.apache.org>.
Github user isaacl commented on the issue:

    https://github.com/apache/spark/pull/14372
  
    It relies on some implicit behavior (i.e. that driver threads share a process), but I think this assumption does currently hold.  I like your example more but it is more restrictive on data transformations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14372: Make offsetRanges variable volatile

Posted by isaacl <gi...@git.apache.org>.
Github user isaacl commented on the issue:

    https://github.com/apache/spark/pull/14372
  
    @srowen see also: https://github.com/apache/spark/pull/6863#issuecomment-113315111  Looks like the earlier PR matched your example at first


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14372: Make offsetRanges variable volatile

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14372
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org