You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by davies <gi...@git.apache.org> on 2015/10/28 19:15:21 UTC

[GitHub] spark pull request: [SPARK-10342] [SPARK-10309] [SPARK-10474] [SPA...

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/9241#issuecomment-151939041
  
    @JoshRosen This PR is already huge, i will stop to add new things (but will fix any problems), please do another round review, thanks!
    
    For thread-safety, here what I'm got:
    
    1) Without calling spill(), the operators should only be used by single thread, no safety problems.
    
    2) spill() could be triggered in two cases, triggered by itself, or by other operators. we can check `trigger ==  this` in `spill()`, so it's still in the same thread, so safety problems.
    
    3) if it's triggered by other operators (right now cache will not trigger spill()), we only spill the data into disk when it's in scanning stage (building is finished), so the in-memory sorter or memory pages are read-only, we only need to synchronize the iterator and change it.
    
    4) During scanning, the iterator will only use one record in one page, we can't free this page, because the downstream is currently using it (used by UnsafeRow or other objects). In BytesToBytesMap, we just skip the current page, and dump all others into disk. In UnsafeExternalSorter, we keep the page that is used by current record (having the same baseObject), free it when loading the next record. In ShuffleExternalSorter, the spill() will not trigger during scanning.
    
    5) In order to avoid deadlock, we didn't call acquireMemory during spill (so we reused the pointer array in InMemorySorter). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org