You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by thunterdb <gi...@git.apache.org> on 2017/02/17 20:13:56 UTC

[GitHub] spark issue #14299: Ensure broadcasted variables are destroyed even in case ...

Github user thunterdb commented on the issue:

    https://github.com/apache/spark/pull/14299
  
    @AnthonyTruchet thank you for the PR. This is definitely worth fixing for large deployments. Now, as you noticed, this portion of code does not quite abide by the best engineering practices... Instead of adding an extra layer of nesting, would you mind make the following changes?
    
    ```scala
      def fit[S <: Iterable[String]](dataset: RDD[S]): Word2VecModel = {
        ...
        val expTable = sc.broadcast(createExpTable())
        val bcVocab = sc.broadcast(vocab)
        val bcVocabHash = sc.broadcast(vocabHash)
        try { fit0(expTable, ...) } finally {
          ...
        }
       }
       private final def fit0(...) { 
         // Put all the content here.
         // Note that the inner code also includes some broadcasts, you may want to fix these as well if you can
       }
    ```
    
    I personally agree about resource management and scala-arm. We try to keep scala dependencies to a minimum, unfortunately, because they can be very tedious to move from one scala version to another.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org