You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by akopich <gi...@git.apache.org> on 2014/12/02 13:20:15 UTC

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

Github user akopich commented on the pull request:

    https://github.com/apache/spark/pull/1269#issuecomment-65223123
  
    @jkbradley, thank you for you comments!
    
    It seems like we should discuss API for this set of models first. 
    As far as I can understand, you are not about to provide a user with an ability to implement their own regularizers. Are you? 
    
    In this case it's possible to implement a facade class and set all the other classes private. But there's still an issue. 
    
    First, note that `RobustDocumentParameters` inherits from `DocumentParameters` and `RobustGlobalParameters` inherits from `GlobalParameters`. Now imagine the facade class. Let it have a parameter `robust : Boolean`. Now let it have a method like 
    
    `def infer(documents : RDD[Seq[String]]) : (RDD[DocumentParameters], GlobalParameter)`
    
    If `robust == false`, everything is OK, but how am I supposed to return `RobustDocumentParameters` and `RobustGlobalParameters` if `robust == true`? It's definitely possible, but user will have to explicitly cast returned values. 
    
    The only way I see to solve this problem is to define a pair of facades (robust and non-robust) inheriting one another.  Probably, it's just because I'm not a Scala expert. If there's a better solution, please give me a hint. 
    
    By the way. You said, `PLSA` and `RobustPLSA` are very similar and suffer from code duplication. Their codes are similar, though different (e.g. in robust version I have to pass background and noise matrices to almost every function). Now I've done my best to avoid code duplication via inheriting both of these classes from `AbstractPLSA`. If there is a way for further code generalization, please point it.
    
    You've also mentioned  mllib/linalg/. Should I use DenseMatrix instead of Array[Array[Float]]? I don't like this idea because DenseMatrix contains Double values -- it seems to be a waste of memory and network resourses. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org