You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spot.apache.org by "Lujan Moreno, Gustavo" <gu...@intel.com> on 2017/07/21 19:06:23 UTC

Re: Spark EM LDA Optimizer support

I would suggest supporting both for now. In my experiments online is taking more iterations to converge (although I haven’t measured time, online is supposed to be faster). The spark.mllib doesn’t allow to score unseen records with EM, only train. The new spark.ml does allow to train with EM and score unseen documents with EM but Ricardo and I found that it is really using online under the hood. I consider that to be a bug from Spark side. Therefore, what Ricardo is suggesting is a workaround for this bug. 




On 7/21/17, 1:44 PM, "Barona, Ricardo" <ri...@intel.com> wrote:

>Once a saved model is loaded it needs to be converted to LocalLDAModel if it’s a DistributedLDAModel but from what I heard, the importance of what you used for training, EM and Online is in the topics matrix that generates one and the other. I’m not exactly and expert but I’d think they are going to be different, right? The topics matrix of a LocalLDAModel coming from DistributedLDAModel will remain the same and topic distributions will be calculated based on that. 
>
>On 7/21/17, 1:26 PM, "Edwards, Brandon" <br...@intel.com> wrote:
>
>    A question just came up for me. Is there a true use case for utilizing EM that allows one to carry context from previous models into the future? It seems that once you save to a local model in order to utilize it for future data, from then on you only can use the Online optimizer. If this is correct, I vote for getting rid of EM. I don’t see value in supporting a use case that does not carry context into future models.
>    
>    On 7/21/17, 11:08 AM, "Barona, Ricardo" <ri...@intel.com> wrote:
>    
>        During the last 9 days, I've been working on modifying Apache Spot LDA wrapper to enable the possibility of saving models and load existing models and then get topic distributions for the same corpus or for new documents (see https://issues.apache.org/jira/browse/SPOT-196). Until now, Apache Spot ML module has been running in batch mode training and getting topic distributions with the same documents it trained but that needs to change soon as we are looking forward to achieving near real time.
>        
>        Since this year, Apache Spot enabled Online optimizer so users can select whether to run LDA using EM or Online; EM was the first option we implemented and then we decided it was a good idea to offer Online as well.
>        
>        In my intention for keep supporting both, EM and Online optimizer, I modified the code in such way that you can train with either one but only get topic distributions with LocalLDAModel. The reason for that is that only LocalLDAModel supports getting topic distributions for new documents. The problem with that approach is that a very simple unit test we have is failing now and the it is because when I convert DistributedLDAModel to LocalLDAModel, the document concentration parameter remains the same as it was originally provided for EM but it doesn't necessarily work for LocalLDAModel.topicDistributions method.
>        
>        Take a look at https://issues.apache.org/jira/secure/attachment/12878382/everythingOK.png. There you can see the expected result from training and getting topic distributions with EM only or Online only in a two document one word each document data set.
>        
>        Then, here is the problem I explained before about converting DistributedLDAModel to LocalLDAModel: https://issues.apache.org/jira/secure/attachment/12878381/notSoOk.png
>        
>        A possible solution for this is to use the following code to implement a custom function to convert DistributedLDAModel to LocalLDAModel (see https://issues.apache.org/jira/secure/attachment/12878380/possibleSolution.png and the code below):
>        
>        package org.apache.spark.mllib.clustering
>        
>        import org.apache.spark.mllib.linalg.{Matrix, Vector}
>        
>        object SpotLDA {
>          /**
>            * Creates a new LocalLDAModel but it can reset alpha and beta (although we just need alpha).
>            * @param topicsMatrix Distributed LDA Model topicsMatrix
>            * @param alpha New value for alpha i.e. If Model was trained with 1.002 for alpha using EM optimizer, this method
>            *              allows you to reset alpha to something like 0.0009 and get topic distributions with the desired
>            *              document concentration.
>            * @param beta New value for beta
>            * @return LocalLDAModel
>            */
>          def toLocal(topicsMatrix: Matrix, alpha: Vector, beta: Double): LocalLDAModel ={
>        
>            new LocalLDAModel(topicsMatrix, alpha, beta)
>          }
>        }
>        
>        The only disadvantage I see here is that users will need to provide 3 parameters if they are using EM optimizer instead of only 2:
>        
>        -          EM alpha
>        
>        -          EM beta
>        
>        -          Online alpha
>        Or provide only 2 parameters if they prefer to work with Online Optimizer only
>        
>        -          Online alpha
>        
>        -          Online beta
>        
>        Discussing this with Gustavo, he suggested we even set a “default” number for Online alpha so if users only configure EM alpha and EM beta the application will keep working.
>        
>        Being said all that, here is the big question I’d like to ask: should we keep supporting both, EM Optimizer and Online Optimizer and have users to configure the required parameters or do you think is time to let EM go and just keep Online optimizer?
>        
>        My vote is for keep both but let me know if what you think.
>        
>        Thanks,
>        Ricardo Barona
>        
>    
>    
>

Re: Spark EM LDA Optimizer support

Posted by "Edwards, Brandon" <br...@intel.com>.

Yes, I agree that EM makes sense in the batch case. Ok, I’m in agreement to keep both in order to allow the batch use case. That was my sticking point, not being sure that batch should be supported. I am good with that though. And good point Gustavo regarding how the initial ‘training’ should be performed on a large set before scoring is initiated. That will be important to put in the documentation.

On 7/27/17, 1:42 PM, "Barona, Ricardo" <ri...@intel.com> wrote:

    Just to complement Gustavo’s message, yeah, I think the best approach would keep both algorithms and “let” developers use EM or Online when planning to create batch applications and Online ONLY for near real time purposes. 
    
    Thanks!
    
    On 7/27/17, 2:43 PM, "Lujan Moreno, Gustavo" <gu...@intel.com> wrote:
    
        Hi there,
        
        What Ricardo did is not a recommendation from us, it is a workaround for something that we think is a bug or a wrong implementation of EM in the new Spark.ml library. However, I talked to Ricardo and we concluded it is a bad idea to train with EM and score with online (which is what spark-ml allows). We also discussed that perhaps the best solution is to provide the user with two options: score of new and never seen documents with online optimizer, and batch mode with EM and online.
        
        Just a little bit of background: when we train and save a model what we are really saving is the TopicMatrix which is the word distribution by topics. During the “scoring” phase what the online optimizer does is that it computes the TopicDistribution matrix given that the TopicMatrix is fixed. The EM computes both but in batch mode. We initially thought that it didn’t make a difference if we passed the TopicsMatrix from EM or online and maybe that was the logic behind the Spark team. I now discourage this because there might be statistical inconsistencies in the procedure. Although empirical test showed that it didn’t perform too bad. 
        
        This takes me to the other important point, the online optimizer is not a magic algorithm. Although the algorithm can start scoring a training set in stream mode the true parameters will take several document to converge. That is why I recommend training the online optimizer with a lot of data (like in batch mode), save the model with a robust TopicMatrix and only then start scoring new documents. 
        
        Best,
        
        Gustavo
        
        
        
        
        On 7/27/17, 10:59 AM, "Edwards, Brandon" <br...@intel.com> wrote:
        
        >There could be some reason why establishing the model using EM then performing further training with online, having saved the initial model, is for some reason a good thing to do?
        >
        >Gustavo I have a question, how do you see what Ricardo implemented as a solution to the fact that under the hood retraining on Em in spark-ml? We still need to switch from EM to online in order to score on unseen data right? I still don’t see how there is any other way than to only train using EM on the first batch.
        >
        >
        >
        >On 7/27/17, 7:42 AM, "Barona, Ricardo" <ri...@intel.com> wrote:
        >
        >    I was wondering something similar the other day. What made the Apache Spark team offering an option to convert EM resulting model into Local LDA Model (Online)? I’m talking about DistributedLDAModel.toLocal? 
        >    
        >    On 7/27/17, 9:02 AM, "Giacomo Bernardi" <mi...@minux.it> wrote:
        >    
        >        That's a very interesting development!
        >        
        >        However let me do a step back: why do we even need EM? From a user
        >        perspective, what would be the advantage of running anomaly detection on
        >        1-day batches rather than on a continuously online-learning model? I'm
        >        probably missing something because I don't see value for the latter use
        >        case.
        >        
        >        Giacomo
        >        
        >        
        >        
        >        On 21 July 2017 at 20:06, Lujan Moreno, Gustavo <
        >        gustavo.lujan.moreno@intel.com> wrote:
        >        
        >        > I would suggest supporting both for now. In my experiments online is
        >        > taking more iterations to converge (although I haven’t measured time,
        >        > online is supposed to be faster). The spark.mllib doesn’t allow to score
        >        > unseen records with EM, only train. The new spark.ml does allow to train
        >        > with EM and score unseen documents with EM but Ricardo and I found that it
        >        > is really using online under the hood. I consider that to be a bug from
        >        > Spark side. Therefore, what Ricardo is suggesting is a workaround for this
        >        > bug.
        >        >
        >        >
        >        >
        >        >
        >        > On 7/21/17, 1:44 PM, "Barona, Ricardo" <ri...@intel.com> wrote:
        >        >
        >        > >Once a saved model is loaded it needs to be converted to LocalLDAModel if
        >        > it’s a DistributedLDAModel but from what I heard, the importance of what
        >        > you used for training, EM and Online is in the topics matrix that generates
        >        > one and the other. I’m not exactly and expert but I’d think they are going
        >        > to be different, right? The topics matrix of a LocalLDAModel coming from
        >        > DistributedLDAModel will remain the same and topic distributions will be
        >        > calculated based on that.
        >        > >
        >        > >On 7/21/17, 1:26 PM, "Edwards, Brandon" <br...@intel.com>
        >        > wrote:
        >        > >
        >        > >    A question just came up for me. Is there a true use case for
        >        > utilizing EM that allows one to carry context from previous models into the
        >        > future? It seems that once you save to a local model in order to utilize it
        >        > for future data, from then on you only can use the Online optimizer. If
        >        > this is correct, I vote for getting rid of EM. I don’t see value in
        >        > supporting a use case that does not carry context into future models.
        >        > >
        >        > >    On 7/21/17, 11:08 AM, "Barona, Ricardo" <ri...@intel.com>
        >        > wrote:
        >        > >
        >        > >        During the last 9 days, I've been working on modifying Apache
        >        > Spot LDA wrapper to enable the possibility of saving models and load
        >        > existing models and then get topic distributions for the same corpus or for
        >        > new documents (see https://issues.apache.org/jira/browse/SPOT-196). Until
        >        > now, Apache Spot ML module has been running in batch mode training and
        >        > getting topic distributions with the same documents it trained but that
        >        > needs to change soon as we are looking forward to achieving near real time.
        >        > >
        >        > >        Since this year, Apache Spot enabled Online optimizer so users
        >        > can select whether to run LDA using EM or Online; EM was the first option
        >        > we implemented and then we decided it was a good idea to offer Online as
        >        > well.
        >        > >
        >        > >        In my intention for keep supporting both, EM and Online
        >        > optimizer, I modified the code in such way that you can train with either
        >        > one but only get topic distributions with LocalLDAModel. The reason for
        >        > that is that only LocalLDAModel supports getting topic distributions for
        >        > new documents. The problem with that approach is that a very simple unit
        >        > test we have is failing now and the it is because when I convert
        >        > DistributedLDAModel to LocalLDAModel, the document concentration parameter
        >        > remains the same as it was originally provided for EM but it doesn't
        >        > necessarily work for LocalLDAModel.topicDistributions method.
        >        > >
        >        > >        Take a look at https://issues.apache.org/jira/secure/attachment/
        >        > 12878382/everythingOK.png. There you can see the expected result from
        >        > training and getting topic distributions with EM only or Online only in a
        >        > two document one word each document data set.
        >        > >
        >        > >        Then, here is the problem I explained before about converting
        >        > DistributedLDAModel to LocalLDAModel: https://issues.apache.org/
        >        > jira/secure/attachment/12878381/notSoOk.png
        >        > >
        >        > >        A possible solution for this is to use the following code to
        >        > implement a custom function to convert DistributedLDAModel to LocalLDAModel
        >        > (see https://issues.apache.org/jira/secure/attachment/
        >        > 12878380/possibleSolution.png and the code below):
        >        > >
        >        > >        package org.apache.spark.mllib.clustering
        >        > >
        >        > >        import org.apache.spark.mllib.linalg.{Matrix, Vector}
        >        > >
        >        > >        object SpotLDA {
        >        > >          /**
        >        > >            * Creates a new LocalLDAModel but it can reset alpha and beta
        >        > (although we just need alpha).
        >        > >            * @param topicsMatrix Distributed LDA Model topicsMatrix
        >        > >            * @param alpha New value for alpha i.e. If Model was trained
        >        > with 1.002 for alpha using EM optimizer, this method
        >        > >            *              allows you to reset alpha to something like
        >        > 0.0009 and get topic distributions with the desired
        >        > >            *              document concentration.
        >        > >            * @param beta New value for beta
        >        > >            * @return LocalLDAModel
        >        > >            */
        >        > >          def toLocal(topicsMatrix: Matrix, alpha: Vector, beta: Double):
        >        > LocalLDAModel ={
        >        > >
        >        > >            new LocalLDAModel(topicsMatrix, alpha, beta)
        >        > >          }
        >        > >        }
        >        > >
        >        > >        The only disadvantage I see here is that users will need to
        >        > provide 3 parameters if they are using EM optimizer instead of only 2:
        >        > >
        >        > >        -          EM alpha
        >        > >
        >        > >        -          EM beta
        >        > >
        >        > >        -          Online alpha
        >        > >        Or provide only 2 parameters if they prefer to work with Online
        >        > Optimizer only
        >        > >
        >        > >        -          Online alpha
        >        > >
        >        > >        -          Online beta
        >        > >
        >        > >        Discussing this with Gustavo, he suggested we even set a
        >        > “default” number for Online alpha so if users only configure EM alpha and
        >        > EM beta the application will keep working.
        >        > >
        >        > >        Being said all that, here is the big question I’d like to ask:
        >        > should we keep supporting both, EM Optimizer and Online Optimizer and have
        >        > users to configure the required parameters or do you think is time to let
        >        > EM go and just keep Online optimizer?
        >        > >
        >        > >        My vote is for keep both but let me know if what you think.
        >        > >
        >        > >        Thanks,
        >        > >        Ricardo Barona
        >        >
        >        
        >    
        >    
        >

Re: Spark EM LDA Optimizer support

Posted by "Barona, Ricardo" <ri...@intel.com>.

Just to complement Gustavo’s message, yeah, I think the best approach would keep both algorithms and “let” developers use EM or Online when planning to create batch applications and Online ONLY for near real time purposes. 

Thanks!

On 7/27/17, 2:43 PM, "Lujan Moreno, Gustavo" <gu...@intel.com> wrote:

    Hi there,
    
    What Ricardo did is not a recommendation from us, it is a workaround for something that we think is a bug or a wrong implementation of EM in the new Spark.ml library. However, I talked to Ricardo and we concluded it is a bad idea to train with EM and score with online (which is what spark-ml allows). We also discussed that perhaps the best solution is to provide the user with two options: score of new and never seen documents with online optimizer, and batch mode with EM and online.
    
    Just a little bit of background: when we train and save a model what we are really saving is the TopicMatrix which is the word distribution by topics. During the “scoring” phase what the online optimizer does is that it computes the TopicDistribution matrix given that the TopicMatrix is fixed. The EM computes both but in batch mode. We initially thought that it didn’t make a difference if we passed the TopicsMatrix from EM or online and maybe that was the logic behind the Spark team. I now discourage this because there might be statistical inconsistencies in the procedure. Although empirical test showed that it didn’t perform too bad. 
    
    This takes me to the other important point, the online optimizer is not a magic algorithm. Although the algorithm can start scoring a training set in stream mode the true parameters will take several document to converge. That is why I recommend training the online optimizer with a lot of data (like in batch mode), save the model with a robust TopicMatrix and only then start scoring new documents. 
    
    Best,
    
    Gustavo
    
    
    
    
    On 7/27/17, 10:59 AM, "Edwards, Brandon" <br...@intel.com> wrote:
    
    >There could be some reason why establishing the model using EM then performing further training with online, having saved the initial model, is for some reason a good thing to do?
    >
    >Gustavo I have a question, how do you see what Ricardo implemented as a solution to the fact that under the hood retraining on Em in spark-ml? We still need to switch from EM to online in order to score on unseen data right? I still don’t see how there is any other way than to only train using EM on the first batch.
    >
    >
    >
    >On 7/27/17, 7:42 AM, "Barona, Ricardo" <ri...@intel.com> wrote:
    >
    >    I was wondering something similar the other day. What made the Apache Spark team offering an option to convert EM resulting model into Local LDA Model (Online)? I’m talking about DistributedLDAModel.toLocal? 
    >    
    >    On 7/27/17, 9:02 AM, "Giacomo Bernardi" <mi...@minux.it> wrote:
    >    
    >        That's a very interesting development!
    >        
    >        However let me do a step back: why do we even need EM? From a user
    >        perspective, what would be the advantage of running anomaly detection on
    >        1-day batches rather than on a continuously online-learning model? I'm
    >        probably missing something because I don't see value for the latter use
    >        case.
    >        
    >        Giacomo
    >        
    >        
    >        
    >        On 21 July 2017 at 20:06, Lujan Moreno, Gustavo <
    >        gustavo.lujan.moreno@intel.com> wrote:
    >        
    >        > I would suggest supporting both for now. In my experiments online is
    >        > taking more iterations to converge (although I haven’t measured time,
    >        > online is supposed to be faster). The spark.mllib doesn’t allow to score
    >        > unseen records with EM, only train. The new spark.ml does allow to train
    >        > with EM and score unseen documents with EM but Ricardo and I found that it
    >        > is really using online under the hood. I consider that to be a bug from
    >        > Spark side. Therefore, what Ricardo is suggesting is a workaround for this
    >        > bug.
    >        >
    >        >
    >        >
    >        >
    >        > On 7/21/17, 1:44 PM, "Barona, Ricardo" <ri...@intel.com> wrote:
    >        >
    >        > >Once a saved model is loaded it needs to be converted to LocalLDAModel if
    >        > it’s a DistributedLDAModel but from what I heard, the importance of what
    >        > you used for training, EM and Online is in the topics matrix that generates
    >        > one and the other. I’m not exactly and expert but I’d think they are going
    >        > to be different, right? The topics matrix of a LocalLDAModel coming from
    >        > DistributedLDAModel will remain the same and topic distributions will be
    >        > calculated based on that.
    >        > >
    >        > >On 7/21/17, 1:26 PM, "Edwards, Brandon" <br...@intel.com>
    >        > wrote:
    >        > >
    >        > >    A question just came up for me. Is there a true use case for
    >        > utilizing EM that allows one to carry context from previous models into the
    >        > future? It seems that once you save to a local model in order to utilize it
    >        > for future data, from then on you only can use the Online optimizer. If
    >        > this is correct, I vote for getting rid of EM. I don’t see value in
    >        > supporting a use case that does not carry context into future models.
    >        > >
    >        > >    On 7/21/17, 11:08 AM, "Barona, Ricardo" <ri...@intel.com>
    >        > wrote:
    >        > >
    >        > >        During the last 9 days, I've been working on modifying Apache
    >        > Spot LDA wrapper to enable the possibility of saving models and load
    >        > existing models and then get topic distributions for the same corpus or for
    >        > new documents (see https://issues.apache.org/jira/browse/SPOT-196). Until
    >        > now, Apache Spot ML module has been running in batch mode training and
    >        > getting topic distributions with the same documents it trained but that
    >        > needs to change soon as we are looking forward to achieving near real time.
    >        > >
    >        > >        Since this year, Apache Spot enabled Online optimizer so users
    >        > can select whether to run LDA using EM or Online; EM was the first option
    >        > we implemented and then we decided it was a good idea to offer Online as
    >        > well.
    >        > >
    >        > >        In my intention for keep supporting both, EM and Online
    >        > optimizer, I modified the code in such way that you can train with either
    >        > one but only get topic distributions with LocalLDAModel. The reason for
    >        > that is that only LocalLDAModel supports getting topic distributions for
    >        > new documents. The problem with that approach is that a very simple unit
    >        > test we have is failing now and the it is because when I convert
    >        > DistributedLDAModel to LocalLDAModel, the document concentration parameter
    >        > remains the same as it was originally provided for EM but it doesn't
    >        > necessarily work for LocalLDAModel.topicDistributions method.
    >        > >
    >        > >        Take a look at https://issues.apache.org/jira/secure/attachment/
    >        > 12878382/everythingOK.png. There you can see the expected result from
    >        > training and getting topic distributions with EM only or Online only in a
    >        > two document one word each document data set.
    >        > >
    >        > >        Then, here is the problem I explained before about converting
    >        > DistributedLDAModel to LocalLDAModel: https://issues.apache.org/
    >        > jira/secure/attachment/12878381/notSoOk.png
    >        > >
    >        > >        A possible solution for this is to use the following code to
    >        > implement a custom function to convert DistributedLDAModel to LocalLDAModel
    >        > (see https://issues.apache.org/jira/secure/attachment/
    >        > 12878380/possibleSolution.png and the code below):
    >        > >
    >        > >        package org.apache.spark.mllib.clustering
    >        > >
    >        > >        import org.apache.spark.mllib.linalg.{Matrix, Vector}
    >        > >
    >        > >        object SpotLDA {
    >        > >          /**
    >        > >            * Creates a new LocalLDAModel but it can reset alpha and beta
    >        > (although we just need alpha).
    >        > >            * @param topicsMatrix Distributed LDA Model topicsMatrix
    >        > >            * @param alpha New value for alpha i.e. If Model was trained
    >        > with 1.002 for alpha using EM optimizer, this method
    >        > >            *              allows you to reset alpha to something like
    >        > 0.0009 and get topic distributions with the desired
    >        > >            *              document concentration.
    >        > >            * @param beta New value for beta
    >        > >            * @return LocalLDAModel
    >        > >            */
    >        > >          def toLocal(topicsMatrix: Matrix, alpha: Vector, beta: Double):
    >        > LocalLDAModel ={
    >        > >
    >        > >            new LocalLDAModel(topicsMatrix, alpha, beta)
    >        > >          }
    >        > >        }
    >        > >
    >        > >        The only disadvantage I see here is that users will need to
    >        > provide 3 parameters if they are using EM optimizer instead of only 2:
    >        > >
    >        > >        -          EM alpha
    >        > >
    >        > >        -          EM beta
    >        > >
    >        > >        -          Online alpha
    >        > >        Or provide only 2 parameters if they prefer to work with Online
    >        > Optimizer only
    >        > >
    >        > >        -          Online alpha
    >        > >
    >        > >        -          Online beta
    >        > >
    >        > >        Discussing this with Gustavo, he suggested we even set a
    >        > “default” number for Online alpha so if users only configure EM alpha and
    >        > EM beta the application will keep working.
    >        > >
    >        > >        Being said all that, here is the big question I’d like to ask:
    >        > should we keep supporting both, EM Optimizer and Online Optimizer and have
    >        > users to configure the required parameters or do you think is time to let
    >        > EM go and just keep Online optimizer?
    >        > >
    >        > >        My vote is for keep both but let me know if what you think.
    >        > >
    >        > >        Thanks,
    >        > >        Ricardo Barona
    >        >
    >        
    >    
    >    
    >

Re: Spark EM LDA Optimizer support

Posted by "Lujan Moreno, Gustavo" <gu...@intel.com>.

Hi there,

What Ricardo did is not a recommendation from us, it is a workaround for something that we think is a bug or a wrong implementation of EM in the new Spark.ml library. However, I talked to Ricardo and we concluded it is a bad idea to train with EM and score with online (which is what spark-ml allows). We also discussed that perhaps the best solution is to provide the user with two options: score of new and never seen documents with online optimizer, and batch mode with EM and online.

Just a little bit of background: when we train and save a model what we are really saving is the TopicMatrix which is the word distribution by topics. During the “scoring” phase what the online optimizer does is that it computes the TopicDistribution matrix given that the TopicMatrix is fixed. The EM computes both but in batch mode. We initially thought that it didn’t make a difference if we passed the TopicsMatrix from EM or online and maybe that was the logic behind the Spark team. I now discourage this because there might be statistical inconsistencies in the procedure. Although empirical test showed that it didn’t perform too bad. 

This takes me to the other important point, the online optimizer is not a magic algorithm. Although the algorithm can start scoring a training set in stream mode the true parameters will take several document to converge. That is why I recommend training the online optimizer with a lot of data (like in batch mode), save the model with a robust TopicMatrix and only then start scoring new documents. 

Best,

Gustavo




On 7/27/17, 10:59 AM, "Edwards, Brandon" <br...@intel.com> wrote:

>There could be some reason why establishing the model using EM then performing further training with online, having saved the initial model, is for some reason a good thing to do?
>
>Gustavo I have a question, how do you see what Ricardo implemented as a solution to the fact that under the hood retraining on Em in spark-ml? We still need to switch from EM to online in order to score on unseen data right? I still don’t see how there is any other way than to only train using EM on the first batch.
>
>
>
>On 7/27/17, 7:42 AM, "Barona, Ricardo" <ri...@intel.com> wrote:
>
>    I was wondering something similar the other day. What made the Apache Spark team offering an option to convert EM resulting model into Local LDA Model (Online)? I’m talking about DistributedLDAModel.toLocal? 
>    
>    On 7/27/17, 9:02 AM, "Giacomo Bernardi" <mi...@minux.it> wrote:
>    
>        That's a very interesting development!
>        
>        However let me do a step back: why do we even need EM? From a user
>        perspective, what would be the advantage of running anomaly detection on
>        1-day batches rather than on a continuously online-learning model? I'm
>        probably missing something because I don't see value for the latter use
>        case.
>        
>        Giacomo
>        
>        
>        
>        On 21 July 2017 at 20:06, Lujan Moreno, Gustavo <
>        gustavo.lujan.moreno@intel.com> wrote:
>        
>        > I would suggest supporting both for now. In my experiments online is
>        > taking more iterations to converge (although I haven’t measured time,
>        > online is supposed to be faster). The spark.mllib doesn’t allow to score
>        > unseen records with EM, only train. The new spark.ml does allow to train
>        > with EM and score unseen documents with EM but Ricardo and I found that it
>        > is really using online under the hood. I consider that to be a bug from
>        > Spark side. Therefore, what Ricardo is suggesting is a workaround for this
>        > bug.
>        >
>        >
>        >
>        >
>        > On 7/21/17, 1:44 PM, "Barona, Ricardo" <ri...@intel.com> wrote:
>        >
>        > >Once a saved model is loaded it needs to be converted to LocalLDAModel if
>        > it’s a DistributedLDAModel but from what I heard, the importance of what
>        > you used for training, EM and Online is in the topics matrix that generates
>        > one and the other. I’m not exactly and expert but I’d think they are going
>        > to be different, right? The topics matrix of a LocalLDAModel coming from
>        > DistributedLDAModel will remain the same and topic distributions will be
>        > calculated based on that.
>        > >
>        > >On 7/21/17, 1:26 PM, "Edwards, Brandon" <br...@intel.com>
>        > wrote:
>        > >
>        > >    A question just came up for me. Is there a true use case for
>        > utilizing EM that allows one to carry context from previous models into the
>        > future? It seems that once you save to a local model in order to utilize it
>        > for future data, from then on you only can use the Online optimizer. If
>        > this is correct, I vote for getting rid of EM. I don’t see value in
>        > supporting a use case that does not carry context into future models.
>        > >
>        > >    On 7/21/17, 11:08 AM, "Barona, Ricardo" <ri...@intel.com>
>        > wrote:
>        > >
>        > >        During the last 9 days, I've been working on modifying Apache
>        > Spot LDA wrapper to enable the possibility of saving models and load
>        > existing models and then get topic distributions for the same corpus or for
>        > new documents (see https://issues.apache.org/jira/browse/SPOT-196). Until
>        > now, Apache Spot ML module has been running in batch mode training and
>        > getting topic distributions with the same documents it trained but that
>        > needs to change soon as we are looking forward to achieving near real time.
>        > >
>        > >        Since this year, Apache Spot enabled Online optimizer so users
>        > can select whether to run LDA using EM or Online; EM was the first option
>        > we implemented and then we decided it was a good idea to offer Online as
>        > well.
>        > >
>        > >        In my intention for keep supporting both, EM and Online
>        > optimizer, I modified the code in such way that you can train with either
>        > one but only get topic distributions with LocalLDAModel. The reason for
>        > that is that only LocalLDAModel supports getting topic distributions for
>        > new documents. The problem with that approach is that a very simple unit
>        > test we have is failing now and the it is because when I convert
>        > DistributedLDAModel to LocalLDAModel, the document concentration parameter
>        > remains the same as it was originally provided for EM but it doesn't
>        > necessarily work for LocalLDAModel.topicDistributions method.
>        > >
>        > >        Take a look at https://issues.apache.org/jira/secure/attachment/
>        > 12878382/everythingOK.png. There you can see the expected result from
>        > training and getting topic distributions with EM only or Online only in a
>        > two document one word each document data set.
>        > >
>        > >        Then, here is the problem I explained before about converting
>        > DistributedLDAModel to LocalLDAModel: https://issues.apache.org/
>        > jira/secure/attachment/12878381/notSoOk.png
>        > >
>        > >        A possible solution for this is to use the following code to
>        > implement a custom function to convert DistributedLDAModel to LocalLDAModel
>        > (see https://issues.apache.org/jira/secure/attachment/
>        > 12878380/possibleSolution.png and the code below):
>        > >
>        > >        package org.apache.spark.mllib.clustering
>        > >
>        > >        import org.apache.spark.mllib.linalg.{Matrix, Vector}
>        > >
>        > >        object SpotLDA {
>        > >          /**
>        > >            * Creates a new LocalLDAModel but it can reset alpha and beta
>        > (although we just need alpha).
>        > >            * @param topicsMatrix Distributed LDA Model topicsMatrix
>        > >            * @param alpha New value for alpha i.e. If Model was trained
>        > with 1.002 for alpha using EM optimizer, this method
>        > >            *              allows you to reset alpha to something like
>        > 0.0009 and get topic distributions with the desired
>        > >            *              document concentration.
>        > >            * @param beta New value for beta
>        > >            * @return LocalLDAModel
>        > >            */
>        > >          def toLocal(topicsMatrix: Matrix, alpha: Vector, beta: Double):
>        > LocalLDAModel ={
>        > >
>        > >            new LocalLDAModel(topicsMatrix, alpha, beta)
>        > >          }
>        > >        }
>        > >
>        > >        The only disadvantage I see here is that users will need to
>        > provide 3 parameters if they are using EM optimizer instead of only 2:
>        > >
>        > >        -          EM alpha
>        > >
>        > >        -          EM beta
>        > >
>        > >        -          Online alpha
>        > >        Or provide only 2 parameters if they prefer to work with Online
>        > Optimizer only
>        > >
>        > >        -          Online alpha
>        > >
>        > >        -          Online beta
>        > >
>        > >        Discussing this with Gustavo, he suggested we even set a
>        > “default” number for Online alpha so if users only configure EM alpha and
>        > EM beta the application will keep working.
>        > >
>        > >        Being said all that, here is the big question I’d like to ask:
>        > should we keep supporting both, EM Optimizer and Online Optimizer and have
>        > users to configure the required parameters or do you think is time to let
>        > EM go and just keep Online optimizer?
>        > >
>        > >        My vote is for keep both but let me know if what you think.
>        > >
>        > >        Thanks,
>        > >        Ricardo Barona
>        >
>        
>    
>    
>

Re: Spark EM LDA Optimizer support

Posted by "Edwards, Brandon" <br...@intel.com>.

There could be some reason why establishing the model using EM then performing further training with online, having saved the initial model, is for some reason a good thing to do?

Gustavo I have a question, how do you see what Ricardo implemented as a solution to the fact that under the hood retraining on Em in spark-ml? We still need to switch from EM to online in order to score on unseen data right? I still don’t see how there is any other way than to only train using EM on the first batch.



On 7/27/17, 7:42 AM, "Barona, Ricardo" <ri...@intel.com> wrote:

    I was wondering something similar the other day. What made the Apache Spark team offering an option to convert EM resulting model into Local LDA Model (Online)? I’m talking about DistributedLDAModel.toLocal? 
    
    On 7/27/17, 9:02 AM, "Giacomo Bernardi" <mi...@minux.it> wrote:
    
        That's a very interesting development!
        
        However let me do a step back: why do we even need EM? From a user
        perspective, what would be the advantage of running anomaly detection on
        1-day batches rather than on a continuously online-learning model? I'm
        probably missing something because I don't see value for the latter use
        case.
        
        Giacomo
        
        
        
        On 21 July 2017 at 20:06, Lujan Moreno, Gustavo <
        gustavo.lujan.moreno@intel.com> wrote:
        
        > I would suggest supporting both for now. In my experiments online is
        > taking more iterations to converge (although I haven’t measured time,
        > online is supposed to be faster). The spark.mllib doesn’t allow to score
        > unseen records with EM, only train. The new spark.ml does allow to train
        > with EM and score unseen documents with EM but Ricardo and I found that it
        > is really using online under the hood. I consider that to be a bug from
        > Spark side. Therefore, what Ricardo is suggesting is a workaround for this
        > bug.
        >
        >
        >
        >
        > On 7/21/17, 1:44 PM, "Barona, Ricardo" <ri...@intel.com> wrote:
        >
        > >Once a saved model is loaded it needs to be converted to LocalLDAModel if
        > it’s a DistributedLDAModel but from what I heard, the importance of what
        > you used for training, EM and Online is in the topics matrix that generates
        > one and the other. I’m not exactly and expert but I’d think they are going
        > to be different, right? The topics matrix of a LocalLDAModel coming from
        > DistributedLDAModel will remain the same and topic distributions will be
        > calculated based on that.
        > >
        > >On 7/21/17, 1:26 PM, "Edwards, Brandon" <br...@intel.com>
        > wrote:
        > >
        > >    A question just came up for me. Is there a true use case for
        > utilizing EM that allows one to carry context from previous models into the
        > future? It seems that once you save to a local model in order to utilize it
        > for future data, from then on you only can use the Online optimizer. If
        > this is correct, I vote for getting rid of EM. I don’t see value in
        > supporting a use case that does not carry context into future models.
        > >
        > >    On 7/21/17, 11:08 AM, "Barona, Ricardo" <ri...@intel.com>
        > wrote:
        > >
        > >        During the last 9 days, I've been working on modifying Apache
        > Spot LDA wrapper to enable the possibility of saving models and load
        > existing models and then get topic distributions for the same corpus or for
        > new documents (see https://issues.apache.org/jira/browse/SPOT-196). Until
        > now, Apache Spot ML module has been running in batch mode training and
        > getting topic distributions with the same documents it trained but that
        > needs to change soon as we are looking forward to achieving near real time.
        > >
        > >        Since this year, Apache Spot enabled Online optimizer so users
        > can select whether to run LDA using EM or Online; EM was the first option
        > we implemented and then we decided it was a good idea to offer Online as
        > well.
        > >
        > >        In my intention for keep supporting both, EM and Online
        > optimizer, I modified the code in such way that you can train with either
        > one but only get topic distributions with LocalLDAModel. The reason for
        > that is that only LocalLDAModel supports getting topic distributions for
        > new documents. The problem with that approach is that a very simple unit
        > test we have is failing now and the it is because when I convert
        > DistributedLDAModel to LocalLDAModel, the document concentration parameter
        > remains the same as it was originally provided for EM but it doesn't
        > necessarily work for LocalLDAModel.topicDistributions method.
        > >
        > >        Take a look at https://issues.apache.org/jira/secure/attachment/
        > 12878382/everythingOK.png. There you can see the expected result from
        > training and getting topic distributions with EM only or Online only in a
        > two document one word each document data set.
        > >
        > >        Then, here is the problem I explained before about converting
        > DistributedLDAModel to LocalLDAModel: https://issues.apache.org/
        > jira/secure/attachment/12878381/notSoOk.png
        > >
        > >        A possible solution for this is to use the following code to
        > implement a custom function to convert DistributedLDAModel to LocalLDAModel
        > (see https://issues.apache.org/jira/secure/attachment/
        > 12878380/possibleSolution.png and the code below):
        > >
        > >        package org.apache.spark.mllib.clustering
        > >
        > >        import org.apache.spark.mllib.linalg.{Matrix, Vector}
        > >
        > >        object SpotLDA {
        > >          /**
        > >            * Creates a new LocalLDAModel but it can reset alpha and beta
        > (although we just need alpha).
        > >            * @param topicsMatrix Distributed LDA Model topicsMatrix
        > >            * @param alpha New value for alpha i.e. If Model was trained
        > with 1.002 for alpha using EM optimizer, this method
        > >            *              allows you to reset alpha to something like
        > 0.0009 and get topic distributions with the desired
        > >            *              document concentration.
        > >            * @param beta New value for beta
        > >            * @return LocalLDAModel
        > >            */
        > >          def toLocal(topicsMatrix: Matrix, alpha: Vector, beta: Double):
        > LocalLDAModel ={
        > >
        > >            new LocalLDAModel(topicsMatrix, alpha, beta)
        > >          }
        > >        }
        > >
        > >        The only disadvantage I see here is that users will need to
        > provide 3 parameters if they are using EM optimizer instead of only 2:
        > >
        > >        -          EM alpha
        > >
        > >        -          EM beta
        > >
        > >        -          Online alpha
        > >        Or provide only 2 parameters if they prefer to work with Online
        > Optimizer only
        > >
        > >        -          Online alpha
        > >
        > >        -          Online beta
        > >
        > >        Discussing this with Gustavo, he suggested we even set a
        > “default” number for Online alpha so if users only configure EM alpha and
        > EM beta the application will keep working.
        > >
        > >        Being said all that, here is the big question I’d like to ask:
        > should we keep supporting both, EM Optimizer and Online Optimizer and have
        > users to configure the required parameters or do you think is time to let
        > EM go and just keep Online optimizer?
        > >
        > >        My vote is for keep both but let me know if what you think.
        > >
        > >        Thanks,
        > >        Ricardo Barona
        >

Re: Spark EM LDA Optimizer support

Posted by "Barona, Ricardo" <ri...@intel.com>.

I was wondering something similar the other day. What made the Apache Spark team offering an option to convert EM resulting model into Local LDA Model (Online)? I’m talking about DistributedLDAModel.toLocal? 

On 7/27/17, 9:02 AM, "Giacomo Bernardi" <mi...@minux.it> wrote:

    That's a very interesting development!
    
    However let me do a step back: why do we even need EM? From a user
    perspective, what would be the advantage of running anomaly detection on
    1-day batches rather than on a continuously online-learning model? I'm
    probably missing something because I don't see value for the latter use
    case.
    
    Giacomo
    
    
    
    On 21 July 2017 at 20:06, Lujan Moreno, Gustavo <
    gustavo.lujan.moreno@intel.com> wrote:
    
    > I would suggest supporting both for now. In my experiments online is
    > taking more iterations to converge (although I haven’t measured time,
    > online is supposed to be faster). The spark.mllib doesn’t allow to score
    > unseen records with EM, only train. The new spark.ml does allow to train
    > with EM and score unseen documents with EM but Ricardo and I found that it
    > is really using online under the hood. I consider that to be a bug from
    > Spark side. Therefore, what Ricardo is suggesting is a workaround for this
    > bug.
    >
    >
    >
    >
    > On 7/21/17, 1:44 PM, "Barona, Ricardo" <ri...@intel.com> wrote:
    >
    > >Once a saved model is loaded it needs to be converted to LocalLDAModel if
    > it’s a DistributedLDAModel but from what I heard, the importance of what
    > you used for training, EM and Online is in the topics matrix that generates
    > one and the other. I’m not exactly and expert but I’d think they are going
    > to be different, right? The topics matrix of a LocalLDAModel coming from
    > DistributedLDAModel will remain the same and topic distributions will be
    > calculated based on that.
    > >
    > >On 7/21/17, 1:26 PM, "Edwards, Brandon" <br...@intel.com>
    > wrote:
    > >
    > >    A question just came up for me. Is there a true use case for
    > utilizing EM that allows one to carry context from previous models into the
    > future? It seems that once you save to a local model in order to utilize it
    > for future data, from then on you only can use the Online optimizer. If
    > this is correct, I vote for getting rid of EM. I don’t see value in
    > supporting a use case that does not carry context into future models.
    > >
    > >    On 7/21/17, 11:08 AM, "Barona, Ricardo" <ri...@intel.com>
    > wrote:
    > >
    > >        During the last 9 days, I've been working on modifying Apache
    > Spot LDA wrapper to enable the possibility of saving models and load
    > existing models and then get topic distributions for the same corpus or for
    > new documents (see https://issues.apache.org/jira/browse/SPOT-196). Until
    > now, Apache Spot ML module has been running in batch mode training and
    > getting topic distributions with the same documents it trained but that
    > needs to change soon as we are looking forward to achieving near real time.
    > >
    > >        Since this year, Apache Spot enabled Online optimizer so users
    > can select whether to run LDA using EM or Online; EM was the first option
    > we implemented and then we decided it was a good idea to offer Online as
    > well.
    > >
    > >        In my intention for keep supporting both, EM and Online
    > optimizer, I modified the code in such way that you can train with either
    > one but only get topic distributions with LocalLDAModel. The reason for
    > that is that only LocalLDAModel supports getting topic distributions for
    > new documents. The problem with that approach is that a very simple unit
    > test we have is failing now and the it is because when I convert
    > DistributedLDAModel to LocalLDAModel, the document concentration parameter
    > remains the same as it was originally provided for EM but it doesn't
    > necessarily work for LocalLDAModel.topicDistributions method.
    > >
    > >        Take a look at https://issues.apache.org/jira/secure/attachment/
    > 12878382/everythingOK.png. There you can see the expected result from
    > training and getting topic distributions with EM only or Online only in a
    > two document one word each document data set.
    > >
    > >        Then, here is the problem I explained before about converting
    > DistributedLDAModel to LocalLDAModel: https://issues.apache.org/
    > jira/secure/attachment/12878381/notSoOk.png
    > >
    > >        A possible solution for this is to use the following code to
    > implement a custom function to convert DistributedLDAModel to LocalLDAModel
    > (see https://issues.apache.org/jira/secure/attachment/
    > 12878380/possibleSolution.png and the code below):
    > >
    > >        package org.apache.spark.mllib.clustering
    > >
    > >        import org.apache.spark.mllib.linalg.{Matrix, Vector}
    > >
    > >        object SpotLDA {
    > >          /**
    > >            * Creates a new LocalLDAModel but it can reset alpha and beta
    > (although we just need alpha).
    > >            * @param topicsMatrix Distributed LDA Model topicsMatrix
    > >            * @param alpha New value for alpha i.e. If Model was trained
    > with 1.002 for alpha using EM optimizer, this method
    > >            *              allows you to reset alpha to something like
    > 0.0009 and get topic distributions with the desired
    > >            *              document concentration.
    > >            * @param beta New value for beta
    > >            * @return LocalLDAModel
    > >            */
    > >          def toLocal(topicsMatrix: Matrix, alpha: Vector, beta: Double):
    > LocalLDAModel ={
    > >
    > >            new LocalLDAModel(topicsMatrix, alpha, beta)
    > >          }
    > >        }
    > >
    > >        The only disadvantage I see here is that users will need to
    > provide 3 parameters if they are using EM optimizer instead of only 2:
    > >
    > >        -          EM alpha
    > >
    > >        -          EM beta
    > >
    > >        -          Online alpha
    > >        Or provide only 2 parameters if they prefer to work with Online
    > Optimizer only
    > >
    > >        -          Online alpha
    > >
    > >        -          Online beta
    > >
    > >        Discussing this with Gustavo, he suggested we even set a
    > “default” number for Online alpha so if users only configure EM alpha and
    > EM beta the application will keep working.
    > >
    > >        Being said all that, here is the big question I’d like to ask:
    > should we keep supporting both, EM Optimizer and Online Optimizer and have
    > users to configure the required parameters or do you think is time to let
    > EM go and just keep Online optimizer?
    > >
    > >        My vote is for keep both but let me know if what you think.
    > >
    > >        Thanks,
    > >        Ricardo Barona
    >

Re: Spark EM LDA Optimizer support

Posted by Giacomo Bernardi <mi...@minux.it>.

That's a very interesting development!

However let me do a step back: why do we even need EM? From a user
perspective, what would be the advantage of running anomaly detection on
1-day batches rather than on a continuously online-learning model? I'm
probably missing something because I don't see value for the latter use
case.

Giacomo



On 21 July 2017 at 20:06, Lujan Moreno, Gustavo <
gustavo.lujan.moreno@intel.com> wrote:

> I would suggest supporting both for now. In my experiments online is
> taking more iterations to converge (although I haven’t measured time,
> online is supposed to be faster). The spark.mllib doesn’t allow to score
> unseen records with EM, only train. The new spark.ml does allow to train
> with EM and score unseen documents with EM but Ricardo and I found that it
> is really using online under the hood. I consider that to be a bug from
> Spark side. Therefore, what Ricardo is suggesting is a workaround for this
> bug.
>
>
>
>
> On 7/21/17, 1:44 PM, "Barona, Ricardo" <ri...@intel.com> wrote:
>
> >Once a saved model is loaded it needs to be converted to LocalLDAModel if
> it’s a DistributedLDAModel but from what I heard, the importance of what
> you used for training, EM and Online is in the topics matrix that generates
> one and the other. I’m not exactly and expert but I’d think they are going
> to be different, right? The topics matrix of a LocalLDAModel coming from
> DistributedLDAModel will remain the same and topic distributions will be
> calculated based on that.
> >
> >On 7/21/17, 1:26 PM, "Edwards, Brandon" <br...@intel.com>
> wrote:
> >
> >    A question just came up for me. Is there a true use case for
> utilizing EM that allows one to carry context from previous models into the
> future? It seems that once you save to a local model in order to utilize it
> for future data, from then on you only can use the Online optimizer. If
> this is correct, I vote for getting rid of EM. I don’t see value in
> supporting a use case that does not carry context into future models.
> >
> >    On 7/21/17, 11:08 AM, "Barona, Ricardo" <ri...@intel.com>
> wrote:
> >
> >        During the last 9 days, I've been working on modifying Apache
> Spot LDA wrapper to enable the possibility of saving models and load
> existing models and then get topic distributions for the same corpus or for
> new documents (see https://issues.apache.org/jira/browse/SPOT-196). Until
> now, Apache Spot ML module has been running in batch mode training and
> getting topic distributions with the same documents it trained but that
> needs to change soon as we are looking forward to achieving near real time.
> >
> >        Since this year, Apache Spot enabled Online optimizer so users
> can select whether to run LDA using EM or Online; EM was the first option
> we implemented and then we decided it was a good idea to offer Online as
> well.
> >
> >        In my intention for keep supporting both, EM and Online
> optimizer, I modified the code in such way that you can train with either
> one but only get topic distributions with LocalLDAModel. The reason for
> that is that only LocalLDAModel supports getting topic distributions for
> new documents. The problem with that approach is that a very simple unit
> test we have is failing now and the it is because when I convert
> DistributedLDAModel to LocalLDAModel, the document concentration parameter
> remains the same as it was originally provided for EM but it doesn't
> necessarily work for LocalLDAModel.topicDistributions method.
> >
> >        Take a look at https://issues.apache.org/jira/secure/attachment/
> 12878382/everythingOK.png. There you can see the expected result from
> training and getting topic distributions with EM only or Online only in a
> two document one word each document data set.
> >
> >        Then, here is the problem I explained before about converting
> DistributedLDAModel to LocalLDAModel: https://issues.apache.org/
> jira/secure/attachment/12878381/notSoOk.png
> >
> >        A possible solution for this is to use the following code to
> implement a custom function to convert DistributedLDAModel to LocalLDAModel
> (see https://issues.apache.org/jira/secure/attachment/
> 12878380/possibleSolution.png and the code below):
> >
> >        package org.apache.spark.mllib.clustering
> >
> >        import org.apache.spark.mllib.linalg.{Matrix, Vector}
> >
> >        object SpotLDA {
> >          /**
> >            * Creates a new LocalLDAModel but it can reset alpha and beta
> (although we just need alpha).
> >            * @param topicsMatrix Distributed LDA Model topicsMatrix
> >            * @param alpha New value for alpha i.e. If Model was trained
> with 1.002 for alpha using EM optimizer, this method
> >            *              allows you to reset alpha to something like
> 0.0009 and get topic distributions with the desired
> >            *              document concentration.
> >            * @param beta New value for beta
> >            * @return LocalLDAModel
> >            */
> >          def toLocal(topicsMatrix: Matrix, alpha: Vector, beta: Double):
> LocalLDAModel ={
> >
> >            new LocalLDAModel(topicsMatrix, alpha, beta)
> >          }
> >        }
> >
> >        The only disadvantage I see here is that users will need to
> provide 3 parameters if they are using EM optimizer instead of only 2:
> >
> >        -          EM alpha
> >
> >        -          EM beta
> >
> >        -          Online alpha
> >        Or provide only 2 parameters if they prefer to work with Online
> Optimizer only
> >
> >        -          Online alpha
> >
> >        -          Online beta
> >
> >        Discussing this with Gustavo, he suggested we even set a
> “default” number for Online alpha so if users only configure EM alpha and
> EM beta the application will keep working.
> >
> >        Being said all that, here is the big question I’d like to ask:
> should we keep supporting both, EM Optimizer and Online Optimizer and have
> users to configure the required parameters or do you think is time to let
> EM go and just keep Online optimizer?
> >
> >        My vote is for keep both but let me know if what you think.
> >
> >        Thanks,
> >        Ricardo Barona
>