You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by somideshmukh <gi...@git.apache.org> on 2016/01/05 13:23:05 UTC

[GitHub] spark pull request: [SPARK-12632][Python][Make Parameter Descripti...

GitHub user somideshmukh opened a pull request:

    https://github.com/apache/spark/pull/10602

    [SPARK-12632][Python][Make Parameter Descriptions Consistent for PySpark MLlib FPM and Recommendation]

    Made changes in FPM file ,Recommendation file doesnot contain param changes

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/somideshmukh/spark Branch12632-2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10602.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10602
    
----
commit 5b53e88794ecb7c9a8a7f8b68aa8a3fb7c3ac7e3
Author: somideshmukh <so...@us.ibm.com>
Date:   2016-01-05T12:18:51Z

    [SPARK-12632][Python][Make Parameter Descriptions Consistent for PySpark MLlib FPM and Recommendation]

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12632][Python][Make Parameter Descripti...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10602#discussion_r48917047
  
    --- Diff: python/pyspark/mllib/fpm.py ---
    @@ -68,11 +68,14 @@ def train(cls, data, minSupport=0.3, numPartitions=-1):
             """
             Computes an FP-Growth model that contains frequent itemsets.
     
    -        :param data: The input data set, each element contains a
    -            transaction.
    -        :param minSupport: The minimal support level (default: `0.3`).
    -        :param numPartitions: The number of partitions used by
    -            parallel FP-growth (default: same as input data).
    +         :param data:
    +           The input data set, each element contains a transaction.
    +         :param minSupport:
    +           The minimal support level
    --- End diff --
    
    Please add a period after the end of the description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12632][Python][Make Parameter Descripti...

Posted by thunterdb <gi...@git.apache.org>.

Github user thunterdb commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10602#discussion_r49140295
  
    --- Diff: python/pyspark/mllib/recommendation.py ---
    @@ -239,6 +239,17 @@ def train(cls, ratings, rank, iterations=5, lambda_=0.01, blocks=-1, nonnegative
             product of two lower-rank matrices of a given rank (number of features). To solve for these
             features, we run a given number of iterations of ALS. This is done using a level of
             parallelism given by `blocks`.
    +		
    +		:param iterations:
    --- End diff --
    
    indentation issues?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12632][Python][Make Parameter Descripti...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10602#discussion_r48917244
  
    --- Diff: python/pyspark/mllib/fpm.py ---
    @@ -130,15 +133,22 @@ def train(cls, data, minSupport=0.1, maxPatternLength=10, maxLocalProjDBSize=320
             """
             Finds the complete set of frequent sequential patterns in the input sequences of itemsets.
     
    -        :param data: The input data set, each element contains a sequnce of itemsets.
    -        :param minSupport: the minimal support level of the sequential pattern, any pattern appears
    -            more than  (minSupport * size-of-the-dataset) times will be output (default: `0.1`)
    -        :param maxPatternLength: the maximal length of the sequential pattern, any pattern appears
    -            less than maxPatternLength will be output. (default: `10`)
    -        :param maxLocalProjDBSize: The maximum number of items (including delimiters used in
    -            the internal storage format) allowed in a projected database before local
    -            processing. If a projected database exceeds this size, another
    -            iteration of distributed prefix growth is run. (default: `32000000`)
    +        :param data:
    +          The input data set, each element contains a sequnce of itemsets.
    +        :param minSupport:
    +          The minimal support level of the sequential pattern, any pattern appears
    +          more than  (minSupport * size-of-the-dataset) times will be output.
    +          default: `0.1`)
    +        :param maxPatternLength:
    +          The maximal length of the sequential pattern, any pattern appears
    +          less than maxPatternLength will be output.
    +          (default: `10`)
    +        :param maxLocalProjDBSize:
    +          The maximum number of items (including delimiters used in
    +          the internal storage format) allowed in a projected database before local
    +          processing. If a projected database exceeds this size, another
    +          iteration of distributed prefix growth is run.
    --- End diff --
    
    Line length can go up to 100 characters, can you please combine these lines and other multi-line descriptions to that limit?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12632][Python][Make Parameter Descripti...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10602#discussion_r49795192
  
    --- Diff: python/pyspark/mllib/fpm.py ---
    @@ -128,17 +130,25 @@ class PrefixSpan(object):
         @since("1.6.0")
         def train(cls, data, minSupport=0.1, maxPatternLength=10, maxLocalProjDBSize=32000000):
             """
    -        Finds the complete set of frequent sequential patterns in the input sequences of itemsets.
    -
    -        :param data: The input data set, each element contains a sequnce of itemsets.
    -        :param minSupport: the minimal support level of the sequential pattern, any pattern appears
    -            more than  (minSupport * size-of-the-dataset) times will be output (default: `0.1`)
    -        :param maxPatternLength: the maximal length of the sequential pattern, any pattern appears
    -            less than maxPatternLength will be output. (default: `10`)
    -        :param maxLocalProjDBSize: The maximum number of items (including delimiters used in
    -            the internal storage format) allowed in a projected database before local
    -            processing. If a projected database exceeds this size, another
    -            iteration of distributed prefix growth is run. (default: `32000000`)
    +        Finds the complete set of frequent sequential patterns in the input
    +		sequences of itemsets.
    --- End diff --
    
    this should have extra indentation, it should line up with the word "Finds"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12632][Python][Make Parameter Descripti...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on the pull request:

    https://github.com/apache/spark/pull/10602#issuecomment-181506578
  
    Hi @somideshmukh , any update for this?  Let me know if you need any assistance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12632][Python][Make Parameter Descripti...

Posted by somideshmukh <gi...@git.apache.org>.

Github user somideshmukh closed the pull request at:

    https://github.com/apache/spark/pull/10602


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12632][Python][Make Parameter Descripti...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10602#discussion_r49795250
  
    --- Diff: python/pyspark/mllib/fpm.py ---
    @@ -128,17 +130,25 @@ class PrefixSpan(object):
         @since("1.6.0")
         def train(cls, data, minSupport=0.1, maxPatternLength=10, maxLocalProjDBSize=32000000):
             """
    -        Finds the complete set of frequent sequential patterns in the input sequences of itemsets.
    -
    -        :param data: The input data set, each element contains a sequnce of itemsets.
    -        :param minSupport: the minimal support level of the sequential pattern, any pattern appears
    -            more than  (minSupport * size-of-the-dataset) times will be output (default: `0.1`)
    -        :param maxPatternLength: the maximal length of the sequential pattern, any pattern appears
    -            less than maxPatternLength will be output. (default: `10`)
    -        :param maxLocalProjDBSize: The maximum number of items (including delimiters used in
    -            the internal storage format) allowed in a projected database before local
    -            processing. If a projected database exceeds this size, another
    -            iteration of distributed prefix growth is run. (default: `32000000`)
    +        Finds the complete set of frequent sequential patterns in the input
    +		sequences of itemsets.
    +        :param data:
    +          The input data set, each element contains a sequnce of itemsets.
    +        :param minSupport:
    +          The minimal support level of the sequential pattern, any pattern
    +		  appears more than (minSupport * size-of-the-dataset) times will be
    --- End diff --
    
    Same as above, no extra indentation when the line wraps around


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12632][Python][Make Parameter Descripti...

Posted by vijaykiran <gi...@git.apache.org>.

Github user vijaykiran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10602#discussion_r48839858
  
    --- Diff: python/pyspark/mllib/fpm.py ---
    @@ -68,11 +68,14 @@ def train(cls, data, minSupport=0.3, numPartitions=-1):
             """
             Computes an FP-Growth model that contains frequent itemsets.
     
    -        :param data: The input data set, each element contains a
    -            transaction.
    -        :param minSupport: The minimal support level (default: `0.3`).
    -        :param numPartitions: The number of partitions used by
    -            parallel FP-growth (default: same as input data).
    +         :param data:
    +           The input data set, each element contains a transaction.
    +         :param minSupport:
    +           The minimal support level
    +           (default: `0.3`)
    +         :param numPartitions:The number of partitions used by parallel FP-growth
    --- End diff --
    
    You missed this one :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12632][Python][Make Parameter Descripti...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on the pull request:

    https://github.com/apache/spark/pull/10602#issuecomment-174011424
  
    Hi @somideshmukh , it looks like your changes still have a couple of indentation problems.  Maybe try changing the indent to not use the tab character.  If using Intellij, you can set this from File -> Settings, then under Editor -> Code Style, for "Default Indent Options" uncheck the box that says "Use tab character" and set "Tab size" to 2, "Indent" to 2" and "Continuation indent" to 4.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12632][Python][Make Parameter Descripti...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on the pull request:

    https://github.com/apache/spark/pull/10602#issuecomment-183471455
  
    Thanks for working on this @somideshmukh , I finished up the remaining work in https://github.com/apache/spark/pull/11186.  Could you please close this PR and the other?
    
    cc @mengxr 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12632][Python][Make Parameter Descripti...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10602#discussion_r49795026
  
    --- Diff: python/pyspark/mllib/fpm.py ---
    @@ -67,12 +67,14 @@ class FPGrowth(object):
         def train(cls, data, minSupport=0.3, numPartitions=-1):
             """
             Computes an FP-Growth model that contains frequent itemsets.
    -
    -        :param data: The input data set, each element contains a
    -            transaction.
    -        :param minSupport: The minimal support level (default: `0.3`).
    -        :param numPartitions: The number of partitions used by
    -            parallel FP-growth (default: same as input data).
    +        :param data:
    +          The input data set, each element contains a transaction.
    +        :param minSupport:
    +          The minimal support level.
    +          (default: 0.3)
    +        :param numPartitions:
    +		  The number of partitions used by parallel FP-growth.
    --- End diff --
    
    Too much indentation here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12632][Python][Make Parameter Descripti...

Posted by thunterdb <gi...@git.apache.org>.

Github user thunterdb commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10602#discussion_r49140273
  
    --- Diff: python/pyspark/mllib/fpm.py ---
    @@ -130,15 +133,21 @@ def train(cls, data, minSupport=0.1, maxPatternLength=10, maxLocalProjDBSize=320
             """
             Finds the complete set of frequent sequential patterns in the input sequences of itemsets.
     
    -        :param data: The input data set, each element contains a sequnce of itemsets.
    -        :param minSupport: the minimal support level of the sequential pattern, any pattern appears
    -            more than  (minSupport * size-of-the-dataset) times will be output (default: `0.1`)
    -        :param maxPatternLength: the maximal length of the sequential pattern, any pattern appears
    -            less than maxPatternLength will be output. (default: `10`)
    -        :param maxLocalProjDBSize: The maximum number of items (including delimiters used in
    -            the internal storage format) allowed in a projected database before local
    -            processing. If a projected database exceeds this size, another
    -            iteration of distributed prefix growth is run. (default: `32000000`)
    +        :param data:
    +          The input data set, each element contains a sequnce of itemsets.
    +        :param minSupport:
    +          The minimal support level of the sequential pattern, any pattern appears more than
    --- End diff --
    
    the lines below have indentation issues


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12632][Python][Make Parameter Descripti...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10602#discussion_r48916985
  
    --- Diff: python/pyspark/mllib/fpm.py ---
    @@ -130,15 +133,22 @@ def train(cls, data, minSupport=0.1, maxPatternLength=10, maxLocalProjDBSize=320
             """
             Finds the complete set of frequent sequential patterns in the input sequences of itemsets.
     
    -        :param data: The input data set, each element contains a sequnce of itemsets.
    -        :param minSupport: the minimal support level of the sequential pattern, any pattern appears
    -            more than  (minSupport * size-of-the-dataset) times will be output (default: `0.1`)
    -        :param maxPatternLength: the maximal length of the sequential pattern, any pattern appears
    -            less than maxPatternLength will be output. (default: `10`)
    -        :param maxLocalProjDBSize: The maximum number of items (including delimiters used in
    -            the internal storage format) allowed in a projected database before local
    -            processing. If a projected database exceeds this size, another
    -            iteration of distributed prefix growth is run. (default: `32000000`)
    +        :param data:
    +          The input data set, each element contains a sequnce of itemsets.
    +        :param minSupport:
    +          The minimal support level of the sequential pattern, any pattern appears
    +          more than  (minSupport * size-of-the-dataset) times will be output.
    +          default: `0.1`)
    --- End diff --
    
    yes, please use the format that @vijaykiran has above (although, no period after the default)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12632][Python][Make Parameter Descripti...

Posted by vijaykiran <gi...@git.apache.org>.

Github user vijaykiran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10602#discussion_r48945561
  
    --- Diff: python/pyspark/mllib/fpm.py ---
    @@ -67,12 +67,15 @@ class FPGrowth(object):
         def train(cls, data, minSupport=0.3, numPartitions=-1):
             """
             Computes an FP-Growth model that contains frequent itemsets.
    +         :param data:
    +           The input data set, each element contains a transaction.
    +         :param minSupport:
    +           The minimal support level.
    +           (default: 0.3)
    +         :param numPartitions:
    +		   The number of partitions used by parallel FP-growth.
    --- End diff --
    
    too much indentation here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12632][Python][Make Parameter Descripti...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on the pull request:

    https://github.com/apache/spark/pull/10602#issuecomment-169185555
  
    Thanks @somideshmukh!  Could you please look the corrections from @vijaykiran and me, extend the descriptions to the 100 character limit, and add the parameter descriptions to recommendation.py?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12632][Python][Make Parameter Descripti...

Posted by somideshmukh <gi...@git.apache.org>.

Github user somideshmukh commented on the pull request:

    https://github.com/apache/spark/pull/10602#issuecomment-171915841
  
    We have made  same changes and committed the code,but when pull request is created ,it i showing indentation problem.I am attaching image  will show changes  we have made.
    
    ![recommendation-changes-comparision](https://cloud.githubusercontent.com/assets/13100125/12350169/60e28896-bb9b-11e5-8e43-65ec221719af.png)
    
    ![fpm-comparision](https://cloud.githubusercontent.com/assets/13100125/12350160/5622ea4a-bb9b-11e5-871b-94bd9275dadd.png)
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12632][Python][Make Parameter Descripti...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10602#discussion_r48917505
  
    --- Diff: python/pyspark/mllib/fpm.py ---
    @@ -68,11 +68,14 @@ def train(cls, data, minSupport=0.3, numPartitions=-1):
             """
             Computes an FP-Growth model that contains frequent itemsets.
     
    -        :param data: The input data set, each element contains a
    -            transaction.
    -        :param minSupport: The minimal support level (default: `0.3`).
    -        :param numPartitions: The number of partitions used by
    -            parallel FP-growth (default: same as input data).
    +         :param data:
    --- End diff --
    
    There are extra spaces in the indentations here.  :param should line up with 'Computes' as before


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12632][Python][Make Parameter Descripti...

Posted by vijaykiran <gi...@git.apache.org>.

Github user vijaykiran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10602#discussion_r48839934
  
    --- Diff: python/pyspark/mllib/fpm.py ---
    @@ -130,15 +133,22 @@ def train(cls, data, minSupport=0.1, maxPatternLength=10, maxLocalProjDBSize=320
             """
             Finds the complete set of frequent sequential patterns in the input sequences of itemsets.
     
    -        :param data: The input data set, each element contains a sequnce of itemsets.
    -        :param minSupport: the minimal support level of the sequential pattern, any pattern appears
    -            more than  (minSupport * size-of-the-dataset) times will be output (default: `0.1`)
    -        :param maxPatternLength: the maximal length of the sequential pattern, any pattern appears
    -            less than maxPatternLength will be output. (default: `10`)
    -        :param maxLocalProjDBSize: The maximum number of items (including delimiters used in
    -            the internal storage format) allowed in a projected database before local
    -            processing. If a projected database exceeds this size, another
    -            iteration of distributed prefix growth is run. (default: `32000000`)
    +        :param data:
    +          The input data set, each element contains a sequnce of itemsets.
    +        :param minSupport:
    +          The minimal support level of the sequential pattern, any pattern appears
    +          more than  (minSupport * size-of-the-dataset) times will be output.
    +          default: `0.1`)
    --- End diff --
    
    I think the format should be (default: `0.1`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12632][Python][Make Parameter Descripti...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10602#issuecomment-168986043
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12632][Python][Make Parameter Descripti...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/10602#issuecomment-170159780
  
    I just added a note to the parent JIRA about a formatting issue affecting all 5 PRs: [https://issues.apache.org/jira/browse/SPARK-11219?focusedCommentId=15090225&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15090225]
    Could you please check it out & ping when I should review again?  Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org