You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by zero323 <gi...@git.apache.org> on 2017/03/09 00:03:15 UTC

[GitHub] spark pull request #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python AP...

GitHub user zero323 opened a pull request:

    https://github.com/apache/spark/pull/17218

    [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for FPGrowth

    ## What changes were proposed in this pull request?
    
    - Add `HasSupport` and `HasConfidence` to `pyspark.ml.param.shared`.
    - Add new module `pyspark.ml.fpm`.
    - Add `FPGrowth` / `FPGrowthModel` wrappers.
    - Provide tests for new features.
    
    ## How was this patch tested?
    
    Unit tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zero323/spark SPARK-19281

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17218.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17218
    
----
commit aaabd067d814e6b284ee8d1919977da515f24fb1
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-03-06T02:58:11Z

    Inital implementation

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Thanks for the PR!  I'll wait until this isn't "WIP" to review it thoroughly, but I'll make two comments now:
    * The params should not be added to shared.py since they are not shared by any other algorithm.  They can be added later if needed, but I expect them not to be since the documentation for these in particular should be specialized for FPM algorithms.
    * For future reference: Never add stuff directly to shared.py; it should go in the generating file in the same folder.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75168/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by indyragandy <gi...@git.apache.org>.

Github user indyragandy commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Get directly


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r107758576
  
    --- Diff: python/pyspark/ml/fpm.py ---
    @@ -0,0 +1,211 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from pyspark import keyword_only
    +from pyspark.ml.util import *
    +from pyspark.ml.wrapper import JavaEstimator, JavaModel
    +from pyspark.ml.param.shared import *
    +
    +__all__ = ["FPGrowth", "FPGrowthModel"]
    +
    +
    +class HasSupport(Params):
    +    """
    +    Mixin for param support: [0.0, 1.0].
    +    """
    +
    +    minSupport = Param(
    +        Params._dummy(),
    +        "minSupport",
    +        "Minimal support level of the frequent pattern. [0.0, 1.0]. Any pattern that appears more "
    +        "than (minSupport * size-of-the-dataset) times will be output",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinSupport(self, value):
    +        """
    +        Sets the value of :py:attr:`minSupport`.
    +        """
    +        if not 0 <= value <= 1:
    +            ValueError("Support must be in range [0, 1]")
    +        return self._set(minSupport=value)
    +
    +    def getMinSupport(self):
    +        """
    +        Gets the value of minSupport or its default value.
    +        """
    +        return self.getOrDefault(self.minSupport)
    +
    +
    +class HasConfidence(Params):
    +    """
    +    Mixin for param confidence: [0.0, 1.0].
    +    """
    +
    +    minConfidence = Param(
    +        Params._dummy(),
    +        "minConfidence",
    +        "Minimal confidence for generating Association Rule. [0.0, 1.0]",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinConfidence(self, value):
    +        """
    +        Sets the value of :py:attr:`minConfidence`.
    +        """
    +        if not 0 <= value <= 1:
    +            ValueError("Confidence must be in range [0, 1]")
    +        return self._set(minConfidence=value)
    +
    +    def getMinConfidence(self):
    +        """
    +        Gets the value of minConfidence or its default value.
    +        """
    +        return self.getOrDefault(self.minConfidence)
    +
    +
    +class HasItemsCol(Params):
    +    """
    +    Mixin for param itemsCol: items column name.
    +    """
    +
    +    itemsCol = Param(Params._dummy(), "itemsCol",
    +                     "items column name.", typeConverter=TypeConverters.toString)
    +
    +    def __init__(self):
    +        super(HasItemsCol, self).__init__()
    +        self._setDefault(itemsCol='items')
    +
    +    def setItemsCol(self, value):
    +        """
    +        Sets the value of :py:attr:`itemsCol`.
    +        """
    +        return self._set(itemsCol=value)
    +
    +    def getItemsCol(self):
    +        """
    +        Gets the value of itemsCol or its default value.
    +        """
    +        return self.getOrDefault(self.itemsCol)
    +
    +
    +class FPGrowthModel(JavaModel, JavaMLWritable, JavaMLReadable):
    +    """Model fitted by FPGrowth.
    +
    +    .. versionadded:: 2.2.0
    +    """
    +    @property
    +    @since("2.2.0")
    +    def freqItemsets(self):
    +        """DataFrame with two columns:
    +        * `items` - Itemset of the same type as the input column.
    +        * `freq`  - Frequency of the itemset (`LongType`).
    +        """
    +        return self._call_java("freqItemsets")
    +
    +    @property
    +    @since("2.2.0")
    +    def associationRules(self):
    +        """Data with three columns:
    +        * `antecedent`  - Array of the same type as the input column.
    +        * `consequent`  - Single element array of the same type as the input column.
    +        * `confidence`  - Confidence for the rule (`DoubleType`)."""
    +        return self._call_java("associationRules")
    +
    +
    +class FPGrowth(JavaEstimator, HasItemsCol, HasPredictionCol,
    --- End diff --
    
    Mark Experimental


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #75131 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75131/testReport)** for PR 17218 at commit [`f29198e`](https://github.com/apache/spark/commit/f29198eaac3e96c4789993ebbd20e56c7c54e65d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r107782064
  
    --- Diff: python/pyspark/ml/fpm.py ---
    @@ -0,0 +1,211 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from pyspark import keyword_only
    +from pyspark.ml.util import *
    +from pyspark.ml.wrapper import JavaEstimator, JavaModel
    +from pyspark.ml.param.shared import *
    +
    +__all__ = ["FPGrowth", "FPGrowthModel"]
    +
    +
    +class HasSupport(Params):
    +    """
    +    Mixin for param support: [0.0, 1.0].
    +    """
    +
    +    minSupport = Param(
    +        Params._dummy(),
    +        "minSupport",
    +        "Minimal support level of the frequent pattern. [0.0, 1.0]. Any pattern that appears more "
    +        "than (minSupport * size-of-the-dataset) times will be output",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinSupport(self, value):
    +        """
    +        Sets the value of :py:attr:`minSupport`.
    +        """
    +        if not 0 <= value <= 1:
    --- End diff --
    
    To be honest I don't like this approach so I'll try to make the case for keeping this "as-is".
    
    If we depend on Scala checks we fail late by delaying this to the point where `transform` is called. If this happens in the middle of a complex pipeline then it is simply expensive so my opinion is that if we can fail early without significant overhead then we should.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #75131 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75131/testReport)** for PR 17218 at commit [`f29198e`](https://github.com/apache/spark/commit/f29198eaac3e96c4789993ebbd20e56c7c54e65d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #74230 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74230/testReport)** for PR 17218 at commit [`3b10a30`](https://github.com/apache/spark/commit/3b10a30cc03f5c8e1e4dacf045261f1e5d70aa4e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class FPGrowthModel(JavaModel, JavaMLWritable, JavaMLReadable):`
      * `class FPGrowth(JavaEstimator, HasFeaturesCol, HasPredictionCol,`
      * `class HasSupport(Params):`
      * `class HasConfidence(Params):`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #75219 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75219/testReport)** for PR 17218 at commit [`21a3606`](https://github.com/apache/spark/commit/21a36066b5bb7f7a58e123de5ad778257031f363).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r107759052
  
    --- Diff: python/pyspark/ml/fpm.py ---
    @@ -0,0 +1,211 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from pyspark import keyword_only
    +from pyspark.ml.util import *
    +from pyspark.ml.wrapper import JavaEstimator, JavaModel
    +from pyspark.ml.param.shared import *
    +
    +__all__ = ["FPGrowth", "FPGrowthModel"]
    +
    +
    +class HasSupport(Params):
    +    """
    +    Mixin for param support: [0.0, 1.0].
    +    """
    +
    +    minSupport = Param(
    +        Params._dummy(),
    +        "minSupport",
    +        "Minimal support level of the frequent pattern. [0.0, 1.0]. Any pattern that appears more "
    +        "than (minSupport * size-of-the-dataset) times will be output",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinSupport(self, value):
    +        """
    +        Sets the value of :py:attr:`minSupport`.
    +        """
    +        if not 0 <= value <= 1:
    +            ValueError("Support must be in range [0, 1]")
    +        return self._set(minSupport=value)
    +
    +    def getMinSupport(self):
    +        """
    +        Gets the value of minSupport or its default value.
    +        """
    +        return self.getOrDefault(self.minSupport)
    +
    +
    +class HasConfidence(Params):
    +    """
    +    Mixin for param confidence: [0.0, 1.0].
    +    """
    +
    +    minConfidence = Param(
    +        Params._dummy(),
    +        "minConfidence",
    +        "Minimal confidence for generating Association Rule. [0.0, 1.0]",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinConfidence(self, value):
    +        """
    +        Sets the value of :py:attr:`minConfidence`.
    +        """
    +        if not 0 <= value <= 1:
    +            ValueError("Confidence must be in range [0, 1]")
    +        return self._set(minConfidence=value)
    +
    +    def getMinConfidence(self):
    +        """
    +        Gets the value of minConfidence or its default value.
    +        """
    +        return self.getOrDefault(self.minConfidence)
    +
    +
    +class HasItemsCol(Params):
    +    """
    +    Mixin for param itemsCol: items column name.
    +    """
    +
    +    itemsCol = Param(Params._dummy(), "itemsCol",
    +                     "items column name.", typeConverter=TypeConverters.toString)
    +
    +    def __init__(self):
    +        super(HasItemsCol, self).__init__()
    +        self._setDefault(itemsCol='items')
    +
    +    def setItemsCol(self, value):
    +        """
    +        Sets the value of :py:attr:`itemsCol`.
    +        """
    +        return self._set(itemsCol=value)
    +
    +    def getItemsCol(self):
    +        """
    +        Gets the value of itemsCol or its default value.
    +        """
    +        return self.getOrDefault(self.itemsCol)
    +
    +
    +class FPGrowthModel(JavaModel, JavaMLWritable, JavaMLReadable):
    --- End diff --
    
    Also, it'd be good to be able to set minConfidence, itemsCol and predictionCol (for associationRules and transform)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #75114 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75114/testReport)** for PR 17218 at commit [`26697bf`](https://github.com/apache/spark/commit/26697bf3235343131762c0b0afbfe4f87065d140).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r107777141
  
    --- Diff: python/pyspark/ml/fpm.py ---
    @@ -0,0 +1,211 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from pyspark import keyword_only
    +from pyspark.ml.util import *
    +from pyspark.ml.wrapper import JavaEstimator, JavaModel
    +from pyspark.ml.param.shared import *
    +
    +__all__ = ["FPGrowth", "FPGrowthModel"]
    +
    +
    +class HasSupport(Params):
    +    """
    +    Mixin for param support: [0.0, 1.0].
    +    """
    +
    +    minSupport = Param(
    +        Params._dummy(),
    +        "minSupport",
    +        "Minimal support level of the frequent pattern. [0.0, 1.0]. Any pattern that appears more "
    +        "than (minSupport * size-of-the-dataset) times will be output",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinSupport(self, value):
    +        """
    +        Sets the value of :py:attr:`minSupport`.
    +        """
    +        if not 0 <= value <= 1:
    +            ValueError("Support must be in range [0, 1]")
    +        return self._set(minSupport=value)
    +
    +    def getMinSupport(self):
    +        """
    +        Gets the value of minSupport or its default value.
    +        """
    +        return self.getOrDefault(self.minSupport)
    +
    +
    +class HasConfidence(Params):
    +    """
    +    Mixin for param confidence: [0.0, 1.0].
    +    """
    +
    +    minConfidence = Param(
    +        Params._dummy(),
    +        "minConfidence",
    +        "Minimal confidence for generating Association Rule. [0.0, 1.0]",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinConfidence(self, value):
    +        """
    +        Sets the value of :py:attr:`minConfidence`.
    +        """
    +        if not 0 <= value <= 1:
    +            ValueError("Confidence must be in range [0, 1]")
    +        return self._set(minConfidence=value)
    +
    +    def getMinConfidence(self):
    +        """
    +        Gets the value of minConfidence or its default value.
    +        """
    +        return self.getOrDefault(self.minConfidence)
    +
    +
    +class HasItemsCol(Params):
    +    """
    +    Mixin for param itemsCol: items column name.
    +    """
    +
    +    itemsCol = Param(Params._dummy(), "itemsCol",
    +                     "items column name.", typeConverter=TypeConverters.toString)
    +
    +    def __init__(self):
    +        super(HasItemsCol, self).__init__()
    +        self._setDefault(itemsCol='items')
    +
    +    def setItemsCol(self, value):
    +        """
    +        Sets the value of :py:attr:`itemsCol`.
    +        """
    +        return self._set(itemsCol=value)
    +
    +    def getItemsCol(self):
    +        """
    +        Gets the value of itemsCol or its default value.
    +        """
    +        return self.getOrDefault(self.itemsCol)
    +
    +
    +class FPGrowthModel(JavaModel, JavaMLWritable, JavaMLReadable):
    +    """Model fitted by FPGrowth.
    +
    +    .. versionadded:: 2.2.0
    +    """
    +    @property
    +    @since("2.2.0")
    +    def freqItemsets(self):
    +        """DataFrame with two columns:
    --- End diff --
    
    Done.
    
    Side note: Should we add it to https://spark.apache.org/contributing.html (PEP8 recommends only the closing quote to be placed in a separate line).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #75237 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75237/testReport)** for PR 17218 at commit [`66b85e5`](https://github.com/apache/spark/commit/66b85e5fc9c6a57978df0494c4a7174070534636).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    @indyragandy What do you mean by "get directly"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #74356 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74356/testReport)** for PR 17218 at commit [`4fe6257`](https://github.com/apache/spark/commit/4fe6257a014824f037cc8de0e99fe217439ce7b5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74630/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r108054236
  
    --- Diff: python/pyspark/ml/fpm.py ---
    @@ -25,23 +25,21 @@
     
     class HasSupport(Params):
         """
    -    Mixin for param support: [0.0, 1.0].
    +    Mixin for param support.
         """
     
         minSupport = Param(
             Params._dummy(),
             "minSupport",
    -        "Minimal support level of the frequent pattern. [0.0, 1.0]. Any pattern that appears more "
    -        "than (minSupport * size-of-the-dataset) times will be output",
    +        """Minimal support level of the frequent pattern. [0.0, 1.0].
    +        Any pattern that appears more than (minSupport * size-of-the-dataset)
    +        times will be output""",
             typeConverter=TypeConverters.toFloat)
     
         def setMinSupport(self, value):
             """
             Sets the value of :py:attr:`minSupport`.
             """
    -        if not (0 <= value <= 1):
    -            raise ValueError("Support must be in range [0, 1]")
    -        return self._set(minSupport=value)
    --- End diff --
    
    This removed too much!  This line should remain


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #75128 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75128/testReport)** for PR 17218 at commit [`d8c2a69`](https://github.com/apache/spark/commit/d8c2a69b4b4bc961dcae05d8f44519574aabf70e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #75121 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75121/testReport)** for PR 17218 at commit [`f5bb151`](https://github.com/apache/spark/commit/f5bb151f4d0fa153a4af0d5e122ccfa8aa479b24).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #75113 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75113/testReport)** for PR 17218 at commit [`d8f291f`](https://github.com/apache/spark/commit/d8f291f0255a60caf8698a22babe42b03190362b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r107757886
  
    --- Diff: python/pyspark/ml/fpm.py ---
    @@ -0,0 +1,211 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from pyspark import keyword_only
    +from pyspark.ml.util import *
    +from pyspark.ml.wrapper import JavaEstimator, JavaModel
    +from pyspark.ml.param.shared import *
    +
    +__all__ = ["FPGrowth", "FPGrowthModel"]
    +
    +
    +class HasSupport(Params):
    +    """
    +    Mixin for param support: [0.0, 1.0].
    +    """
    +
    +    minSupport = Param(
    +        Params._dummy(),
    +        "minSupport",
    +        "Minimal support level of the frequent pattern. [0.0, 1.0]. Any pattern that appears more "
    +        "than (minSupport * size-of-the-dataset) times will be output",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinSupport(self, value):
    +        """
    +        Sets the value of :py:attr:`minSupport`.
    +        """
    +        if not 0 <= value <= 1:
    +            ValueError("Support must be in range [0, 1]")
    +        return self._set(minSupport=value)
    +
    +    def getMinSupport(self):
    +        """
    +        Gets the value of minSupport or its default value.
    +        """
    +        return self.getOrDefault(self.minSupport)
    +
    +
    +class HasConfidence(Params):
    +    """
    +    Mixin for param confidence: [0.0, 1.0].
    +    """
    +
    +    minConfidence = Param(
    +        Params._dummy(),
    +        "minConfidence",
    +        "Minimal confidence for generating Association Rule. [0.0, 1.0]",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinConfidence(self, value):
    +        """
    +        Sets the value of :py:attr:`minConfidence`.
    +        """
    +        if not 0 <= value <= 1:
    +            ValueError("Confidence must be in range [0, 1]")
    +        return self._set(minConfidence=value)
    +
    +    def getMinConfidence(self):
    +        """
    +        Gets the value of minConfidence or its default value.
    +        """
    +        return self.getOrDefault(self.minConfidence)
    +
    +
    +class HasItemsCol(Params):
    +    """
    +    Mixin for param itemsCol: items column name.
    +    """
    +
    +    itemsCol = Param(Params._dummy(), "itemsCol",
    +                     "items column name.", typeConverter=TypeConverters.toString)
    --- End diff --
    
    remove period "." from end of doc string here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75121/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #75114 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75114/testReport)** for PR 17218 at commit [`26697bf`](https://github.com/apache/spark/commit/26697bf3235343131762c0b0afbfe4f87065d140).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r107757391
  
    --- Diff: python/pyspark/ml/fpm.py ---
    @@ -0,0 +1,211 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from pyspark import keyword_only
    +from pyspark.ml.util import *
    +from pyspark.ml.wrapper import JavaEstimator, JavaModel
    +from pyspark.ml.param.shared import *
    +
    +__all__ = ["FPGrowth", "FPGrowthModel"]
    +
    +
    +class HasSupport(Params):
    +    """
    +    Mixin for param support: [0.0, 1.0].
    +    """
    +
    +    minSupport = Param(
    +        Params._dummy(),
    +        "minSupport",
    +        "Minimal support level of the frequent pattern. [0.0, 1.0]. Any pattern that appears more "
    +        "than (minSupport * size-of-the-dataset) times will be output",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinSupport(self, value):
    +        """
    +        Sets the value of :py:attr:`minSupport`.
    +        """
    +        if not 0 <= value <= 1:
    --- End diff --
    
    This check happens on the Scala side; let's not replicate it here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #75119 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75119/testReport)** for PR 17218 at commit [`c478b24`](https://github.com/apache/spark/commit/c478b2407fad9ae1e24b52c4cd29e8ea755c104f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r107803159
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1243,6 +1245,43 @@ def test_tweedie_distribution(self):
             self.assertTrue(np.isclose(model2.intercept, 0.6667, atol=1E-4))
     
     
    +class FPGrowthTests(SparkSessionTestCase):
    +    def setUp(self):
    +        super(FPGrowthTests, self).setUp()
    +        self.data = self.spark.createDataFrame(
    +            [([1, 2], ), ([1, 2], ), ([1, 2, 3], ), ([1, 3], )],
    +            ["items"])
    +
    +    def test_association_rules(self):
    +        fp = FPGrowth()
    +        fpm = fp.fit(self.data)
    +
    +        expected_association_rules = self.spark.createDataFrame(
    +            [([3], [1], 1.0), ([2], [1], 1.0)],
    +            ["antecedent", "consequent", "confidence"]
    +        )
    +        actual_association_rules = fpm.associationRules
    +
    --- End diff --
    
    I don't think so. I reported this before on the developers list (http://apache-spark-developers-list.1001551.n3.nabble.com/ML-PYTHON-Collecting-data-in-a-class-extending-SparkSessionTestCase-causes-AttributeError-td21120.html) with a minimal example.  
    
    There is something ugly going on here but it doesn't seem to be related to `FPGrowth` at all.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    LGTM
    Merging with master
    Thanks a lot!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #75241 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75241/testReport)** for PR 17218 at commit [`66b85e5`](https://github.com/apache/spark/commit/66b85e5fc9c6a57978df0494c4a7174070534636).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #75136 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75136/testReport)** for PR 17218 at commit [`dd67055`](https://github.com/apache/spark/commit/dd67055e3ac22451c8a5aeb3a0b0c9c007f30b67).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r108048659
  
    --- Diff: python/pyspark/ml/fpm.py ---
    @@ -0,0 +1,232 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from pyspark import keyword_only, since
    +from pyspark.ml.util import *
    +from pyspark.ml.wrapper import JavaEstimator, JavaModel
    +from pyspark.ml.param.shared import *
    +
    +__all__ = ["FPGrowth", "FPGrowthModel"]
    +
    +
    +class HasSupport(Params):
    +    """
    +    Mixin for param support: [0.0, 1.0].
    +    """
    +
    +    minSupport = Param(
    +        Params._dummy(),
    +        "minSupport",
    +        "Minimal support level of the frequent pattern. [0.0, 1.0]. Any pattern that appears more "
    +        "than (minSupport * size-of-the-dataset) times will be output",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinSupport(self, value):
    +        """
    +        Sets the value of :py:attr:`minSupport`.
    +        """
    +        if not (0 <= value <= 1):
    +            raise ValueError("Support must be in range [0, 1]")
    +        return self._set(minSupport=value)
    +
    +    def getMinSupport(self):
    +        """
    +        Gets the value of minSupport or its default value.
    +        """
    +        return self.getOrDefault(self.minSupport)
    +
    +
    +class HasConfidence(Params):
    +    """
    +    Mixin for param confidence: [0.0, 1.0].
    +    """
    +
    +    minConfidence = Param(
    +        Params._dummy(),
    +        "minConfidence",
    +        """"Minimal confidence for generating Association Rule. [0.0, 1.0]
    +        Note that minConfidence has no effect during fitting.""",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinConfidence(self, value):
    +        """
    +        Sets the value of :py:attr:`minConfidence`.
    +        """
    +        if not (0 <= value <= 1):
    +            raise ValueError("Confidence must be in range [0, 1]")
    +        return self._set(minConfidence=value)
    +
    +    def getMinConfidence(self):
    +        """
    +        Gets the value of minConfidence or its default value.
    +        """
    +        return self.getOrDefault(self.minConfidence)
    +
    +
    +class HasItemsCol(Params):
    +    """
    +    Mixin for param itemsCol: items column name.
    +    """
    +
    +    itemsCol = Param(Params._dummy(), "itemsCol",
    +                     "items column name", typeConverter=TypeConverters.toString)
    +
    +    def setItemsCol(self, value):
    +        """
    +        Sets the value of :py:attr:`itemsCol`.
    +        """
    +        return self._set(itemsCol=value)
    +
    +    def getItemsCol(self):
    +        """
    +        Gets the value of itemsCol or its default value.
    +        """
    +        return self.getOrDefault(self.itemsCol)
    +
    +
    +class FPGrowthModel(JavaModel, JavaMLWritable, JavaMLReadable,
    +                    HasConfidence, HasItemsCol, HasPredictionCol):
    +    """Model fitted by FPGrowth.
    +
    +    .. note:: Experimental
    --- End diff --
    
    Put first in doc string  (See examples elsewhere)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Jenkins retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74895/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    @jkbradley Thanks for the comment. I thought about `PrefixSpan` in the future so I wanted to avoid embedding this in `FPGrowth`. I'll put it in the `fpm`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r107776232
  
    --- Diff: python/pyspark/ml/fpm.py ---
    @@ -0,0 +1,211 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from pyspark import keyword_only
    +from pyspark.ml.util import *
    +from pyspark.ml.wrapper import JavaEstimator, JavaModel
    +from pyspark.ml.param.shared import *
    +
    +__all__ = ["FPGrowth", "FPGrowthModel"]
    +
    +
    +class HasSupport(Params):
    +    """
    +    Mixin for param support: [0.0, 1.0].
    +    """
    +
    +    minSupport = Param(
    +        Params._dummy(),
    +        "minSupport",
    +        "Minimal support level of the frequent pattern. [0.0, 1.0]. Any pattern that appears more "
    +        "than (minSupport * size-of-the-dataset) times will be output",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinSupport(self, value):
    +        """
    +        Sets the value of :py:attr:`minSupport`.
    +        """
    +        if not 0 <= value <= 1:
    +            ValueError("Support must be in range [0, 1]")
    +        return self._set(minSupport=value)
    +
    +    def getMinSupport(self):
    +        """
    +        Gets the value of minSupport or its default value.
    +        """
    +        return self.getOrDefault(self.minSupport)
    +
    +
    +class HasConfidence(Params):
    +    """
    +    Mixin for param confidence: [0.0, 1.0].
    +    """
    +
    +    minConfidence = Param(
    +        Params._dummy(),
    +        "minConfidence",
    +        "Minimal confidence for generating Association Rule. [0.0, 1.0]",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinConfidence(self, value):
    +        """
    +        Sets the value of :py:attr:`minConfidence`.
    +        """
    +        if not 0 <= value <= 1:
    +            ValueError("Confidence must be in range [0, 1]")
    +        return self._set(minConfidence=value)
    +
    +    def getMinConfidence(self):
    +        """
    +        Gets the value of minConfidence or its default value.
    +        """
    +        return self.getOrDefault(self.minConfidence)
    +
    +
    +class HasItemsCol(Params):
    +    """
    +    Mixin for param itemsCol: items column name.
    +    """
    +
    +    itemsCol = Param(Params._dummy(), "itemsCol",
    +                     "items column name.", typeConverter=TypeConverters.toString)
    +
    +    def __init__(self):
    +        super(HasItemsCol, self).__init__()
    +        self._setDefault(itemsCol='items')
    +
    +    def setItemsCol(self, value):
    +        """
    +        Sets the value of :py:attr:`itemsCol`.
    +        """
    +        return self._set(itemsCol=value)
    +
    +    def getItemsCol(self):
    +        """
    +        Gets the value of itemsCol or its default value.
    +        """
    +        return self.getOrDefault(self.itemsCol)
    +
    +
    +class FPGrowthModel(JavaModel, JavaMLWritable, JavaMLReadable):
    +    """Model fitted by FPGrowth.
    +
    +    .. versionadded:: 2.2.0
    +    """
    +    @property
    +    @since("2.2.0")
    +    def freqItemsets(self):
    +        """DataFrame with two columns:
    +        * `items` - Itemset of the same type as the input column.
    +        * `freq`  - Frequency of the itemset (`LongType`).
    +        """
    +        return self._call_java("freqItemsets")
    +
    +    @property
    +    @since("2.2.0")
    +    def associationRules(self):
    +        """Data with three columns:
    +        * `antecedent`  - Array of the same type as the input column.
    +        * `consequent`  - Single element array of the same type as the input column.
    +        * `confidence`  - Confidence for the rule (`DoubleType`)."""
    +        return self._call_java("associationRules")
    +
    +
    +class FPGrowth(JavaEstimator, HasItemsCol, HasPredictionCol,
    --- End diff --
    
    Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r107803580
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1243,6 +1245,43 @@ def test_tweedie_distribution(self):
             self.assertTrue(np.isclose(model2.intercept, 0.6667, atol=1E-4))
     
     
    +class FPGrowthTests(SparkSessionTestCase):
    +    def setUp(self):
    +        super(FPGrowthTests, self).setUp()
    +        self.data = self.spark.createDataFrame(
    +            [([1, 2], ), ([1, 2], ), ([1, 2, 3], ), ([1, 3], )],
    +            ["items"])
    +
    +    def test_association_rules(self):
    +        fp = FPGrowth()
    +        fpm = fp.fit(self.data)
    +
    +        expected_association_rules = self.spark.createDataFrame(
    +            [([3], [1], 1.0), ([2], [1], 1.0)],
    +            ["antecedent", "consequent", "confidence"]
    +        )
    +        actual_association_rules = fpm.associationRules
    +
    --- End diff --
    
    Also on a clean build I get a lot of exceptions 
    
    ```
      File "/home/user/Workspace/spark/python/pyspark/ml/wrapper.py", line 105, in __del__
        SparkContext._active_spark_context._gateway.detach(self._java_obj)
      File "/home/user/Workspace/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1870, in detach
    AttributeError: 'NoneType' object has no attribute '_detach'
    Exception ignored in: <bound method JavaParams.__del__ of JavaTransformer_40a1ae03e4cc7b21f140>
    ```
    when running `python/pyspark/ml/tests.py`.  I am not sure if it is relate or not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #74898 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74898/testReport)** for PR 17218 at commit [`5f4673e`](https://github.com/apache/spark/commit/5f4673e74049d9f6918f4e029215ae6c8364043e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #74630 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74630/testReport)** for PR 17218 at commit [`9074312`](https://github.com/apache/spark/commit/90743124dd4e71d1e2f5f00e19a522d1f92dff63).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #75237 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75237/testReport)** for PR 17218 at commit [`66b85e5`](https://github.com/apache/spark/commit/66b85e5fc9c6a57978df0494c4a7174070534636).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r107770880
  
    --- Diff: python/pyspark/ml/fpm.py ---
    @@ -0,0 +1,211 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from pyspark import keyword_only
    +from pyspark.ml.util import *
    +from pyspark.ml.wrapper import JavaEstimator, JavaModel
    +from pyspark.ml.param.shared import *
    +
    +__all__ = ["FPGrowth", "FPGrowthModel"]
    +
    +
    +class HasSupport(Params):
    +    """
    +    Mixin for param support: [0.0, 1.0].
    +    """
    +
    +    minSupport = Param(
    +        Params._dummy(),
    +        "minSupport",
    +        "Minimal support level of the frequent pattern. [0.0, 1.0]. Any pattern that appears more "
    +        "than (minSupport * size-of-the-dataset) times will be output",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinSupport(self, value):
    +        """
    +        Sets the value of :py:attr:`minSupport`.
    +        """
    +        if not 0 <= value <= 1:
    +            ValueError("Support must be in range [0, 1]")
    +        return self._set(minSupport=value)
    +
    +    def getMinSupport(self):
    +        """
    +        Gets the value of minSupport or its default value.
    +        """
    +        return self.getOrDefault(self.minSupport)
    +
    +
    +class HasConfidence(Params):
    +    """
    +    Mixin for param confidence: [0.0, 1.0].
    +    """
    +
    +    minConfidence = Param(
    +        Params._dummy(),
    +        "minConfidence",
    +        "Minimal confidence for generating Association Rule. [0.0, 1.0]",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinConfidence(self, value):
    +        """
    +        Sets the value of :py:attr:`minConfidence`.
    +        """
    +        if not 0 <= value <= 1:
    +            ValueError("Confidence must be in range [0, 1]")
    +        return self._set(minConfidence=value)
    +
    +    def getMinConfidence(self):
    +        """
    +        Gets the value of minConfidence or its default value.
    +        """
    +        return self.getOrDefault(self.minConfidence)
    +
    +
    +class HasItemsCol(Params):
    +    """
    +    Mixin for param itemsCol: items column name.
    +    """
    +
    +    itemsCol = Param(Params._dummy(), "itemsCol",
    +                     "items column name.", typeConverter=TypeConverters.toString)
    +
    +    def __init__(self):
    +        super(HasItemsCol, self).__init__()
    +        self._setDefault(itemsCol='items')
    +
    +    def setItemsCol(self, value):
    +        """
    +        Sets the value of :py:attr:`itemsCol`.
    +        """
    +        return self._set(itemsCol=value)
    +
    +    def getItemsCol(self):
    +        """
    +        Gets the value of itemsCol or its default value.
    +        """
    +        return self.getOrDefault(self.itemsCol)
    +
    +
    +class FPGrowthModel(JavaModel, JavaMLWritable, JavaMLReadable):
    +    """Model fitted by FPGrowth.
    +
    +    .. versionadded:: 2.2.0
    +    """
    +    @property
    +    @since("2.2.0")
    +    def freqItemsets(self):
    +        """DataFrame with two columns:
    +        * `items` - Itemset of the same type as the input column.
    +        * `freq`  - Frequency of the itemset (`LongType`).
    +        """
    +        return self._call_java("freqItemsets")
    +
    +    @property
    +    @since("2.2.0")
    +    def associationRules(self):
    +        """Data with three columns:
    +        * `antecedent`  - Array of the same type as the input column.
    +        * `consequent`  - Single element array of the same type as the input column.
    +        * `confidence`  - Confidence for the rule (`DoubleType`)."""
    +        return self._call_java("associationRules")
    +
    +
    +class FPGrowth(JavaEstimator, HasItemsCol, HasPredictionCol,
    +               HasSupport, HasConfidence, JavaMLWritable, JavaMLReadable):
    +    """A parallel FP-growth algorithm to mine frequent itemsets
    --- End diff --
    
    Copy all relevant docs from the Scala doc, please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r108048696
  
    --- Diff: python/pyspark/ml/fpm.py ---
    @@ -0,0 +1,232 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from pyspark import keyword_only, since
    +from pyspark.ml.util import *
    +from pyspark.ml.wrapper import JavaEstimator, JavaModel
    +from pyspark.ml.param.shared import *
    +
    +__all__ = ["FPGrowth", "FPGrowthModel"]
    +
    +
    +class HasSupport(Params):
    +    """
    +    Mixin for param support: [0.0, 1.0].
    +    """
    +
    +    minSupport = Param(
    +        Params._dummy(),
    +        "minSupport",
    +        "Minimal support level of the frequent pattern. [0.0, 1.0]. Any pattern that appears more "
    +        "than (minSupport * size-of-the-dataset) times will be output",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinSupport(self, value):
    +        """
    +        Sets the value of :py:attr:`minSupport`.
    +        """
    +        if not (0 <= value <= 1):
    +            raise ValueError("Support must be in range [0, 1]")
    +        return self._set(minSupport=value)
    +
    +    def getMinSupport(self):
    +        """
    +        Gets the value of minSupport or its default value.
    +        """
    +        return self.getOrDefault(self.minSupport)
    +
    +
    +class HasConfidence(Params):
    +    """
    +    Mixin for param confidence: [0.0, 1.0].
    +    """
    +
    +    minConfidence = Param(
    +        Params._dummy(),
    +        "minConfidence",
    +        """"Minimal confidence for generating Association Rule. [0.0, 1.0]
    +        Note that minConfidence has no effect during fitting.""",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinConfidence(self, value):
    +        """
    +        Sets the value of :py:attr:`minConfidence`.
    +        """
    +        if not (0 <= value <= 1):
    +            raise ValueError("Confidence must be in range [0, 1]")
    +        return self._set(minConfidence=value)
    +
    +    def getMinConfidence(self):
    +        """
    +        Gets the value of minConfidence or its default value.
    +        """
    +        return self.getOrDefault(self.minConfidence)
    +
    +
    +class HasItemsCol(Params):
    +    """
    +    Mixin for param itemsCol: items column name.
    +    """
    +
    +    itemsCol = Param(Params._dummy(), "itemsCol",
    +                     "items column name", typeConverter=TypeConverters.toString)
    +
    +    def setItemsCol(self, value):
    +        """
    +        Sets the value of :py:attr:`itemsCol`.
    +        """
    +        return self._set(itemsCol=value)
    +
    +    def getItemsCol(self):
    +        """
    +        Gets the value of itemsCol or its default value.
    +        """
    +        return self.getOrDefault(self.itemsCol)
    +
    +
    +class FPGrowthModel(JavaModel, JavaMLWritable, JavaMLReadable,
    +                    HasConfidence, HasItemsCol, HasPredictionCol):
    +    """Model fitted by FPGrowth.
    +
    +    .. note:: Experimental
    +
    +    .. versionadded:: 2.2.0
    +    """
    +    @property
    +    @since("2.2.0")
    +    def freqItemsets(self):
    +        """
    +        DataFrame with two columns:
    +        * `items` - Itemset of the same type as the input column.
    +        * `freq`  - Frequency of the itemset (`LongType`).
    +        """
    +        return self._call_java("freqItemsets")
    +
    +    @property
    +    @since("2.2.0")
    +    def associationRules(self):
    +        """
    +        Data with three columns:
    +        * `antecedent`  - Array of the same type as the input column.
    +        * `consequent`  - Single element array of the same type as the input column.
    --- End diff --
    
    I just realized: If we're leaving open the possibility of returning multiple elements here in the future, then let's not document that this has a single element (else it effectively becomes a guarantee in the API).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r107808584
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -60,6 +60,7 @@
     from pyspark.ml.regression import LinearRegression, DecisionTreeRegressor, \
         GeneralizedLinearRegression
     from pyspark.ml.tuning import *
    +from pyspark.ml.fpm import FPGrowth, FPGrowthModel
    --- End diff --
    
    I fixed the rest but isn't this one already in order?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    One more thing: Can you please update dev/sparktestsupport/modules.py with the new "fpm" module?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #74898 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74898/testReport)** for PR 17218 at commit [`5f4673e`](https://github.com/apache/spark/commit/5f4673e74049d9f6918f4e029215ae6c8364043e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75241/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    @jkbradley I think this is ready for review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r107810416
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1243,6 +1245,43 @@ def test_tweedie_distribution(self):
             self.assertTrue(np.isclose(model2.intercept, 0.6667, atol=1E-4))
     
     
    +class FPGrowthTests(SparkSessionTestCase):
    +    def setUp(self):
    +        super(FPGrowthTests, self).setUp()
    +        self.data = self.spark.createDataFrame(
    +            [([1, 2], ), ([1, 2], ), ([1, 2, 3], ), ([1, 3], )],
    +            ["items"])
    +
    +    def test_association_rules(self):
    +        fp = FPGrowth()
    +        fpm = fp.fit(self.data)
    +
    +        expected_association_rules = self.spark.createDataFrame(
    +            [([3], [1], 1.0), ([2], [1], 1.0)],
    +            ["antecedent", "consequent", "confidence"]
    +        )
    +        actual_association_rules = fpm.associationRules
    +
    --- End diff --
    
    @davies is looking into this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #75230 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75230/testReport)** for PR 17218 at commit [`deb2ce7`](https://github.com/apache/spark/commit/deb2ce7f8586ce27aeefefa8b689e558d20f07de).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #75121 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75121/testReport)** for PR 17218 at commit [`f5bb151`](https://github.com/apache/spark/commit/f5bb151f4d0fa153a4af0d5e122ccfa8aa479b24).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #75113 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75113/testReport)** for PR 17218 at commit [`d8f291f`](https://github.com/apache/spark/commit/d8f291f0255a60caf8698a22babe42b03190362b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74898/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r107774888
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1243,6 +1244,45 @@ def test_tweedie_distribution(self):
             self.assertTrue(np.isclose(model2.intercept, 0.6667, atol=1E-4))
     
     
    +class FPGrowthTests(SparkSessionTestCase):
    +    def setUp(self):
    --- End diff --
    
    Call ```super(FPGrowthTests, self).setUp()``` too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74354/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r107757033
  
    --- Diff: python/pyspark/ml/fpm.py ---
    @@ -0,0 +1,211 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from pyspark import keyword_only
    +from pyspark.ml.util import *
    +from pyspark.ml.wrapper import JavaEstimator, JavaModel
    +from pyspark.ml.param.shared import *
    +
    +__all__ = ["FPGrowth", "FPGrowthModel"]
    +
    +
    +class HasSupport(Params):
    +    """
    +    Mixin for param support: [0.0, 1.0].
    --- End diff --
    
    Remove bounds here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #75119 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75119/testReport)** for PR 17218 at commit [`c478b24`](https://github.com/apache/spark/commit/c478b2407fad9ae1e24b52c4cd29e8ea755c104f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r108050687
  
    --- Diff: dev/sparktestsupport/modules.py ---
    @@ -423,15 +423,16 @@ def __hash__(self):
             "python/pyspark/ml/"
         ],
         python_test_goals=[
    -        "pyspark.ml.feature",
             "pyspark.ml.classification",
             "pyspark.ml.clustering",
    +        "pyspark.ml.evaluation",
    +        "pyspark.ml.feature",
    +        "pyspark.ml.fpm",
             "pyspark.ml.linalg.__init__",
             "pyspark.ml.recommendation",
             "pyspark.ml.regression",
             "pyspark.ml.tuning",
             "pyspark.ml.tests",
    --- End diff --
    
    Sure thing, I thought there is some logic in putting tests last. Should I reorder the other modules as well?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #75219 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75219/testReport)** for PR 17218 at commit [`21a3606`](https://github.com/apache/spark/commit/21a36066b5bb7f7a58e123de5ad778257031f363).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Sure, I can take a look.  Let me ping @mlnick too since he marked himself as shepherd


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74827/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #74895 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74895/testReport)** for PR 17218 at commit [`0a3798d`](https://github.com/apache/spark/commit/0a3798d906e1341303b6872d44d5ce68d853aae4).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class HasItemsCol(Params):`
      * `class FPGrowthModel(JavaModel, JavaMLWritable, JavaMLReadable):`
      * `class FPGrowth(JavaEstimator, HasItemsCol, HasPredictionCol,`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #75230 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75230/testReport)** for PR 17218 at commit [`deb2ce7`](https://github.com/apache/spark/commit/deb2ce7f8586ce27aeefefa8b689e558d20f07de).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r107773640
  
    --- Diff: python/pyspark/ml/fpm.py ---
    @@ -0,0 +1,211 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from pyspark import keyword_only
    --- End diff --
    
    Import "since" too.  I'm not sure why it works without the import.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #75136 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75136/testReport)** for PR 17218 at commit [`dd67055`](https://github.com/apache/spark/commit/dd67055e3ac22451c8a5aeb3a0b0c9c007f30b67).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r107774188
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1243,6 +1244,45 @@ def test_tweedie_distribution(self):
             self.assertTrue(np.isclose(model2.intercept, 0.6667, atol=1E-4))
     
     
    +class FPGrowthTests(SparkSessionTestCase):
    +    def setUp(self):
    +        self.shuffle_partitions = self.spark.conf.get("spark.sql.shuffle.partitions")
    +        self.spark.conf.set("spark.sql.shuffle.partitions", "1")
    --- End diff --
    
    Why 1 partition?  If it's for speed, then I wouldn't bother, unless we want to adjust it for all unit tests.  (I agree that setting it to 4 or so is often much faster than the default of 200.)  But let's do that in another PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75128/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75237/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75136/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #74827 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74827/testReport)** for PR 17218 at commit [`0a3798d`](https://github.com/apache/spark/commit/0a3798d906e1341303b6872d44d5ce68d853aae4).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class HasItemsCol(Params):`
      * `class FPGrowthModel(JavaModel, JavaMLWritable, JavaMLReadable):`
      * `class FPGrowth(JavaEstimator, HasItemsCol, HasPredictionCol,`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #74895 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74895/testReport)** for PR 17218 at commit [`0a3798d`](https://github.com/apache/spark/commit/0a3798d906e1341303b6872d44d5ce68d853aae4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75218/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #75218 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75218/testReport)** for PR 17218 at commit [`f90b71e`](https://github.com/apache/spark/commit/f90b71ead8c78412a6c4dfcaf4989e59a3bcde3e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by hhbyyh <gi...@git.apache.org>.

Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    @jkbradley Regarding the question, in most definition Association Rules are defined between two ItemSets and ArrayType seems to be a more intuitive choice for me. It just happens that the AssociationRules in mllib supports only single item as the consequent for now, which may be changed in the future. And I don't think there would be a noticeable performance improvement for `transform` from the change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #75168 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75168/testReport)** for PR 17218 at commit [`4ceff1d`](https://github.com/apache/spark/commit/4ceff1d6226d2e5aaf93cb09b94383cd5d97c98b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17218


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    @hhbyyh OK that seems reasonable; I could see us adding support for multiple items in the future as well.  Thanks for confirming!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74356/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #75128 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75128/testReport)** for PR 17218 at commit [`d8c2a69`](https://github.com/apache/spark/commit/d8c2a69b4b4bc961dcae05d8f44519574aabf70e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class FPGrowthModel(JavaModel, JavaMLWritable, JavaMLReadable,`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #75168 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75168/testReport)** for PR 17218 at commit [`4ceff1d`](https://github.com/apache/spark/commit/4ceff1d6226d2e5aaf93cb09b94383cd5d97c98b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #74534 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74534/testReport)** for PR 17218 at commit [`c9ab242`](https://github.com/apache/spark/commit/c9ab242cf752d9ec66ff7b09ba280f37e40bb9aa).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #74827 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74827/testReport)** for PR 17218 at commit [`0a3798d`](https://github.com/apache/spark/commit/0a3798d906e1341303b6872d44d5ce68d853aae4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by indyragandy <gi...@git.apache.org>.

Github user indyragandy commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Get updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #74354 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74354/testReport)** for PR 17218 at commit [`e599f31`](https://github.com/apache/spark/commit/e599f311a1c65b0e70820c2ede773f825d4854bb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #74534 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74534/testReport)** for PR 17218 at commit [`c9ab242`](https://github.com/apache/spark/commit/c9ab242cf752d9ec66ff7b09ba280f37e40bb9aa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74632/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r108054204
  
    --- Diff: dev/sparktestsupport/modules.py ---
    @@ -423,15 +423,16 @@ def __hash__(self):
             "python/pyspark/ml/"
         ],
         python_test_goals=[
    -        "pyspark.ml.feature",
             "pyspark.ml.classification",
             "pyspark.ml.clustering",
    +        "pyspark.ml.evaluation",
    +        "pyspark.ml.feature",
    +        "pyspark.ml.fpm",
             "pyspark.ml.linalg.__init__",
             "pyspark.ml.recommendation",
             "pyspark.ml.regression",
             "pyspark.ml.tuning",
             "pyspark.ml.tests",
    --- End diff --
    
    Interesting...maybe?  I guess it doesn't really matter, so no need to rearrange more.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #74632 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74632/testReport)** for PR 17218 at commit [`a2afb74`](https://github.com/apache/spark/commit/a2afb743d5b4ba057cbf2a67de2b595119c372da).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #74230 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74230/testReport)** for PR 17218 at commit [`3b10a30`](https://github.com/apache/spark/commit/3b10a30cc03f5c8e1e4dacf045261f1e5d70aa4e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #74630 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74630/testReport)** for PR 17218 at commit [`9074312`](https://github.com/apache/spark/commit/90743124dd4e71d1e2f5f00e19a522d1f92dff63).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class HasSupport(Params):`
      * `class HasConfidence(Params):`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74230/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75114/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r108048613
  
    --- Diff: python/pyspark/ml/fpm.py ---
    @@ -0,0 +1,232 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from pyspark import keyword_only, since
    +from pyspark.ml.util import *
    +from pyspark.ml.wrapper import JavaEstimator, JavaModel
    +from pyspark.ml.param.shared import *
    +
    +__all__ = ["FPGrowth", "FPGrowthModel"]
    +
    +
    +class HasSupport(Params):
    +    """
    +    Mixin for param support: [0.0, 1.0].
    +    """
    +
    +    minSupport = Param(
    +        Params._dummy(),
    +        "minSupport",
    +        "Minimal support level of the frequent pattern. [0.0, 1.0]. Any pattern that appears more "
    +        "than (minSupport * size-of-the-dataset) times will be output",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinSupport(self, value):
    +        """
    +        Sets the value of :py:attr:`minSupport`.
    +        """
    +        if not (0 <= value <= 1):
    --- End diff --
    
    On this topic, I agree with you that not checking here could currently cause late failures in a Pipeline.  However, I think the right fix for this is to add PipelineStage and transformSchema() to Python.  I just made a JIRA for it: https://issues.apache.org/jira/browse/SPARK-20099


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r107758054
  
    --- Diff: python/pyspark/ml/fpm.py ---
    @@ -0,0 +1,211 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from pyspark import keyword_only
    +from pyspark.ml.util import *
    +from pyspark.ml.wrapper import JavaEstimator, JavaModel
    +from pyspark.ml.param.shared import *
    +
    +__all__ = ["FPGrowth", "FPGrowthModel"]
    +
    +
    +class HasSupport(Params):
    +    """
    +    Mixin for param support: [0.0, 1.0].
    +    """
    +
    +    minSupport = Param(
    +        Params._dummy(),
    +        "minSupport",
    +        "Minimal support level of the frequent pattern. [0.0, 1.0]. Any pattern that appears more "
    +        "than (minSupport * size-of-the-dataset) times will be output",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinSupport(self, value):
    +        """
    +        Sets the value of :py:attr:`minSupport`.
    +        """
    +        if not 0 <= value <= 1:
    +            ValueError("Support must be in range [0, 1]")
    +        return self._set(minSupport=value)
    +
    +    def getMinSupport(self):
    +        """
    +        Gets the value of minSupport or its default value.
    +        """
    +        return self.getOrDefault(self.minSupport)
    +
    +
    +class HasConfidence(Params):
    +    """
    +    Mixin for param confidence: [0.0, 1.0].
    +    """
    +
    +    minConfidence = Param(
    +        Params._dummy(),
    +        "minConfidence",
    +        "Minimal confidence for generating Association Rule. [0.0, 1.0]",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinConfidence(self, value):
    +        """
    +        Sets the value of :py:attr:`minConfidence`.
    +        """
    +        if not 0 <= value <= 1:
    +            ValueError("Confidence must be in range [0, 1]")
    +        return self._set(minConfidence=value)
    +
    +    def getMinConfidence(self):
    +        """
    +        Gets the value of minConfidence or its default value.
    +        """
    +        return self.getOrDefault(self.minConfidence)
    +
    +
    +class HasItemsCol(Params):
    +    """
    +    Mixin for param itemsCol: items column name.
    +    """
    +
    +    itemsCol = Param(Params._dummy(), "itemsCol",
    +                     "items column name.", typeConverter=TypeConverters.toString)
    +
    +    def __init__(self):
    --- End diff --
    
    No need for this.  The default will be set in FPGrowth


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r107773828
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -60,6 +60,7 @@
     from pyspark.ml.regression import LinearRegression, DecisionTreeRegressor, \
         GeneralizedLinearRegression
     from pyspark.ml.tuning import *
    +from pyspark.ml.fpm import FPGrowth, FPGrowthModel
    --- End diff --
    
    sort imports alphabetically  (Feel free to fix others too)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #74356 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74356/testReport)** for PR 17218 at commit [`4fe6257`](https://github.com/apache/spark/commit/4fe6257a014824f037cc8de0e99fe217439ce7b5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r108048627
  
    --- Diff: python/pyspark/ml/fpm.py ---
    @@ -0,0 +1,232 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from pyspark import keyword_only, since
    +from pyspark.ml.util import *
    +from pyspark.ml.wrapper import JavaEstimator, JavaModel
    +from pyspark.ml.param.shared import *
    +
    +__all__ = ["FPGrowth", "FPGrowthModel"]
    +
    +
    +class HasSupport(Params):
    +    """
    +    Mixin for param support: [0.0, 1.0].
    +    """
    +
    +    minSupport = Param(
    +        Params._dummy(),
    +        "minSupport",
    +        "Minimal support level of the frequent pattern. [0.0, 1.0]. Any pattern that appears more "
    +        "than (minSupport * size-of-the-dataset) times will be output",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinSupport(self, value):
    +        """
    +        Sets the value of :py:attr:`minSupport`.
    +        """
    +        if not (0 <= value <= 1):
    +            raise ValueError("Support must be in range [0, 1]")
    +        return self._set(minSupport=value)
    +
    +    def getMinSupport(self):
    +        """
    +        Gets the value of minSupport or its default value.
    +        """
    +        return self.getOrDefault(self.minSupport)
    +
    +
    +class HasConfidence(Params):
    +    """
    +    Mixin for param confidence: [0.0, 1.0].
    +    """
    +
    +    minConfidence = Param(
    +        Params._dummy(),
    +        "minConfidence",
    +        """"Minimal confidence for generating Association Rule. [0.0, 1.0]
    --- End diff --
    
    Extra quotes here.  Does this come out formatted correctly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r107757685
  
    --- Diff: python/pyspark/ml/fpm.py ---
    @@ -0,0 +1,211 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from pyspark import keyword_only
    +from pyspark.ml.util import *
    +from pyspark.ml.wrapper import JavaEstimator, JavaModel
    +from pyspark.ml.param.shared import *
    +
    +__all__ = ["FPGrowth", "FPGrowthModel"]
    +
    +
    +class HasSupport(Params):
    +    """
    +    Mixin for param support: [0.0, 1.0].
    +    """
    +
    +    minSupport = Param(
    +        Params._dummy(),
    +        "minSupport",
    +        "Minimal support level of the frequent pattern. [0.0, 1.0]. Any pattern that appears more "
    +        "than (minSupport * size-of-the-dataset) times will be output",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinSupport(self, value):
    +        """
    +        Sets the value of :py:attr:`minSupport`.
    +        """
    +        if not 0 <= value <= 1:
    +            ValueError("Support must be in range [0, 1]")
    +        return self._set(minSupport=value)
    +
    +    def getMinSupport(self):
    +        """
    +        Gets the value of minSupport or its default value.
    +        """
    +        return self.getOrDefault(self.minSupport)
    +
    +
    +class HasConfidence(Params):
    +    """
    +    Mixin for param confidence: [0.0, 1.0].
    +    """
    +
    +    minConfidence = Param(
    +        Params._dummy(),
    +        "minConfidence",
    +        "Minimal confidence for generating Association Rule. [0.0, 1.0]",
    --- End diff --
    
    Match Scala doc: "Note that minConfidence has no effect during fitting."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    __Note__: should be retested after #17321


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r107809244
  
    --- Diff: python/pyspark/ml/fpm.py ---
    @@ -0,0 +1,211 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from pyspark import keyword_only
    +from pyspark.ml.util import *
    +from pyspark.ml.wrapper import JavaEstimator, JavaModel
    +from pyspark.ml.param.shared import *
    +
    +__all__ = ["FPGrowth", "FPGrowthModel"]
    +
    +
    +class HasSupport(Params):
    +    """
    +    Mixin for param support: [0.0, 1.0].
    +    """
    +
    +    minSupport = Param(
    +        Params._dummy(),
    +        "minSupport",
    +        "Minimal support level of the frequent pattern. [0.0, 1.0]. Any pattern that appears more "
    +        "than (minSupport * size-of-the-dataset) times will be output",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinSupport(self, value):
    +        """
    +        Sets the value of :py:attr:`minSupport`.
    +        """
    +        if not 0 <= value <= 1:
    +            ValueError("Support must be in range [0, 1]")
    +        return self._set(minSupport=value)
    +
    +    def getMinSupport(self):
    +        """
    +        Gets the value of minSupport or its default value.
    +        """
    +        return self.getOrDefault(self.minSupport)
    +
    +
    +class HasConfidence(Params):
    +    """
    +    Mixin for param confidence: [0.0, 1.0].
    +    """
    +
    +    minConfidence = Param(
    +        Params._dummy(),
    +        "minConfidence",
    +        "Minimal confidence for generating Association Rule. [0.0, 1.0]",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinConfidence(self, value):
    +        """
    +        Sets the value of :py:attr:`minConfidence`.
    +        """
    +        if not 0 <= value <= 1:
    +            ValueError("Confidence must be in range [0, 1]")
    +        return self._set(minConfidence=value)
    +
    +    def getMinConfidence(self):
    +        """
    +        Gets the value of minConfidence or its default value.
    +        """
    +        return self.getOrDefault(self.minConfidence)
    +
    +
    +class HasItemsCol(Params):
    +    """
    +    Mixin for param itemsCol: items column name.
    +    """
    +
    +    itemsCol = Param(Params._dummy(), "itemsCol",
    +                     "items column name.", typeConverter=TypeConverters.toString)
    +
    +    def __init__(self):
    +        super(HasItemsCol, self).__init__()
    +        self._setDefault(itemsCol='items')
    +
    +    def setItemsCol(self, value):
    +        """
    +        Sets the value of :py:attr:`itemsCol`.
    +        """
    +        return self._set(itemsCol=value)
    +
    +    def getItemsCol(self):
    +        """
    +        Gets the value of itemsCol or its default value.
    +        """
    +        return self.getOrDefault(self.itemsCol)
    +
    +
    +class FPGrowthModel(JavaModel, JavaMLWritable, JavaMLReadable):
    --- End diff --
    
    I pushed my first attempt but I think will require a bit more discussion. If enable this here should we do the same for the rest of Python models?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74534/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75113/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Jenkins retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r107757520
  
    --- Diff: python/pyspark/ml/fpm.py ---
    @@ -0,0 +1,211 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from pyspark import keyword_only
    +from pyspark.ml.util import *
    +from pyspark.ml.wrapper import JavaEstimator, JavaModel
    +from pyspark.ml.param.shared import *
    +
    +__all__ = ["FPGrowth", "FPGrowthModel"]
    +
    +
    +class HasSupport(Params):
    +    """
    +    Mixin for param support: [0.0, 1.0].
    +    """
    +
    +    minSupport = Param(
    +        Params._dummy(),
    +        "minSupport",
    +        "Minimal support level of the frequent pattern. [0.0, 1.0]. Any pattern that appears more "
    +        "than (minSupport * size-of-the-dataset) times will be output",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinSupport(self, value):
    +        """
    +        Sets the value of :py:attr:`minSupport`.
    +        """
    +        if not 0 <= value <= 1:
    +            ValueError("Support must be in range [0, 1]")
    +        return self._set(minSupport=value)
    +
    +    def getMinSupport(self):
    +        """
    +        Gets the value of minSupport or its default value.
    +        """
    +        return self.getOrDefault(self.minSupport)
    +
    +
    +class HasConfidence(Params):
    +    """
    +    Mixin for param confidence: [0.0, 1.0].
    --- End diff --
    
    omit range here too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r107757555
  
    --- Diff: python/pyspark/ml/fpm.py ---
    @@ -0,0 +1,211 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from pyspark import keyword_only
    +from pyspark.ml.util import *
    +from pyspark.ml.wrapper import JavaEstimator, JavaModel
    +from pyspark.ml.param.shared import *
    +
    +__all__ = ["FPGrowth", "FPGrowthModel"]
    +
    +
    +class HasSupport(Params):
    +    """
    +    Mixin for param support: [0.0, 1.0].
    +    """
    +
    +    minSupport = Param(
    +        Params._dummy(),
    +        "minSupport",
    +        "Minimal support level of the frequent pattern. [0.0, 1.0]. Any pattern that appears more "
    +        "than (minSupport * size-of-the-dataset) times will be output",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinSupport(self, value):
    +        """
    +        Sets the value of :py:attr:`minSupport`.
    +        """
    +        if not 0 <= value <= 1:
    +            ValueError("Support must be in range [0, 1]")
    +        return self._set(minSupport=value)
    +
    +    def getMinSupport(self):
    +        """
    +        Gets the value of minSupport or its default value.
    +        """
    +        return self.getOrDefault(self.minSupport)
    +
    +
    +class HasConfidence(Params):
    +    """
    +    Mixin for param confidence: [0.0, 1.0].
    +    """
    +
    +    minConfidence = Param(
    +        Params._dummy(),
    +        "minConfidence",
    +        "Minimal confidence for generating Association Rule. [0.0, 1.0]",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinConfidence(self, value):
    +        """
    +        Sets the value of :py:attr:`minConfidence`.
    +        """
    +        if not 0 <= value <= 1:
    --- End diff --
    
    ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75119/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    @jkbradley  As far as I remember some variants of `PrefixSpan` use confidence but I doubt we'll encounter this problem any time soon :) 
    
    Somewhat related - could you take a look at [SPARK-19899](https://issues.apache.org/jira/browse/SPARK-19899)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Issue this PR brought up:
    * Background: AssociationRules currently return a 1-element array for the consequent (predicted item).  This makes sense b/c, even though multiple consequents could be predicted for a given itemset, they belong in different rules because they have different confidences.
    * *Question*: Should we change the schema for "consequent" to be a single item, rather than an array of a single item?
    * CCing people who have worked on this: @zero323 @mlnick @hhbyyh @felixcheung 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    > True, we should do it for all models. And you're right that it's more involved than I was thinking. Specifically, rather than calling setParams from _create_model, I'd want us to call _copyValues from fit() in order to eliminate duplicate code. Would you mind removing the Params from the model, and we can work on adding them in more carefully for the next release? Thanks a lot!
    >
    > I dug up the existing JIRA for this issue: https://issues.apache.org/jira/browse/SPARK-10931
    
    I removed the code and I'll be following SPARK-10931. One possible challenge (here and for parameters validation) is high latency of Py4j calls. With large pipelines it can build up pretty fast.
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #74632 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74632/testReport)** for PR 17218 at commit [`a2afb74`](https://github.com/apache/spark/commit/a2afb743d5b4ba057cbf2a67de2b595119c372da).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class HasSupport(Params):`
      * `class HasConfidence(Params):`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #74354 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74354/testReport)** for PR 17218 at commit [`e599f31`](https://github.com/apache/spark/commit/e599f311a1c65b0e70820c2ede773f825d4854bb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #75241 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75241/testReport)** for PR 17218 at commit [`66b85e5`](https://github.com/apache/spark/commit/66b85e5fc9c6a57978df0494c4a7174070534636).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r108048426
  
    --- Diff: dev/sparktestsupport/modules.py ---
    @@ -423,15 +423,16 @@ def __hash__(self):
             "python/pyspark/ml/"
         ],
         python_test_goals=[
    -        "pyspark.ml.feature",
             "pyspark.ml.classification",
             "pyspark.ml.clustering",
    +        "pyspark.ml.evaluation",
    +        "pyspark.ml.feature",
    +        "pyspark.ml.fpm",
             "pyspark.ml.linalg.__init__",
             "pyspark.ml.recommendation",
             "pyspark.ml.regression",
             "pyspark.ml.tuning",
             "pyspark.ml.tests",
    --- End diff --
    
    As long as you're at it, switch tuning & tests to alphabetize them


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r107801964
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1243,6 +1245,43 @@ def test_tweedie_distribution(self):
             self.assertTrue(np.isclose(model2.intercept, 0.6667, atol=1E-4))
     
     
    +class FPGrowthTests(SparkSessionTestCase):
    +    def setUp(self):
    +        super(FPGrowthTests, self).setUp()
    +        self.data = self.spark.createDataFrame(
    +            [([1, 2], ), ([1, 2], ), ([1, 2, 3], ), ([1, 3], )],
    +            ["items"])
    +
    +    def test_association_rules(self):
    +        fp = FPGrowth()
    +        fpm = fp.fit(self.data)
    +
    +        expected_association_rules = self.spark.createDataFrame(
    +            [([3], [1], 1.0), ([2], [1], 1.0)],
    +            ["antecedent", "consequent", "confidence"]
    +        )
    +        actual_association_rules = fpm.associationRules
    +
    --- End diff --
    
    Maybe it's from not calling the parent setUp and tearDown


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r107784062
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1243,6 +1244,45 @@ def test_tweedie_distribution(self):
             self.assertTrue(np.isclose(model2.intercept, 0.6667, atol=1E-4))
     
     
    +class FPGrowthTests(SparkSessionTestCase):
    +    def setUp(self):
    +        self.shuffle_partitions = self.spark.conf.get("spark.sql.shuffle.partitions")
    +        self.spark.conf.set("spark.sql.shuffle.partitions", "1")
    --- End diff --
    
    Performance only.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    True, if minSupport can be shared, then that's OK.  confidence won't be shared though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75131/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r108048733
  
    --- Diff: python/pyspark/ml/fpm.py ---
    @@ -0,0 +1,232 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from pyspark import keyword_only, since
    +from pyspark.ml.util import *
    +from pyspark.ml.wrapper import JavaEstimator, JavaModel
    +from pyspark.ml.param.shared import *
    +
    +__all__ = ["FPGrowth", "FPGrowthModel"]
    +
    +
    +class HasSupport(Params):
    +    """
    +    Mixin for param support: [0.0, 1.0].
    +    """
    +
    +    minSupport = Param(
    +        Params._dummy(),
    +        "minSupport",
    +        "Minimal support level of the frequent pattern. [0.0, 1.0]. Any pattern that appears more "
    +        "than (minSupport * size-of-the-dataset) times will be output",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinSupport(self, value):
    +        """
    +        Sets the value of :py:attr:`minSupport`.
    +        """
    +        if not (0 <= value <= 1):
    +            raise ValueError("Support must be in range [0, 1]")
    +        return self._set(minSupport=value)
    +
    +    def getMinSupport(self):
    +        """
    +        Gets the value of minSupport or its default value.
    +        """
    +        return self.getOrDefault(self.minSupport)
    +
    +
    +class HasConfidence(Params):
    +    """
    +    Mixin for param confidence: [0.0, 1.0].
    +    """
    +
    +    minConfidence = Param(
    +        Params._dummy(),
    +        "minConfidence",
    +        """"Minimal confidence for generating Association Rule. [0.0, 1.0]
    +        Note that minConfidence has no effect during fitting.""",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinConfidence(self, value):
    +        """
    +        Sets the value of :py:attr:`minConfidence`.
    +        """
    +        if not (0 <= value <= 1):
    +            raise ValueError("Confidence must be in range [0, 1]")
    +        return self._set(minConfidence=value)
    +
    +    def getMinConfidence(self):
    +        """
    +        Gets the value of minConfidence or its default value.
    +        """
    +        return self.getOrDefault(self.minConfidence)
    +
    +
    +class HasItemsCol(Params):
    +    """
    +    Mixin for param itemsCol: items column name.
    +    """
    +
    +    itemsCol = Param(Params._dummy(), "itemsCol",
    +                     "items column name", typeConverter=TypeConverters.toString)
    +
    +    def setItemsCol(self, value):
    +        """
    +        Sets the value of :py:attr:`itemsCol`.
    +        """
    +        return self._set(itemsCol=value)
    +
    +    def getItemsCol(self):
    +        """
    +        Gets the value of itemsCol or its default value.
    +        """
    +        return self.getOrDefault(self.itemsCol)
    +
    +
    +class FPGrowthModel(JavaModel, JavaMLWritable, JavaMLReadable,
    +                    HasConfidence, HasItemsCol, HasPredictionCol):
    +    """Model fitted by FPGrowth.
    +
    +    .. note:: Experimental
    +
    +    .. versionadded:: 2.2.0
    +    """
    +    @property
    +    @since("2.2.0")
    +    def freqItemsets(self):
    +        """
    +        DataFrame with two columns:
    +        * `items` - Itemset of the same type as the input column.
    +        * `freq`  - Frequency of the itemset (`LongType`).
    +        """
    +        return self._call_java("freqItemsets")
    +
    +    @property
    +    @since("2.2.0")
    +    def associationRules(self):
    +        """
    +        Data with three columns:
    +        * `antecedent`  - Array of the same type as the input column.
    +        * `consequent`  - Single element array of the same type as the input column.
    +        * `confidence`  - Confidence for the rule (`DoubleType`).
    +        """
    +        self._transfer_params_to_java()
    +        return self._call_java("associationRules")
    +
    +    @keyword_only
    +    @since("2.2.0")
    +    def setParams(self, minConfidence=0.8, itemsCol="items", predictionCol="prediction"):
    +        """
    +        setParams(self, minConfidence=0.8, itemsCol="items", predictionCol="prediction")
    +        """
    +        kwargs = self._input_kwargs
    +        return self._set(**kwargs)
    +
    +
    +class FPGrowth(JavaEstimator, HasItemsCol, HasPredictionCol,
    +               HasSupport, HasConfidence, JavaMLWritable, JavaMLReadable):
    +    """A parallel FP-growth algorithm to mine frequent itemsets. The algorithm is described in
    +    Li et al., PFP: Parallel FP-Growth for Query Recommendation [LI2008]_.
    +    PFP distributes computation in such a way that each worker executes an
    +    independent group of mining tasks. The FP-Growth algorithm is described in
    +    Han et al., Mining frequent patterns without candidate generation [HAN2000]_
    +
    +    .. [LI2008] http://dx.doi.org/10.1145/1454008.1454027
    +    .. [HAN2000] http://dx.doi.org/10.1145/335191.335372
    +
    +    .. note:: Experimental
    --- End diff --
    
    I didn't see this before, so now this is noted twice.  Just put it once at the beginning of the docstring.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75219/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r107801822
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1243,6 +1245,43 @@ def test_tweedie_distribution(self):
             self.assertTrue(np.isclose(model2.intercept, 0.6667, atol=1E-4))
     
     
    +class FPGrowthTests(SparkSessionTestCase):
    +    def setUp(self):
    +        super(FPGrowthTests, self).setUp()
    +        self.data = self.spark.createDataFrame(
    +            [([1, 2], ), ([1, 2], ), ([1, 2, 3], ), ([1, 3], )],
    +            ["items"])
    +
    +    def test_association_rules(self):
    +        fp = FPGrowth()
    +        fpm = fp.fit(self.data)
    +
    +        expected_association_rules = self.spark.createDataFrame(
    +            [([3], [1], 1.0), ([2], [1], 1.0)],
    +            ["antecedent", "consequent", "confidence"]
    +        )
    +        actual_association_rules = fpm.associationRules
    +
    --- End diff --
    
    Try inserting ```actual_association_rules.collect()``` here.  I get a weird error.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    **[Test build #75218 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75218/testReport)** for PR 17218 at commit [`f90b71e`](https://github.com/apache/spark/commit/f90b71ead8c78412a6c4dfcaf4989e59a3bcde3e).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r107761241
  
    --- Diff: python/pyspark/ml/fpm.py ---
    @@ -0,0 +1,211 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from pyspark import keyword_only
    +from pyspark.ml.util import *
    +from pyspark.ml.wrapper import JavaEstimator, JavaModel
    +from pyspark.ml.param.shared import *
    +
    +__all__ = ["FPGrowth", "FPGrowthModel"]
    +
    +
    +class HasSupport(Params):
    +    """
    +    Mixin for param support: [0.0, 1.0].
    +    """
    +
    +    minSupport = Param(
    +        Params._dummy(),
    +        "minSupport",
    +        "Minimal support level of the frequent pattern. [0.0, 1.0]. Any pattern that appears more "
    +        "than (minSupport * size-of-the-dataset) times will be output",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinSupport(self, value):
    +        """
    +        Sets the value of :py:attr:`minSupport`.
    +        """
    +        if not 0 <= value <= 1:
    +            ValueError("Support must be in range [0, 1]")
    +        return self._set(minSupport=value)
    +
    +    def getMinSupport(self):
    +        """
    +        Gets the value of minSupport or its default value.
    +        """
    +        return self.getOrDefault(self.minSupport)
    +
    +
    +class HasConfidence(Params):
    +    """
    +    Mixin for param confidence: [0.0, 1.0].
    +    """
    +
    +    minConfidence = Param(
    +        Params._dummy(),
    +        "minConfidence",
    +        "Minimal confidence for generating Association Rule. [0.0, 1.0]",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinConfidence(self, value):
    +        """
    +        Sets the value of :py:attr:`minConfidence`.
    +        """
    +        if not 0 <= value <= 1:
    +            ValueError("Confidence must be in range [0, 1]")
    +        return self._set(minConfidence=value)
    +
    +    def getMinConfidence(self):
    +        """
    +        Gets the value of minConfidence or its default value.
    +        """
    +        return self.getOrDefault(self.minConfidence)
    +
    +
    +class HasItemsCol(Params):
    +    """
    +    Mixin for param itemsCol: items column name.
    +    """
    +
    +    itemsCol = Param(Params._dummy(), "itemsCol",
    +                     "items column name.", typeConverter=TypeConverters.toString)
    +
    +    def __init__(self):
    +        super(HasItemsCol, self).__init__()
    +        self._setDefault(itemsCol='items')
    +
    +    def setItemsCol(self, value):
    +        """
    +        Sets the value of :py:attr:`itemsCol`.
    +        """
    +        return self._set(itemsCol=value)
    +
    +    def getItemsCol(self):
    +        """
    +        Gets the value of itemsCol or its default value.
    +        """
    +        return self.getOrDefault(self.itemsCol)
    +
    +
    +class FPGrowthModel(JavaModel, JavaMLWritable, JavaMLReadable):
    +    """Model fitted by FPGrowth.
    +
    +    .. versionadded:: 2.2.0
    +    """
    +    @property
    +    @since("2.2.0")
    +    def freqItemsets(self):
    +        """DataFrame with two columns:
    --- End diff --
    
    Python style: put triple-quotes on a line by themselves (here and elsewhere below)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75230/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17218#discussion_r107758565
  
    --- Diff: python/pyspark/ml/fpm.py ---
    @@ -0,0 +1,211 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from pyspark import keyword_only
    +from pyspark.ml.util import *
    +from pyspark.ml.wrapper import JavaEstimator, JavaModel
    +from pyspark.ml.param.shared import *
    +
    +__all__ = ["FPGrowth", "FPGrowthModel"]
    +
    +
    +class HasSupport(Params):
    +    """
    +    Mixin for param support: [0.0, 1.0].
    +    """
    +
    +    minSupport = Param(
    +        Params._dummy(),
    +        "minSupport",
    +        "Minimal support level of the frequent pattern. [0.0, 1.0]. Any pattern that appears more "
    +        "than (minSupport * size-of-the-dataset) times will be output",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinSupport(self, value):
    +        """
    +        Sets the value of :py:attr:`minSupport`.
    +        """
    +        if not 0 <= value <= 1:
    +            ValueError("Support must be in range [0, 1]")
    +        return self._set(minSupport=value)
    +
    +    def getMinSupport(self):
    +        """
    +        Gets the value of minSupport or its default value.
    +        """
    +        return self.getOrDefault(self.minSupport)
    +
    +
    +class HasConfidence(Params):
    +    """
    +    Mixin for param confidence: [0.0, 1.0].
    +    """
    +
    +    minConfidence = Param(
    +        Params._dummy(),
    +        "minConfidence",
    +        "Minimal confidence for generating Association Rule. [0.0, 1.0]",
    +        typeConverter=TypeConverters.toFloat)
    +
    +    def setMinConfidence(self, value):
    +        """
    +        Sets the value of :py:attr:`minConfidence`.
    +        """
    +        if not 0 <= value <= 1:
    +            ValueError("Confidence must be in range [0, 1]")
    +        return self._set(minConfidence=value)
    +
    +    def getMinConfidence(self):
    +        """
    +        Gets the value of minConfidence or its default value.
    +        """
    +        return self.getOrDefault(self.minConfidence)
    +
    +
    +class HasItemsCol(Params):
    +    """
    +    Mixin for param itemsCol: items column name.
    +    """
    +
    +    itemsCol = Param(Params._dummy(), "itemsCol",
    +                     "items column name.", typeConverter=TypeConverters.toString)
    +
    +    def __init__(self):
    +        super(HasItemsCol, self).__init__()
    +        self._setDefault(itemsCol='items')
    +
    +    def setItemsCol(self, value):
    +        """
    +        Sets the value of :py:attr:`itemsCol`.
    +        """
    +        return self._set(itemsCol=value)
    +
    +    def getItemsCol(self):
    +        """
    +        Gets the value of itemsCol or its default value.
    +        """
    +        return self.getOrDefault(self.itemsCol)
    +
    +
    +class FPGrowthModel(JavaModel, JavaMLWritable, JavaMLReadable):
    --- End diff --
    
    Mark Experimental


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17218: [SPARK-19281][PYTHON][ML] spark.ml Python API for FPGrow...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17218
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org