You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hivemall.apache.org by my...@apache.org on 2019/06/28 06:43:20 UTC
[incubator-hivemall] branch master updated: Fixed feature binning
documentation
This is an automated email from the ASF dual-hosted git repository.
myui pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hivemall.git
The following commit(s) were added to refs/heads/master by this push:
new c0a317d Fixed feature binning documentation
c0a317d is described below
commit c0a317dba281081432c9c873b8c07d42ff58aa92
Author: Makoto Yui <my...@apache.org>
AuthorDate: Fri Jun 28 15:43:05 2019 +0900
Fixed feature binning documentation
---
docs/gitbook/ft_engineering/binning.md | 16 +++++++---------
1 file changed, 7 insertions(+), 9 deletions(-)
diff --git a/docs/gitbook/ft_engineering/binning.md b/docs/gitbook/ft_engineering/binning.md
index 4634f92..2f36578 100644
--- a/docs/gitbook/ft_engineering/binning.md
+++ b/docs/gitbook/ft_engineering/binning.md
@@ -21,8 +21,6 @@ Feature binning is a method of dividing quantitative variables into categorical
If the number of bins is set to 3, the bin ranges become something like `[-Inf, 1], (1, 10], (10, Inf]`.
-*Note: This feature is supported from Hivemall v0.5-rc.1 or later.*
-
<!-- toc -->
# Usage
@@ -205,23 +203,23 @@ FROM
# Function Signatures
-### UDAF `build_bins(weight, num_of_bins[, auto_shrink])`
+### UDAF `build_bins(weight num_of_bins [, auto_shrink=false])`
#### Input
| weight: int|bigint|float|double | num\_of\_bins: `int` | [auto\_shrink: `boolean` = false] |
| :-: | :-: | :-: |
-| weight | 2 <= | behavior when separations are repeated: T=\>skip, F=\>exception |
+| weight | greather than or equals to 2 | behavior when separations are repeated: T=\>skip, F=\>exception |
#### Output
| quantiles: `array<double>` |
| :-: |
-| array of separation value |
+| thresholds of bins based on quantiles |
> #### Note
> There is the possibility quantiles are repeated because of too many `num_of_bins` or too few data.
-> If `auto_shrink` is true, skip duplicated quantiles. If not, throw an exception.
+> If `auto_shrink` is set to true, skip duplicated quantiles. If not, throw an exception.
### UDF `feature_binning(features, quantiles_map)`
@@ -229,15 +227,15 @@ FROM
| features: `array<features::string>` | quantiles\_map: `map<string, array<double>>` |
| :-: | :-: |
-| serialized feature | entry:: key: col name, val: quantiles |
+| feature vector | a map where key=column name and value=quantiles |
#### Output
| features: `array<feature::string>` |
| :-: |
-| serialized and binned features |
+| binned features |
-### UDF `feature_binning((weight, quantiles)`
+### UDF `feature_binning(weight, quantiles)`
#### Input