You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hivemall.apache.org by my...@apache.org on 2019/06/28 16:56:39 UTC

[incubator-hivemall-site] branch asf-site updated: Added a usage of feature_binning UDF

This is an automated email from the ASF dual-hosted git repository.

myui pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hivemall-site.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new eb4c16e  Added a usage of feature_binning UDF
eb4c16e is described below

commit eb4c16ed01465b18176f43a018b4fdf07b7015a8
Author: Makoto Yui <my...@apache.org>
AuthorDate: Sat Jun 29 01:56:26 2019 +0900

    Added a usage of feature_binning UDF
---
 userguide/ft_engineering/binning.html | 70 ++++++++++++++++++++++++++++++-----
 1 file changed, 61 insertions(+), 9 deletions(-)

diff --git a/userguide/ft_engineering/binning.html b/userguide/ft_engineering/binning.html
index 1d4f235..c0102b3 100644
--- a/userguide/ft_engineering/binning.html
+++ b/userguide/ft_engineering/binning.html
@@ -2382,10 +2382,11 @@
 <!-- toc --><div id="toc" class="toc">
 
 <ul>
-<li><a href="#usage">Usage</a><ul>
-<li><a href="#feature-vector-trasformation-by-applying-feature-binning">Feature Vector trasformation by applying Feature Binning</a></li>
+<li><a href="#data-preparation">Data Preparation</a><ul>
+<li><a href="#custom-rule-for-binning">Custom rule for binning</a></li>
+<li><a href="#binning-based-on-quantiles">Binning based on quantiles</a></li>
 <li><a href="#practical-example">Practical Example</a></li>
-<li><a href="#get-a-mapping-table-by-feature-binning">Get a mapping table by Feature Binning</a></li>
+<li><a href="#create-a-mapping-table-by-feature-binning">Create a mapping table by Feature Binning</a></li>
 </ul>
 </li>
 <li><a href="#function-signatures">Function Signatures</a><ul>
@@ -2397,7 +2398,7 @@
 </ul>
 
 </div><!-- tocstop -->
-<h1 id="usage">Usage</h1>
+<h1 id="data-preparation">Data Preparation</h1>
 <p>Prepare sample data (<em>users</em> table) first as follows:</p>
 <pre><code class="lang-sql"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> <span class="hljs-keyword">users</span> (
   <span class="hljs-keyword">rowid</span> <span class="hljs-built_in">int</span>, <span class="hljs-keyword">name</span> <span class="hljs-keyword">string</span>, age <span class="hljs-built_in">int</span>, gender <span class="hljs-keyword">string</span>
@@ -2448,8 +2449,59 @@
 </tr>
 </tbody>
 </table>
-<h2 id="feature-vector-trasformation-by-applying-feature-binning">Feature Vector trasformation by applying Feature Binning</h2>
-<p>Now, converting <code>age</code> values into 3 bins.</p>
+<h2 id="custom-rule-for-binning">Custom rule for binning</h2>
+<p>You can provide a custom rule for binning as follows:</p>
+<pre><code class="lang-sql"><span class="hljs-keyword">select</span> 
+  features <span class="hljs-keyword">as</span> original,
+  feature_binning(
+    features,
+    <span class="hljs-comment">-- [-INF-10.0], (10.0-20.0], (20.0-30.0], (30.0-40.0], (40.0-INF]</span>
+    <span class="hljs-keyword">map</span>(<span class="hljs-string">&apos;age&apos;</span>, <span class="hljs-built_in">array</span>(-infinity(), <span class="hljs-number">10.0</span>, <span class="hljs-number">20.0</span>, <span class="hljs-number">30.0</span>, <span class="hljs-number">40.0</span>, infinity()))
+  ) <span class="hljs-keyword">as</span> binned
+<span class="hljs-keyword">from</span>
+  <span class="hljs-keyword">input</span>;
+</code></pre>
+<table>
+<thead>
+<tr>
+<th style="text-align:left">original</th>
+<th style="text-align:left">binned</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td style="text-align:left">[&quot;name#Jacob&quot;,&quot;gender#Male&quot;,&quot;age:20.0&quot;]</td>
+<td style="text-align:left">[&quot;name#Jacob&quot;,&quot;gender#Male&quot;,&quot;age:1&quot;]</td>
+</tr>
+<tr>
+<td style="text-align:left">[&quot;name#Mason&quot;,&quot;gender#Male&quot;,&quot;age:22.0&quot;]</td>
+<td style="text-align:left">[&quot;name#Mason&quot;,&quot;gender#Male&quot;,&quot;age:2&quot;]</td>
+</tr>
+<tr>
+<td style="text-align:left">[&quot;name#Sophia&quot;,&quot;gender#Female&quot;,&quot;age:35.0&quot;]</td>
+<td style="text-align:left">[&quot;name#Sophia&quot;,&quot;gender#Female&quot;,&quot;age:3&quot;]</td>
+</tr>
+<tr>
+<td style="text-align:left">[&quot;name#Ethan&quot;,&quot;gender#Male&quot;,&quot;age:55.0&quot;]</td>
+<td style="text-align:left">[&quot;name#Ethan&quot;,&quot;gender#Male&quot;,&quot;age:4&quot;]</td>
+</tr>
+<tr>
+<td style="text-align:left">[&quot;name#Emma&quot;,&quot;gender#Female&quot;,&quot;age:15.0&quot;]</td>
+<td style="text-align:left">[&quot;name#Emma&quot;,&quot;gender#Female&quot;,&quot;age:1&quot;]</td>
+</tr>
+<tr>
+<td style="text-align:left">[&quot;name#Noah&quot;,&quot;gender#Male&quot;,&quot;age:46.0&quot;]</td>
+<td style="text-align:left">[&quot;name#Noah&quot;,&quot;gender#Male&quot;,&quot;age:4&quot;]</td>
+</tr>
+<tr>
+<td style="text-align:left">[&quot;name#Isabella&quot;,&quot;gender#Female&quot;,&quot;age:20.0&quot;]</td>
+<td style="text-align:left">[&quot;name#Isabella&quot;,&quot;gender#Female&quot;,&quot;age:1&quot;]</td>
+</tr>
+</tbody>
+</table>
+<h2 id="binning-based-on-quantiles">Binning based on quantiles</h2>
+<p>You can apply feature binning based on <a href="https://en.wikipedia.org/wiki/Quantile" target="_blank">quantiles</a>. </p>
+<p>Suppose converting <code>age</code> values into 3 bins:</p>
 <pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span>
   <span class="hljs-keyword">map</span>(<span class="hljs-string">&apos;age&apos;</span>, build_bins(age, <span class="hljs-number">3</span>)) <span class="hljs-keyword">AS</span> quantiles_map
 <span class="hljs-keyword">FROM</span>
@@ -2458,7 +2510,7 @@
 <blockquote>
 <p>{&quot;age&quot;:[-Infinity,18.333333333333332,30.666666666666657,Infinity]}</p>
 </blockquote>
-<p>In the above query result, you can find 4 values for age in <code>quantiles_map</code>. It&apos;s a threshold of 3 bins. </p>
+<p>In the above query result, you can find 4 values for age in <code>quantiles_map</code>. It&apos;s a threshold for 3 bins.</p>
 <pre><code class="lang-sql">WITH bins as (
   <span class="hljs-keyword">SELECT</span>
     <span class="hljs-keyword">map</span>(<span class="hljs-string">&apos;age&apos;</span>, build_bins(age, <span class="hljs-number">3</span>)) <span class="hljs-keyword">AS</span> quantiles_map
@@ -2582,7 +2634,7 @@ bins <span class="hljs-keyword">as</span> (
 </tr>
 </tbody>
 </table>
-<h2 id="get-a-mapping-table-by-feature-binning">Get a mapping table by Feature Binning</h2>
+<h2 id="create-a-mapping-table-by-feature-binning">Create a mapping table by Feature Binning</h2>
 <pre><code class="lang-sql">WITH bins AS (
   <span class="hljs-keyword">SELECT</span> build_bins(age, <span class="hljs-number">3</span>) <span class="hljs-keyword">AS</span> quantiles
   <span class="hljs-keyword">FROM</span> <span class="hljs-keyword">users</span>
@@ -2777,7 +2829,7 @@ Apache Hivemall is an effort undergoing incubation at The Apache Software Founda
     <script>
         var gitbook = gitbook || [];
         gitbook.push(function() {
-            gitbook.page.hasChanged({"page":{"title":"Feature Binning","level":"3.4","depth":1,"next":{"title":"Feature Paring","level":"3.5","depth":1,"path":"ft_engineering/pairing.md","ref":"ft_engineering/pairing.md","articles":[{"title":"Polynomial features","level":"3.5.1","depth":2,"path":"ft_engineering/polynomial.md","ref":"ft_engineering/polynomial.md","articles":[]}]},"previous":{"title":"Feature Selection","level":"3.3","depth":1,"path":"ft_engineering/selection.md","ref":"ft [...]
+            gitbook.page.hasChanged({"page":{"title":"Feature Binning","level":"3.4","depth":1,"next":{"title":"Feature Paring","level":"3.5","depth":1,"path":"ft_engineering/pairing.md","ref":"ft_engineering/pairing.md","articles":[{"title":"Polynomial features","level":"3.5.1","depth":2,"path":"ft_engineering/polynomial.md","ref":"ft_engineering/polynomial.md","articles":[]}]},"previous":{"title":"Feature Selection","level":"3.3","depth":1,"path":"ft_engineering/selection.md","ref":"ft [...]
         });
     </script>
 </div>