You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hivemall.apache.org by my...@apache.org on 2016/11/17 12:18:51 UTC
[08/12] incubator-hivemall-site git commit: Fixed links in the user
guide
http://git-wip-us.apache.org/repos/asf/incubator-hivemall-site/blob/68241a08/userguide/getting_started/input-format.html
----------------------------------------------------------------------
diff --git a/userguide/getting_started/input-format.html b/userguide/getting_started/input-format.html
index 8e7e876..d1236c7 100644
--- a/userguide/getting_started/input-format.html
+++ b/userguide/getting_started/input-format.html
@@ -999,6 +999,21 @@
</li>
+ <li class="chapter " data-level="5.6" data-path="../binaryclass/titanic_rf.html">
+
+ <a href="../binaryclass/titanic_rf.html">
+
+
+ <b>5.6.</b>
+
+ Kaggle Titanic Tutorial
+
+ </a>
+
+
+
+ </li>
+
@@ -1678,11 +1693,11 @@ Here, we use <a href="http://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_
</div><!-- tocstop -->
<h1 id="input-format-for-classification">Input Format for Classification</h1>
-<p>The classifiers of Hivemall takes 2 (or 3) arguments: <em>features</em>, <em>label</em>, and <em>options</em> (a.k.a. <a href="http://en.wikipedia.org/wiki/Hyperparameter" target="_blank">hyperparameters</a>). The first two arguments of training functions (e.g., <a href="https://github.com/myui/hivemall/wiki/a9a-binary-classification-(logistic-regression" target="_blank">logress</a>) and <a href="https://github.com/myui/hivemall/wiki/news20-binary-classification-%232-(CW,-AROW,-SCW" target="_blank">train_scw</a>)) represents training examples. </p>
+<p>The classifiers of Hivemall takes 2 (or 3) arguments: <em>features</em>, <em>label</em>, and <em>options</em> (a.k.a. <a href="http://en.wikipedia.org/wiki/Hyperparameter" target="_blank">hyperparameters</a>). The first two arguments of training functions represents training examples. </p>
<p>In Statistics, <em>features</em> and <em>label</em> are called <a href="http://www.oswego.edu/~srp/stats/variable_types.htm" target="_blank">Explanatory variable and Response Variable</a>, respectively.</p>
<h1 id="features-format-for-classification-and-regression">Features format (for classification and regression)</h1>
<p>The format of <em>features</em> is common between (binary and multi-class) classification and regression.
-Hivemall accepts ARRAY<INT|BIGINT|TEXT> for the type of <em>features</em> column.</p>
+Hivemall accepts <code>ARRAY&lt;INT|BIGINT|TEXT></code> for the type of <em>features</em> column.</p>
<p>Hivemall uses a <em>sparse</em> data format (cf. <a href="http://netlib.org/linalg/html_templates/node91.html" target="_blank">Compressed Row Storage</a>) which is similar to <a href="http://stackoverflow.com/questions/12112558/read-write-data-in-libsvm-format" target="_blank">LIBSVM</a> and <a href="https://github.com/JohnLangford/vowpal_wabbit/wiki/Input-format" target="_blank">Vowpal Wabbit</a>.</p>
<p>The format of each feature in an array is as follows:</p>
<pre><code>feature ::= <index>:<weight> or <index>
@@ -1692,7 +1707,7 @@ weight ::= <FLOAT>
</code></pre><p>The <em>index</em> are usually a number (INT or BIGINT) starting from 1.
Here is an instance of a features.</p>
<pre><code>10:3.4 123:0.5 34567:0.231
-</code></pre><p><em>Note:</em> As mentioned later, <em>index</em> "0" is reserved for a <a href="https://github.com/myui/hivemall/wiki/Using-explicit-addBias(" target="_blank">Bias/Dummy variable</a>-for-a-better-prediction).</p>
+</code></pre><p><em>Note:</em> As mentioned later, <em>index</em> "0" is reserved for a <a href="../tips/addbias.html">Bias/Dummy variable</a>.</p>
<p>In addition to numbers, you can use a TEXT value for an index. For example, you can use array("height:1.5", "length:2.0") for the features.</p>
<pre><code>"height:1.5" "length:2.0"
</code></pre><h2 id="quantitative-and-categorical-variables">Quantitative and Categorical variables</h2>
@@ -1708,11 +1723,11 @@ Here is an instance of a features.</p>
</code></pre><p>Note 1.0 is used for the weight when omitting <em>weight</em>. </p>
<h2 id="biasdummy-variable-in-features">Bias/Dummy Variable in features</h2>
<p>Note that "0" is reserved for a Bias variable (called dummy variable in Statistics). </p>
-<p>The <a href="https://github.com/myui/hivemall/wiki/Using-explicit-addBias(" target="_blank">addBias</a>-for-a-better-prediction) function is Hivemall appends "0:1.0" as an element of array in <em>features</em>.</p>
+<p>The <a href="../tips/addbias.html">addBias</a> function is Hivemall appends "0:1.0" as an element of array in <em>features</em>.</p>
<h2 id="feature-hashing">Feature hashing</h2>
-<p>Hivemall supports <a href="http://en.wikipedia.org/wiki/Feature_hashing" target="_blank">feature hashing/hashing trick</a> through <a href="https://github.com/myui/hivemall/wiki/KDDCup-2012-track-2-CTR-prediction-dataset#converting-feature-representation-by-feature-hashing" target="_blank">mhash function</a>.</p>
+<p>Hivemall supports <a href="http://en.wikipedia.org/wiki/Feature_hashing" target="_blank">feature hashing/hashing trick</a> through <a href="../ft_engineering/hashing.html#mhash-function">mhash function</a>.</p>
<p>The mhash function takes a feature (i.e., <em>index</em>) of TEXT format and generates a hash number of a range from 1 to 2^24 (=16777216) by the default setting.</p>
-<p>Feature hashing is useful where the dimension of feature vector (i.e., the number of elements in <em>features</em>) is so large. Consider applying <a href="(https:/github.com/myui/hivemall/wiki/KDDCup-2012-track-2-CTR-prediction-dataset#converting-feature-representation-by-feature-hashing">mhash function</a>) when a prediction model does not fit in memory and OutOfMemory exception happens.</p>
+<p>Feature hashing is useful where the dimension of feature vector (i.e., the number of elements in <em>features</em>) is so large. Consider applying <a href="(../ft_engineering/hashing.html#mhash-function">mhash function</a>) when a prediction model does not fit in memory and OutOfMemory exception happens.</p>
<p>In general, you don't need to use mhash when the dimension of feature vector is less than 16777216.
If feature <em>index</em> is very long TEXT (e.g., "xxxxxxx-yyyyyy-weight:55.3") and uses huge memory spaces, consider using mhash as follows:</p>
<pre><code class="lang-sql"><span class="hljs-comment">-- feature is v0.3.2 or before</span>
@@ -1725,7 +1740,7 @@ feature(mhash(extract_feature("xxxxxxx-yyyyyy-weight:55.3")), extract_
<p>43352:55.3</p>
</blockquote>
<h2 id="feature-normalization">Feature Normalization</h2>
-<p>Feature (weight) normalization is important in machine learning. Please refer <a href="https://github.com/myui/hivemall/wiki/Feature-scaling" target="_blank">https://github.com/myui/hivemall/wiki/Feature-scaling</a> for detail.</p>
+<p>Feature (weight) normalization is important in machine learning. Please refer <a href="../ft_engineering/scaling.html">this article</a> for detail.</p>
<hr>
<h1 id="label-format-in-binary-classification">Label format in Binary Classification</h1>
<p>The <em>label</em> must be an <em>INT</em> typed column and the values are positive (+1) or negative (-1) as follows:</p>
@@ -1815,7 +1830,25 @@ feature(mhash(extract_feature("xxxxxxx-yyyyyy-weight:55.3")), extract_
<span class="hljs-keyword">from</span>
<span class="hljs-keyword">table</span>;
</code></pre>
-<p><div id="page-footer"><hr><p><sub><font color="gray">
+<p><div id="page-footer"><hr><!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+<p><sub><font color="gray">
Apache Hivemall is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator.
</font></sub></p>
</div></p>
@@ -1852,7 +1885,7 @@ Apache Hivemall is an effort undergoing incubation at The Apache Software Founda
<script>
var gitbook = gitbook || [];
gitbook.push(function() {
- gitbook.page.hasChanged({"page":{"title":"Input Format","level":"1.2.3","depth":2,"next":{"title":"Tips for Effective Hivemall","level":"1.3","depth":1,"path":"tips/README.md","ref":"tips/README.md","articles":[{"title":"Explicit addBias() for better prediction","level":"1.3.1","depth":2,"path":"tips/addbias.md","ref":"tips/addbias.md","articles":[]},{"title":"Use rand_amplify() to better prediction results","level":"1.3.2","depth":2,"path":"tips/rand_amplify.md","ref":"tips/rand_amplify.md","articles":[]},{"title":"Real-time Prediction on RDBMS","level":"1.3.3","depth":2,"path":"tips/rt_prediction.md","ref":"tips/rt_prediction.md","articles":[]},{"title":"Ensemble learning for stable prediction","level":"1.3.4","depth":2,"path":"tips/ensemble_learning.md","ref":"tips/ensemble_learning.md","articles":[]},{"title":"Mixing models for a better prediction convergence (MIX server)","level":"1.3.5","depth":2,"path":"tips/mixserver.md","ref":"tips/mixserver.md","articles":[]},{
"title":"Run Hivemall on Amazon Elastic MapReduce","level":"1.3.6","depth":2,"path":"tips/emr.md","ref":"tips/emr.md","articles":[]}]},"previous":{"title":"Install as permanent functions","level":"1.2.2","depth":2,"path":"getting_started/permanent-functions.md","ref":"getting_started/permanent-functions.md","articles":[]},"dir":"ltr"},"config":{"plugins":["theme-api","edit-link","github","splitter","sitemap","etoc","callouts","toggle-chapters","anchorjs","codeblock-filename","expandable-chapters","multipart","codeblock-filename","katex","emphasize","localized-footer"],"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"pluginsConfig":{"emphasize":{},"callouts":{},"etoc":{"maxdepth":3,"mindepth":1,"notoc":true},"github":{"url":"https://github.com/apache/incubator-hivemall/"},"splitter":{},"search":{},"downloadpdf":{"base":"https://github.com/apache/incubator-hivemall/
docs/gitbook","label":"PDF","multilingual":false},"multipart":{},"localized-footer":{"filename":"FOOTER.md"},"lunr":{"maxIndexSize":1000000,"ignoreSpecialCharacters":false},"katex":{},"fontsettings":{"theme":"white","family":"sans","size":2,"font":"sans"},"highlight":{},"codeblock-filename":{},"sitemap":{"hostname":"http://hivemall.incubator.apache.org/"},"theme-api":{"languages":[],"split":false,"theme":"dark"},"sharing":{"facebook":true,"twitter":true,"google":false,"weibo":false,"instapaper":false,"vk":false,"all":["facebook","google","twitter","weibo","instapaper"]},"edit-link":{"label":"Edit","base":"https://github.com/apache/incubator-hivemall/docs/gitbook"},"theme-default":{"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"showLevel":true},"anchorjs":{"selector":"h1,h2,h3,*:not(.callout) > h4,h5"},"toggle-chapters":{},"expandable-chapters":{}},"theme":"defau
lt","pdf":{"pageNumbers":true,"fontSize":12,"fontFamily":"Arial","paperSize":"a4","chapterMark":"pagebreak","pageBreaksBefore":"/","margin":{"right":62,"left":62,"top":56,"bottom":56}},"structure":{"langs":"LANGS.md","readme":"README.md","glossary":"GLOSSARY.md","summary":"SUMMARY.md"},"variables":{},"title":"Hivemall User Manual","links":{"sidebar":{"<i class=\"fa fa-home\"></i> Home":"http://hivemall.incubator.apache.org/"}},"gitbook":"3.x.x","description":"User Manual for Apache Hivemall"},"file":{"path":"getting_started/input-format.md","mtime":"2016-11-12T07:18:00.000Z","type":"markdown"},"gitbook":{"version":"3.2.2","time":"2016-11-14T10:40:22.987Z"},"basePath":"..","book":{"language":""}});
+ gitbook.page.hasChanged({"page":{"title":"Input Format","level":"1.2.3","depth":2,"next":{"title":"Tips for Effective Hivemall","level":"1.3","depth":1,"path":"tips/README.md","ref":"tips/README.md","articles":[{"title":"Explicit addBias() for better prediction","level":"1.3.1","depth":2,"path":"tips/addbias.md","ref":"tips/addbias.md","articles":[]},{"title":"Use rand_amplify() to better prediction results","level":"1.3.2","depth":2,"path":"tips/rand_amplify.md","ref":"tips/rand_amplify.md","articles":[]},{"title":"Real-time Prediction on RDBMS","level":"1.3.3","depth":2,"path":"tips/rt_prediction.md","ref":"tips/rt_prediction.md","articles":[]},{"title":"Ensemble learning for stable prediction","level":"1.3.4","depth":2,"path":"tips/ensemble_learning.md","ref":"tips/ensemble_learning.md","articles":[]},{"title":"Mixing models for a better prediction convergence (MIX server)","level":"1.3.5","depth":2,"path":"tips/mixserver.md","ref":"tips/mixserver.md","articles":[]},{
"title":"Run Hivemall on Amazon Elastic MapReduce","level":"1.3.6","depth":2,"path":"tips/emr.md","ref":"tips/emr.md","articles":[]}]},"previous":{"title":"Install as permanent functions","level":"1.2.2","depth":2,"path":"getting_started/permanent-functions.md","ref":"getting_started/permanent-functions.md","articles":[]},"dir":"ltr"},"config":{"plugins":["theme-api","edit-link","github","splitter","sitemap","etoc","callouts","toggle-chapters","anchorjs","codeblock-filename","expandable-chapters","multipart","codeblock-filename","katex","emphasize","localized-footer"],"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"pluginsConfig":{"emphasize":{},"callouts":{},"etoc":{"maxdepth":3,"mindepth":1,"notoc":true},"github":{"url":"https://github.com/apache/incubator-hivemall/"},"splitter":{},"search":{},"downloadpdf":{"base":"https://github.com/apache/incubator-hivemall/
docs/gitbook","label":"PDF","multilingual":false},"multipart":{},"localized-footer":{"filename":"FOOTER.md"},"lunr":{"maxIndexSize":1000000,"ignoreSpecialCharacters":false},"katex":{},"fontsettings":{"theme":"white","family":"sans","size":2,"font":"sans"},"highlight":{},"codeblock-filename":{},"sitemap":{"hostname":"http://hivemall.incubator.apache.org/"},"theme-api":{"languages":[],"split":false,"theme":"dark"},"sharing":{"facebook":true,"twitter":true,"google":false,"weibo":false,"instapaper":false,"vk":false,"all":["facebook","google","twitter","weibo","instapaper"]},"edit-link":{"label":"Edit","base":"https://github.com/apache/incubator-hivemall/docs/gitbook"},"theme-default":{"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"showLevel":true},"anchorjs":{"selector":"h1,h2,h3,*:not(.callout) > h4,h5"},"toggle-chapters":{},"expandable-chapters":{}},"theme":"defau
lt","pdf":{"pageNumbers":true,"fontSize":12,"fontFamily":"Arial","paperSize":"a4","chapterMark":"pagebreak","pageBreaksBefore":"/","margin":{"right":62,"left":62,"top":56,"bottom":56}},"structure":{"langs":"LANGS.md","readme":"README.md","glossary":"GLOSSARY.md","summary":"SUMMARY.md"},"variables":{},"title":"Hivemall User Manual","links":{"sidebar":{"<i class=\"fa fa-home\"></i> Home":"http://hivemall.incubator.apache.org/"}},"gitbook":"3.x.x","description":"User Manual for Apache Hivemall"},"file":{"path":"getting_started/input-format.md","mtime":"2016-11-17T10:42:51.000Z","type":"markdown"},"gitbook":{"version":"3.2.2","time":"2016-11-17T12:16:14.647Z"},"basePath":"..","book":{"language":""}});
});
</script>
</div>
http://git-wip-us.apache.org/repos/asf/incubator-hivemall-site/blob/68241a08/userguide/getting_started/installation.html
----------------------------------------------------------------------
diff --git a/userguide/getting_started/installation.html b/userguide/getting_started/installation.html
index f223bf0..6908e93 100644
--- a/userguide/getting_started/installation.html
+++ b/userguide/getting_started/installation.html
@@ -999,6 +999,21 @@
</li>
+ <li class="chapter " data-level="5.6" data-path="../binaryclass/titanic_rf.html">
+
+ <a href="../binaryclass/titanic_rf.html">
+
+
+ <b>5.6.</b>
+
+ Kaggle Titanic Tutorial
+
+ </a>
+
+
+
+ </li>
+
@@ -1664,7 +1679,25 @@ source /home/myui/tmp/define-all.hive;
<pre><code>$ hive
add jar /tmp/hivemall-core-xxx-with-dependencies.jar;
source /tmp/define-all.hive;
-</code></pre><p><div id="page-footer"><hr><p><sub><font color="gray">
+</code></pre><p><div id="page-footer"><hr><!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+<p><sub><font color="gray">
Apache Hivemall is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator.
</font></sub></p>
</div></p>
@@ -1701,7 +1734,7 @@ Apache Hivemall is an effort undergoing incubation at The Apache Software Founda
<script>
var gitbook = gitbook || [];
gitbook.push(function() {
- gitbook.page.hasChanged({"page":{"title":"Installation","level":"1.2.1","depth":2,"next":{"title":"Install as permanent functions","level":"1.2.2","depth":2,"path":"getting_started/permanent-functions.md","ref":"getting_started/permanent-functions.md","articles":[]},"previous":{"title":"Getting Started","level":"1.2","depth":1,"path":"getting_started/README.md","ref":"getting_started/README.md","articles":[{"title":"Installation","level":"1.2.1","depth":2,"path":"getting_started/installation.md","ref":"getting_started/installation.md","articles":[]},{"title":"Install as permanent functions","level":"1.2.2","depth":2,"path":"getting_started/permanent-functions.md","ref":"getting_started/permanent-functions.md","articles":[]},{"title":"Input Format","level":"1.2.3","depth":2,"path":"getting_started/input-format.md","ref":"getting_started/input-format.md","articles":[]}]},"dir":"ltr"},"config":{"plugins":["theme-api","edit-link","github","splitter","sitemap","etoc","callout
s","toggle-chapters","anchorjs","codeblock-filename","expandable-chapters","multipart","codeblock-filename","katex","emphasize","localized-footer"],"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"pluginsConfig":{"emphasize":{},"callouts":{},"etoc":{"maxdepth":3,"mindepth":1,"notoc":true},"github":{"url":"https://github.com/apache/incubator-hivemall/"},"splitter":{},"search":{},"downloadpdf":{"base":"https://github.com/apache/incubator-hivemall/docs/gitbook","label":"PDF","multilingual":false},"multipart":{},"localized-footer":{"filename":"FOOTER.md"},"lunr":{"maxIndexSize":1000000,"ignoreSpecialCharacters":false},"katex":{},"fontsettings":{"theme":"white","family":"sans","size":2,"font":"sans"},"highlight":{},"codeblock-filename":{},"sitemap":{"hostname":"http://hivemall.incubator.apache.org/"},"theme-api":{"languages":[],"split":false,"theme":"dark"},"sharing":{
"facebook":true,"twitter":true,"google":false,"weibo":false,"instapaper":false,"vk":false,"all":["facebook","google","twitter","weibo","instapaper"]},"edit-link":{"label":"Edit","base":"https://github.com/apache/incubator-hivemall/docs/gitbook"},"theme-default":{"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"showLevel":true},"anchorjs":{"selector":"h1,h2,h3,*:not(.callout) > h4,h5"},"toggle-chapters":{},"expandable-chapters":{}},"theme":"default","pdf":{"pageNumbers":true,"fontSize":12,"fontFamily":"Arial","paperSize":"a4","chapterMark":"pagebreak","pageBreaksBefore":"/","margin":{"right":62,"left":62,"top":56,"bottom":56}},"structure":{"langs":"LANGS.md","readme":"README.md","glossary":"GLOSSARY.md","summary":"SUMMARY.md"},"variables":{},"title":"Hivemall User Manual","links":{"sidebar":{"<i class=\"fa fa-home\"></i> Home":"http://hivemall.incubator.apache.org/
"}},"gitbook":"3.x.x","description":"User Manual for Apache Hivemall"},"file":{"path":"getting_started/installation.md","mtime":"2016-11-12T07:18:00.000Z","type":"markdown"},"gitbook":{"version":"3.2.2","time":"2016-11-14T10:40:22.987Z"},"basePath":"..","book":{"language":""}});
+ gitbook.page.hasChanged({"page":{"title":"Installation","level":"1.2.1","depth":2,"next":{"title":"Install as permanent functions","level":"1.2.2","depth":2,"path":"getting_started/permanent-functions.md","ref":"getting_started/permanent-functions.md","articles":[]},"previous":{"title":"Getting Started","level":"1.2","depth":1,"path":"getting_started/README.md","ref":"getting_started/README.md","articles":[{"title":"Installation","level":"1.2.1","depth":2,"path":"getting_started/installation.md","ref":"getting_started/installation.md","articles":[]},{"title":"Install as permanent functions","level":"1.2.2","depth":2,"path":"getting_started/permanent-functions.md","ref":"getting_started/permanent-functions.md","articles":[]},{"title":"Input Format","level":"1.2.3","depth":2,"path":"getting_started/input-format.md","ref":"getting_started/input-format.md","articles":[]}]},"dir":"ltr"},"config":{"plugins":["theme-api","edit-link","github","splitter","sitemap","etoc","callout
s","toggle-chapters","anchorjs","codeblock-filename","expandable-chapters","multipart","codeblock-filename","katex","emphasize","localized-footer"],"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"pluginsConfig":{"emphasize":{},"callouts":{},"etoc":{"maxdepth":3,"mindepth":1,"notoc":true},"github":{"url":"https://github.com/apache/incubator-hivemall/"},"splitter":{},"search":{},"downloadpdf":{"base":"https://github.com/apache/incubator-hivemall/docs/gitbook","label":"PDF","multilingual":false},"multipart":{},"localized-footer":{"filename":"FOOTER.md"},"lunr":{"maxIndexSize":1000000,"ignoreSpecialCharacters":false},"katex":{},"fontsettings":{"theme":"white","family":"sans","size":2,"font":"sans"},"highlight":{},"codeblock-filename":{},"sitemap":{"hostname":"http://hivemall.incubator.apache.org/"},"theme-api":{"languages":[],"split":false,"theme":"dark"},"sharing":{
"facebook":true,"twitter":true,"google":false,"weibo":false,"instapaper":false,"vk":false,"all":["facebook","google","twitter","weibo","instapaper"]},"edit-link":{"label":"Edit","base":"https://github.com/apache/incubator-hivemall/docs/gitbook"},"theme-default":{"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"showLevel":true},"anchorjs":{"selector":"h1,h2,h3,*:not(.callout) > h4,h5"},"toggle-chapters":{},"expandable-chapters":{}},"theme":"default","pdf":{"pageNumbers":true,"fontSize":12,"fontFamily":"Arial","paperSize":"a4","chapterMark":"pagebreak","pageBreaksBefore":"/","margin":{"right":62,"left":62,"top":56,"bottom":56}},"structure":{"langs":"LANGS.md","readme":"README.md","glossary":"GLOSSARY.md","summary":"SUMMARY.md"},"variables":{},"title":"Hivemall User Manual","links":{"sidebar":{"<i class=\"fa fa-home\"></i> Home":"http://hivemall.incubator.apache.org/
"}},"gitbook":"3.x.x","description":"User Manual for Apache Hivemall"},"file":{"path":"getting_started/installation.md","mtime":"2016-11-16T08:39:12.000Z","type":"markdown"},"gitbook":{"version":"3.2.2","time":"2016-11-17T12:16:14.647Z"},"basePath":"..","book":{"language":""}});
});
</script>
</div>
http://git-wip-us.apache.org/repos/asf/incubator-hivemall-site/blob/68241a08/userguide/getting_started/permanent-functions.html
----------------------------------------------------------------------
diff --git a/userguide/getting_started/permanent-functions.html b/userguide/getting_started/permanent-functions.html
index e99496d..c55834f 100644
--- a/userguide/getting_started/permanent-functions.html
+++ b/userguide/getting_started/permanent-functions.html
@@ -999,6 +999,21 @@
</li>
+ <li class="chapter " data-level="5.6" data-path="../binaryclass/titanic_rf.html">
+
+ <a href="../binaryclass/titanic_rf.html">
+
+
+ <b>5.6.</b>
+
+ Kaggle Titanic Tutorial
+
+ </a>
+
+
+
+ </li>
+
@@ -1651,7 +1666,6 @@
-->
<p>Hive v0.13 or later supports <a href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/DropFunction" target="_blank">permanent functions</a> that live across sessions.</p>
<p>Permanent functions are useful when you are using Hive through Hiveserver or to avoid hivemall installation for each session.</p>
-<p><em>Note: This feature is supported since hivemall-0.3 beta 3 or later.</em></p>
<!-- toc --><div id="toc" class="toc">
<ul>
@@ -1683,7 +1697,25 @@ source /tmp/define-all-as-permanent.hive;
> hivemall.adagrad
</code></pre>
<div class="panel panel-warning"><div class="panel-heading"><h3 class="panel-title" id="caution"><i class="fa fa-exclamation-triangle"></i> Caution</h3></div><div class="panel-body"><p>You need to specify "hivemall." prefix to call hivemall UDFs in your queries if UDFs are loaded into non-default scheme, in this case <em>hivemall</em>.</p></div></div>
-<p><div id="page-footer"><hr><p><sub><font color="gray">
+<p><div id="page-footer"><hr><!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+<p><sub><font color="gray">
Apache Hivemall is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator.
</font></sub></p>
</div></p>
@@ -1720,7 +1752,7 @@ Apache Hivemall is an effort undergoing incubation at The Apache Software Founda
<script>
var gitbook = gitbook || [];
gitbook.push(function() {
- gitbook.page.hasChanged({"page":{"title":"Install as permanent functions","level":"1.2.2","depth":2,"next":{"title":"Input Format","level":"1.2.3","depth":2,"path":"getting_started/input-format.md","ref":"getting_started/input-format.md","articles":[]},"previous":{"title":"Installation","level":"1.2.1","depth":2,"path":"getting_started/installation.md","ref":"getting_started/installation.md","articles":[]},"dir":"ltr"},"config":{"plugins":["theme-api","edit-link","github","splitter","sitemap","etoc","callouts","toggle-chapters","anchorjs","codeblock-filename","expandable-chapters","multipart","codeblock-filename","katex","emphasize","localized-footer"],"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"pluginsConfig":{"emphasize":{},"callouts":{},"etoc":{"maxdepth":3,"mindepth":1,"notoc":true},"github":{"url":"https://github.com/apache/incubator-hivemall
/"},"splitter":{},"search":{},"downloadpdf":{"base":"https://github.com/apache/incubator-hivemall/docs/gitbook","label":"PDF","multilingual":false},"multipart":{},"localized-footer":{"filename":"FOOTER.md"},"lunr":{"maxIndexSize":1000000,"ignoreSpecialCharacters":false},"katex":{},"fontsettings":{"theme":"white","family":"sans","size":2,"font":"sans"},"highlight":{},"codeblock-filename":{},"sitemap":{"hostname":"http://hivemall.incubator.apache.org/"},"theme-api":{"languages":[],"split":false,"theme":"dark"},"sharing":{"facebook":true,"twitter":true,"google":false,"weibo":false,"instapaper":false,"vk":false,"all":["facebook","google","twitter","weibo","instapaper"]},"edit-link":{"label":"Edit","base":"https://github.com/apache/incubator-hivemall/docs/gitbook"},"theme-default":{"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"showLevel":true},"anchorjs":{"selector"
:"h1,h2,h3,*:not(.callout) > h4,h5"},"toggle-chapters":{},"expandable-chapters":{}},"theme":"default","pdf":{"pageNumbers":true,"fontSize":12,"fontFamily":"Arial","paperSize":"a4","chapterMark":"pagebreak","pageBreaksBefore":"/","margin":{"right":62,"left":62,"top":56,"bottom":56}},"structure":{"langs":"LANGS.md","readme":"README.md","glossary":"GLOSSARY.md","summary":"SUMMARY.md"},"variables":{},"title":"Hivemall User Manual","links":{"sidebar":{"<i class=\"fa fa-home\"></i> Home":"http://hivemall.incubator.apache.org/"}},"gitbook":"3.x.x","description":"User Manual for Apache Hivemall"},"file":{"path":"getting_started/permanent-functions.md","mtime":"2016-11-14T11:10:11.000Z","type":"markdown"},"gitbook":{"version":"3.2.2","time":"2016-11-14T11:11:31.970Z"},"basePath":"..","book":{"language":""}});
+ gitbook.page.hasChanged({"page":{"title":"Install as permanent functions","level":"1.2.2","depth":2,"next":{"title":"Input Format","level":"1.2.3","depth":2,"path":"getting_started/input-format.md","ref":"getting_started/input-format.md","articles":[]},"previous":{"title":"Installation","level":"1.2.1","depth":2,"path":"getting_started/installation.md","ref":"getting_started/installation.md","articles":[]},"dir":"ltr"},"config":{"plugins":["theme-api","edit-link","github","splitter","sitemap","etoc","callouts","toggle-chapters","anchorjs","codeblock-filename","expandable-chapters","multipart","codeblock-filename","katex","emphasize","localized-footer"],"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"pluginsConfig":{"emphasize":{},"callouts":{},"etoc":{"maxdepth":3,"mindepth":1,"notoc":true},"github":{"url":"https://github.com/apache/incubator-hivemall
/"},"splitter":{},"search":{},"downloadpdf":{"base":"https://github.com/apache/incubator-hivemall/docs/gitbook","label":"PDF","multilingual":false},"multipart":{},"localized-footer":{"filename":"FOOTER.md"},"lunr":{"maxIndexSize":1000000,"ignoreSpecialCharacters":false},"katex":{},"fontsettings":{"theme":"white","family":"sans","size":2,"font":"sans"},"highlight":{},"codeblock-filename":{},"sitemap":{"hostname":"http://hivemall.incubator.apache.org/"},"theme-api":{"languages":[],"split":false,"theme":"dark"},"sharing":{"facebook":true,"twitter":true,"google":false,"weibo":false,"instapaper":false,"vk":false,"all":["facebook","google","twitter","weibo","instapaper"]},"edit-link":{"label":"Edit","base":"https://github.com/apache/incubator-hivemall/docs/gitbook"},"theme-default":{"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"showLevel":true},"anchorjs":{"selector"
:"h1,h2,h3,*:not(.callout) > h4,h5"},"toggle-chapters":{},"expandable-chapters":{}},"theme":"default","pdf":{"pageNumbers":true,"fontSize":12,"fontFamily":"Arial","paperSize":"a4","chapterMark":"pagebreak","pageBreaksBefore":"/","margin":{"right":62,"left":62,"top":56,"bottom":56}},"structure":{"langs":"LANGS.md","readme":"README.md","glossary":"GLOSSARY.md","summary":"SUMMARY.md"},"variables":{},"title":"Hivemall User Manual","links":{"sidebar":{"<i class=\"fa fa-home\"></i> Home":"http://hivemall.incubator.apache.org/"}},"gitbook":"3.x.x","description":"User Manual for Apache Hivemall"},"file":{"path":"getting_started/permanent-functions.md","mtime":"2016-11-17T09:55:29.000Z","type":"markdown"},"gitbook":{"version":"3.2.2","time":"2016-11-17T12:16:14.647Z"},"basePath":"..","book":{"language":""}});
});
</script>
</div>
http://git-wip-us.apache.org/repos/asf/incubator-hivemall-site/blob/68241a08/userguide/index.html
----------------------------------------------------------------------
diff --git a/userguide/index.html b/userguide/index.html
index a2c73fd..d25c46e 100644
--- a/userguide/index.html
+++ b/userguide/index.html
@@ -997,6 +997,21 @@
</li>
+ <li class="chapter " data-level="5.6" data-path="binaryclass/titanic_rf.html">
+
+ <a href="binaryclass/titanic_rf.html">
+
+
+ <b>5.6.</b>
+
+ Kaggle Titanic Tutorial
+
+ </a>
+
+
+
+ </li>
+
@@ -1660,7 +1675,25 @@ Apache Hivemall is a collection of machine learning algorithms and versatile dat
Thus, it can be considered as a cross platform library for machine learning; prediction models built by a batch query of Apache Hive can be used on Apache Spark/Pig, and conversely, prediction models build by Apache Spark can be used from Apache Hive/Pig.</p>
<div style="text-align:center"><img src="resources/images/techstack.png" width="80%" height="80%"></div>
-<p><div id="page-footer"><hr><p><sub><font color="gray">
+<p><div id="page-footer"><hr><!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+<p><sub><font color="gray">
Apache Hivemall is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator.
</font></sub></p>
</div></p>
@@ -1697,7 +1730,7 @@ Apache Hivemall is an effort undergoing incubation at The Apache Software Founda
<script>
var gitbook = gitbook || [];
gitbook.push(function() {
- gitbook.page.hasChanged({"page":{"title":"Introduction","level":"1.1","depth":1,"next":{"title":"Getting Started","level":"1.2","depth":1,"path":"getting_started/README.md","ref":"getting_started/README.md","articles":[{"title":"Installation","level":"1.2.1","depth":2,"path":"getting_started/installation.md","ref":"getting_started/installation.md","articles":[]},{"title":"Install as permanent functions","level":"1.2.2","depth":2,"path":"getting_started/permanent-functions.md","ref":"getting_started/permanent-functions.md","articles":[]},{"title":"Input Format","level":"1.2.3","depth":2,"path":"getting_started/input-format.md","ref":"getting_started/input-format.md","articles":[]}]},"dir":"ltr"},"config":{"plugins":["theme-api","edit-link","github","splitter","sitemap","etoc","callouts","toggle-chapters","anchorjs","codeblock-filename","expandable-chapters","multipart","codeblock-filename","katex","emphasize","localized-footer"],"styles":{"website":"styles/website.css","p
df":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"pluginsConfig":{"emphasize":{},"callouts":{},"etoc":{"maxdepth":3,"mindepth":1,"notoc":true},"github":{"url":"https://github.com/apache/incubator-hivemall/"},"splitter":{},"search":{},"downloadpdf":{"base":"https://github.com/apache/incubator-hivemall/docs/gitbook","label":"PDF","multilingual":false},"multipart":{},"localized-footer":{"filename":"FOOTER.md"},"lunr":{"maxIndexSize":1000000,"ignoreSpecialCharacters":false},"katex":{},"fontsettings":{"theme":"white","family":"sans","size":2,"font":"sans"},"highlight":{},"codeblock-filename":{},"sitemap":{"hostname":"http://hivemall.incubator.apache.org/"},"theme-api":{"languages":[],"split":false,"theme":"dark"},"sharing":{"facebook":true,"twitter":true,"google":false,"weibo":false,"instapaper":false,"vk":false,"all":["facebook","google","twitter","weibo","instapaper"]},"edit-link":{"label":"Edit","base":"https
://github.com/apache/incubator-hivemall/docs/gitbook"},"theme-default":{"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"showLevel":true},"anchorjs":{"selector":"h1,h2,h3,*:not(.callout) > h4,h5"},"toggle-chapters":{},"expandable-chapters":{}},"theme":"default","pdf":{"pageNumbers":true,"fontSize":12,"fontFamily":"Arial","paperSize":"a4","chapterMark":"pagebreak","pageBreaksBefore":"/","margin":{"right":62,"left":62,"top":56,"bottom":56}},"structure":{"langs":"LANGS.md","readme":"README.md","glossary":"GLOSSARY.md","summary":"SUMMARY.md"},"variables":{},"title":"Hivemall User Manual","links":{"sidebar":{"<i class=\"fa fa-home\"></i> Home":"http://hivemall.incubator.apache.org/"}},"gitbook":"3.x.x","description":"User Manual for Apache Hivemall"},"file":{"path":"README.md","mtime":"2016-11-14T10:15:30.000Z","type":"markdown"},"gitbook":{"version":"3.2.2","time":"20
16-11-14T10:40:22.987Z"},"basePath":".","book":{"language":""}});
+ gitbook.page.hasChanged({"page":{"title":"Introduction","level":"1.1","depth":1,"next":{"title":"Getting Started","level":"1.2","depth":1,"path":"getting_started/README.md","ref":"getting_started/README.md","articles":[{"title":"Installation","level":"1.2.1","depth":2,"path":"getting_started/installation.md","ref":"getting_started/installation.md","articles":[]},{"title":"Install as permanent functions","level":"1.2.2","depth":2,"path":"getting_started/permanent-functions.md","ref":"getting_started/permanent-functions.md","articles":[]},{"title":"Input Format","level":"1.2.3","depth":2,"path":"getting_started/input-format.md","ref":"getting_started/input-format.md","articles":[]}]},"dir":"ltr"},"config":{"plugins":["theme-api","edit-link","github","splitter","sitemap","etoc","callouts","toggle-chapters","anchorjs","codeblock-filename","expandable-chapters","multipart","codeblock-filename","katex","emphasize","localized-footer"],"styles":{"website":"styles/website.css","p
df":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"pluginsConfig":{"emphasize":{},"callouts":{},"etoc":{"maxdepth":3,"mindepth":1,"notoc":true},"github":{"url":"https://github.com/apache/incubator-hivemall/"},"splitter":{},"search":{},"downloadpdf":{"base":"https://github.com/apache/incubator-hivemall/docs/gitbook","label":"PDF","multilingual":false},"multipart":{},"localized-footer":{"filename":"FOOTER.md"},"lunr":{"maxIndexSize":1000000,"ignoreSpecialCharacters":false},"katex":{},"fontsettings":{"theme":"white","family":"sans","size":2,"font":"sans"},"highlight":{},"codeblock-filename":{},"sitemap":{"hostname":"http://hivemall.incubator.apache.org/"},"theme-api":{"languages":[],"split":false,"theme":"dark"},"sharing":{"facebook":true,"twitter":true,"google":false,"weibo":false,"instapaper":false,"vk":false,"all":["facebook","google","twitter","weibo","instapaper"]},"edit-link":{"label":"Edit","base":"https
://github.com/apache/incubator-hivemall/docs/gitbook"},"theme-default":{"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"showLevel":true},"anchorjs":{"selector":"h1,h2,h3,*:not(.callout) > h4,h5"},"toggle-chapters":{},"expandable-chapters":{}},"theme":"default","pdf":{"pageNumbers":true,"fontSize":12,"fontFamily":"Arial","paperSize":"a4","chapterMark":"pagebreak","pageBreaksBefore":"/","margin":{"right":62,"left":62,"top":56,"bottom":56}},"structure":{"langs":"LANGS.md","readme":"README.md","glossary":"GLOSSARY.md","summary":"SUMMARY.md"},"variables":{},"title":"Hivemall User Manual","links":{"sidebar":{"<i class=\"fa fa-home\"></i> Home":"http://hivemall.incubator.apache.org/"}},"gitbook":"3.x.x","description":"User Manual for Apache Hivemall"},"file":{"path":"README.md","mtime":"2016-11-16T08:39:12.000Z","type":"markdown"},"gitbook":{"version":"3.2.2","time":"20
16-11-17T12:16:14.647Z"},"basePath":".","book":{"language":""}});
});
</script>
</div>
http://git-wip-us.apache.org/repos/asf/incubator-hivemall-site/blob/68241a08/userguide/misc/generic_funcs.html
----------------------------------------------------------------------
diff --git a/userguide/misc/generic_funcs.html b/userguide/misc/generic_funcs.html
index eec951a..f5edcc2 100644
--- a/userguide/misc/generic_funcs.html
+++ b/userguide/misc/generic_funcs.html
@@ -999,6 +999,21 @@
</li>
+ <li class="chapter " data-level="5.6" data-path="../binaryclass/titanic_rf.html">
+
+ <a href="../binaryclass/titanic_rf.html">
+
+
+ <b>5.6.</b>
+
+ Kaggle Titanic Tutorial
+
+ </a>
+
+
+
+ </li>
+
@@ -1650,53 +1665,74 @@
under the License.
-->
<p>This page describes a list of useful Hivemall generic functions.</p>
+<!-- toc --><div id="toc" class="toc">
+
+<ul>
+<li><a href="#array-functions">Array functions</a><ul>
+<li><a href="#array-udfs">Array UDFs</a></li>
+<li><a href="#array-udafs">Array UDAFs</a></li>
+</ul>
+</li>
+<li><a href="#bitset-functions">Bitset functions</a><ul>
+<li><a href="#bitset-udf">Bitset UDF</a></li>
+<li><a href="#bitset-udaf">Bitset UDAF</a></li>
+</ul>
+</li>
+<li><a href="#compression-functions">Compression functions</a></li>
+<li><a href="#map-functions">Map functions</a><ul>
+<li><a href="#map-udfs">Map UDFs</a></li>
+<li><a href="#map-udafs">MAP UDAFs</a></li>
+</ul>
+</li>
+<li><a href="#mapreduce-functions">MapReduce functions</a></li>
+<li><a href="#math-functions">Math functions</a></li>
+<li><a href="#text-processing-functions">Text processing functions</a></li>
+<li><a href="#other-functions">Other functions</a></li>
+</ul>
+
+</div><!-- tocstop -->
<h1 id="array-functions">Array functions</h1>
<h2 id="array-udfs">Array UDFs</h2>
<ul>
-<li><code>array_concat(array<ANY> x1, array<ANY> x2, ..)</code> - Returns a concatenated array</li>
-</ul>
-<pre><code class="lang-sql">select array_concat(array(1),array(2,3));
-> [1,2,3]
+<li><p><code>array_concat(array<ANY> x1, array<ANY> x2, ..)</code> - Returns a concatenated array</p>
+<pre><code class="lang-sql"> select array_concat(array(1),array(2,3));
+ > [1,2,3]
</code></pre>
-<ul>
-<li><code>array_intersect(array<ANY> x1, array<ANY> x2, ..)</code> - Returns an intersect of given arrays</li>
-</ul>
-<pre><code class="lang-sql">select array_intersect(array(1,3,4),array(2,3,4),array(3,5));
-> [3]
+</li>
+<li><p><code>array_intersect(array<ANY> x1, array<ANY> x2, ..)</code> - Returns an intersect of given arrays</p>
+<pre><code class="lang-sql"> select array_intersect(array(1,3,4),array(2,3,4),array(3,5));
+ > [3]
</code></pre>
-<ul>
-<li><code>array_remove(array<int|text> original, int|text|array<int> target)</code> - Returns an array that the target is removed from the original array</li>
-</ul>
-<pre><code class="lang-sql">select array_remove(array(1,null,3),array(null));
-> [3]
+</li>
+<li><p><code>array_remove(array<int|text> original, int|text|array<int> target)</code> - Returns an array that the target is removed from the original array</p>
+<pre><code class="lang-sql"> select array_remove(array(1,null,3),array(null));
+ > [3]
-select array_remove(array("aaa","bbb"),"bbb");
-> ["aaa"]
+ select array_remove(array("aaa","bbb"),"bbb");
+ > ["aaa"]
</code></pre>
-<ul>
-<li><code>sort_and_uniq_array(array<int>)</code> - Takes an array of type int and returns a sorted array in a natural order with duplicate elements eliminated</li>
-</ul>
-<pre><code class="lang-sql">select sort_and_uniq_array(array(3,1,1,-2,10));
-> [-2,1,3,10]
+</li>
+<li><p><code>sort_and_uniq_array(array<int>)</code> - Takes an array of type INT and returns a sorted array in a natural order with duplicate elements eliminated</p>
+<pre><code class="lang-sql"> select sort_and_uniq_array(array(3,1,1,-2,10));
+ > [-2,1,3,10]
</code></pre>
-<ul>
-<li><code>subarray_endwith(array<int|text> original, int|text key)</code> - Returns an array that ends with the specified key</li>
-</ul>
-<pre><code class="lang-sql">select subarray_endwith(array(1,2,3,4), 3);
-> [1,2,3]
+</li>
+<li><p><code>subarray_endwith(array<int|text> original, int|text key)</code> - Returns an array that ends with the specified key</p>
+<pre><code class="lang-sql"> select subarray_endwith(array(1,2,3,4), 3);
+ > [1,2,3]
</code></pre>
-<ul>
-<li><code>subarray_startwith(array<int|text> original, int|text key)</code> - Returns an array that starts with the specified key</li>
-</ul>
-<pre><code class="lang-sql">select subarray_startwith(array(1,2,3,4), 2);
-> [2,3,4]
+</li>
+<li><p><code>subarray_startwith(array<int|text> original, int|text key)</code> - Returns an array that starts with the specified key</p>
+<pre><code class="lang-sql"> select subarray_startwith(array(1,2,3,4), 2);
+ > [2,3,4]
</code></pre>
-<ul>
-<li><code>subarray(array<int> orignal, int fromIndex, int toIndex)</code> - Returns a slice of the original array between the inclusive fromIndex and the exclusive toIndex</li>
-</ul>
-<pre><code class="lang-sql">select subarray(array(1,2,3,4,5,6), 2,4);
-> [3,4]
+</li>
+<li><p><code>subarray(array<int> orignal, int fromIndex, int toIndex)</code> - Returns a slice of the original array between the inclusive <code>fromIndex</code> and the exclusive <code>toIndex</code></p>
+<pre><code class="lang-sql"> select subarray(array(1,2,3,4,5,6), 2,4);
+ > [3,4]
</code></pre>
+</li>
+</ul>
<h2 id="array-udafs">Array UDAFs</h2>
<ul>
<li><p><code>array_avg(array<NUMBER>)</code> - Returns an array<double> in which each element is the mean of a set of numbers</double></p>
@@ -1707,41 +1743,40 @@ select array_remove(array("aaa","bbb"),"bbb");
<h1 id="bitset-functions">Bitset functions</h1>
<h2 id="bitset-udf">Bitset UDF</h2>
<ul>
-<li><code>to_bits(int[] indexes)</code> - Returns an bitset representation if the given indexes in long[]</li>
-</ul>
-<pre><code class="lang-sql">select to_bits(array(1,2,3,128));
->[14,-9223372036854775808]
+<li><p><code>to_bits(int[] indexes)</code> - Returns an bitset representation if the given indexes in long[]</p>
+<pre><code class="lang-sql"> select to_bits(array(1,2,3,128));
+ >[14,-9223372036854775808]
</code></pre>
-<ul>
-<li><code>unbits(long[] bitset)</code> - Returns an long array of the give bitset representation</li>
-</ul>
-<pre><code class="lang-sql">select unbits(to_bits(array(1,4,2,3)));
-> [1,2,3,4]
+</li>
+<li><p><code>unbits(long[] bitset)</code> - Returns an long array of the give bitset representation</p>
+<pre><code class="lang-sql"> select unbits(to_bits(array(1,4,2,3)));
+ > [1,2,3,4]
</code></pre>
-<ul>
-<li><code>bits_or(array<long> b1, array<long> b2, ..)</code> - Returns a logical OR given bitsets</li>
-</ul>
-<pre><code class="lang-sql">select unbits(bits_or(to_bits(array(1,4)),to_bits(array(2,3))));
-> [1,2,3,4]
+</li>
+<li><p><code>bits_or(array<long> b1, array<long> b2, ..)</code> - Returns a logical OR given bitsets</p>
+<pre><code class="lang-sql"> select unbits(bits_or(to_bits(array(1,4)),to_bits(array(2,3))));
+ > [1,2,3,4]
</code></pre>
+</li>
+</ul>
<h2 id="bitset-udaf">Bitset UDAF</h2>
<ul>
<li><code>bits_collect(int|long x)</code> - Returns a bitset in array<long></long></li>
</ul>
<h1 id="compression-functions">Compression functions</h1>
<ul>
-<li><code>deflate(TEXT data [, const int compressionLevel])</code> - Returns a compressed BINARY obeject by using Deflater.
-The compression level must be in range [-1,9]</li>
-</ul>
-<pre><code class="lang-sql">select base91(deflate('aaaaaaaaaaaaaaaabbbbccc'));
-> AA+=kaIM|WTt!+wbGAA
+<li><p><code>deflate(TEXT data [, const int compressionLevel])</code> - Returns a compressed BINARY object by using Deflater.
+The compression level must be in range [-1,9]</p>
+<pre><code class="lang-sql"> select base91(deflate('aaaaaaaaaaaaaaaabbbbccc'));
+ > AA+=kaIM|WTt!+wbGAA
</code></pre>
-<ul>
-<li><code>inflate(BINARY compressedData)</code> - Returns a decompressed STRING by using Inflater</li>
-</ul>
-<pre><code class="lang-sql">select inflate(unbase91(base91(deflate('aaaaaaaaaaaaaaaabbbbccc'))));
-> aaaaaaaaaaaaaaaabbbbccc
+</li>
+<li><p><code>inflate(BINARY compressedData)</code> - Returns a decompressed STRING by using Inflater</p>
+<pre><code class="lang-sql"> select inflate(unbase91(base91(deflate('aaaaaaaaaaaaaaaabbbbccc'))));
+ > aaaaaaaaaaaaaaaabbbbccc
</code></pre>
+</li>
+</ul>
<h1 id="map-functions">Map functions</h1>
<h2 id="map-udfs">Map UDFs</h2>
<ul>
@@ -1766,81 +1801,88 @@ The compression level must be in range [-1,9]</li>
</ul>
<h1 id="math-functions">Math functions</h1>
<ul>
-<li><code>sigmoid(x)</code> - Returns 1.0 / (1.0 + exp(-x))</li>
+<li><code>sigmoid(x)</code> - Returns <code>1.0 / (1.0 + exp(-x))</code></li>
</ul>
<h1 id="text-processing-functions">Text processing functions</h1>
<ul>
-<li><code>base91(binary)</code> - Convert the argument from binary to a BASE91 string</li>
-</ul>
-<pre><code class="lang-sql">select base91(deflate('aaaaaaaaaaaaaaaabbbbccc'));
-> AA+=kaIM|WTt!+wbGAA
+<li><p><code>base91(binary)</code> - Convert the argument from binary to a BASE91 string</p>
+<pre><code class="lang-sql"> select base91(deflate('aaaaaaaaaaaaaaaabbbbccc'));
+ > AA+=kaIM|WTt!+wbGAA
</code></pre>
-<ul>
-<li><code>unbase91(string)</code> - Convert a BASE91 string to a binary</li>
-</ul>
-<pre><code class="lang-sql">select inflate(unbase91(base91(deflate('aaaaaaaaaaaaaaaabbbbccc'))));
-> aaaaaaaaaaaaaaaabbbbccc
+</li>
+<li><p><code>unbase91(string)</code> - Convert a BASE91 string to a binary</p>
+<pre><code class="lang-sql"> select inflate(unbase91(base91(deflate('aaaaaaaaaaaaaaaabbbbccc'))));
+ > aaaaaaaaaaaaaaaabbbbccc
</code></pre>
-<ul>
-<li><code>normalize_unicode(string str [, string form])</code> - Transforms <code>str</code> with the specified normalization form. The <code>form</code> takes one of NFC (default), NFD, NFKC, or NFKD</li>
-</ul>
-<pre><code class="lang-sql">select normalize_unicode('ハンカクカナ','NFKC');
-> ハンカクカナ
+</li>
+<li><p><code>normalize_unicode(string str [, string form])</code> - Transforms <code>str</code> with the specified normalization form. The <code>form</code> takes one of NFC (default), NFD, NFKC, or NFKD</p>
+<pre><code class="lang-sql"> select normalize_unicode('ハンカクカナ','NFKC');
+ > ハンカクカナ
-select normalize_unicode('㈱㌧㌦Ⅲ','NFKC');
-> (株)トンドルIII
+ select normalize_unicode('㈱㌧㌦Ⅲ','NFKC');
+ > (株)トンドルIII
</code></pre>
-<ul>
+</li>
<li><p><code>split_words(string query [, string regex])</code> - Returns an array<text> containing splitted strings</text></p>
</li>
<li><p><code>is_stopword(string word)</code> - Returns whether English stopword or not</p>
</li>
<li><p><code>tokenize(string englishText [, boolean toLowerCase])</code> - Returns words in array<string></string></p>
</li>
-<li><p><code>tokenize_ja(String line [, const string mode = "normal", const list<string> stopWords, const list<string> stopTags])</code> - returns tokenized strings in array<string></string></p>
-</li>
-</ul>
-<pre><code class="lang-sql">select tokenize_ja("kuromojiを使った分かち書きのテストです。第二引数にはnormal/search/extendedを指定できます。デフォルトではnormalモードです。");
+<li><p><code>tokenize_ja(String line [, const string mode = "normal", const list<string> stopWords, const list<string> stopTags])</code> - returns tokenized strings in array<string>. Refer <a href="tokenizer.html">this article</a> for detail.</string></p>
+<pre><code class="lang-sql"> select tokenize_ja("kuromojiを使った分かち書きのテストです。第二引数にはnormal/search/extendedを指定できます。デフォルトではnormalモードです。");
-> ["kuromoji","使う","分かち書き","テスト","第","二","引数","normal","search","extended","指定","デフォルト","normal"," モード"]
+ > ["kuromoji","使う","分かち書き","テスト","第","二","引数","normal","search","extended","指定","デフォルト","normal"," モード"]
</code></pre>
-<p><a href="https://github.com/myui/hivemall/wiki/Tokenizer" target="_blank">https://github.com/myui/hivemall/wiki/Tokenizer</a></p>
+</li>
+</ul>
<h1 id="other-functions">Other functions</h1>
<ul>
<li><p><code>convert_label(const int|const float)</code> - Convert from -1|1 to 0.0f|1.0f, or from 0.0f|1.0f to -1|1</p>
</li>
-<li><p><code>each_top_k(int K, Object group, double cmpKey, *)</code> - Returns top-K values (or tail-K values when k is less than 0)</p>
+<li><p><code>each_top_k(int K, Object group, double cmpKey, *)</code> - Returns top-K values (or tail-K values when k is less than 0). Refer <a href="topk.html">this article</a> for detail.</p>
</li>
-</ul>
-<p><a href="https://github.com/myui/hivemall/wiki/Efficient-Top-k-computation-on-Apache-Hive-using-Hivemall-UDTF" target="_blank">https://github.com/myui/hivemall/wiki/Efficient-Top-k-computation-on-Apache-Hive-using-Hivemall-UDTF</a></p>
-<ul>
-<li><code>generate_series(const int|bigint start, const int|bigint end)</code> - Generate a series of values, from start to end</li>
-</ul>
-<pre><code class="lang-sql">WITH dual as (
- <span class="hljs-keyword">select</span> <span class="hljs-number">1</span>
-)
-<span class="hljs-keyword">select</span> generate_series(<span class="hljs-number">1</span>,<span class="hljs-number">9</span>)
-<span class="hljs-keyword">from</span> dual;
-
-1
-2
-3
-4
-5
-6
-7
-8
-9
+<li><p><code>generate_series(const int|bigint start, const int|bigint end)</code> - Generate a series of values, from start to end</p>
+<pre><code class="lang-sql"> <span class="hljs-keyword">select</span> generate_series(<span class="hljs-number">1</span>,<span class="hljs-number">9</span>);
+
+ 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
</code></pre>
-<p>A similar function to PostgreSQL's <code>generate_serics</code>.
-<a href="http://www.postgresql.org/docs/current/static/functions-srf.html" target="_blank">http://www.postgresql.org/docs/current/static/functions-srf.html</a></p>
-<ul>
-<li><code>x_rank(KEY)</code> - Generates a pseudo sequence number starting from 1 for each key
-<div id="page-footer"><hr><p><sub><font color="gray">
+<p> A similar function to PostgreSQL's <code>generate_serics</code>.
+ <a href="http://www.postgresql.org/docs/current/static/functions-srf.html" target="_blank">http://www.postgresql.org/docs/current/static/functions-srf.html</a></p>
+</li>
+<li><p><code>x_rank(KEY)</code> - Generates a pseudo sequence number starting from 1 for each key</p>
+</li>
+</ul>
+<p><div id="page-footer"><hr><!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+<p><sub><font color="gray">
Apache Hivemall is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator.
</font></sub></p>
-</div></li>
-</ul>
+</div></p>
</section>
@@ -1874,7 +1916,7 @@ Apache Hivemall is an effort undergoing incubation at The Apache Software Founda
<script>
var gitbook = gitbook || [];
gitbook.push(function() {
- gitbook.page.hasChanged({"page":{"title":"List of generic Hivemall functions","level":"2.1","depth":1,"next":{"title":"Efficient Top-K query processing","level":"2.2","depth":1,"path":"misc/topk.md","ref":"misc/topk.md","articles":[]},"previous":{"title":"Map-side Join causes ClassCastException on Tez","level":"1.5.5","depth":2,"path":"troubleshooting/mapjoin_classcastex.md","ref":"troubleshooting/mapjoin_classcastex.md","articles":[]},"dir":"ltr"},"config":{"plugins":["theme-api","edit-link","github","splitter","sitemap","etoc","callouts","toggle-chapters","anchorjs","codeblock-filename","expandable-chapters","multipart","codeblock-filename","katex","emphasize","localized-footer"],"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"pluginsConfig":{"emphasize":{},"callouts":{},"etoc":{"maxdepth":3,"mindepth":1,"notoc":true},"github":{"url":"https://github
.com/apache/incubator-hivemall/"},"splitter":{},"search":{},"downloadpdf":{"base":"https://github.com/apache/incubator-hivemall/docs/gitbook","label":"PDF","multilingual":false},"multipart":{},"localized-footer":{"filename":"FOOTER.md"},"lunr":{"maxIndexSize":1000000,"ignoreSpecialCharacters":false},"katex":{},"fontsettings":{"theme":"white","family":"sans","size":2,"font":"sans"},"highlight":{},"codeblock-filename":{},"sitemap":{"hostname":"http://hivemall.incubator.apache.org/"},"theme-api":{"languages":[],"split":false,"theme":"dark"},"sharing":{"facebook":true,"twitter":true,"google":false,"weibo":false,"instapaper":false,"vk":false,"all":["facebook","google","twitter","weibo","instapaper"]},"edit-link":{"label":"Edit","base":"https://github.com/apache/incubator-hivemall/docs/gitbook"},"theme-default":{"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"showLevel
":true},"anchorjs":{"selector":"h1,h2,h3,*:not(.callout) > h4,h5"},"toggle-chapters":{},"expandable-chapters":{}},"theme":"default","pdf":{"pageNumbers":true,"fontSize":12,"fontFamily":"Arial","paperSize":"a4","chapterMark":"pagebreak","pageBreaksBefore":"/","margin":{"right":62,"left":62,"top":56,"bottom":56}},"structure":{"langs":"LANGS.md","readme":"README.md","glossary":"GLOSSARY.md","summary":"SUMMARY.md"},"variables":{},"title":"Hivemall User Manual","links":{"sidebar":{"<i class=\"fa fa-home\"></i> Home":"http://hivemall.incubator.apache.org/"}},"gitbook":"3.x.x","description":"User Manual for Apache Hivemall"},"file":{"path":"misc/generic_funcs.md","mtime":"2016-11-12T07:18:00.000Z","type":"markdown"},"gitbook":{"version":"3.2.2","time":"2016-11-14T10:40:22.987Z"},"basePath":"..","book":{"language":""}});
+ gitbook.page.hasChanged({"page":{"title":"List of generic Hivemall functions","level":"2.1","depth":1,"next":{"title":"Efficient Top-K query processing","level":"2.2","depth":1,"path":"misc/topk.md","ref":"misc/topk.md","articles":[]},"previous":{"title":"Map-side Join causes ClassCastException on Tez","level":"1.5.5","depth":2,"path":"troubleshooting/mapjoin_classcastex.md","ref":"troubleshooting/mapjoin_classcastex.md","articles":[]},"dir":"ltr"},"config":{"plugins":["theme-api","edit-link","github","splitter","sitemap","etoc","callouts","toggle-chapters","anchorjs","codeblock-filename","expandable-chapters","multipart","codeblock-filename","katex","emphasize","localized-footer"],"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"pluginsConfig":{"emphasize":{},"callouts":{},"etoc":{"maxdepth":3,"mindepth":1,"notoc":true},"github":{"url":"https://github
.com/apache/incubator-hivemall/"},"splitter":{},"search":{},"downloadpdf":{"base":"https://github.com/apache/incubator-hivemall/docs/gitbook","label":"PDF","multilingual":false},"multipart":{},"localized-footer":{"filename":"FOOTER.md"},"lunr":{"maxIndexSize":1000000,"ignoreSpecialCharacters":false},"katex":{},"fontsettings":{"theme":"white","family":"sans","size":2,"font":"sans"},"highlight":{},"codeblock-filename":{},"sitemap":{"hostname":"http://hivemall.incubator.apache.org/"},"theme-api":{"languages":[],"split":false,"theme":"dark"},"sharing":{"facebook":true,"twitter":true,"google":false,"weibo":false,"instapaper":false,"vk":false,"all":["facebook","google","twitter","weibo","instapaper"]},"edit-link":{"label":"Edit","base":"https://github.com/apache/incubator-hivemall/docs/gitbook"},"theme-default":{"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"showLevel
":true},"anchorjs":{"selector":"h1,h2,h3,*:not(.callout) > h4,h5"},"toggle-chapters":{},"expandable-chapters":{}},"theme":"default","pdf":{"pageNumbers":true,"fontSize":12,"fontFamily":"Arial","paperSize":"a4","chapterMark":"pagebreak","pageBreaksBefore":"/","margin":{"right":62,"left":62,"top":56,"bottom":56}},"structure":{"langs":"LANGS.md","readme":"README.md","glossary":"GLOSSARY.md","summary":"SUMMARY.md"},"variables":{},"title":"Hivemall User Manual","links":{"sidebar":{"<i class=\"fa fa-home\"></i> Home":"http://hivemall.incubator.apache.org/"}},"gitbook":"3.x.x","description":"User Manual for Apache Hivemall"},"file":{"path":"misc/generic_funcs.md","mtime":"2016-11-17T11:12:15.000Z","type":"markdown"},"gitbook":{"version":"3.2.2","time":"2016-11-17T12:16:14.647Z"},"basePath":"..","book":{"language":""}});
});
</script>
</div>
http://git-wip-us.apache.org/repos/asf/incubator-hivemall-site/blob/68241a08/userguide/misc/tokenizer.html
----------------------------------------------------------------------
diff --git a/userguide/misc/tokenizer.html b/userguide/misc/tokenizer.html
index e0d3959..af62ee8 100644
--- a/userguide/misc/tokenizer.html
+++ b/userguide/misc/tokenizer.html
@@ -999,6 +999,21 @@
</li>
+ <li class="chapter " data-level="5.6" data-path="../binaryclass/titanic_rf.html">
+
+ <a href="../binaryclass/titanic_rf.html">
+
+
+ <b>5.6.</b>
+
+ Kaggle Titanic Tutorial
+
+ </a>
+
+
+
+ </li>
+
@@ -1671,7 +1686,25 @@
<p>["kuromoji","使う","分かち書き","テスト","第","二","引数","normal","search","extended","指定","デフォルト","normal","モード"]</p>
</blockquote>
<p>For detailed APIs, please refer Javadoc of <a href="https://lucene.apache.org/core/5_3_1/analyzers-kuromoji/org/apache/lucene/analysis/ja/JapaneseAnalyzer.html" target="_blank">JapaneseAnalyzer</a> as well.
-<div id="page-footer"><hr><p><sub><font color="gray">
+<div id="page-footer"><hr><!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+<p><sub><font color="gray">
Apache Hivemall is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator.
</font></sub></p>
</div></p>
@@ -1708,7 +1741,7 @@ Apache Hivemall is an effort undergoing incubation at The Apache Software Founda
<script>
var gitbook = gitbook || [];
gitbook.push(function() {
- gitbook.page.hasChanged({"page":{"title":"English/Japanese Text Tokenizer","level":"2.3","depth":1,"next":{"title":"Feature Scaling","level":"3.1","depth":1,"path":"ft_engineering/scaling.md","ref":"ft_engineering/scaling.md","articles":[]},"previous":{"title":"Efficient Top-K query processing","level":"2.2","depth":1,"path":"misc/topk.md","ref":"misc/topk.md","articles":[]},"dir":"ltr"},"config":{"plugins":["theme-api","edit-link","github","splitter","sitemap","etoc","callouts","toggle-chapters","anchorjs","codeblock-filename","expandable-chapters","multipart","codeblock-filename","katex","emphasize","localized-footer"],"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"pluginsConfig":{"emphasize":{},"callouts":{},"etoc":{"maxdepth":3,"mindepth":1,"notoc":true},"github":{"url":"https://github.com/apache/incubator-hivemall/"},"splitter":{},"search":{},"d
ownloadpdf":{"base":"https://github.com/apache/incubator-hivemall/docs/gitbook","label":"PDF","multilingual":false},"multipart":{},"localized-footer":{"filename":"FOOTER.md"},"lunr":{"maxIndexSize":1000000,"ignoreSpecialCharacters":false},"katex":{},"fontsettings":{"theme":"white","family":"sans","size":2,"font":"sans"},"highlight":{},"codeblock-filename":{},"sitemap":{"hostname":"http://hivemall.incubator.apache.org/"},"theme-api":{"languages":[],"split":false,"theme":"dark"},"sharing":{"facebook":true,"twitter":true,"google":false,"weibo":false,"instapaper":false,"vk":false,"all":["facebook","google","twitter","weibo","instapaper"]},"edit-link":{"label":"Edit","base":"https://github.com/apache/incubator-hivemall/docs/gitbook"},"theme-default":{"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"showLevel":true},"anchorjs":{"selector":"h1,h2,h3,*:not(.callout) > h4,
h5"},"toggle-chapters":{},"expandable-chapters":{}},"theme":"default","pdf":{"pageNumbers":true,"fontSize":12,"fontFamily":"Arial","paperSize":"a4","chapterMark":"pagebreak","pageBreaksBefore":"/","margin":{"right":62,"left":62,"top":56,"bottom":56}},"structure":{"langs":"LANGS.md","readme":"README.md","glossary":"GLOSSARY.md","summary":"SUMMARY.md"},"variables":{},"title":"Hivemall User Manual","links":{"sidebar":{"<i class=\"fa fa-home\"></i> Home":"http://hivemall.incubator.apache.org/"}},"gitbook":"3.x.x","description":"User Manual for Apache Hivemall"},"file":{"path":"misc/tokenizer.md","mtime":"2016-11-12T07:18:00.000Z","type":"markdown"},"gitbook":{"version":"3.2.2","time":"2016-11-14T10:40:22.987Z"},"basePath":"..","book":{"language":""}});
+ gitbook.page.hasChanged({"page":{"title":"English/Japanese Text Tokenizer","level":"2.3","depth":1,"next":{"title":"Feature Scaling","level":"3.1","depth":1,"path":"ft_engineering/scaling.md","ref":"ft_engineering/scaling.md","articles":[]},"previous":{"title":"Efficient Top-K query processing","level":"2.2","depth":1,"path":"misc/topk.md","ref":"misc/topk.md","articles":[]},"dir":"ltr"},"config":{"plugins":["theme-api","edit-link","github","splitter","sitemap","etoc","callouts","toggle-chapters","anchorjs","codeblock-filename","expandable-chapters","multipart","codeblock-filename","katex","emphasize","localized-footer"],"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"pluginsConfig":{"emphasize":{},"callouts":{},"etoc":{"maxdepth":3,"mindepth":1,"notoc":true},"github":{"url":"https://github.com/apache/incubator-hivemall/"},"splitter":{},"search":{},"d
ownloadpdf":{"base":"https://github.com/apache/incubator-hivemall/docs/gitbook","label":"PDF","multilingual":false},"multipart":{},"localized-footer":{"filename":"FOOTER.md"},"lunr":{"maxIndexSize":1000000,"ignoreSpecialCharacters":false},"katex":{},"fontsettings":{"theme":"white","family":"sans","size":2,"font":"sans"},"highlight":{},"codeblock-filename":{},"sitemap":{"hostname":"http://hivemall.incubator.apache.org/"},"theme-api":{"languages":[],"split":false,"theme":"dark"},"sharing":{"facebook":true,"twitter":true,"google":false,"weibo":false,"instapaper":false,"vk":false,"all":["facebook","google","twitter","weibo","instapaper"]},"edit-link":{"label":"Edit","base":"https://github.com/apache/incubator-hivemall/docs/gitbook"},"theme-default":{"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"showLevel":true},"anchorjs":{"selector":"h1,h2,h3,*:not(.callout) > h4,
h5"},"toggle-chapters":{},"expandable-chapters":{}},"theme":"default","pdf":{"pageNumbers":true,"fontSize":12,"fontFamily":"Arial","paperSize":"a4","chapterMark":"pagebreak","pageBreaksBefore":"/","margin":{"right":62,"left":62,"top":56,"bottom":56}},"structure":{"langs":"LANGS.md","readme":"README.md","glossary":"GLOSSARY.md","summary":"SUMMARY.md"},"variables":{},"title":"Hivemall User Manual","links":{"sidebar":{"<i class=\"fa fa-home\"></i> Home":"http://hivemall.incubator.apache.org/"}},"gitbook":"3.x.x","description":"User Manual for Apache Hivemall"},"file":{"path":"misc/tokenizer.md","mtime":"2016-11-16T08:39:12.000Z","type":"markdown"},"gitbook":{"version":"3.2.2","time":"2016-11-17T12:16:14.647Z"},"basePath":"..","book":{"language":""}});
});
</script>
</div>
http://git-wip-us.apache.org/repos/asf/incubator-hivemall-site/blob/68241a08/userguide/misc/topk.html
----------------------------------------------------------------------
diff --git a/userguide/misc/topk.html b/userguide/misc/topk.html
index 92a9d2f..8d4690c 100644
--- a/userguide/misc/topk.html
+++ b/userguide/misc/topk.html
@@ -999,6 +999,21 @@
</li>
+ <li class="chapter " data-level="5.6" data-path="../binaryclass/titanic_rf.html">
+
+ <a href="../binaryclass/titanic_rf.html">
+
+
+ <b>5.6.</b>
+
+ Kaggle Titanic Tutorial
+
+ </a>
+
+
+
+ </li>
+
@@ -1652,7 +1667,25 @@
<p><code>each_top_k(int k, ANY group, double value, arg1, arg2, ..., argN)</code> returns a top-k records for each <code>group</code>. It returns a relation consists of <code>(int rank, double value, arg1, arg2, .., argN)</code>.</p>
<p>This function is particularly useful for applying a similarity/distance function where the computation complexity is <strong>O(nm)</strong>.</p>
<p><code>each_top_k</code> is very fast when compared to other methods running top-k queries (e.g., <a href="https://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/" target="_blank"><code>rank/distribute by</code></a>) in Hive.</p>
-<h2 id="caution">Caution</h2>
+<!-- toc --><div id="toc" class="toc">
+
+<ul>
+<li><a href="#caution">Caution</a></li>
+<li><a href="#usage">Usage</a><ul>
+<li><a href="#efficient-top-k-query-processing-using-eachtopk">Efficient Top-k Query Processing using <code>each_top_k</code></a></li>
+<li><a href="#top-k-clicks">top-k clicks</a></li>
+<li><a href="#top-k-similarity-computation">Top-k similarity computation</a><ul>
+<li><a href="#explicit-grouping-using-distribute-by-and-sort-by">Explicit grouping using <code>distribute by</code> and <code>sort by</code></a></li>
+<li><a href="#parallelization-of-similarity-computation-using-with-clause">Parallelization of similarity computation using WITH clause</a></li>
+</ul>
+</li>
+<li><a href="#tail-k">tail-K</a></li>
+</ul>
+</li>
+</ul>
+
+</div><!-- tocstop -->
+<h1 id="caution">Caution</h1>
<ul>
<li><code>each_top_k</code> is supported from Hivemall v0.3.2-3 or later.</li>
<li>This UDTF assumes that input records are sorted by <code>group</code>. Use <code>DISTRIBUTE BY group SORT BY group</code> to ensure that. Or, you can use <code>LEFT OUTER JOIN</code> for certain cases.</li>
@@ -1662,7 +1695,8 @@
<li>If k is less than 0, reverse order is used and <code>tail-K</code> records are returned for each <code>group</code>.</li>
<li>Note that this function returns <a href="http://www.michaelpollmeier.com/selecting-top-k-items-from-a-list-efficiently-in-java-groovy/" target="_blank">a pseudo ranking</a> for top-k. It always returns <code>at-most K</code> records for each group. The ranking scheme is similar to <code>dense_rank</code> but slightly different in certain cases.</li>
</ul>
-<h1 id="efficient-top-k-query-processing-using-eachtopk">Efficient Top-k Query Processing using <code>each_top_k</code></h1>
+<h1 id="usage">Usage</h1>
+<h2 id="efficient-top-k-query-processing-using-eachtopk">Efficient Top-k Query Processing using <code>each_top_k</code></h2>
<p>Efficient processing of Top-k queries is a crucial requirement in many interactive environments that involve massive amounts of data.
Our Hive extension <code>each_top_k</code> helps running Top-k processing efficiently.</p>
<ul>
@@ -1778,7 +1812,6 @@ Do null hanlding like <code>if(value is null, -1, value)</code> to avoid null.</
<p>If <code>k</code> is less than 0, reverse order is used and tail-K records are returned for each <code>group</code>.</p>
<p>The ranking semantics of <code>each_top_k</code> follows SQL's <code>dense_rank</code> and then limits results by <code>k</code>. </p>
<div class="panel panel-warning"><div class="panel-heading"><h3 class="panel-title" id="caution"><i class="fa fa-exclamation-triangle"></i> Caution</h3></div><div class="panel-body"><p><code>each_top_k</code> is benefical where the number of grouping keys are large. If the number of grouping keys are not so large (e.g., less than 100), consider using <code>rank() over</code> instead.</p></div></div>
-<h1 id="usage">Usage</h1>
<h2 id="top-k-clicks">top-k clicks</h2>
<p><a href="http://stackoverflow.com/questions/9390698/hive-getting-top-n-records-in-group-by-query/32559050#32559050" target="_blank">http://stackoverflow.com/questions/9390698/hive-getting-top-n-records-in-group-by-query/32559050#32559050</a></p>
<pre><code class="lang-sql"><span class="hljs-keyword">set</span> hivevar:k=<span class="hljs-number">5</span>;
@@ -2422,7 +2455,7 @@ Apache Hivemall is an effort undergoing incubation at The Apache Software Founda
<script>
var gitbook = gitbook || [];
gitbook.push(function() {
- gitbook.page.hasChanged({"page":{"title":"Efficient Top-K query processing","level":"2.2","depth":1,"next":{"title":"English/Japanese Text Tokenizer","level":"2.3","depth":1,"path":"misc/tokenizer.md","ref":"misc/tokenizer.md","articles":[]},"previous":{"title":"List of generic Hivemall functions","level":"2.1","depth":1,"path":"misc/generic_funcs.md","ref":"misc/generic_funcs.md","articles":[]},"dir":"ltr"},"config":{"plugins":["theme-api","edit-link","github","splitter","sitemap","etoc","callouts","toggle-chapters","anchorjs","codeblock-filename","expandable-chapters","multipart","codeblock-filename","katex","emphasize","localized-footer"],"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"pluginsConfig":{"emphasize":{},"callouts":{},"etoc":{"maxdepth":3,"mindepth":1,"notoc":true},"github":{"url":"https://github.com/apache/incubator-hivemall/"},"splitt
er":{},"search":{},"downloadpdf":{"base":"https://github.com/apache/incubator-hivemall/docs/gitbook","label":"PDF","multilingual":false},"multipart":{},"localized-footer":{"filename":"FOOTER.md"},"lunr":{"maxIndexSize":1000000,"ignoreSpecialCharacters":false},"katex":{},"fontsettings":{"theme":"white","family":"sans","size":2,"font":"sans"},"highlight":{},"codeblock-filename":{},"sitemap":{"hostname":"http://hivemall.incubator.apache.org/"},"theme-api":{"languages":[],"split":false,"theme":"dark"},"sharing":{"facebook":true,"twitter":true,"google":false,"weibo":false,"instapaper":false,"vk":false,"all":["facebook","google","twitter","weibo","instapaper"]},"edit-link":{"label":"Edit","base":"https://github.com/apache/incubator-hivemall/docs/gitbook"},"theme-default":{"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"showLevel":true},"anchorjs":{"selector":"h1,h2,h3,
*:not(.callout) > h4,h5"},"toggle-chapters":{},"expandable-chapters":{}},"theme":"default","pdf":{"pageNumbers":true,"fontSize":12,"fontFamily":"Arial","paperSize":"a4","chapterMark":"pagebreak","pageBreaksBefore":"/","margin":{"right":62,"left":62,"top":56,"bottom":56}},"structure":{"langs":"LANGS.md","readme":"README.md","glossary":"GLOSSARY.md","summary":"SUMMARY.md"},"variables":{},"title":"Hivemall User Manual","links":{"sidebar":{"<i class=\"fa fa-home\"></i> Home":"http://hivemall.incubator.apache.org/"}},"gitbook":"3.x.x","description":"User Manual for Apache Hivemall"},"file":{"path":"misc/topk.md","mtime":"2016-11-16T08:32:05.000Z","type":"markdown"},"gitbook":{"version":"3.2.2","time":"2016-11-16T08:36:45.392Z"},"basePath":"..","book":{"language":""}});
+ gitbook.page.hasChanged({"page":{"title":"Efficient Top-K query processing","level":"2.2","depth":1,"next":{"title":"English/Japanese Text Tokenizer","level":"2.3","depth":1,"path":"misc/tokenizer.md","ref":"misc/tokenizer.md","articles":[]},"previous":{"title":"List of generic Hivemall functions","level":"2.1","depth":1,"path":"misc/generic_funcs.md","ref":"misc/generic_funcs.md","articles":[]},"dir":"ltr"},"config":{"plugins":["theme-api","edit-link","github","splitter","sitemap","etoc","callouts","toggle-chapters","anchorjs","codeblock-filename","expandable-chapters","multipart","codeblock-filename","katex","emphasize","localized-footer"],"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"pluginsConfig":{"emphasize":{},"callouts":{},"etoc":{"maxdepth":3,"mindepth":1,"notoc":true},"github":{"url":"https://github.com/apache/incubator-hivemall/"},"splitt
er":{},"search":{},"downloadpdf":{"base":"https://github.com/apache/incubator-hivemall/docs/gitbook","label":"PDF","multilingual":false},"multipart":{},"localized-footer":{"filename":"FOOTER.md"},"lunr":{"maxIndexSize":1000000,"ignoreSpecialCharacters":false},"katex":{},"fontsettings":{"theme":"white","family":"sans","size":2,"font":"sans"},"highlight":{},"codeblock-filename":{},"sitemap":{"hostname":"http://hivemall.incubator.apache.org/"},"theme-api":{"languages":[],"split":false,"theme":"dark"},"sharing":{"facebook":true,"twitter":true,"google":false,"weibo":false,"instapaper":false,"vk":false,"all":["facebook","google","twitter","weibo","instapaper"]},"edit-link":{"label":"Edit","base":"https://github.com/apache/incubator-hivemall/docs/gitbook"},"theme-default":{"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"showLevel":true},"anchorjs":{"selector":"h1,h2,h3,
*:not(.callout) > h4,h5"},"toggle-chapters":{},"expandable-chapters":{}},"theme":"default","pdf":{"pageNumbers":true,"fontSize":12,"fontFamily":"Arial","paperSize":"a4","chapterMark":"pagebreak","pageBreaksBefore":"/","margin":{"right":62,"left":62,"top":56,"bottom":56}},"structure":{"langs":"LANGS.md","readme":"README.md","glossary":"GLOSSARY.md","summary":"SUMMARY.md"},"variables":{},"title":"Hivemall User Manual","links":{"sidebar":{"<i class=\"fa fa-home\"></i> Home":"http://hivemall.incubator.apache.org/"}},"gitbook":"3.x.x","description":"User Manual for Apache Hivemall"},"file":{"path":"misc/topk.md","mtime":"2016-11-17T09:58:26.000Z","type":"markdown"},"gitbook":{"version":"3.2.2","time":"2016-11-17T12:16:14.647Z"},"basePath":"..","book":{"language":""}});
});
</script>
</div>