You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hivemall.apache.org by my...@apache.org on 2016/11/17 12:18:47 UTC

[04/12] incubator-hivemall-site git commit: Fixed links in the user guide

http://git-wip-us.apache.org/repos/asf/incubator-hivemall-site/blob/68241a08/userguide/regression/kddcup12tr2_lr_amplify.html
----------------------------------------------------------------------
diff --git a/userguide/regression/kddcup12tr2_lr_amplify.html b/userguide/regression/kddcup12tr2_lr_amplify.html
index ef32b5d..b40d2e1 100644
--- a/userguide/regression/kddcup12tr2_lr_amplify.html
+++ b/userguide/regression/kddcup12tr2_lr_amplify.html
@@ -999,6 +999,21 @@
             
         </li>
     
+        <li class="chapter " data-level="5.6" data-path="../binaryclass/titanic_rf.html">
+            
+                <a href="../binaryclass/titanic_rf.html">
+            
+                    
+                        <b>5.6.</b>
+                    
+                    Kaggle Titanic Tutorial
+            
+                </a>
+            
+
+            
+        </li>
+    
 
     
         
@@ -1651,7 +1666,7 @@
 -->
 <p>This article explains <em>amplify</em> technique that is useful for improving prediction score.</p>
 <p>Iterations are mandatory in machine learning (e.g., in <a href="http://en.wikipedia.org/wiki/Stochastic_gradient_descent" target="_blank">stochastic gradient descent</a>) to get good prediction models. However, MapReduce is known to be not suited for iterative algorithms because IN/OUT of each MapReduce job is through HDFS.</p>
-<p>In this example, we show how Hivemall deals with this problem. We use <a href="https://github.com/myui/hivemall/wiki/KDDCup-2012-track-2-CTR-prediction-dataset" target="_blank">KDD Cup 2012, Track 2 Task</a> as an example.</p>
+<p>In this example, we show how Hivemall deals with this problem. We use <a href="kddcup12tr2_dataset.html">KDD Cup 2012, Track 2 Task</a> as an example.</p>
 <p><strong>WARNING</strong>: rand_amplify() is supported in v0.2-beta1 and later.</p>
 <hr>
 <h1 id="amplify-training-examples-in-map-phase-and-shuffle-them-in-reduce-phase">Amplify training examples in Map phase and shuffle them in Reduce phase</h1>
@@ -1690,7 +1705,7 @@ So, we recommend users to use an amplified view for training as follows:</p>
 </code></pre>
 <p>The above query is executed by 2 MapReduce jobs as shown below:</p>
 <p><img src="../resources/images/amplify.png" alt="amplifier"></p>
-<p>Using <em>trainning_x3</em>  instead of the plain training table results in higher and better AUC (0.746214) in <a href="https://github.com/myui/hivemall/wiki/KDDCup-2012-track-2-CTR-prediction-(regression\" target="_blank">this</a>) example.</p>
+<p>Using <em>trainning_x3</em>  instead of the plain training table results in higher and better AUC (0.746214) in <a href="kddcup12tr2_lr.html#evaluation">this example</a>.</p>
 <p>A problem in amplify() is that the shuffle (copy) and merge phase of the stage 1 could become a bottleneck.
 When the training table is so large that involves 100 Map tasks, the merge operator needs to merge at least 100 files by (external) merge sort! </p>
 <p>Note that the actual bottleneck is not M/R iterations but shuffling training instance. Iteration without shuffling (as in <a href="http://spark.incubator.apache.org/examples.html" target="_blank">the Spark example</a>) causes very slow convergence and results in requiring more iterations. Shuffling cannot be avoided even in iterative MapReduce variants.</p>
@@ -1713,7 +1728,7 @@ The rand_amplify UDTF outputs rows in a random order when the local buffer speci
 <p><img src="../resources/images/randamplify.png" alt="Random amplify"></p>
 <p>The map-local multiplication and shuffling has no bottleneck in the merge phase and the query is efficiently executed within a single MapReduce job.</p>
 <p><img src="../resources/images/randamplify_elapsed.png" alt="rand_amplify elapsed"></p>
-<p>Using <em>rand_amplify</em> results in a better AUC (0.743392) in <a href="https://github.com/myui/hivemall/wiki/KDDCup-2012-track-2-CTR-prediction-(regression\" target="_blank">this</a>) example.</p>
+<p>Using <em>rand_amplify</em> results in a better AUC (0.743392) in <a href="kddcup12tr2_lr.html#evaluation">this example</a>.</p>
 <hr>
 <h1 id="conclusion">Conclusion</h1>
 <p>We recommend users to use <em>amplify()</em> for small training inputs and to use <em>rand_amplify()</em> for large training inputs to get a better accuracy in a reasonable training time.</p>
@@ -1743,7 +1758,25 @@ The rand_amplify UDTF outputs rows in a random order when the local buffer speci
 </tr>
 </tbody>
 </table>
-<p><div id="page-footer"><hr><p><sub><font color="gray">
+<p><div id="page-footer"><hr><!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+<p><sub><font color="gray">
 Apache Hivemall is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator.
 </font></sub></p>
 </div></p>
@@ -1780,7 +1813,7 @@ Apache Hivemall is an effort undergoing incubation at The Apache Software Founda
     <script>
         var gitbook = gitbook || [];
         gitbook.push(function() {
-            gitbook.page.hasChanged({"page":{"title":"Logistic Regression with Amplifier","level":"7.2.3","depth":2,"next":{"title":"AdaGrad, AdaDelta","level":"7.2.4","depth":2,"path":"regression/kddcup12tr2_adagrad.md","ref":"regression/kddcup12tr2_adagrad.md","articles":[]},"previous":{"title":"Logistic Regression, Passive Aggressive","level":"7.2.2","depth":2,"path":"regression/kddcup12tr2_lr.md","ref":"regression/kddcup12tr2_lr.md","articles":[]},"dir":"ltr"},"config":{"plugins":["theme-api","edit-link","github","splitter","sitemap","etoc","callouts","toggle-chapters","anchorjs","codeblock-filename","expandable-chapters","multipart","codeblock-filename","katex","emphasize","localized-footer"],"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"pluginsConfig":{"emphasize":{},"callouts":{},"etoc":{"maxdepth":3,"mindepth":1,"notoc":true},"github":{"url":"https://gi
 thub.com/apache/incubator-hivemall/"},"splitter":{},"search":{},"downloadpdf":{"base":"https://github.com/apache/incubator-hivemall/docs/gitbook","label":"PDF","multilingual":false},"multipart":{},"localized-footer":{"filename":"FOOTER.md"},"lunr":{"maxIndexSize":1000000,"ignoreSpecialCharacters":false},"katex":{},"fontsettings":{"theme":"white","family":"sans","size":2,"font":"sans"},"highlight":{},"codeblock-filename":{},"sitemap":{"hostname":"http://hivemall.incubator.apache.org/"},"theme-api":{"languages":[],"split":false,"theme":"dark"},"sharing":{"facebook":true,"twitter":true,"google":false,"weibo":false,"instapaper":false,"vk":false,"all":["facebook","google","twitter","weibo","instapaper"]},"edit-link":{"label":"Edit","base":"https://github.com/apache/incubator-hivemall/docs/gitbook"},"theme-default":{"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"showL
 evel":true},"anchorjs":{"selector":"h1,h2,h3,*:not(.callout) > h4,h5"},"toggle-chapters":{},"expandable-chapters":{}},"theme":"default","pdf":{"pageNumbers":true,"fontSize":12,"fontFamily":"Arial","paperSize":"a4","chapterMark":"pagebreak","pageBreaksBefore":"/","margin":{"right":62,"left":62,"top":56,"bottom":56}},"structure":{"langs":"LANGS.md","readme":"README.md","glossary":"GLOSSARY.md","summary":"SUMMARY.md"},"variables":{},"title":"Hivemall User Manual","links":{"sidebar":{"<i class=\"fa fa-home\"></i> Home":"http://hivemall.incubator.apache.org/"}},"gitbook":"3.x.x","description":"User Manual for Apache Hivemall"},"file":{"path":"regression/kddcup12tr2_lr_amplify.md","mtime":"2016-11-14T09:52:36.000Z","type":"markdown"},"gitbook":{"version":"3.2.2","time":"2016-11-14T10:40:22.987Z"},"basePath":"..","book":{"language":""}});
+            gitbook.page.hasChanged({"page":{"title":"Logistic Regression with Amplifier","level":"7.2.3","depth":2,"next":{"title":"AdaGrad, AdaDelta","level":"7.2.4","depth":2,"path":"regression/kddcup12tr2_adagrad.md","ref":"regression/kddcup12tr2_adagrad.md","articles":[]},"previous":{"title":"Logistic Regression, Passive Aggressive","level":"7.2.2","depth":2,"path":"regression/kddcup12tr2_lr.md","ref":"regression/kddcup12tr2_lr.md","articles":[]},"dir":"ltr"},"config":{"plugins":["theme-api","edit-link","github","splitter","sitemap","etoc","callouts","toggle-chapters","anchorjs","codeblock-filename","expandable-chapters","multipart","codeblock-filename","katex","emphasize","localized-footer"],"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"pluginsConfig":{"emphasize":{},"callouts":{},"etoc":{"maxdepth":3,"mindepth":1,"notoc":true},"github":{"url":"https://gi
 thub.com/apache/incubator-hivemall/"},"splitter":{},"search":{},"downloadpdf":{"base":"https://github.com/apache/incubator-hivemall/docs/gitbook","label":"PDF","multilingual":false},"multipart":{},"localized-footer":{"filename":"FOOTER.md"},"lunr":{"maxIndexSize":1000000,"ignoreSpecialCharacters":false},"katex":{},"fontsettings":{"theme":"white","family":"sans","size":2,"font":"sans"},"highlight":{},"codeblock-filename":{},"sitemap":{"hostname":"http://hivemall.incubator.apache.org/"},"theme-api":{"languages":[],"split":false,"theme":"dark"},"sharing":{"facebook":true,"twitter":true,"google":false,"weibo":false,"instapaper":false,"vk":false,"all":["facebook","google","twitter","weibo","instapaper"]},"edit-link":{"label":"Edit","base":"https://github.com/apache/incubator-hivemall/docs/gitbook"},"theme-default":{"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"showL
 evel":true},"anchorjs":{"selector":"h1,h2,h3,*:not(.callout) > h4,h5"},"toggle-chapters":{},"expandable-chapters":{}},"theme":"default","pdf":{"pageNumbers":true,"fontSize":12,"fontFamily":"Arial","paperSize":"a4","chapterMark":"pagebreak","pageBreaksBefore":"/","margin":{"right":62,"left":62,"top":56,"bottom":56}},"structure":{"langs":"LANGS.md","readme":"README.md","glossary":"GLOSSARY.md","summary":"SUMMARY.md"},"variables":{},"title":"Hivemall User Manual","links":{"sidebar":{"<i class=\"fa fa-home\"></i> Home":"http://hivemall.incubator.apache.org/"}},"gitbook":"3.x.x","description":"User Manual for Apache Hivemall"},"file":{"path":"regression/kddcup12tr2_lr_amplify.md","mtime":"2016-11-17T11:40:35.000Z","type":"markdown"},"gitbook":{"version":"3.2.2","time":"2016-11-17T12:16:14.647Z"},"basePath":"..","book":{"language":""}});
         });
     </script>
 </div>

http://git-wip-us.apache.org/repos/asf/incubator-hivemall-site/blob/68241a08/userguide/resources/images/kddtrack2tables.png
----------------------------------------------------------------------
diff --git a/userguide/resources/images/kddtrack2tables.png b/userguide/resources/images/kddtrack2tables.png
new file mode 100644
index 0000000..90012db
Binary files /dev/null and b/userguide/resources/images/kddtrack2tables.png differ