You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@zeppelin.apache.org by ah...@apache.org on 2017/02/28 04:16:57 UTC

svn commit: r1784686 - in /zeppelin/site/docs/0.8.0-SNAPSHOT: assets/themes/zeppelin/img/pig_zeppelin_tutorial.png interpreter/pig.html

Author: ahyoungryu
Date: Tue Feb 28 04:16:57 2017
New Revision: 1784686

URL: http://svn.apache.org/viewvc?rev=1784686&view=rev
Log: (empty)

Added:
    zeppelin/site/docs/0.8.0-SNAPSHOT/assets/themes/zeppelin/img/pig_zeppelin_tutorial.png   (with props)
Modified:
    zeppelin/site/docs/0.8.0-SNAPSHOT/interpreter/pig.html

Added: zeppelin/site/docs/0.8.0-SNAPSHOT/assets/themes/zeppelin/img/pig_zeppelin_tutorial.png
URL: http://svn.apache.org/viewvc/zeppelin/site/docs/0.8.0-SNAPSHOT/assets/themes/zeppelin/img/pig_zeppelin_tutorial.png?rev=1784686&view=auto
==============================================================================
Binary file - no diff available.

Propchange: zeppelin/site/docs/0.8.0-SNAPSHOT/assets/themes/zeppelin/img/pig_zeppelin_tutorial.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Modified: zeppelin/site/docs/0.8.0-SNAPSHOT/interpreter/pig.html
URL: http://svn.apache.org/viewvc/zeppelin/site/docs/0.8.0-SNAPSHOT/interpreter/pig.html?rev=1784686&r1=1784685&r2=1784686&view=diff
==============================================================================
--- zeppelin/site/docs/0.8.0-SNAPSHOT/interpreter/pig.html (original)
+++ zeppelin/site/docs/0.8.0-SNAPSHOT/interpreter/pig.html Tue Feb 28 04:16:57 2017
@@ -210,12 +210,17 @@
 <h2>Supported interpreter type</h2>
 
 <ul>
-<li><p><code>%pig.script</code> (default)</p>
+<li><p><code>%pig.script</code> (default Pig interpreter, so you can use <code>%pig</code>)</p>
 
-<p>All the pig script can run in this type of interpreter, and display type is plain text.</p></li>
+<p><code>%pig.script</code> is like the Pig grunt shell. Anything you can run in Pig grunt shell can be run in <code>%pig.script</code> interpreter, it is used for running Pig script where you don’t need to visualize the data, it is suitable for data munging. </p></li>
 <li><p><code>%pig.query</code></p>
 
-<p>Almost the same as <code>%pig.script</code>. The only difference is that you don&#39;t need to add alias in the last statement. And the display type is table.   </p></li>
+<p><code>%pig.query</code> is a little different compared with <code>%pig.script</code>. It is used for exploratory data analysis via Pig latin where you can leverage Zeppelin’s visualization ability. There&#39;re 2 minor differences in the last statement between <code>%pig.script</code> and <code>%pig.query</code></p>
+
+<ul>
+<li>No pig alias in the last statement in <code>%pig.query</code> (read the examples below).</li>
+<li>The last statement must be in single line in <code>%pig.query</code></li>
+</ul></li>
 </ul>
 
 <h2>Supported runtime mode</h2>
@@ -249,8 +254,8 @@
 <h3>How to configure interpreter</h3>
 
 <p>At the Interpreters menu, you have to create a new Pig interpreter. Pig interpreter has below properties by default.
-And you can set any pig properties here which will be passed to pig engine. (like tez.queue.name &amp; mapred.job.queue.name).
-Besides, we use paragraph title as job name if it exists, else use the last line of pig script. So you can use that to find app running in YARN RM UI.</p>
+And you can set any Pig properties here which will be passed to Pig engine. (like tez.queue.name &amp; mapred.job.queue.name).
+Besides, we use paragraph title as job name if it exists, else use the last line of Pig script. So you can use that to find app running in YARN RM UI.</p>
 
 <table class="table-configuration">
     <tr>
@@ -290,21 +295,42 @@ Besides, we use paragraph title as job n
 <h5>pig</h5>
 <div class="highlight"><pre><code class="text language-text" data-lang="text">%pig
 
-raw_data = load &#39;dataset/sf_crime/train.csv&#39; using PigStorage(&#39;,&#39;) as (Dates,Category,Descript,DayOfWeek,PdDistrict,Resolution,Address,X,Y);
-b = group raw_data all;
-c = foreach b generate COUNT($1);
-dump c;
+bankText = load &#39;bank.csv&#39; using PigStorage(&#39;;&#39;);
+bank = foreach bankText generate $0 as age, $1 as job, $2 as marital, $3 as education, $5 as balance; 
+bank = filter bank by age != &#39;&quot;age&quot;&#39;;
+bank = foreach bank generate (int)age, REPLACE(job,&#39;&quot;&#39;,&#39;&#39;) as job, REPLACE(marital, &#39;&quot;&#39;, &#39;&#39;) as marital, (int)(REPLACE(balance, &#39;&quot;&#39;, &#39;&#39;)) as balance;
+store bank into &#39;clean_bank.csv&#39; using PigStorage(&#39;;&#39;); -- this statement is optional, it just show you that most of time %pig.script is used for data munging before querying the data. 
 </code></pre></div>
 <h5>pig.query</h5>
+
+<p>Get the number of each age where age is less than 30</p>
+<div class="highlight"><pre><code class="text language-text" data-lang="text">%pig.query
+
+bank_data = filter bank by age &lt; 30;
+b = group bank_data by age;
+foreach b generate group, COUNT($1);
+</code></pre></div>
+<p>The same as above, but use dynamic text form so that use can specify the variable maxAge in textbox. (See screenshot below). Dynamic form is a very cool feature of Zeppelin, you can refer this <a href="(../manual/dynamicform.html">link</a>) for details.</p>
 <div class="highlight"><pre><code class="text language-text" data-lang="text">%pig.query
 
-b = foreach raw_data generate Category;
-c = group b by Category;
-foreach c generate group as category, COUNT($1) as count;
+bank_data = filter bank by age &lt; ${maxAge=40};
+b = group bank_data by age;
+foreach b generate group, COUNT($1) as count;
 </code></pre></div>
+<p>Get the number of each age for specific marital type, also use dynamic form here. User can choose the marital type in the dropdown list (see screenshot below).</p>
+<div class="highlight"><pre><code class="text language-text" data-lang="text">%pig.query
+
+bank_data = filter bank by marital==&#39;${marital=single,single|divorced|married}&#39;;
+b = group bank_data by age;
+foreach b generate group, COUNT($1) as count;
+</code></pre></div>
+<p>The above examples are in the Pig tutorial note in Zeppelin, you can check that for details. Here&#39;s the screenshot.</p>
+
+<p><img class="img-responsive" width="1024px" style="margin:0 auto; padding: 26px;" src="../assets/themes/zeppelin/img/pig_zeppelin_tutorial.png" /></p>
+
 <p>Data is shared between <code>%pig</code> and <code>%pig.query</code>, so that you can do some common work in <code>%pig</code>, and do different kinds of query based on the data of <code>%pig</code>. 
-Besides, we recommend you to specify alias explicitly so that the visualization can display the column name correctly. Here, we name <code>COUNT($1)</code> as <code>count</code>, if you don&#39;t do this,
-then we will name it using position, here we will use <code>col_1</code> to represent <code>COUNT($1)</code> if you don&#39;t specify alias for it. There&#39;s one pig tutorial note in zeppelin for your reference.</p>
+Besides, we recommend you to specify alias explicitly so that the visualization can display the column name correctly. In the above example 2 and 3 of <code>%pig.query</code>, we name <code>COUNT($1)</code> as <code>count</code>. If you don&#39;t do this,
+then we will name it using position. E.g. in the above first example of <code>%pig.query</code>, we will use <code>col_1</code> in chart to represent <code>COUNT($1)</code>.</p>
 
   </div>
 </div>