You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@zeppelin.apache.org by ah...@apache.org on 2017/02/28 04:16:57 UTC
svn commit: r1784686 - in /zeppelin/site/docs/0.8.0-SNAPSHOT:
assets/themes/zeppelin/img/pig_zeppelin_tutorial.png interpreter/pig.html
Author: ahyoungryu
Date: Tue Feb 28 04:16:57 2017
New Revision: 1784686
URL: http://svn.apache.org/viewvc?rev=1784686&view=rev
Log: (empty)
Added:
zeppelin/site/docs/0.8.0-SNAPSHOT/assets/themes/zeppelin/img/pig_zeppelin_tutorial.png (with props)
Modified:
zeppelin/site/docs/0.8.0-SNAPSHOT/interpreter/pig.html
Added: zeppelin/site/docs/0.8.0-SNAPSHOT/assets/themes/zeppelin/img/pig_zeppelin_tutorial.png
URL: http://svn.apache.org/viewvc/zeppelin/site/docs/0.8.0-SNAPSHOT/assets/themes/zeppelin/img/pig_zeppelin_tutorial.png?rev=1784686&view=auto
==============================================================================
Binary file - no diff available.
Propchange: zeppelin/site/docs/0.8.0-SNAPSHOT/assets/themes/zeppelin/img/pig_zeppelin_tutorial.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Modified: zeppelin/site/docs/0.8.0-SNAPSHOT/interpreter/pig.html
URL: http://svn.apache.org/viewvc/zeppelin/site/docs/0.8.0-SNAPSHOT/interpreter/pig.html?rev=1784686&r1=1784685&r2=1784686&view=diff
==============================================================================
--- zeppelin/site/docs/0.8.0-SNAPSHOT/interpreter/pig.html (original)
+++ zeppelin/site/docs/0.8.0-SNAPSHOT/interpreter/pig.html Tue Feb 28 04:16:57 2017
@@ -210,12 +210,17 @@
<h2>Supported interpreter type</h2>
<ul>
-<li><p><code>%pig.script</code> (default)</p>
+<li><p><code>%pig.script</code> (default Pig interpreter, so you can use <code>%pig</code>)</p>
-<p>All the pig script can run in this type of interpreter, and display type is plain text.</p></li>
+<p><code>%pig.script</code> is like the Pig grunt shell. Anything you can run in Pig grunt shell can be run in <code>%pig.script</code> interpreter, it is used for running Pig script where you don’t need to visualize the data, it is suitable for data munging. </p></li>
<li><p><code>%pig.query</code></p>
-<p>Almost the same as <code>%pig.script</code>. The only difference is that you don't need to add alias in the last statement. And the display type is table. </p></li>
+<p><code>%pig.query</code> is a little different compared with <code>%pig.script</code>. It is used for exploratory data analysis via Pig latin where you can leverage Zeppelin’s visualization ability. There're 2 minor differences in the last statement between <code>%pig.script</code> and <code>%pig.query</code></p>
+
+<ul>
+<li>No pig alias in the last statement in <code>%pig.query</code> (read the examples below).</li>
+<li>The last statement must be in single line in <code>%pig.query</code></li>
+</ul></li>
</ul>
<h2>Supported runtime mode</h2>
@@ -249,8 +254,8 @@
<h3>How to configure interpreter</h3>
<p>At the Interpreters menu, you have to create a new Pig interpreter. Pig interpreter has below properties by default.
-And you can set any pig properties here which will be passed to pig engine. (like tez.queue.name & mapred.job.queue.name).
-Besides, we use paragraph title as job name if it exists, else use the last line of pig script. So you can use that to find app running in YARN RM UI.</p>
+And you can set any Pig properties here which will be passed to Pig engine. (like tez.queue.name & mapred.job.queue.name).
+Besides, we use paragraph title as job name if it exists, else use the last line of Pig script. So you can use that to find app running in YARN RM UI.</p>
<table class="table-configuration">
<tr>
@@ -290,21 +295,42 @@ Besides, we use paragraph title as job n
<h5>pig</h5>
<div class="highlight"><pre><code class="text language-text" data-lang="text">%pig
-raw_data = load 'dataset/sf_crime/train.csv' using PigStorage(',') as (Dates,Category,Descript,DayOfWeek,PdDistrict,Resolution,Address,X,Y);
-b = group raw_data all;
-c = foreach b generate COUNT($1);
-dump c;
+bankText = load 'bank.csv' using PigStorage(';');
+bank = foreach bankText generate $0 as age, $1 as job, $2 as marital, $3 as education, $5 as balance;
+bank = filter bank by age != '"age"';
+bank = foreach bank generate (int)age, REPLACE(job,'"','') as job, REPLACE(marital, '"', '') as marital, (int)(REPLACE(balance, '"', '')) as balance;
+store bank into 'clean_bank.csv' using PigStorage(';'); -- this statement is optional, it just show you that most of time %pig.script is used for data munging before querying the data.
</code></pre></div>
<h5>pig.query</h5>
+
+<p>Get the number of each age where age is less than 30</p>
+<div class="highlight"><pre><code class="text language-text" data-lang="text">%pig.query
+
+bank_data = filter bank by age < 30;
+b = group bank_data by age;
+foreach b generate group, COUNT($1);
+</code></pre></div>
+<p>The same as above, but use dynamic text form so that use can specify the variable maxAge in textbox. (See screenshot below). Dynamic form is a very cool feature of Zeppelin, you can refer this <a href="(../manual/dynamicform.html">link</a>) for details.</p>
<div class="highlight"><pre><code class="text language-text" data-lang="text">%pig.query
-b = foreach raw_data generate Category;
-c = group b by Category;
-foreach c generate group as category, COUNT($1) as count;
+bank_data = filter bank by age < ${maxAge=40};
+b = group bank_data by age;
+foreach b generate group, COUNT($1) as count;
</code></pre></div>
+<p>Get the number of each age for specific marital type, also use dynamic form here. User can choose the marital type in the dropdown list (see screenshot below).</p>
+<div class="highlight"><pre><code class="text language-text" data-lang="text">%pig.query
+
+bank_data = filter bank by marital=='${marital=single,single|divorced|married}';
+b = group bank_data by age;
+foreach b generate group, COUNT($1) as count;
+</code></pre></div>
+<p>The above examples are in the Pig tutorial note in Zeppelin, you can check that for details. Here's the screenshot.</p>
+
+<p><img class="img-responsive" width="1024px" style="margin:0 auto; padding: 26px;" src="../assets/themes/zeppelin/img/pig_zeppelin_tutorial.png" /></p>
+
<p>Data is shared between <code>%pig</code> and <code>%pig.query</code>, so that you can do some common work in <code>%pig</code>, and do different kinds of query based on the data of <code>%pig</code>.
-Besides, we recommend you to specify alias explicitly so that the visualization can display the column name correctly. Here, we name <code>COUNT($1)</code> as <code>count</code>, if you don't do this,
-then we will name it using position, here we will use <code>col_1</code> to represent <code>COUNT($1)</code> if you don't specify alias for it. There's one pig tutorial note in zeppelin for your reference.</p>
+Besides, we recommend you to specify alias explicitly so that the visualization can display the column name correctly. In the above example 2 and 3 of <code>%pig.query</code>, we name <code>COUNT($1)</code> as <code>count</code>. If you don't do this,
+then we will name it using position. E.g. in the above first example of <code>%pig.query</code>, we will use <code>col_1</code> in chart to represent <code>COUNT($1)</code>.</p>
</div>
</div>