You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@zeppelin.apache.org by ah...@apache.org on 2017/01/12 03:10:16 UTC
zeppelin git commit: ZEPPELIN-1867. Update document for pig interpreter and add one sample note

Repository: zeppelin
Updated Branches:
  refs/heads/master 0da08d1d7 -> 3d2d4b6f9


ZEPPELIN-1867. Update document for pig interpreter and add one sample note

### What is this PR for?
* Minor update for pig interpreter
* Add one sample pig tutorial note which do the same thing as the spark tutorial note.

### What type of PR is it?
[Improvement | Documentation ]

### Todos
* [ ] - Task

### What is the Jira issue?
* https://issues.apache.org/jira/browse/ZEPPELIN-1867

### How should this be tested?
Tested manually

### Screenshots (if appropriate)
![image](https://cloud.githubusercontent.com/assets/164491/21839221/8a4ffa04-d811-11e6-9096-f4f9da22ea49.png)

### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No

Author: Jeff Zhang <zj...@apache.org>

Closes #1830 from zjffdu/ZEPPELIN-1867 and squashes the following commits:

1c0d819 [Jeff Zhang] rename note name
50198a1 [Jeff Zhang] add more description of tutorial note
88385f2 [Jeff Zhang] Add pig tutorial note
25216f8 [Jeff Zhang] ZEPPELIN-1867. Update document for pig interpreter and add one sample note


Project: http://git-wip-us.apache.org/repos/asf/zeppelin/repo
Commit: http://git-wip-us.apache.org/repos/asf/zeppelin/commit/3d2d4b6f
Tree: http://git-wip-us.apache.org/repos/asf/zeppelin/tree/3d2d4b6f
Diff: http://git-wip-us.apache.org/repos/asf/zeppelin/diff/3d2d4b6f

Branch: refs/heads/master
Commit: 3d2d4b6f9804ecc5c157c1b4a3885ee01890884e
Parents: 0da08d1
Author: Jeff Zhang <zj...@apache.org>
Authored: Wed Jan 11 15:22:19 2017 +0800
Committer: ahyoungryu <ah...@apache.org>
Committed: Thu Jan 12 12:10:09 2017 +0900

----------------------------------------------------------------------
 docs/interpreter/pig.md      |  11 +-
 notebook/2C57UKYWR/note.json | 316 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 325 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/zeppelin/blob/3d2d4b6f/docs/interpreter/pig.md
----------------------------------------------------------------------
diff --git a/docs/interpreter/pig.md b/docs/interpreter/pig.md
index 227656b..a778169 100644
--- a/docs/interpreter/pig.md
+++ b/docs/interpreter/pig.md
@@ -26,6 +26,7 @@ group: manual
 ## Supported runtime mode
   - Local
   - MapReduce
+  - Tez_Local (Only Tez 0.7 is supported)
   - Tez  (Only Tez 0.7 is supported)
 
 ## How to use
@@ -40,6 +41,10 @@ group: manual
 
     HADOOP\_CONF\_DIR needs to be specified in `ZEPPELIN_HOME/conf/zeppelin-env.sh`.
 
+- Tez Local Mode
+    
+    Nothing needs to be done for tez local mode
+    
 - Tez Mode
 
     HADOOP\_CONF\_DIR and TEZ\_CONF\_DIR needs to be specified in `ZEPPELIN_HOME/conf/zeppelin-env.sh`.
@@ -57,7 +62,7 @@ At the Interpreters menu, you have to create a new Pig interpreter. Pig interpre
     <tr>
         <td>zeppelin.pig.execType</td>
         <td>mapreduce</td>
-        <td>Execution mode for pig runtime. local | mapreduce | tez </td>
+        <td>Execution mode for pig runtime. local | mapreduce | tez_local | tez </td>
     </tr>
     <tr>
         <td>zeppelin.pig.includeJobStats</td>
@@ -94,4 +99,6 @@ c = group b by Category;
 foreach c generate group as category, COUNT($1) as count;
 ```
 
-Data is shared between `%pig` and `%pig.query`, so that you can do some common work in `%pig`, and do different kinds of query based on the data of `%pig`.
+Data is shared between `%pig` and `%pig.query`, so that you can do some common work in `%pig`, and do different kinds of query based on the data of `%pig`. 
+Besides, we recommend you to specify alias explicitly so that the visualization can display the column name correctly. Here, we name `COUNT($1)` as `count`, if you don't do this,
+then we will name it using position, here we will use `col_1` to represent `COUNT($1)` if you don't specify alias for it. There's one pig tutorial note in zeppelin for your reference.

http://git-wip-us.apache.org/repos/asf/zeppelin/blob/3d2d4b6f/notebook/2C57UKYWR/note.json
----------------------------------------------------------------------
diff --git a/notebook/2C57UKYWR/note.json b/notebook/2C57UKYWR/note.json
new file mode 100644
index 0000000..2b6ef8f
--- /dev/null
+++ b/notebook/2C57UKYWR/note.json
@@ -0,0 +1,316 @@
+{
+  "paragraphs": [
+    {
+      "text": "%md\n\n\n### [Apache Pig](http://pig.apache.org/) is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.\n\nPig\u0027s language layer currently consists of a textual language called Pig Latin, which has the following key properties:\n\n* Ease of programming. It is trivial to achieve parallel execution of simple, \"embarrassingly parallel\" data analysis tasks. Complex tasks comprised of multiple interrelated data transformations are explicitly encoded as data flow sequences, making them easy to write, understand, and maintain.\n* Optimization opportunities. The way in which tasks are encoded permits the system to optimize their execution automatically, allowing the user to focus on 
 semantics rather than efficiency.\n* Extensibility. Users can create their own functions to do special-purpose processing.\n",
+      "user": "user1",
+      "dateUpdated": "Jan 6, 2017 3:55:03 PM",
+      "config": {
+        "colWidth": 12.0,
+        "enabled": true,
+        "results": {},
+        "editorSetting": {
+          "language": "markdown",
+          "editOnDblClick": true
+        },
+        "editorMode": "ace/mode/markdown",
+        "editorHide": true,
+        "tableHide": false
+      },
+      "settings": {
+        "params": {},
+        "forms": {}
+      },
+      "results": {
+        "code": "SUCCESS",
+        "msg": [
+          {
+            "type": "HTML",
+            "data": "\u003cdiv class\u003d\"markdown-body\"\u003e\n\u003ch3\u003e\u003ca href\u003d\"http://pig.apache.org/\"\u003eApache Pig\u003c/a\u003e is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.\u003c/h3\u003e\n\u003cp\u003ePig\u0026rsquo;s language layer currently consists of a textual language called Pig Latin, which has the following key properties:\u003c/p\u003e\n\u003cul\u003e\n  \u003cli\u003eEase of programming. It is trivial to achieve parallel execution of simple, \u0026ldquo;embarrassingly parallel\u0026rdquo; data analysis tasks. Complex tasks comprised of multiple interrelated data transformations are explicitly encoded as data flow sequences, making them easy to write,
  understand, and maintain.\u003c/li\u003e\n  \u003cli\u003eOptimization opportunities. The way in which tasks are encoded permits the system to optimize their execution automatically, allowing the user to focus on semantics rather than efficiency.\u003c/li\u003e\n  \u003cli\u003eExtensibility. Users can create their own functions to do special-purpose processing.\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e"
+          }
+        ]
+      },
+      "apps": [],
+      "jobName": "paragraph_1483277502513_1156234051",
+      "id": "20170101-213142_1565013608",
+      "dateCreated": "Jan 1, 2017 9:31:42 PM",
+      "dateStarted": "Jan 6, 2017 3:55:03 PM",
+      "dateFinished": "Jan 6, 2017 3:55:04 PM",
+      "status": "FINISHED",
+      "progressUpdateIntervalMs": 500
+    },
+    {
+      "text": "%md\n\nThis pig tutorial use pig to do the same thing as spark tutorial. The default mode is mapreduce, you can also use other modes like local/tez_local/tez. For mapreduce mode, you need to have hadoop installed and export `HADOOP_CONF_DIR` in `zeppelin-env.sh`\n\nThe tutorial consists of 3 steps.\n\n* Use shell interpreter to download bank.csv and upload it to hdfs\n* use `%pig` to process the data\n* use `%pig.query` to query the data",
+      "user": "user1",
+      "dateUpdated": "Jan 6, 2017 3:55:18 PM",
+      "config": {
+        "colWidth": 12.0,
+        "enabled": true,
+        "results": {},
+        "editorSetting": {
+          "language": "markdown",
+          "editOnDblClick": true
+        },
+        "editorMode": "ace/mode/markdown",
+        "editorHide": true,
+        "tableHide": false
+      },
+      "settings": {
+        "params": {},
+        "forms": {}
+      },
+      "results": {
+        "code": "SUCCESS",
+        "msg": [
+          {
+            "type": "HTML",
+            "data": "\u003cdiv class\u003d\"markdown-body\"\u003e\n\u003cp\u003eThis pig tutorial use pig to do the same thing as spark tutorial. The default mode is mapreduce, you can also use other modes like local/tez_local/tez. For mapreduce mode, you need to have hadoop installed and export \u003ccode\u003eHADOOP_CONF_DIR\u003c/code\u003e in \u003ccode\u003ezeppelin-env.sh\u003c/code\u003e\u003c/p\u003e\n\u003cp\u003eThe tutorial consists of 3 steps.\u003c/p\u003e\n\u003cul\u003e\n  \u003cli\u003eUse shell interpreter to download bank.csv and upload it to hdfs\u003c/li\u003e\n  \u003cli\u003euse \u003ccode\u003e%pig\u003c/code\u003e to process the data\u003c/li\u003e\n  \u003cli\u003euse \u003ccode\u003e%pig.query\u003c/code\u003e to query the data\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e"
+          }
+        ]
+      },
+      "apps": [],
+      "jobName": "paragraph_1483689316217_-629483391",
+      "id": "20170106-155516_1050601059",
+      "dateCreated": "Jan 6, 2017 3:55:16 PM",
+      "dateStarted": "Jan 6, 2017 3:55:18 PM",
+      "dateFinished": "Jan 6, 2017 3:55:18 PM",
+      "status": "FINISHED",
+      "progressUpdateIntervalMs": 500
+    },
+    {
+      "text": "%pig\n\nbankText \u003d load \u0027bank.csv\u0027 using PigStorage(\u0027;\u0027);\nbank \u003d foreach bankText generate $0 as age, $1 as job, $2 as marital, $3 as education, $5 as balance; \nbank \u003d filter bank by age !\u003d \u0027\"age\"\u0027;\nbank \u003d foreach bank generate (int)age, REPLACE(job,\u0027\"\u0027,\u0027\u0027) as job, REPLACE(marital, \u0027\"\u0027, \u0027\u0027) as marital, (int)(REPLACE(balance, \u0027\"\u0027, \u0027\u0027)) as balance;\n\n-- The following statement is optional, it depends on whether your needs.\n-- store bank into \u0027clean_bank.csv\u0027 using PigStorage(\u0027;\u0027);\n\n\n",
+      "user": "user1",
+      "dateUpdated": "Jan 6, 2017 3:57:11 PM",
+      "config": {
+        "colWidth": 12.0,
+        "editorMode": "ace/mode/pig",
+        "results": {},
+        "enabled": true,
+        "editorSetting": {
+          "language": "pig",
+          "editOnDblClick": false
+        }
+      },
+      "settings": {
+        "params": {},
+        "forms": {}
+      },
+      "results": {
+        "code": "SUCCESS",
+        "msg": []
+      },
+      "apps": [],
+      "jobName": "paragraph_1483277250237_-466604517",
+      "id": "20161228-140640_1560978333",
+      "dateCreated": "Jan 1, 2017 9:27:30 PM",
+      "dateStarted": "Jan 6, 2017 3:57:11 PM",
+      "dateFinished": "Jan 6, 2017 3:57:13 PM",
+      "status": "FINISHED",
+      "progressUpdateIntervalMs": 500
+    },
+    {
+      "text": "%pig.query\n\nbank_data \u003d filter bank by age \u003c 30;\nb \u003d group bank_data by age;\nforeach b generate group, COUNT($1);\n\n",
+      "user": "user1",
+      "dateUpdated": "Jan 6, 2017 3:57:15 PM",
+      "config": {
+        "colWidth": 4.0,
+        "editorMode": "ace/mode/pig",
+        "results": {
+          "0": {
+            "graph": {
+              "mode": "multiBarChart",
+              "height": 300.0,
+              "optionOpen": false
+            },
+            "helium": {}
+          }
+        },
+        "enabled": true,
+        "editorSetting": {
+          "language": "pig",
+          "editOnDblClick": false
+        }
+      },
+      "settings": {
+        "params": {},
+        "forms": {}
+      },
+      "results": {
+        "code": "SUCCESS",
+        "msg": [
+          {
+            "type": "TABLE",
+            "data": "group\tnull\n19\t4\n20\t3\n21\t7\n22\t9\n23\t20\n24\t24\n25\t44\n26\t77\n27\t94\n28\t103\n29\t97\n"
+          }
+        ]
+      },
+      "apps": [],
+      "jobName": "paragraph_1483277250238_-465450270",
+      "id": "20161228-140730_1903342877",
+      "dateCreated": "Jan 1, 2017 9:27:30 PM",
+      "dateStarted": "Jan 6, 2017 3:57:15 PM",
+      "dateFinished": "Jan 6, 2017 3:57:16 PM",
+      "status": "FINISHED",
+      "progressUpdateIntervalMs": 500
+    },
+    {
+      "text": "%pig.query\n\nbank_data \u003d filter bank by age \u003c ${maxAge\u003d40};\nb \u003d group bank_data by age;\nforeach b generate group, COUNT($1);",
+      "user": "user1",
+      "dateUpdated": "Jan 6, 2017 3:57:18 PM",
+      "config": {
+        "colWidth": 4.0,
+        "editorMode": "ace/mode/pig",
+        "results": {
+          "0": {
+            "graph": {
+              "mode": "pieChart",
+              "height": 300.0,
+              "optionOpen": false
+            },
+            "helium": {}
+          }
+        },
+        "enabled": true,
+        "editorSetting": {
+          "language": "pig",
+          "editOnDblClick": false
+        }
+      },
+      "settings": {
+        "params": {
+          "maxAge": "36"
+        },
+        "forms": {
+          "maxAge": {
+            "name": "maxAge",
+            "defaultValue": "40",
+            "hidden": false
+          }
+        }
+      },
+      "results": {
+        "code": "SUCCESS",
+        "msg": [
+          {
+            "type": "TABLE",
+            "data": "group\tnull\n19\t4\n20\t3\n21\t7\n22\t9\n23\t20\n24\t24\n25\t44\n26\t77\n27\t94\n28\t103\n29\t97\n30\t150\n31\t199\n32\t224\n33\t186\n34\t231\n35\t180\n"
+          }
+        ]
+      },
+      "apps": [],
+      "jobName": "paragraph_1483277250239_-465835019",
+      "id": "20161228-154918_1551591203",
+      "dateCreated": "Jan 1, 2017 9:27:30 PM",
+      "dateStarted": "Jan 6, 2017 3:57:18 PM",
+      "dateFinished": "Jan 6, 2017 3:57:19 PM",
+      "status": "FINISHED",
+      "progressUpdateIntervalMs": 500
+    },
+    {
+      "text": "%pig.query\n\nbank_data \u003d filter bank by marital\u003d\u003d\u0027${marital\u003dsingle,single|divorced|married}\u0027;\nb \u003d group bank_data by age;\nforeach b generate group, COUNT($1) as c;\n\n\n",
+      "user": "user1",
+      "dateUpdated": "Jan 6, 2017 3:57:24 PM",
+      "config": {
+        "colWidth": 4.0,
+        "editorMode": "ace/mode/pig",
+        "results": {
+          "0": {
+            "graph": {
+              "mode": "scatterChart",
+              "height": 300.0,
+              "optionOpen": false
+            },
+            "helium": {}
+          }
+        },
+        "enabled": true,
+        "editorSetting": {
+          "language": "pig",
+          "editOnDblClick": false
+        }
+      },
+      "settings": {
+        "params": {
+          "marital": "married"
+        },
+        "forms": {
+          "marital": {
+            "name": "marital",
+            "defaultValue": "single",
+            "options": [
+              {
+                "value": "single"
+              },
+              {
+                "value": "divorced"
+              },
+              {
+                "value": "married"
+              }
+            ],
+            "hidden": false
+          }
+        }
+      },
+      "results": {
+        "code": "SUCCESS",
+        "msg": [
+          {
+            "type": "TABLE",
+            "data": "group\tc\n23\t3\n24\t11\n25\t11\n26\t18\n27\t26\n28\t23\n29\t37\n30\t56\n31\t104\n32\t105\n33\t103\n34\t142\n35\t109\n36\t117\n37\t100\n38\t99\n39\t88\n40\t105\n41\t97\n42\t91\n43\t79\n44\t68\n45\t76\n46\t82\n47\t78\n48\t91\n49\t87\n50\t74\n51\t63\n52\t66\n53\t75\n54\t56\n55\t68\n56\t50\n57\t78\n58\t67\n59\t56\n60\t36\n61\t15\n62\t5\n63\t7\n64\t6\n65\t4\n66\t7\n67\t5\n68\t1\n69\t5\n70\t5\n71\t5\n72\t4\n73\t6\n74\t2\n75\t3\n76\t1\n77\t5\n78\t2\n79\t3\n80\t6\n81\t1\n83\t2\n86\t1\n87\t1\n"
+          }
+        ]
+      },
+      "apps": [],
+      "jobName": "paragraph_1483277250240_-480070728",
+      "id": "20161228-142259_575675591",
+      "dateCreated": "Jan 1, 2017 9:27:30 PM",
+      "dateStarted": "Jan 6, 2017 3:57:20 PM",
+      "dateFinished": "Jan 6, 2017 3:57:20 PM",
+      "status": "FINISHED",
+      "progressUpdateIntervalMs": 500
+    },
+    {
+      "text": "%pig\n",
+      "dateUpdated": "Jan 1, 2017 9:27:30 PM",
+      "config": {},
+      "settings": {
+        "params": {},
+        "forms": {}
+      },
+      "apps": [],
+      "jobName": "paragraph_1483277250240_-480070728",
+      "id": "20161228-155036_1854903164",
+      "dateCreated": "Jan 1, 2017 9:27:30 PM",
+      "status": "READY",
+      "errorMessage": "",
+      "progressUpdateIntervalMs": 500
+    }
+  ],
+  "name": "Zeppelin Tutorial/Basic Features (Pig)",
+  "id": "2C57UKYWR",
+  "angularObjects": {
+    "2C3DR183X:shared_process": [],
+    "2C5VH924X:shared_process": [],
+    "2C686X8ZH:shared_process": [],
+    "2C66Z9XPQ:shared_process": [],
+    "2C3JKFMJU:shared_process": [],
+    "2C69WE69N:shared_process": [],
+    "2C3RWCVAG:shared_process": [],
+    "2C4HKDCQW:shared_process": [],
+    "2C4BJDRRZ:shared_process": [],
+    "2C6V3D44K:shared_process": [],
+    "2C3VECEG2:shared_process": [],
+    "2C5SRRXHM:shared_process": [],
+    "2C5DCRVGM:shared_process": [],
+    "2C66GE1VB:shared_process": [],
+    "2C3PTPMUH:shared_process": [],
+    "2C48Y7FSJ:shared_process": [],
+    "2C4ZD49PF:shared_process": [],
+    "2C63XW4XE:shared_process": [],
+    "2C4UB1UZA:shared_process": [],
+    "2C5S1R21W:shared_process": [],
+    "2C3SQSB7V:shared_process": []
+  },
+  "config": {},
+  "info": {}
+}