You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@carbondata.apache.org by ma...@apache.org on 2018/05/17 14:30:58 UTC

[45/50] [abbrv] carbondata git commit: [CARBONDATA-2370] Added document for presto multinode setup for carbondata

[CARBONDATA-2370] Added document for presto multinode setup for carbondata

Added document for presto multinode setup for carbondata

This closes #2199


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/3d23fa69
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/3d23fa69
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/3d23fa69

Branch: refs/heads/spark-2.3
Commit: 3d23fa693a604701e3ab3b574b20f21d089e8b43
Parents: 1c5b526
Author: Geetika Gupta <ge...@knoldus.in>
Authored: Fri Apr 20 16:47:35 2018 +0530
Committer: chenliang613 <ch...@huawei.com>
Committed: Mon May 14 21:43:24 2018 +0800

----------------------------------------------------------------------
 .../Presto_Cluster_Setup_For_Carbondata.md      | 133 +++++++++++++++++++
 1 file changed, 133 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/carbondata/blob/3d23fa69/integration/presto/Presto_Cluster_Setup_For_Carbondata.md
----------------------------------------------------------------------
diff --git a/integration/presto/Presto_Cluster_Setup_For_Carbondata.md b/integration/presto/Presto_Cluster_Setup_For_Carbondata.md
new file mode 100644
index 0000000..082b8fe
--- /dev/null
+++ b/integration/presto/Presto_Cluster_Setup_For_Carbondata.md
@@ -0,0 +1,133 @@
+# Presto Multinode Cluster setup For Carbondata
+
+## Installing Presto
+
+  1. Download the 0.187 version of Presto using:
+  `wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.187/presto-server-0.187.tar.gz`
+
+  2. Extract Presto tar file: `tar zxvf presto-server-0.187.tar.gz`.
+
+  3. Download the Presto CLI for the coordinator and name it presto.
+
+  ```
+    wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.187/presto-cli-0.187-executable.jar
+
+    mv presto-cli-0.187-executable.jar presto
+
+    chmod +x presto
+  ```
+
+ ## Create Configuration Files
+
+  1. Create `etc` folder in presto-server-0.187 directory.
+  2. Create `config.properties`, `jvm.config`, `log.properties`, and `node.properties` files.
+  3. Install uuid to generate a node.id.
+
+      ```
+      sudo apt-get install uuid
+
+      uuid
+      ```
+
+
+##### Contents of your node.properties file
+
+  ```
+  node.environment=production
+  node.id=<generated uuid>
+  node.data-dir=/home/ubuntu/data
+  ```
+
+##### Contents of your jvm.config file
+
+  ```
+  -server
+  -Xmx16G
+  -XX:+UseG1GC
+  -XX:G1HeapRegionSize=32M
+  -XX:+UseGCOverheadLimit
+  -XX:+ExplicitGCInvokesConcurrent
+  -XX:+HeapDumpOnOutOfMemoryError
+  -XX:OnOutOfMemoryError=kill -9 %p
+  ```
+
+##### Contents of your log.properties file
+  ```
+  com.facebook.presto=INFO
+  ```
+
+ The default minimum level is `INFO`. There are four levels: `DEBUG`, `INFO`, `WARN` and `ERROR`.
+
+## Coordinator Configurations
+
+  ##### Contents of your config.properties
+  ```
+  coordinator=true
+  node-scheduler.include-coordinator=false
+  http-server.http.port=8086
+  query.max-memory=50GB
+  query.max-memory-per-node=2GB
+  discovery-server.enabled=true
+  discovery.uri=<coordinator_ip>:8086
+  ```
+The options `node-scheduler.include-coordinator=false` and `coordinator=true` indicate that the node is the coordinator and tells the coordinator not to do any of the computation work itself and to use the workers.
+
+**Note**: We recommend setting `query.max-memory-per-node` to half of the JVM config max memory, though if your workload is highly concurrent, you may want to use a lower value for `query.max-memory-per-node`.
+
+Also relation between below two configuration-properties should be like:
+If, `query.max-memory-per-node=30GB`
+Then, `query.max-memory=<30GB * number of nodes>`.
+
+## Worker Configurations
+
+##### Contents of your config.properties
+
+  ```
+  coordinator=false
+  http-server.http.port=8086
+  query.max-memory=50GB
+  query.max-memory-per-node=2GB
+  discovery.uri=<coordinator_ip>:8086
+  ```
+
+**Note**: `jvm.config` and `node.properties` files are same for all the nodes (worker + coordinator). All the nodes should have different `node.id`.
+
+## Catalog Configurations
+
+1. Create a folder named `catalog` in etc directory of presto on all the nodes of the cluster including the coordinator.
+
+##### Configuring Carbondata in Presto
+1. Create a file named `carbondata.properties` in the `catalog` folder and set the required properties on all the nodes.
+
+## Add Plugins
+
+1. Create a directory named `carbondata` in plugin directory of presto.
+2. Copy `carbondata` jars to `plugin/carbondata` directory on all nodes.
+
+## Start Presto Server on all nodes
+
+```
+./presto-server-0.187/bin/launcher start
+```
+To run it as a background process.
+
+```
+./presto-server-0.187/bin/launcher run
+```
+To run it in foreground.
+
+## Start Presto CLI
+```
+./presto
+```
+To connect to carbondata catalog use the following command:
+
+```
+./presto --server <coordinator_ip>:8086 --catalog carbondata --schema <schema_name>
+```
+Execute the following command to ensure the workers are connected.
+
+```
+select * from system.runtime.nodes;
+```
+Now you can use the Presto CLI on the coordinator to query data sources in the catalog using the Presto workers.