You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by sraghunandan <gi...@git.apache.org> on 2018/08/03 04:11:34 UTC
[GitHub] carbondata pull request #2568: [Presto-integration-Technical-note] created d...
Github user sraghunandan commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2568#discussion_r207434322
--- Diff: integration/presto/presto-integration-technical-note.md ---
@@ -0,0 +1,253 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to you under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+# Presto Integration Technical Note
+Presto Integration with Carbon data include the below steps:
+
+* Setting up Presto Cluster
+
+* Setting up cluster to use carbondata as a catalog along with other catalogs provided by presto.
+
+In this technical note we will first learn about the above two points and after that we will see how we can do performance tuning with Presto.
+
+## **Let us begin with the first step of Presto Cluster Setup:**
+
+
+* ### Installing Presto
+
+ 1. Download the 0.187 version of Presto using:
+ `wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.187/presto-server-0.187.tar.gz`
+
+ 2. Extract Presto tar file: `tar zxvf presto-server-0.187.tar.gz`.
+
+ 3. Download the Presto CLI for the coordinator and name it presto.
+
+ ```
+ wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.187/presto-cli-0.187-executable.jar
+
+ mv presto-cli-0.187-executable.jar presto
+
+ chmod +x presto
+ ```
+
+### Create Configuration Files
+
+ 1. Create `etc` folder in presto-server-0.187 directory.
+ 2. Create `config.properties`, `jvm.config`, `log.properties`, and `node.properties` files.
+ 3. Install uuid to generate a node.id.
+
+ ```
+ sudo apt-get install uuid
+
+ uuid
+ ```
+
+
+##### Contents of your node.properties file
+
+ ```
+ node.environment=production
+ node.id=<generated uuid>
+ node.data-dir=/home/ubuntu/data
+ ```
+
+##### Contents of your jvm.config file
+
+ ```
+ -server
+ -Xmx16G
+ -XX:+UseG1GC
+ -XX:G1HeapRegionSize=32M
+ -XX:+UseGCOverheadLimit
+ -XX:+ExplicitGCInvokesConcurrent
+ -XX:+HeapDumpOnOutOfMemoryError
+ -XX:OnOutOfMemoryError=kill -9 %p
+ ```
+
+##### Contents of your log.properties file
+ ```
+ com.facebook.presto=INFO
+ ```
+
+ The default minimum level is `INFO`. There are four levels: `DEBUG`, `INFO`, `WARN` and `ERROR`.
+
+### Coordinator Configurations
+
+##### Contents of your config.properties
+ ```
+ coordinator=true
+ node-scheduler.include-coordinator=false
+ http-server.http.port=8086
+ query.max-memory=50GB
+ query.max-memory-per-node=2GB
+ discovery-server.enabled=true
+ discovery.uri=<coordinator_ip>:8086
+ ```
+The options `node-scheduler.include-coordinator=false` and `coordinator=true` indicate that the node is the coordinator and tells the coordinator not to do any of the computation work itself and to use the workers.
+
+**Note**: We recommend setting `query.max-memory-per-node` to half of the JVM config max memory, though if your workload is highly concurrent, you may want to use a lower value for `query.max-memory-per-node`.
+
+Also relation between below two configuration-properties should be like:
+If, `query.max-memory-per-node=30GB`
+Then, `query.max-memory=<30GB * number of nodes>`.
+
+### Worker Configurations
+
+##### Contents of your config.properties
+
+ ```
+ coordinator=false
+ http-server.http.port=8086
+ query.max-memory=50GB
+ query.max-memory-per-node=2GB
+ discovery.uri=<coordinator_ip>:8086
+ ```
+
+**Note**: `jvm.config` and `node.properties` files are same for all the nodes (worker + coordinator). All the nodes should have different `node.id`.(generated by uuid command).
+
+### **With this we are ready with the Presto Cluster setup but to integrate with carbon data further steps are required which are as follows:**
+
+### Catalog Configurations
+
+1. Create a folder named `catalog` in etc directory of presto on all the nodes of the cluster including the coordinator.
+
+##### Configuring Carbondata in Presto
+1. Create a file named `carbondata.properties` in the `catalog` folder and set the required properties on all the nodes.
+
+### Add Plugins
+
+1. Create a directory named `carbondata` in plugin directory of presto.
+2. Copy `carbondata` jars to `plugin/carbondata` directory on all nodes.
+
+### Start Presto Server on all nodes
+
+```
+./presto-server-0.187/bin/launcher start
+```
+To run it as a background process.
+
+```
+./presto-server-0.187/bin/launcher run
+```
+To run it in foreground.
+
+### Start Presto CLI
+```
+./presto
+```
+To connect to carbondata catalog use the following command:
+
+```
+./presto --server <coordinator_ip>:8086 --catalog carbondata --schema <schema_name>
+```
+Execute the following command to ensure the workers are connected.
+
+```
+select * from system.runtime.nodes;
+```
+Now you can use the Presto CLI on the coordinator to query data sources in the catalog using the Presto workers.
+
+**Note :** Create Tables and data loads should be done before executing queries as we can not create carbon table from this interface.
+
+## **Presto Performance Tuning**
+
+**Performance Optimizations according to data types and schema:**
+
+- When the data could be stored in Int as well as String. Example: keys for a table then using Int gives a better performance
--- End diff --
change the sentence to specify the suggestion first
---