You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@carbondata.apache.org by ch...@apache.org on 2017/02/04 02:38:20 UTC
[21/35] incubator-carbondata-site git commit: Updated website for
CarbonData release 1.0.0
http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/0d4cdb1c/content/docs/latest/installation-guide.html
----------------------------------------------------------------------
diff --git a/content/docs/latest/installation-guide.html b/content/docs/latest/installation-guide.html
new file mode 100644
index 0000000..168f895
--- /dev/null
+++ b/content/docs/latest/installation-guide.html
@@ -0,0 +1,330 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+--><h1>Installation Guide</h1><p>This tutorial guides you through the installation and configuration
+ of CarbonData in the following two modes :</p>
+<ul>
+ <li><a href="#installing-spark-cluster">Installing and
+ Configuring CarbonData on Standalone Spark Cluster</a></li>
+ <li><a href="#installing-yarn-cluster">Installing and
+ Configuring CarbonData on "Spark on YARN" Cluster</a></li>
+</ul><p>followed by :</p>
+<ul>
+ <li><a href="#query-execution">Query Execution using CarbonData
+ Thrift Server</a></li>
+</ul><h2 id="installing-spark-cluster">Installing and Configuring CarbonData on Standalone Spark Cluster</h2><h3>
+ Prerequisites</h3>
+<ul>
+ <li><p>Hadoop HDFS and Yarn should be installed and running.</p></li>
+ <li><p>Spark should be installed and running on all the cluster nodes.</p></li>
+ <li><p>CarbonData user should have permission to access HDFS.</p></li>
+</ul><h3>Procedure</h3>
+<ul>
+ <li><p><a
+ href="https://github.com/apache/incubator-carbondata/blob/master/build/README.md" target="_blank">Build
+ the CarbonData</a> project and get the assembly jar from
+ "./assembly/target/scala-2.10/carbondata_xxx.jar" and put in the <code>"<SPARK_HOME>/carbonlib"</code>
+ folder.</p>
+ <p>NOTE: Create the carbonlib folder if it does not exists inside <code>"<SPARK_HOME>"</code>
+ path.</p></li>
+ <li><p>Add the carbonlib folder path in the Spark classpath. (Edit <code>"<SPARK_HOME>/conf/spark-env.sh"</code>
+ file and modify the value of SPARK_CLASSPATH by appending <br/><code>"<SPARK_HOME>/carbonlib/*"</code>
+ to the existing value)</p></li>
+ <li><p>Copy the carbon.properties.template to <code>"<SPARK_HOME>/conf/carbon.properties"</code>
+ folder from "./conf/" of CarbonData repository.</p></li>
+ <li><p>Copy the "carbonplugins" folder to <code>"<SPARK_HOME>/carbonlib"</code>
+ folder from "./processing/" folder of CarbonData repository.</p>
+ <p>NOTE: carbonplugins will contain .kettle folder.</p></li>
+ <li><p>In Spark node, configure the properties mentioned in the following table in <code>"<SPARK_HOME>/conf/spark-defaults.conf"</code>
+ file.</p></li>
+</ul>
+<table class="table table-striped table-bordered">
+ <thead>
+ <tr>
+ <th>Property</th>
+ <th>Description</th>
+ <th>Value</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>carbon.kettle.home</td>
+ <td>Path that will be used by CarbonData internally to create graph for loading the data
+ </td>
+ <td>$SPARK_HOME /carbonlib/carbonplugins</td>
+ </tr>
+ <tr>
+ <td>spark.driver.extraJavaOptions</td>
+ <td>A string of extra JVM options to pass to the driver. For instance, GC settings or other
+ logging.
+ </td>
+ <td>-Dcarbon.properties.filepath=$SPARK_HOME/conf/carbon.properties</td>
+ </tr>
+ <tr>
+ <td>spark.executor.extraJavaOptions</td>
+ <td>A string of extra JVM options to pass to executors. For instance, GC settings or other
+ logging. NOTE: You can enter multiple values separated by space.
+ </td>
+ <td>-Dcarbon.properties.filepath=$SPARK_HOME/conf/carbon.properties</td>
+ </tr>
+ </tbody>
+</table>
+<ul>
+ <li>Add the following properties in <code>"<SPARK_HOME>/conf/"
+ carbon.properties</code>:
+ </li>
+</ul>
+<table class="table table-striped table-bordered">
+ <thead>
+ <tr>
+ <th>Property</th>
+ <th>Required</th>
+ <th>Description</th>
+ <th>Example</th>
+ <th>Remark</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>carbon.storelocation</td>
+ <td>NO</td>
+ <td>Location where data CarbonData will create the store and write the data in its own
+ format.
+ </td>
+ <td>hdfs://HOSTNAME:PORT/Opt/CarbonStore</td>
+ <td>Propose to set HDFS directory</td>
+ </tr>
+ <tr>
+ <td>carbon.kettle.home</td>
+ <td>YES</td>
+ <td>Path that will be used by CarbonData internally to create graph for loading the data.
+ </td>
+ <td>$SPARK_HOME/carbonlib/carbonplugins</td>
+ <td></td>
+ </tr>
+ </tbody>
+</table>
+<ul>
+ <li>Verify the installation. For example:</li>
+</ul><p><code>
+ ./spark-shell --master spark://HOSTNAME:PORT --total-executor-cores 2
+ --executor-memory 2G
+</code></p><p>NOTE: Make sure you have permissions for CarbonData JARs and files through which
+ driver and executor will start.</p><p>To get started with CarbonData : <a
+ href="mainpage.html?page=quickStart">Quick Start</a> , <a href="mainpage.html?page=ddl">DDL
+ Operations on CarbonData</a></p>
+ <h2 id="installing-yarn-cluster">Installing and Configuring CarbonData on "Spark on YARN"
+ Cluster</h2><p>This section provides the procedure to install CarbonData on "Spark on YARN"
+ cluster.</p><h3>Prerequisites</h3>
+<ul>
+ <li>Hadoop HDFS and Yarn should be installed and running.</li>
+ <li>Spark should be installed and running in all the clients.</li>
+ <li>CarbonData user should have permission to access HDFS.</li>
+</ul><h3>Procedure</h3><p>The following steps are only for Driver Nodes. (Driver nodes are the one
+ which starts the spark context.)</p>
+<ul>
+ <li><p><a
+ href="https://github.com/apache/incubator-carbondata/blob/master/build/README.md" target="_blank">Build
+ the CarbonData</a> project and get the assembly jar from
+ "./assembly/target/scala-2.10/carbondata_xxx.jar" and put in the <code>"<SPARK_HOME>/carbonlib"</code>
+ folder.</p>
+ <p>NOTE: Create the carbonlib folder if it does not exists inside <code>"<SPARK_HOME>"</code>
+ path.</p></li>
+ <li><p>Copy "carbonplugins" folder to <code>"<SPARK_HOME>/carbonlib"</code>
+ folder from "./processing/" folder of CarbonData repository. carbonplugins will contain
+ .kettle folder.</p></li>
+ <li><p>Copy the "carbon.properties.template" to <code>"<SPARK_HOME>/conf/carbon.properties"</code>
+ folder from conf folder of CarbonData repository.</p></li>
+ <li>Modify the parameters in "spark-default.conf" located in the <code>"<SPARK_HOME>/conf</code>"
+ </li>
+</ul>
+<table class="table table-striped table-bordered">
+ <thead>
+ <tr>
+ <th>Property</th>
+ <th>Description</th>
+ <th>Value</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>spark.master</td>
+ <td>Set this value to run the Spark in yarn cluster mode.</td>
+ <td>Set "yarn-client" to run the Spark in yarn cluster mode.</td>
+ </tr>
+ <tr>
+ <td>spark.yarn.dist.files</td>
+ <td>Comma-separated list of files to be placed in the working directory of each executor.
+ </td>
+ <td><code>"<YOUR_SPARK_HOME_PATH>"/conf/carbon.properties</code></td>
+ </tr>
+ <tr>
+ <td>spark.yarn.dist.archives</td>
+ <td>Comma-separated list of archives to be extracted into the working directory of each
+ executor.
+ </td>
+ <td><code>"<YOUR_SPARK_HOME_PATH>"/carbonlib/carbondata_xxx.jar</code></td>
+ </tr>
+ <tr>
+ <td>spark.executor.extraJavaOptions</td>
+ <td>A string of extra JVM options to pass to executors. For instance NOTE: You can enter
+ multiple values separated by space.
+ </td>
+ <td><code>-Dcarbon.properties.filepath="<YOUR_SPARK_HOME_PATH>"/conf/carbon.properties</code>
+ </td>
+ </tr>
+ <tr>
+ <td>spark.executor.extraClassPath</td>
+ <td>Extra classpath entries to prepend to the classpath of executors. NOTE: If
+ SPARK_CLASSPATH is defined in spark-env.sh, then comment it and append the values in
+ below parameter spark.driver.extraClassPath
+ </td>
+ <td><code>
+ "<YOUR_SPARK_HOME_PATH>"/carbonlib/carbonlib/carbondata_xxx.jar</code>
+ </td>
+ </tr>
+ <tr>
+ <td>spark.driver.extraClassPath</td>
+ <td>Extra classpath entries to prepend to the classpath of the driver. NOTE: If
+ SPARK_CLASSPATH is defined in spark-env.sh, then comment it and append the value in
+ below parameter spark.driver.extraClassPath.
+ </td>
+ <td><code>
+ "<YOUR_SPARK_HOME_PATH>"/carbonlib/carbonlib/carbondata_xxx.jar</code>
+ </td>
+ </tr>
+ <tr>
+ <td>spark.driver.extraJavaOptions</td>
+ <td>A string of extra JVM options to pass to the driver. For instance, GC settings or other
+ logging.
+ </td>
+ <td><code>-Dcarbon.properties.filepath="<YOUR_SPARK_HOME_PATH>"/conf/carbon.properties</code>
+ </td>
+ </tr>
+ <tr>
+ <td>carbon.kettle.home</td>
+ <td>Path that will be used by CarbonData internally to create graph for loading the data.
+ </td>
+ <td><code>"<YOUR_SPARK_HOME_PATH>"/carbonlib/carbonplugins</code></td>
+ </tr>
+ </tbody>
+</table>
+<ul>
+ <li>Add the following properties in <code><SPARK_HOME>/conf/ carbon.properties</code>:
+ </li>
+</ul>
+<table class="table table-striped table-bordered">
+ <thead>
+ <tr>
+ <th>Property</th>
+ <th>Required</th>
+ <th>Description</th>
+ <th>Example</th>
+ <th>Default Value</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>carbon.storelocation</td>
+ <td>NO</td>
+ <td>Location where CarbonData will create the store and write the data in its own format.
+ </td>
+ <td>hdfs://HOSTNAME:PORT/Opt/CarbonStore</td>
+ <td>Propose to set HDFS directory</td>
+ </tr>
+ <tr>
+ <td>carbon.kettle.home</td>
+ <td>YES</td>
+ <td>Path that will be used by CarbonData internally to create graph for loading the data.
+ </td>
+ <td>$SPARK_HOME/carbonlib/carbonplugins</td>
+ <td></td>
+ </tr>
+ </tbody>
+</table>
+<ul>
+ <li>Verify the installation.</li>
+</ul><p><code>
+ ./bin/spark-shell --master yarn-client --driver-memory 1g
+ --executor-cores 2 --executor-memory 2G
+</code> NOTE: Make sure you have permissions for CarbonData JARs and files through which driver and
+ executor will start.</p><p>Getting started with CarbonData :<a
+ href="mainpage.html?page=quickStart">Quick Start</a> , <a href="mainpage.html?page=ddl">DDL Operations on CarbonData</a></p>
+
+<h2 id="query-execution">
+ Query Execution Using CarbonData Thrift Server</h2><h3>Starting CarbonData Thrift Server</h3><p>
+ a. cd <code><SPARK_HOME></code></p><p>b. Run the following command to start the CarbonData
+ thrift server.</p><p><code>
+ ./bin/spark-submit --conf spark.sql.hive.thriftServer.singleSession=true
+ --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer
+ $SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR <carbon_store_path>
+</code></p>
+<table class="table table-striped table-bordered">
+ <thead>
+ <tr>
+ <th>Parameter</th>
+ <th>Description</th>
+ <th>Example</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>CARBON_ASSEMBLY_JAR</td>
+ <td>CarbonData assembly jar name present in the <code>"<SPARK_HOME>"/carbonlib/</code>
+ folder.
+ </td>
+ <td>carbondata_2.10-0.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar</td>
+ </tr>
+ <tr>
+ <td>carbon_store_path</td>
+ <td>This is a parameter to the CarbonThriftServer class. This a HDFS path where CarbonData
+ files will be kept. Strongly Recommended to put same as carbon.storelocation parameter
+ of carbon.properties.
+ </td>
+ <td><code>hdfs//<host_name>:54310/user/hive/warehouse/carbon.store</code></td>
+ </tr>
+ </tbody>
+</table><h3>Examples</h3>
+<ul>
+ <li>Start with default memory and executors</li>
+</ul><p><pre><code>
+ ./bin/spark-submit --conf spark.sql.hive.thriftServer.singleSession=true
+ --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer
+ $SPARK_HOME/carbonlib
+ /carbondata_2.10-0.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar
+ hdfs://hacluster/user/hive/warehouse/carbon.store
+</code></pre></p>
+<ul>
+ <li>Start with Fixed executors and resources</li>
+</ul><p><pre><code>
+ ./bin/spark-submit --conf spark.sql.hive.thriftServer.singleSession=true
+ --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer
+ --num-executors 3 --driver-memory 20g --executor-memory 250g
+ --executor-cores 32
+ /srv/OSCON/BigData/HACluster/install/spark/sparkJdbc/lib
+ /carbondata_2.10-0.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar
+ hdfs://hacluster/user/hive/warehouse/carbon.store
+</code></pre></p><h3>Connecting to CarbonData Thrift Server Using Beeline</h3>
+<p>
+<pre><code>
+ cd SPARK_HOME
+ ./bin/beeline jdbc:hive2://thriftserver_host:port
+
+ Example:
+ ./bin/beeline jdbc:hive2://10.10.10.10:10000
+</code></pre>
+</p>