You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@carbondata.apache.org by ch...@apache.org on 2017/02/04 02:38:20 UTC

[21/35] incubator-carbondata-site git commit: Updated website for CarbonData release 1.0.0

http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/0d4cdb1c/content/docs/latest/installation-guide.html
----------------------------------------------------------------------
diff --git a/content/docs/latest/installation-guide.html b/content/docs/latest/installation-guide.html
new file mode 100644
index 0000000..168f895
--- /dev/null
+++ b/content/docs/latest/installation-guide.html
@@ -0,0 +1,330 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+--><h1>Installation Guide</h1><p>This tutorial guides you through the installation and configuration
+    of CarbonData in the following two modes :</p>
+<ul>
+    <li><a href="#installing-spark-cluster">Installing and
+        Configuring CarbonData on Standalone Spark Cluster</a></li>
+    <li><a href="#installing-yarn-cluster">Installing and
+        Configuring CarbonData on "Spark on YARN" Cluster</a></li>
+</ul><p>followed by :</p>
+<ul>
+    <li><a href="#query-execution">Query Execution using CarbonData
+        Thrift Server</a></li>
+</ul><h2 id="installing-spark-cluster">Installing and Configuring CarbonData on Standalone Spark Cluster</h2><h3>
+    Prerequisites</h3>
+<ul>
+    <li><p>Hadoop HDFS and Yarn should be installed and running.</p></li>
+    <li><p>Spark should be installed and running on all the cluster nodes.</p></li>
+    <li><p>CarbonData user should have permission to access HDFS.</p></li>
+</ul><h3>Procedure</h3>
+<ul>
+    <li><p><a
+            href="https://github.com/apache/incubator-carbondata/blob/master/build/README.md" target="_blank">Build
+        the CarbonData</a> project and get the assembly jar from
+        "./assembly/target/scala-2.10/carbondata_xxx.jar" and put in the <code>&quot;&lt;SPARK_HOME&gt;/carbonlib&quot;</code>
+        folder.</p>
+        <p>NOTE: Create the carbonlib folder if it does not exists inside <code>&quot;&lt;SPARK_HOME&gt;&quot;</code>
+            path.</p></li>
+    <li><p>Add the carbonlib folder path in the Spark classpath. (Edit <code>&quot;&lt;SPARK_HOME&gt;/conf/spark-env.sh&quot;</code>
+        file and modify the value of SPARK_CLASSPATH by appending <br/><code>&quot;&lt;SPARK_HOME&gt;/carbonlib/*&quot;</code>
+        to the existing value)</p></li>
+    <li><p>Copy the carbon.properties.template to <code>&quot;&lt;SPARK_HOME&gt;/conf/carbon.properties&quot;</code>
+        folder from "./conf/" of CarbonData repository.</p></li>
+    <li><p>Copy the "carbonplugins" folder to <code>&quot;&lt;SPARK_HOME&gt;/carbonlib&quot;</code>
+        folder from "./processing/" folder of CarbonData repository.</p>
+        <p>NOTE: carbonplugins will contain .kettle folder.</p></li>
+    <li><p>In Spark node, configure the properties mentioned in the following table in <code>&quot;&lt;SPARK_HOME&gt;/conf/spark-defaults.conf&quot;</code>
+        file.</p></li>
+</ul>
+<table class="table table-striped table-bordered">
+    <thead>
+    <tr>
+        <th>Property</th>
+        <th>Description</th>
+        <th>Value</th>
+    </tr>
+    </thead>
+    <tbody>
+    <tr>
+        <td>carbon.kettle.home</td>
+        <td>Path that will be used by CarbonData internally to create graph for loading the data
+        </td>
+        <td>$SPARK_HOME /carbonlib/carbonplugins</td>
+    </tr>
+    <tr>
+        <td>spark.driver.extraJavaOptions</td>
+        <td>A string of extra JVM options to pass to the driver. For instance, GC settings or other
+            logging.
+        </td>
+        <td>-Dcarbon.properties.filepath=$SPARK_HOME/conf/carbon.properties</td>
+    </tr>
+    <tr>
+        <td>spark.executor.extraJavaOptions</td>
+        <td>A string of extra JVM options to pass to executors. For instance, GC settings or other
+            logging. NOTE: You can enter multiple values separated by space.
+        </td>
+        <td>-Dcarbon.properties.filepath=$SPARK_HOME/conf/carbon.properties</td>
+    </tr>
+    </tbody>
+</table>
+<ul>
+    <li>Add the following properties in <code>&quot;&lt;SPARK_HOME&gt;/conf/&quot;
+        carbon.properties</code>:
+    </li>
+</ul>
+<table class="table table-striped table-bordered">
+    <thead>
+    <tr>
+        <th>Property</th>
+        <th>Required</th>
+        <th>Description</th>
+        <th>Example</th>
+        <th>Remark</th>
+    </tr>
+    </thead>
+    <tbody>
+    <tr>
+        <td>carbon.storelocation</td>
+        <td>NO</td>
+        <td>Location where data CarbonData will create the store and write the data in its own
+            format.
+        </td>
+        <td>hdfs://HOSTNAME:PORT/Opt/CarbonStore</td>
+        <td>Propose to set HDFS directory</td>
+    </tr>
+    <tr>
+        <td>carbon.kettle.home</td>
+        <td>YES</td>
+        <td>Path that will be used by CarbonData internally to create graph for loading the data.
+        </td>
+        <td>$SPARK_HOME/carbonlib/carbonplugins</td>
+        <td></td>
+    </tr>
+    </tbody>
+</table>
+<ul>
+    <li>Verify the installation. For example:</li>
+</ul><p><code>
+    ./spark-shell --master spark://HOSTNAME:PORT --total-executor-cores 2
+    --executor-memory 2G
+</code></p><p>NOTE: Make sure you have permissions for CarbonData JARs and files through which
+    driver and executor will start.</p><p>To get started with CarbonData : <a
+        href="mainpage.html?page=quickStart">Quick Start</a> , <a href="mainpage.html?page=ddl">DDL
+    Operations on CarbonData</a></p>
+   <h2 id="installing-yarn-cluster">Installing and Configuring CarbonData on "Spark on YARN"
+    Cluster</h2><p>This section provides the procedure to install CarbonData on "Spark on YARN"
+    cluster.</p><h3>Prerequisites</h3>
+<ul>
+    <li>Hadoop HDFS and Yarn should be installed and running.</li>
+    <li>Spark should be installed and running in all the clients.</li>
+    <li>CarbonData user should have permission to access HDFS.</li>
+</ul><h3>Procedure</h3><p>The following steps are only for Driver Nodes. (Driver nodes are the one
+    which starts the spark context.)</p>
+<ul>
+    <li><p><a
+            href="https://github.com/apache/incubator-carbondata/blob/master/build/README.md" target="_blank">Build
+        the CarbonData</a> project and get the assembly jar from
+        "./assembly/target/scala-2.10/carbondata_xxx.jar" and put in the <code>&quot;&lt;SPARK_HOME&gt;/carbonlib&quot;</code>
+        folder.</p>
+        <p>NOTE: Create the carbonlib folder if it does not exists inside <code>&quot;&lt;SPARK_HOME&gt;&quot;</code>
+            path.</p></li>
+    <li><p>Copy "carbonplugins" folder to <code>&quot;&lt;SPARK_HOME&gt;/carbonlib&quot;</code>
+        folder from "./processing/" folder of CarbonData repository. carbonplugins will contain
+        .kettle folder.</p></li>
+    <li><p>Copy the "carbon.properties.template" to <code>&quot;&lt;SPARK_HOME&gt;/conf/carbon.properties&quot;</code>
+        folder from conf folder of CarbonData repository.</p></li>
+    <li>Modify the parameters in "spark-default.conf" located in the <code>&quot;&lt;SPARK_HOME&gt;/conf</code>"
+    </li>
+</ul>
+<table class="table table-striped table-bordered">
+    <thead>
+    <tr>
+        <th>Property</th>
+        <th>Description</th>
+        <th>Value</th>
+    </tr>
+    </thead>
+    <tbody>
+    <tr>
+        <td>spark.master</td>
+        <td>Set this value to run the Spark in yarn cluster mode.</td>
+        <td>Set "yarn-client" to run the Spark in yarn cluster mode.</td>
+    </tr>
+    <tr>
+        <td>spark.yarn.dist.files</td>
+        <td>Comma-separated list of files to be placed in the working directory of each executor.
+        </td>
+        <td><code>&quot;&lt;YOUR_SPARK_HOME_PATH&gt;&quot;/conf/carbon.properties</code></td>
+    </tr>
+    <tr>
+        <td>spark.yarn.dist.archives</td>
+        <td>Comma-separated list of archives to be extracted into the working directory of each
+            executor.
+        </td>
+        <td><code>&quot;&lt;YOUR_SPARK_HOME_PATH&gt;&quot;/carbonlib/carbondata_xxx.jar</code></td>
+    </tr>
+    <tr>
+        <td>spark.executor.extraJavaOptions</td>
+        <td>A string of extra JVM options to pass to executors. For instance NOTE: You can enter
+            multiple values separated by space.
+        </td>
+        <td><code>-Dcarbon.properties.filepath=&quot;&lt;YOUR_SPARK_HOME_PATH&gt;&quot;/conf/carbon.properties</code>
+        </td>
+    </tr>
+    <tr>
+        <td>spark.executor.extraClassPath</td>
+        <td>Extra classpath entries to prepend to the classpath of executors. NOTE: If
+            SPARK_CLASSPATH is defined in spark-env.sh, then comment it and append the values in
+            below parameter spark.driver.extraClassPath
+        </td>
+        <td><code>
+            &quot;&lt;YOUR_SPARK_HOME_PATH&gt;&quot;/carbonlib/carbonlib/carbondata_xxx.jar</code>
+        </td>
+    </tr>
+    <tr>
+        <td>spark.driver.extraClassPath</td>
+        <td>Extra classpath entries to prepend to the classpath of the driver. NOTE: If
+            SPARK_CLASSPATH is defined in spark-env.sh, then comment it and append the value in
+            below parameter spark.driver.extraClassPath.
+        </td>
+        <td><code>
+            &quot;&lt;YOUR_SPARK_HOME_PATH&gt;&quot;/carbonlib/carbonlib/carbondata_xxx.jar</code>
+        </td>
+    </tr>
+    <tr>
+        <td>spark.driver.extraJavaOptions</td>
+        <td>A string of extra JVM options to pass to the driver. For instance, GC settings or other
+            logging.
+        </td>
+        <td><code>-Dcarbon.properties.filepath=&quot;&lt;YOUR_SPARK_HOME_PATH&gt;&quot;/conf/carbon.properties</code>
+        </td>
+    </tr>
+    <tr>
+        <td>carbon.kettle.home</td>
+        <td>Path that will be used by CarbonData internally to create graph for loading the data.
+        </td>
+        <td><code>&quot;&lt;YOUR_SPARK_HOME_PATH&gt;&quot;/carbonlib/carbonplugins</code></td>
+    </tr>
+    </tbody>
+</table>
+<ul>
+    <li>Add the following properties in <code>&lt;SPARK_HOME&gt;/conf/ carbon.properties</code>:
+    </li>
+</ul>
+<table class="table table-striped table-bordered">
+    <thead>
+    <tr>
+        <th>Property</th>
+        <th>Required</th>
+        <th>Description</th>
+        <th>Example</th>
+        <th>Default Value</th>
+    </tr>
+    </thead>
+    <tbody>
+    <tr>
+        <td>carbon.storelocation</td>
+        <td>NO</td>
+        <td>Location where CarbonData will create the store and write the data in its own format.
+        </td>
+        <td>hdfs://HOSTNAME:PORT/Opt/CarbonStore</td>
+        <td>Propose to set HDFS directory</td>
+    </tr>
+    <tr>
+        <td>carbon.kettle.home</td>
+        <td>YES</td>
+        <td>Path that will be used by CarbonData internally to create graph for loading the data.
+        </td>
+        <td>$SPARK_HOME/carbonlib/carbonplugins</td>
+        <td></td>
+    </tr>
+    </tbody>
+</table>
+<ul>
+    <li>Verify the installation.</li>
+</ul><p><code>
+    ./bin/spark-shell --master yarn-client --driver-memory 1g
+    --executor-cores 2 --executor-memory 2G
+</code> NOTE: Make sure you have permissions for CarbonData JARs and files through which driver and
+    executor will start.</p><p>Getting started with CarbonData :<a
+        href="mainpage.html?page=quickStart">Quick Start</a> , <a href="mainpage.html?page=ddl">DDL Operations on CarbonData</a></p>
+
+<h2 id="query-execution">
+    Query Execution Using CarbonData Thrift Server</h2><h3>Starting CarbonData Thrift Server</h3><p>
+    a. cd <code>&lt;SPARK_HOME&gt;</code></p><p>b. Run the following command to start the CarbonData
+    thrift server.</p><p><code>
+    ./bin/spark-submit --conf spark.sql.hive.thriftServer.singleSession=true
+    --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer
+    $SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR &lt;carbon_store_path&gt;
+</code></p>
+<table class="table table-striped table-bordered">
+    <thead>
+    <tr>
+        <th>Parameter</th>
+        <th>Description</th>
+        <th>Example</th>
+    </tr>
+    </thead>
+    <tbody>
+    <tr>
+        <td>CARBON_ASSEMBLY_JAR</td>
+        <td>CarbonData assembly jar name present in the <code>&quot;&lt;SPARK_HOME&gt;&quot;/carbonlib/</code>
+            folder.
+        </td>
+        <td>carbondata_2.10-0.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar</td>
+    </tr>
+    <tr>
+        <td>carbon_store_path</td>
+        <td>This is a parameter to the CarbonThriftServer class. This a HDFS path where CarbonData
+            files will be kept. Strongly Recommended to put same as carbon.storelocation parameter
+            of carbon.properties.
+        </td>
+        <td><code>hdfs//&lt;host_name&gt;:54310/user/hive/warehouse/carbon.store</code></td>
+    </tr>
+    </tbody>
+</table><h3>Examples</h3>
+<ul>
+    <li>Start with default memory and executors</li>
+</ul><p><pre><code>
+    ./bin/spark-submit --conf spark.sql.hive.thriftServer.singleSession=true
+    --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer
+    $SPARK_HOME/carbonlib
+    /carbondata_2.10-0.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar
+    hdfs://hacluster/user/hive/warehouse/carbon.store
+</code></pre></p>
+<ul>
+    <li>Start with Fixed executors and resources</li>
+</ul><p><pre><code>
+    ./bin/spark-submit --conf spark.sql.hive.thriftServer.singleSession=true
+    --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer
+    --num-executors 3 --driver-memory 20g --executor-memory 250g
+    --executor-cores 32
+    /srv/OSCON/BigData/HACluster/install/spark/sparkJdbc/lib
+    /carbondata_2.10-0.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar
+    hdfs://hacluster/user/hive/warehouse/carbon.store
+</code></pre></p><h3>Connecting to CarbonData Thrift Server Using Beeline</h3>
+<p>
+<pre><code>
+    cd SPARK_HOME
+    ./bin/beeline jdbc:hive2://thriftserver_host:port
+
+    Example:
+    ./bin/beeline jdbc:hive2://10.10.10.10:10000
+</code></pre>
+</p>