You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@crail.apache.org by pe...@apache.org on 2018/09/06 11:01:07 UTC

[2/5] incubator-crail git commit: Documentation: HDFS Adapter

Documentation: HDFS Adapter

Add Spark HDFS Adapater documentation.

Signed-off-by: Jonas Pfefferle <pe...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/incubator-crail/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-crail/commit/f1dcb0d2
Tree: http://git-wip-us.apache.org/repos/asf/incubator-crail/tree/f1dcb0d2
Diff: http://git-wip-us.apache.org/repos/asf/incubator-crail/diff/f1dcb0d2

Branch: refs/heads/master
Commit: f1dcb0d20b6b492861e32bb3a919217cf17a98ac
Parents: 0e536ca
Author: Jonas Pfefferle <pe...@apache.org>
Authored: Wed Aug 15 10:45:54 2018 +0200
Committer: Jonas Pfefferle <pe...@apache.org>
Committed: Thu Sep 6 12:59:41 2018 +0200

----------------------------------------------------------------------
 doc/source/spark.rst | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-crail/blob/f1dcb0d2/doc/source/spark.rst
----------------------------------------------------------------------
diff --git a/doc/source/spark.rst b/doc/source/spark.rst
index 3f222ad..b999ed8 100644
--- a/doc/source/spark.rst
+++ b/doc/source/spark.rst
@@ -1,9 +1,40 @@
 Spark
 =====
 
+Crail can be used to increase performance or enhance flexibility in
+`Apache Spark <https://spark.apache.org/>`_. We provide multiple plugins to allow
+Crail to be used as:
+
+* :ref:`HDFS Adapter`: input and output
+* :ref:`Spark-IO`: shuffle data and broadcast store
+
+HDFS Adapter
+------------
+
+The Crail HDFS adapter is provided with every Crail :ref:`deployment <Deploy Crail>`.
+The HDFS adpater allows to replace every HDFS path with a path on Crail.
+However for it to be used for input and output in Spark the jar file paths
+have to be added to the Spark configuration spark-defaults.conf:
+
+.. code-block:: bash
+
+   spark.driver.extraClassPath      $CRAIL_HOME/jars/*
+   spark.executor.extraClassPath    $CRAIL_HOME/jars/*
+
+Data in Crail can be accessed by prepending the value of :code:`crail.namenode.address`
+from :ref:`crail-site.conf` to any HDFS path. For example :code:`crail://localhost:9060/test`
+accesses :code:`/test` in Crail.
+Note that Crail works independent of HDFS and does not interact with HDFS in
+any way. However Crail does not completely replace HDFS since we do not offer
+durability and fault tolerance cf. :ref:`Introduction`.
+A good fit for Crail is for example inter-job data that can be recomputed
+from the original data in HDFS.
+
 Spark-IO
 --------
 
+
+
 Crail-TeraSort
 --------------