You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@carbondata.apache.org by ch...@apache.org on 2017/08/11 03:40:04 UTC

[4/4] carbondata-site git commit: Fixed the Failed to load PDF issue

Fixed the Failed to load PDF issue


Project: http://git-wip-us.apache.org/repos/asf/carbondata-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata-site/commit/167fcae3
Tree: http://git-wip-us.apache.org/repos/asf/carbondata-site/tree/167fcae3
Diff: http://git-wip-us.apache.org/repos/asf/carbondata-site/diff/167fcae3

Branch: refs/heads/asf-site
Commit: 167fcae375c2fb7f4e11255968da932456dab505
Parents: edabb2a
Author: chenliang613 <ch...@apache.org>
Authored: Fri Aug 11 11:39:43 2017 +0800
Committer: chenliang613 <ch...@apache.org>
Committed: Fri Aug 11 11:39:43 2017 +0800

----------------------------------------------------------------------
 content/configuration-parameters.html    |  24 ++++++++
 content/ddl-operation-on-carbondata.html |  25 ++++++++-
 content/dml-operation-on-carbondata.html |  78 ++++++++++++++++++++++++--
 content/pdf/maven-pdf-plugin.pdf         | Bin 155540 -> 233933 bytes
 4 files changed, 121 insertions(+), 6 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/167fcae3/content/configuration-parameters.html
----------------------------------------------------------------------
diff --git a/content/configuration-parameters.html b/content/configuration-parameters.html
index 96a91c6..d3c96ea 100644
--- a/content/configuration-parameters.html
+++ b/content/configuration-parameters.html
@@ -292,6 +292,30 @@
 <td>The Number of partitions to use when shuffling data for sort. If user don't configurate or configurate it less than 1, it uses the number of map tasks as reduce tasks. In general, we recommend 2-3 tasks per CPU core in your cluster.</td>
 <td></td>
 </tr>
+<tr>
+<td>carbon.options.bad.records.logger.enable</td>
+<td>false</td>
+<td>Whether to create logs with details about bad records.</td>
+<td></td>
+</tr>
+<tr>
+<td>carbon.bad.records.action</td>
+<td>fail</td>
+<td>This property can have four types of actions for bad records FORCE, REDIRECT, IGNORE and FAIL. If set to FORCE then it auto-corrects the data by storing the bad records as NULL. If set to REDIRECT then bad records are written to the raw CSV instead of being loaded. If set to IGNORE then bad records are neither loaded nor written to the raw CSV. If set to FAIL then data loading fails if any bad records are found.</td>
+<td></td>
+</tr>
+<tr>
+<td>carbon.options.is.empty.data.bad.record</td>
+<td>false</td>
+<td>If false, then empty ("" or '' or ,,) data will not be considered as bad record and vice versa.</td>
+<td></td>
+</tr>
+<tr>
+<td>carbon.options.bad.record.path</td>
+<td></td>
+<td>Specifies the HDFS path where bad records are stored. By default the value is Null. This path must to be configured by the user if bad record logger is enabled or bad record action redirect.</td>
+<td></td>
+</tr>
 </tbody>
 </table>
 <ul>

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/167fcae3/content/ddl-operation-on-carbondata.html
----------------------------------------------------------------------
diff --git a/content/ddl-operation-on-carbondata.html b/content/ddl-operation-on-carbondata.html
index 5c0cd9e..19cb64d 100644
--- a/content/ddl-operation-on-carbondata.html
+++ b/content/ddl-operation-on-carbondata.html
@@ -275,7 +275,13 @@ By default inverted index is enabled. The user can disable the inverted index cr
 <li>
 <p>All dimensions except complex datatype columns are part of multi dimensional key(MDK). This behavior can be overridden by using TBLPROPERTIES. If the user wants to keep any column (except columns of complex datatype) in multi dimensional key then he can keep the columns either in DICTIONARY_EXCLUDE or DICTIONARY_INCLUDE.</p>
 </li>
+<li>
+<p><strong>Sort Columns Configuration</strong></p>
+<p>"SORT_COLUMN" property is for users to specify which columns belong to the MDK index. If user don't specify "SORT_COLUMN" property, by default MDK index be built by using all dimension columns except complex datatype column.</p>
+</li>
 </ul>
+<pre><code>       TBLPROPERTIES ('SORT_COLUMNS'='column1, column3')
+</code></pre>
 <h3>
 <a id="example" class="anchor" href="#example" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Example:</h3>
 <pre><code>    CREATE TABLE IF NOT EXISTS productSchema.productSalesTable (
@@ -290,8 +296,25 @@ By default inverted index is enabled. The user can disable the inverted index cr
       STORED BY 'carbondata'
       TBLPROPERTIES ('DICTIONARY_EXCLUDE'='storeCity',
                      'DICTIONARY_INCLUDE'='productNumber',
-                     'NO_INVERTED_INDEX'='productBatch')
+                     'NO_INVERTED_INDEX'='productBatch',
+                     'SORT_COLUMNS'='productName,storeCity')
+</code></pre>
+<ul>
+<li><strong>SORT_COLUMNS</strong></li>
+</ul>
+<pre><code>This table property specifies the order of the sort column.
+</code></pre>
+<pre><code>    TBLPROPERTIES('SORT_COLUMNS'='column1, column3')
 </code></pre>
+<p>NOTE:</p>
+<ul>
+<li>
+<p>If this property is not specified, then by default SORT_COLUMNS consist of all dimension (exclude Complex Column).</p>
+</li>
+<li>
+<p>If this property is specified but with empty argument, then the table will be loaded without sort. For example, ('SORT_COLUMNS'='')</p>
+</li>
+</ul>
 <h2>
 <a id="show-table" class="anchor" href="#show-table" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>SHOW TABLE</h2>
 <p>This command can be used to list all the tables in current database or all the tables of a specific database.</p>

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/167fcae3/content/dml-operation-on-carbondata.html
----------------------------------------------------------------------
diff --git a/content/dml-operation-on-carbondata.html b/content/dml-operation-on-carbondata.html
index d187004..6e27c75 100644
--- a/content/dml-operation-on-carbondata.html
+++ b/content/dml-operation-on-carbondata.html
@@ -304,10 +304,10 @@ column2:dictionaryFilePath2')
 <p>If this option is set to TRUE, then high.cardinality.identify.enable property will be disabled during data load.</p>
 </li>
 </ul>
-</li>
-</ul>
 <h3>
 <a id="example" class="anchor" href="#example" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Example:</h3>
+</li>
+</ul>
 <pre><code>LOAD DATA local inpath '/opt/rawdata/data.csv' INTO table carbontable
 options('DELIMITER'=',', 'QUOTECHAR'='"','COMMENTCHAR'='#',
 'FILEHEADER'='empno,empname,designation,doj,workgroupcategory,
@@ -319,6 +319,74 @@ options('DELIMITER'=',', 'QUOTECHAR'='"','COMMENTCHAR'='#',
 'SINGLE_PASS'='TRUE'
 )
 </code></pre>
+<ul>
+<li>
+<p><strong>BAD RECORDS HANDLING:</strong> Methods of handling bad records are as follows:</p>
+<ul>
+<li>
+<p>Load all of the data before dealing with the errors.</p>
+</li>
+<li>
+<p>Clean or delete bad records before loading data or stop the loading when bad records are found.</p>
+</li>
+</ul>
+<pre><code>OPTIONS('BAD_RECORDS_LOGGER_ENABLE'='true', 'BAD_RECORD_PATH'='hdfs://hacluster/tmp/carbon', 'BAD_RECORDS_ACTION'='REDIRECT', 'IS_EMPTY_DATA_BAD_RECORD'='false')
+</code></pre>
+<p>NOTE:</p>
+<ul>
+<li>
+<p>If the REDIRECT option is used, Carbon will add all bad records in to a separate CSV file. However, this file must not be used for subsequent data loading because the content may not exactly match the source record. You are advised to cleanse the original source record for further data ingestion. This option is used to remind you which records are bad records.</p>
+</li>
+<li>
+<p>In loaded data, if all records are bad records, the BAD_RECORDS_ACTION is invalid and the load operation fails.</p>
+</li>
+<li>
+<p>The maximum number of characters per column is 100000. If there are more than 100000 characters in a column, data loading will fail.</p>
+</li>
+</ul>
+</li>
+</ul>
+<h3>
+<a id="example-1" class="anchor" href="#example-1" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Example:</h3>
+<pre><code>LOAD DATA INPATH 'filepath.csv'
+INTO TABLE tablename
+OPTIONS('BAD_RECORDS_LOGGER_ENABLE'='true',
+'BAD_RECORD_PATH'='hdfs://hacluster/tmp/carbon',
+'BAD_RECORDS_ACTION'='REDIRECT',
+'IS_EMPTY_DATA_BAD_RECORD'='false');
+</code></pre>
+<p><strong>Bad Records Management Options:</strong></p>
+<table>
+<thead>
+<tr>
+<th>Options</th>
+<th>Default Value</th>
+<th>Description</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>BAD_RECORDS_LOGGER_ENABLE</td>
+<td>false</td>
+<td>Whether to create logs with details about bad records.</td>
+</tr>
+<tr>
+<td>BAD_RECORDS_ACTION</td>
+<td>FAIL</td>
+<td>Following are the four types of action for bad records:  FORCE: Auto-corrects the data by storing the bad records as NULL.  REDIRECT: Bad records are written to the raw CSV instead of being loaded.  IGNORE: Bad records are neither loaded nor written to the raw CSV.  FAIL: Data loading fails if any bad records are found.  NOTE: In loaded data, if all records are bad records, the BAD_RECORDS_ACTION is invalid and the load operation fails.</td>
+</tr>
+<tr>
+<td>IS_EMPTY_DATA_BAD_RECORD</td>
+<td>false</td>
+<td>If false, then empty ("" or '' or ,,) data will not be considered as bad record and vice versa.</td>
+</tr>
+<tr>
+<td>BAD_RECORD_PATH</td>
+<td>-</td>
+<td>Specifies the HDFS path where bad records are stored. By default the value is Null. This path must to be configured by the user if bad record logger is enabled or bad record action redirect.</td>
+</tr>
+</tbody>
+</table>
 <h2>
 <a id="insert-data-into-a-carbondata-table" class="anchor" href="#insert-data-into-a-carbondata-table" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>INSERT DATA INTO A CARBONDATA TABLE</h2>
 <p>This command inserts data into a CarbonData table. It is defined as a combination of two queries Insert and Select query respectively. It inserts records from a source table into a target CarbonData table. The source table can be a Hive table, Parquet table or a CarbonData table itself. It comes with the functionality to aggregate the records of a table by performing Select query on source table and load its corresponding resultant records into a CarbonData table.</p>
@@ -416,7 +484,7 @@ LIMIT number_of_segments;
 </tbody>
 </table>
 <h3>
-<a id="example-1" class="anchor" href="#example-1" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Example:</h3>
+<a id="example-2" class="anchor" href="#example-2" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Example:</h3>
 <pre><code>SHOW SEGMENTS FOR TABLE CarbonDatabase.CarbonTable LIMIT 4;
 </code></pre>
 <h2>
@@ -458,7 +526,7 @@ Using this segment ID, you can remove the segment.</p>
 </tbody>
 </table>
 <h3>
-<a id="example-2" class="anchor" href="#example-2" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Example:</h3>
+<a id="example-3" class="anchor" href="#example-3" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Example:</h3>
 <pre><code>DELETE FROM TABLE CarbonDatabase.CarbonTable WHERE SEGMENT.ID IN (0);
 DELETE FROM TABLE CarbonDatabase.CarbonTable WHERE SEGMENT.ID IN (0,5,8);
 </code></pre>
@@ -499,7 +567,7 @@ WHERE SEGMENT.STARTTIME BEFORE DATE_VALUE
 </tbody>
 </table>
 <h3>
-<a id="example-3" class="anchor" href="#example-3" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Example:</h3>
+<a id="example-4" class="anchor" href="#example-4" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Example:</h3>
 <pre><code> DELETE FROM TABLE CarbonDatabase.CarbonTable 
  WHERE SEGMENT.STARTTIME BEFORE '2017-06-01 12:05:06';  
 </code></pre>

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/167fcae3/content/pdf/maven-pdf-plugin.pdf
----------------------------------------------------------------------
diff --git a/content/pdf/maven-pdf-plugin.pdf b/content/pdf/maven-pdf-plugin.pdf
index cb6de01..37389f8 100644
Binary files a/content/pdf/maven-pdf-plugin.pdf and b/content/pdf/maven-pdf-plugin.pdf differ