You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@carbondata.apache.org by GitBox <gi...@apache.org> on 2020/05/03 17:42:14 UTC

[GitHub] [carbondata] akashrn5 opened a new pull request #3736: [WIP]correct the link, grammars and content of dml-management document

akashrn5 opened a new pull request #3736:
URL: https://github.com/apache/carbondata/pull/3736


    ### Why is this PR needed?
    
    
    ### What changes were proposed in this PR?
   
       
    ### Does this PR introduce any user interface change?
    - No
    - Yes. (please explain the change and update document)
   
    ### Is any new testcase added?
    - No
    - Yes
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] chetandb commented on a change in pull request #3736: [CARBONDATA-3791]Correct the link, grammars and content of dml-management document

Posted by GitBox <gi...@apache.org>.

chetandb commented on a change in pull request #3736:
URL: https://github.com/apache/carbondata/pull/3736#discussion_r419334449



##########
File path: docs/dml-of-carbondata.md
##########
@@ -219,61 +218,57 @@ CarbonData DML statements are documented here,which includes:
     OPTIONS('BAD_RECORDS_LOGGER_ENABLE'='true', 'BAD_RECORD_PATH'='hdfs://hacluster/tmp/carbon', 'BAD_RECORDS_ACTION'='REDIRECT', 'IS_EMPTY_DATA_BAD_RECORD'='false')
     ```
 
-  **NOTE:**
-  * BAD_RECORDS_ACTION property can have four type of actions for bad records FORCE, REDIRECT, IGNORE and FAIL.
-  * FAIL option is its Default value. If the FAIL option is used, then data loading fails if any bad records are found.
-  * If the REDIRECT option is used, CarbonData will add all bad records in to a separate CSV file. However, this file must not be used for subsequent data loading because the content may not exactly match the source record. You are advised to cleanse the original source record for further data ingestion. This option is used to remind you which records are bad records.
-  * If the FORCE option is used, then it auto-converts the data by storing the bad records as NULL before Loading data.
-  * If the IGNORE option is used, then bad records are neither loaded nor written to the separate CSV file.
-  * In loaded data, if all records are bad records, the BAD_RECORDS_ACTION is invalid and the load operation fails.
-  * The default maximum number of characters per column is 32000. If there are more than 32000 characters in a column, please refer to *String longer than 32000 characters* section.
-  * Since Bad Records Path can be specified in create, load and carbon properties. 
-    Therefore, value specified in load will have the highest priority, and value specified in carbon properties will have the least priority.
+    **NOTE:**
+    * BAD_RECORDS_ACTION property can have four types of actions for bad records FORCE, REDIRECT, IGNORE, and FAIL.
+    * FAIL option is its Default value. If the FAIL option is used, then data loading fails if any bad records are found.
+    * If the REDIRECT option is used, CarbonData will add all bad records into a separate CSV file. However, this file must not be used for subsequent data loading because the content may not exactly match the source record. You are advised to cleanse the source record for further data ingestion. This option is used to remind you which records are bad.
+    * If the FORCE option is used, then it auto-converts the data by storing the bad records as NULL before Loading data.
+    * If the IGNORE option is used, then bad records are neither loaded nor written to the separate CSV file.
+    * In loaded data, if all records are bad records, the BAD_RECORDS_ACTION is invalid and the load operation fails.
+    * The default maximum number of characters per column is 32000. If there are more than 32000 characters in a column, please refer to *String longer than 32000 characters* section.
+    * Since Bad Records Path can be specified in create, load and carbon properties. 
+      Therefore, the value specified in load will have the highest priority, and value specified in carbon properties will have the least priority.
 
-  Example:
+    Example:
 
-  ```
-  LOAD DATA INPATH 'filepath.csv' INTO TABLE tablename
-  OPTIONS('BAD_RECORDS_LOGGER_ENABLE'='true','BAD_RECORD_PATH'='hdfs://hacluster/tmp/carbon',
-  'BAD_RECORDS_ACTION'='REDIRECT','IS_EMPTY_DATA_BAD_RECORD'='false')
-  ```
+    ```
+    LOAD DATA INPATH 'filepath.csv' INTO TABLE tablename
+    OPTIONS('BAD_RECORDS_LOGGER_ENABLE'='true','BAD_RECORD_PATH'='hdfs://hacluster/tmp/carbon',
+    'BAD_RECORDS_ACTION'='REDIRECT','IS_EMPTY_DATA_BAD_RECORD'='false')
+    ```
 
   - ##### GLOBAL_SORT_PARTITIONS:
 
-    If the SORT_SCOPE is defined as GLOBAL_SORT, then user can specify the number of partitions to use while shuffling data for sort using GLOBAL_SORT_PARTITIONS. If it is not configured, or configured less than 1, then it uses the number of map task as reduce task. It is recommended that each reduce task deal with 512MB-1GB data.
+    If the SORT_SCOPE is defined as GLOBAL_SORT, then the user can specify the number of partitions to use while shuffling data for sort using GLOBAL_SORT_PARTITIONS. If it is not configured, or configured less than 1, then it uses the number of map tasks as reduce tasks. It is recommended that each reduce task to deal with 512MB-1GB data.

Review comment:
       "It is recommended that each reduce task to deal with 512MB-1GB data." - This can be modified to "It is recommended that each reduce task deals with 512MB-1GB data."

##########
File path: docs/dml-of-carbondata.md
##########
@@ -316,12 +311,12 @@ CarbonData DML statements are documented here,which includes:
   INSERT OVERWRITE TABLE table1 SELECT * FROM TABLE2
   ```
 
-### INSERT DATA INTO CARBONDATA TABLE From Stage Input Files
+## INSERT DATA INTO CARBONDATA TABLE From Stage Input Files
 
   Stage input files are data files written by external application (such as Flink). These files 
   are committed but not loaded into the table. 
   
-  You can use this command to insert them into the table, so that making them visible for query.
+  User can use this command to insert them into the table, so that making them visible for a query.

Review comment:
       "User can use this command to insert them into the table, so that making them visible for a query." can be changed to "User can use this command to insert them into the table, thus making them visible for a query."

##########
File path: docs/dml-of-carbondata.md
##########
@@ -352,18 +347,18 @@ CarbonData DML statements are documented here,which includes:
     OPTIONS('batch_file_order'='DESC')
     ```
 
-  Examples:
-  ```
-  INSERT INTO table1 STAGE
-
-  INSERT INTO table1 STAGE OPTIONS('batch_file_count' = '5')
-  Note: This command use the default file order, will insert the earliest stage files into the table.
-
-  INSERT INTO table1 STAGE OPTIONS('batch_file_count' = '5', 'batch_file_order'='DESC')
-  Note: This command will insert the latest stage files into the table.
-  ```
+    Examples:
+    ```
+    INSERT INTO table1 STAGE
+  
+    INSERT INTO table1 STAGE OPTIONS('batch_file_count' = '5')
+    Note: This command use the default file order, will insert the earliest stage files into the table.

Review comment:
       Change "This command use" to "This command uses"




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] kunal642 commented on a change in pull request #3736: [CARBONDATA-3791]Correct the link, grammars and content of dml-management document

Posted by GitBox <gi...@apache.org>.

kunal642 commented on a change in pull request #3736:
URL: https://github.com/apache/carbondata/pull/3736#discussion_r420117381



##########
File path: docs/dml-of-carbondata.md
##########
@@ -219,61 +218,57 @@ CarbonData DML statements are documented here,which includes:
     OPTIONS('BAD_RECORDS_LOGGER_ENABLE'='true', 'BAD_RECORD_PATH'='hdfs://hacluster/tmp/carbon', 'BAD_RECORDS_ACTION'='REDIRECT', 'IS_EMPTY_DATA_BAD_RECORD'='false')
     ```
 
-  **NOTE:**
-  * BAD_RECORDS_ACTION property can have four type of actions for bad records FORCE, REDIRECT, IGNORE and FAIL.
-  * FAIL option is its Default value. If the FAIL option is used, then data loading fails if any bad records are found.
-  * If the REDIRECT option is used, CarbonData will add all bad records in to a separate CSV file. However, this file must not be used for subsequent data loading because the content may not exactly match the source record. You are advised to cleanse the original source record for further data ingestion. This option is used to remind you which records are bad records.
-  * If the FORCE option is used, then it auto-converts the data by storing the bad records as NULL before Loading data.
-  * If the IGNORE option is used, then bad records are neither loaded nor written to the separate CSV file.
-  * In loaded data, if all records are bad records, the BAD_RECORDS_ACTION is invalid and the load operation fails.
-  * The default maximum number of characters per column is 32000. If there are more than 32000 characters in a column, please refer to *String longer than 32000 characters* section.
-  * Since Bad Records Path can be specified in create, load and carbon properties. 
-    Therefore, value specified in load will have the highest priority, and value specified in carbon properties will have the least priority.
+    **NOTE:**
+    * BAD_RECORDS_ACTION property can have four types of actions for bad records FORCE, REDIRECT, IGNORE, and FAIL.

Review comment:
       There is a lot of indentation in the start of the lines...If they dont cause any change to the actual doc then better to remove




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3736: [CARBONDATA-3791]Correct the link, grammars and content of dml-management document

Posted by GitBox <gi...@apache.org>.

CarbonDataQA1 commented on pull request #3736:
URL: https://github.com/apache/carbondata/pull/3736#issuecomment-623926592


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1218/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3736: [CARBONDATA-3791]Correct the link, grammars and content of dml-management document

Posted by GitBox <gi...@apache.org>.

CarbonDataQA1 commented on pull request #3736:
URL: https://github.com/apache/carbondata/pull/3736#issuecomment-623348862


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2921/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3736: [CARBONDATA-3791]Correct the link, grammars and content of dml-management document

Posted by GitBox <gi...@apache.org>.

CarbonDataQA1 commented on pull request #3736:
URL: https://github.com/apache/carbondata/pull/3736#issuecomment-623355483


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1203/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3736: [CARBONDATA-3791]Correct the link, grammars and content of dml-management document

Posted by GitBox <gi...@apache.org>.

akashrn5 commented on a change in pull request #3736:
URL: https://github.com/apache/carbondata/pull/3736#discussion_r420279748



##########
File path: docs/dml-of-carbondata.md
##########
@@ -219,61 +218,57 @@ CarbonData DML statements are documented here,which includes:
     OPTIONS('BAD_RECORDS_LOGGER_ENABLE'='true', 'BAD_RECORD_PATH'='hdfs://hacluster/tmp/carbon', 'BAD_RECORDS_ACTION'='REDIRECT', 'IS_EMPTY_DATA_BAD_RECORD'='false')
     ```
 
-  **NOTE:**
-  * BAD_RECORDS_ACTION property can have four type of actions for bad records FORCE, REDIRECT, IGNORE and FAIL.
-  * FAIL option is its Default value. If the FAIL option is used, then data loading fails if any bad records are found.
-  * If the REDIRECT option is used, CarbonData will add all bad records in to a separate CSV file. However, this file must not be used for subsequent data loading because the content may not exactly match the source record. You are advised to cleanse the original source record for further data ingestion. This option is used to remind you which records are bad records.
-  * If the FORCE option is used, then it auto-converts the data by storing the bad records as NULL before Loading data.
-  * If the IGNORE option is used, then bad records are neither loaded nor written to the separate CSV file.
-  * In loaded data, if all records are bad records, the BAD_RECORDS_ACTION is invalid and the load operation fails.
-  * The default maximum number of characters per column is 32000. If there are more than 32000 characters in a column, please refer to *String longer than 32000 characters* section.
-  * Since Bad Records Path can be specified in create, load and carbon properties. 
-    Therefore, value specified in load will have the highest priority, and value specified in carbon properties will have the least priority.
+    **NOTE:**
+    * BAD_RECORDS_ACTION property can have four types of actions for bad records FORCE, REDIRECT, IGNORE, and FAIL.

Review comment:
       indentation is needed, this was beacause main bullet points were not proper, if you see the old document, these are required




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3736: [CARBONDATA-3791]Correct the link, grammars and content of dml-management document

Posted by GitBox <gi...@apache.org>.

CarbonDataQA1 commented on pull request #3736:
URL: https://github.com/apache/carbondata/pull/3736#issuecomment-623933556


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2936/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3736: [CARBONDATA-3791]Correct the link, grammars and content of dml-management document

Posted by GitBox <gi...@apache.org>.

CarbonDataQA1 commented on pull request #3736:
URL: https://github.com/apache/carbondata/pull/3736#issuecomment-623555888






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3736: [CARBONDATA-3791]Correct the link, grammars and content of dml-management document

Posted by GitBox <gi...@apache.org>.

akashrn5 commented on a change in pull request #3736:
URL: https://github.com/apache/carbondata/pull/3736#discussion_r419451876



##########
File path: docs/dml-of-carbondata.md
##########
@@ -316,12 +311,12 @@ CarbonData DML statements are documented here,which includes:
   INSERT OVERWRITE TABLE table1 SELECT * FROM TABLE2
   ```
 
-### INSERT DATA INTO CARBONDATA TABLE From Stage Input Files
+## INSERT DATA INTO CARBONDATA TABLE From Stage Input Files
 
   Stage input files are data files written by external application (such as Flink). These files 
   are committed but not loaded into the table. 
   
-  You can use this command to insert them into the table, so that making them visible for query.
+  User can use this command to insert them into the table, so that making them visible for a query.

Review comment:
       done

##########
File path: docs/dml-of-carbondata.md
##########
@@ -219,61 +218,57 @@ CarbonData DML statements are documented here,which includes:
     OPTIONS('BAD_RECORDS_LOGGER_ENABLE'='true', 'BAD_RECORD_PATH'='hdfs://hacluster/tmp/carbon', 'BAD_RECORDS_ACTION'='REDIRECT', 'IS_EMPTY_DATA_BAD_RECORD'='false')
     ```
 
-  **NOTE:**
-  * BAD_RECORDS_ACTION property can have four type of actions for bad records FORCE, REDIRECT, IGNORE and FAIL.
-  * FAIL option is its Default value. If the FAIL option is used, then data loading fails if any bad records are found.
-  * If the REDIRECT option is used, CarbonData will add all bad records in to a separate CSV file. However, this file must not be used for subsequent data loading because the content may not exactly match the source record. You are advised to cleanse the original source record for further data ingestion. This option is used to remind you which records are bad records.
-  * If the FORCE option is used, then it auto-converts the data by storing the bad records as NULL before Loading data.
-  * If the IGNORE option is used, then bad records are neither loaded nor written to the separate CSV file.
-  * In loaded data, if all records are bad records, the BAD_RECORDS_ACTION is invalid and the load operation fails.
-  * The default maximum number of characters per column is 32000. If there are more than 32000 characters in a column, please refer to *String longer than 32000 characters* section.
-  * Since Bad Records Path can be specified in create, load and carbon properties. 
-    Therefore, value specified in load will have the highest priority, and value specified in carbon properties will have the least priority.
+    **NOTE:**
+    * BAD_RECORDS_ACTION property can have four types of actions for bad records FORCE, REDIRECT, IGNORE, and FAIL.
+    * FAIL option is its Default value. If the FAIL option is used, then data loading fails if any bad records are found.
+    * If the REDIRECT option is used, CarbonData will add all bad records into a separate CSV file. However, this file must not be used for subsequent data loading because the content may not exactly match the source record. You are advised to cleanse the source record for further data ingestion. This option is used to remind you which records are bad.
+    * If the FORCE option is used, then it auto-converts the data by storing the bad records as NULL before Loading data.
+    * If the IGNORE option is used, then bad records are neither loaded nor written to the separate CSV file.
+    * In loaded data, if all records are bad records, the BAD_RECORDS_ACTION is invalid and the load operation fails.
+    * The default maximum number of characters per column is 32000. If there are more than 32000 characters in a column, please refer to *String longer than 32000 characters* section.
+    * Since Bad Records Path can be specified in create, load and carbon properties. 
+      Therefore, the value specified in load will have the highest priority, and value specified in carbon properties will have the least priority.
 
-  Example:
+    Example:
 
-  ```
-  LOAD DATA INPATH 'filepath.csv' INTO TABLE tablename
-  OPTIONS('BAD_RECORDS_LOGGER_ENABLE'='true','BAD_RECORD_PATH'='hdfs://hacluster/tmp/carbon',
-  'BAD_RECORDS_ACTION'='REDIRECT','IS_EMPTY_DATA_BAD_RECORD'='false')
-  ```
+    ```
+    LOAD DATA INPATH 'filepath.csv' INTO TABLE tablename
+    OPTIONS('BAD_RECORDS_LOGGER_ENABLE'='true','BAD_RECORD_PATH'='hdfs://hacluster/tmp/carbon',
+    'BAD_RECORDS_ACTION'='REDIRECT','IS_EMPTY_DATA_BAD_RECORD'='false')
+    ```
 
   - ##### GLOBAL_SORT_PARTITIONS:
 
-    If the SORT_SCOPE is defined as GLOBAL_SORT, then user can specify the number of partitions to use while shuffling data for sort using GLOBAL_SORT_PARTITIONS. If it is not configured, or configured less than 1, then it uses the number of map task as reduce task. It is recommended that each reduce task deal with 512MB-1GB data.
+    If the SORT_SCOPE is defined as GLOBAL_SORT, then the user can specify the number of partitions to use while shuffling data for sort using GLOBAL_SORT_PARTITIONS. If it is not configured, or configured less than 1, then it uses the number of map tasks as reduce tasks. It is recommended that each reduce task to deal with 512MB-1GB data.

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] kunal642 commented on pull request #3736: [CARBONDATA-3791]Correct the link, grammars and content of dml-management document

Posted by GitBox <gi...@apache.org>.

kunal642 commented on pull request #3736:
URL: https://github.com/apache/carbondata/pull/3736#issuecomment-624500865


   LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3736: [WIP]correct the link, grammars and content of dml-management document

Posted by GitBox <gi...@apache.org>.

CarbonDataQA1 commented on pull request #3736:
URL: https://github.com/apache/carbondata/pull/3736#issuecomment-623175013


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1197/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3736: [WIP]correct the link, grammars and content of dml-management document

Posted by GitBox <gi...@apache.org>.

CarbonDataQA1 commented on pull request #3736:
URL: https://github.com/apache/carbondata/pull/3736#issuecomment-623174960


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2915/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org