You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by GitBox <gi...@apache.org> on 2021/09/05 02:02:19 UTC

[GitHub] [carbondata] Jeromestein opened a new pull request #4213: [CARBONDATA-4280][Doc] Add OVERWRITE keyword explanation in load command

Jeromestein opened a new pull request #4213:
URL: https://github.com/apache/carbondata/pull/4213


    ### Why is this PR needed?
    CarbonData supports load overwrite now, but no related testcases or documents take care of this feature, except partition tables.
   
    
    ### What changes were proposed in this PR?
    Add [OVERWRITE] keyword in dml-of-carbondata.md and explain how to use this feature with a simple example.
       
    ### Does this PR introduce any user interface change?
    - No
   
    ### Is any new testcase added?
    - No (We hava added new testcase in this PR: https://github.com/apache/carbondata/pull/4207)
   
       
   JIRA Issue: https://issues.apache.org/jira/browse/CARBONDATA-4280


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Jeromestein commented on a change in pull request #4213: [CARBONDATA-4280][Doc] Add OVERWRITE keyword explanation in load command

Posted by GitBox <gi...@apache.org>.
Jeromestein commented on a change in pull request #4213:
URL: https://github.com/apache/carbondata/pull/4213#discussion_r717533716



##########
File path: docs/dml-of-carbondata.md
##########
@@ -266,7 +281,7 @@ CarbonData DML statements are documented here,which includes:
       numPartitions = total size of input data / splitSize
     ```
     The default value is 3, and the range is [1, 300].
- 
+

Review comment:
       OK, I will do it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Jeromestein commented on a change in pull request #4213: [CARBONDATA-4280][Doc] Add OVERWRITE keyword explanation in load command

Posted by GitBox <gi...@apache.org>.
Jeromestein commented on a change in pull request #4213:
URL: https://github.com/apache/carbondata/pull/4213#discussion_r717533716



##########
File path: docs/dml-of-carbondata.md
##########
@@ -266,7 +281,7 @@ CarbonData DML statements are documented here,which includes:
       numPartitions = total size of input data / splitSize
     ```
     The default value is 3, and the range is [1, 300].
- 
+

Review comment:
       OK, I will do it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4213: [CARBONDATA-4280][Doc] Add OVERWRITE keyword explanation in load command

Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #4213:
URL: https://github.com/apache/carbondata/pull/4213#discussion_r717214686



##########
File path: docs/dml-of-carbondata.md
##########
@@ -266,7 +281,7 @@ CarbonData DML statements are documented here,which includes:
       numPartitions = total size of input data / splitSize
     ```
     The default value is 3, and the range is [1, 300].
- 
+

Review comment:
       please revert all these changes below in this PR. Space related changes




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4213: [CARBONDATA-4280][Doc] Add OVERWRITE keyword explanation in load command

Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #4213:
URL: https://github.com/apache/carbondata/pull/4213#discussion_r717215028



##########
File path: docs/dml-of-carbondata.md
##########
@@ -37,13 +37,28 @@ CarbonData DML statements are documented here,which includes:
   This command is used to load csv files to carbondata, OPTIONS are not mandatory for data loading process. 
 
   ```
-  LOAD DATA INPATH 'folder_path'
+  LOAD DATA INPATH 'folder_path' [ OVERWRITE ] 
   INTO TABLE [db_name.]table_name 
   OPTIONS(property_name=property_value, ...)
   ```
   **NOTE**:
-    * Use 'file://' prefix to indicate local input files path, but it just supports local mode.
-    * If run on cluster mode, please upload all input files to distributed file system, for example 'hdfs://' for hdfs.
+   * Use 'file://' prefix to indicate local input files path, but it just supports local mode.
+
+   * If run on cluster mode, please upload all input files to distributed file system, for example 'hdfs://' for hdfs.
+
+* [ OVERWRITE ] :
+
+  ​	By default, new data is appended to the table. If `OVERWRITE` is used, the table is instead overwritten with new data.
+
+  ​	Example:   
+
+  ```sql
+  CREATE TABLE carbon_load_overwrite(id int, name string, city string, age int)

Review comment:
       i think, no need to add create table here. Just load command example should be enough




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4213: [CARBONDATA-4280][Doc] Add OVERWRITE keyword explanation in load command

Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #4213:
URL: https://github.com/apache/carbondata/pull/4213#discussion_r717215028



##########
File path: docs/dml-of-carbondata.md
##########
@@ -37,13 +37,28 @@ CarbonData DML statements are documented here,which includes:
   This command is used to load csv files to carbondata, OPTIONS are not mandatory for data loading process. 
 
   ```
-  LOAD DATA INPATH 'folder_path'
+  LOAD DATA INPATH 'folder_path' [ OVERWRITE ] 
   INTO TABLE [db_name.]table_name 
   OPTIONS(property_name=property_value, ...)
   ```
   **NOTE**:
-    * Use 'file://' prefix to indicate local input files path, but it just supports local mode.
-    * If run on cluster mode, please upload all input files to distributed file system, for example 'hdfs://' for hdfs.
+   * Use 'file://' prefix to indicate local input files path, but it just supports local mode.
+
+   * If run on cluster mode, please upload all input files to distributed file system, for example 'hdfs://' for hdfs.
+
+* [ OVERWRITE ] :
+
+  ​	By default, new data is appended to the table. If `OVERWRITE` is used, the table is instead overwritten with new data.
+
+  ​	Example:   
+
+  ```sql
+  CREATE TABLE carbon_load_overwrite(id int, name string, city string, age int)

Review comment:
       No need to add example here. Since it is mentioned in syntax, that could be enough




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4213: [CARBONDATA-4280][Doc] Add OVERWRITE keyword explanation in load command

Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #4213:
URL: https://github.com/apache/carbondata/pull/4213#discussion_r717217995



##########
File path: docs/dml-of-carbondata.md
##########
@@ -37,13 +37,28 @@ CarbonData DML statements are documented here,which includes:
   This command is used to load csv files to carbondata, OPTIONS are not mandatory for data loading process. 
 
   ```
-  LOAD DATA INPATH 'folder_path'
+  LOAD DATA INPATH 'folder_path' [ OVERWRITE ] 
   INTO TABLE [db_name.]table_name 
   OPTIONS(property_name=property_value, ...)
   ```
   **NOTE**:
-    * Use 'file://' prefix to indicate local input files path, but it just supports local mode.
-    * If run on cluster mode, please upload all input files to distributed file system, for example 'hdfs://' for hdfs.
+   * Use 'file://' prefix to indicate local input files path, but it just supports local mode.
+
+   * If run on cluster mode, please upload all input files to distributed file system, for example 'hdfs://' for hdfs.
+
+* [ OVERWRITE ] :

Review comment:
       ```suggestion
   * If the OVERWRITE keyword is used, then it will overwrite the existing data in the table with new data.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4213: [CARBONDATA-4280][Doc] Add OVERWRITE keyword explanation in load command

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4213:
URL: https://github.com/apache/carbondata/pull/4213#issuecomment-913082606


   Build Success with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/310/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4213: [CARBONDATA-4280][Doc] Add OVERWRITE keyword explanation in load command

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4213:
URL: https://github.com/apache/carbondata/pull/4213#issuecomment-913082037






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4213: [CARBONDATA-4280][Doc] Add OVERWRITE keyword explanation in load command

Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #4213:
URL: https://github.com/apache/carbondata/pull/4213#discussion_r717214686



##########
File path: docs/dml-of-carbondata.md
##########
@@ -266,7 +281,7 @@ CarbonData DML statements are documented here,which includes:
       numPartitions = total size of input data / splitSize
     ```
     The default value is 3, and the range is [1, 300].
- 
+

Review comment:
       please revert all these changes below in this PR. Space related changes

##########
File path: docs/dml-of-carbondata.md
##########
@@ -37,13 +37,28 @@ CarbonData DML statements are documented here,which includes:
   This command is used to load csv files to carbondata, OPTIONS are not mandatory for data loading process. 
 
   ```
-  LOAD DATA INPATH 'folder_path'
+  LOAD DATA INPATH 'folder_path' [ OVERWRITE ] 
   INTO TABLE [db_name.]table_name 
   OPTIONS(property_name=property_value, ...)
   ```
   **NOTE**:
-    * Use 'file://' prefix to indicate local input files path, but it just supports local mode.
-    * If run on cluster mode, please upload all input files to distributed file system, for example 'hdfs://' for hdfs.
+   * Use 'file://' prefix to indicate local input files path, but it just supports local mode.
+
+   * If run on cluster mode, please upload all input files to distributed file system, for example 'hdfs://' for hdfs.
+
+* [ OVERWRITE ] :
+
+  ​	By default, new data is appended to the table. If `OVERWRITE` is used, the table is instead overwritten with new data.
+
+  ​	Example:   
+
+  ```sql
+  CREATE TABLE carbon_load_overwrite(id int, name string, city string, age int)

Review comment:
       i think, no need to add create table here. Just load command example should be enough

##########
File path: docs/dml-of-carbondata.md
##########
@@ -37,13 +37,28 @@ CarbonData DML statements are documented here,which includes:
   This command is used to load csv files to carbondata, OPTIONS are not mandatory for data loading process. 
 
   ```
-  LOAD DATA INPATH 'folder_path'
+  LOAD DATA INPATH 'folder_path' [ OVERWRITE ] 
   INTO TABLE [db_name.]table_name 
   OPTIONS(property_name=property_value, ...)
   ```
   **NOTE**:
-    * Use 'file://' prefix to indicate local input files path, but it just supports local mode.
-    * If run on cluster mode, please upload all input files to distributed file system, for example 'hdfs://' for hdfs.
+   * Use 'file://' prefix to indicate local input files path, but it just supports local mode.
+
+   * If run on cluster mode, please upload all input files to distributed file system, for example 'hdfs://' for hdfs.
+
+* [ OVERWRITE ] :
+
+  ​	By default, new data is appended to the table. If `OVERWRITE` is used, the table is instead overwritten with new data.
+
+  ​	Example:   
+
+  ```sql
+  CREATE TABLE carbon_load_overwrite(id int, name string, city string, age int)

Review comment:
       No need to add example here. Since it is mentioned in syntax, that could be enough

##########
File path: docs/dml-of-carbondata.md
##########
@@ -37,13 +37,28 @@ CarbonData DML statements are documented here,which includes:
   This command is used to load csv files to carbondata, OPTIONS are not mandatory for data loading process. 
 
   ```
-  LOAD DATA INPATH 'folder_path'
+  LOAD DATA INPATH 'folder_path' [ OVERWRITE ] 
   INTO TABLE [db_name.]table_name 
   OPTIONS(property_name=property_value, ...)
   ```
   **NOTE**:
-    * Use 'file://' prefix to indicate local input files path, but it just supports local mode.
-    * If run on cluster mode, please upload all input files to distributed file system, for example 'hdfs://' for hdfs.
+   * Use 'file://' prefix to indicate local input files path, but it just supports local mode.
+
+   * If run on cluster mode, please upload all input files to distributed file system, for example 'hdfs://' for hdfs.
+
+* [ OVERWRITE ] :

Review comment:
       ```suggestion
   * If the OVERWRITE keyword is used, then it will overwrite the existing data in the table with new data.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org