You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by GitBox <gi...@apache.org> on 2020/05/04 05:22:26 UTC

[GitHub] [carbondata] kunal642 opened a new pull request #3738: [HOTFIX ] Fix documentation for various features

kunal642 opened a new pull request #3738:
URL: https://github.com/apache/carbondata/pull/3738


    ### Why is this PR needed?
    Fix documentation for various features
    
    ### What changes were proposed in this PR?
   1. Added write with hive doc
   2. Added alter upgrade segment doc
   3. Fix other random issues
       
    ### Does this PR introduce any user interface change?
    - No
    - Yes. (please explain the change and update document)
   
    ### Is any new testcase added?
    - No
    - Yes
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3738: [CARBONDATA-3791] Fix documentation for various features

Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #3738:
URL: https://github.com/apache/carbondata/pull/3738#discussion_r419877825



##########
File path: docs/index-server.md
##########
@@ -82,6 +82,15 @@ the pruned blocklets which would be further used for result fetching.
 
 **Note:** Multiple JDBC drivers can connect to the index server to use the cache.
 
+## Enabling Size based distribution for Legacy stores

Review comment:
       Please correct the format in Line No.55




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] kunal642 commented on a change in pull request #3738: [CARBONDATA-3791] Fix documentation for various features

Posted by GitBox <gi...@apache.org>.
kunal642 commented on a change in pull request #3738:
URL: https://github.com/apache/carbondata/pull/3738#discussion_r420770234



##########
File path: docs/index-server.md
##########
@@ -82,6 +82,15 @@ the pruned blocklets which would be further used for result fetching.
 
 **Note:** Multiple JDBC drivers can connect to the index server to use the cache.
 
+## Enabling Size based distribution for Legacy stores
+The default round robin based distribution causes unequal distribution of cache amoung the executors, which can cause any 1 of the executors to be bloated with too much cache and cause performance degrade.

Review comment:
       done

##########
File path: docs/index-server.md
##########
@@ -82,6 +82,15 @@ the pruned blocklets which would be further used for result fetching.
 
 **Note:** Multiple JDBC drivers can connect to the index server to use the cache.
 
+## Enabling Size based distribution for Legacy stores
+The default round robin based distribution causes unequal distribution of cache amoung the executors, which can cause any 1 of the executors to be bloated with too much cache and cause performance degrade.
+This problem can be solved by running the upgrade_segment command which will fill the data size values for each segment in the tablestatus file. Any cache loaded after this can use the traditional size based distribution.

Review comment:
       done

##########
File path: docs/index-server.md
##########
@@ -19,8 +19,8 @@
 
 ## Background
 
-Carbon currently prunes and caches all block/blocklet datamap index information into the driver for
-normal table, for Bloom/Index datamaps the JDBC driver will launch a job to prune and cache the
+Carbon currently prunes and caches all block/blocklet index information into the driver for
+normal table, for Bloom/Index indexes the JDBC driver will launch a job to prune and cache the

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] kunal642 commented on a change in pull request #3738: [CARBONDATA-3791] Fix documentation for various features

Posted by GitBox <gi...@apache.org>.
kunal642 commented on a change in pull request #3738:
URL: https://github.com/apache/carbondata/pull/3738#discussion_r420772644



##########
File path: docs/index-server.md
##########
@@ -82,6 +82,15 @@ the pruned blocklets which would be further used for result fetching.
 
 **Note:** Multiple JDBC drivers can connect to the index server to use the cache.
 
+## Enabling Size based distribution for Legacy stores
+The default round robin based distribution causes unequal distribution of cache amoung the executors, which can cause any 1 of the executors to be bloated with too much cache and cause performance degrade.

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] akashrn5 commented on pull request #3738: [CARBONDATA-3791] Fix documentation for various features

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on pull request #3738:
URL: https://github.com/apache/carbondata/pull/3738#issuecomment-624733114


   LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] akashrn5 commented on a change in pull request #3738: [CARBONDATA-3791] Fix documentation for various features

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on a change in pull request #3738:
URL: https://github.com/apache/carbondata/pull/3738#discussion_r419892047



##########
File path: docs/ddl-of-carbondata.md
##########
@@ -608,12 +607,10 @@ CarbonData DDL statements are documented here,which includes:
   This can be SDK output or C++ SDK output. Refer [SDK Guide](./sdk-guide.md) and [C++ SDK Guide](./csdk-guide.md). 
 
   **Note:**
-  1. Dropping of the external table should not delete the files present in the location.
+  1. Dropping of the external table will not delete the files present in the location.
   2. When external table is created on non-transactional table data, 
     external table will be registered with the schema of carbondata files.
-    If multiple files with different schema is present, exception will be thrown.
-    So, If table registered with one schema and files are of different schema, 
-    suggest to drop the external table and create again to register table with new schema.  
+    If multiple files with different schema is present, exception will be thrown.  

Review comment:
       actually if multiple files different schema present, we check if same column present with different schema, in that case we throw exception, so here we can be more specific

##########
File path: docs/index-server.md
##########
@@ -82,6 +82,15 @@ the pruned blocklets which would be further used for result fetching.
 
 **Note:** Multiple JDBC drivers can connect to the index server to use the cache.
 
+## Enabling Size based distribution for Legacy stores
+The default round robin based distribution causes unequal distribution of cache amoung the executors, which can cause any 1 of the executors to be bloated with too much cache and cause performance degrade.

Review comment:
       ```suggestion
   The default round robin based distribution causes unequal distribution of cache among the executors, which can cause any one of the executors to be bloated with too much cache resulting in performance degrade.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3738: [CARBONDATA-3791] Fix documentation for various features

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3738:
URL: https://github.com/apache/carbondata/pull/3738#issuecomment-623361798


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1205/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3738: [CARBONDATA-3791] Fix documentation for various features

Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #3738:
URL: https://github.com/apache/carbondata/pull/3738#discussion_r419871090



##########
File path: docs/index-server.md
##########
@@ -82,6 +82,15 @@ the pruned blocklets which would be further used for result fetching.
 
 **Note:** Multiple JDBC drivers can connect to the index server to use the cache.
 
+## Enabling Size based distribution for Legacy stores
+The default round robin based distribution causes unequal distribution of cache amoung the executors, which can cause any 1 of the executors to be bloated with too much cache and cause performance degrade.

Review comment:
       ```suggestion
   The default round robin based distribution causes unequal distribution of cache among the executors, which can cause any 1 of the executors to be bloated with too much cache and cause performance degrade.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3738: [CARBONDATA-3791] Fix documentation for various features

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3738:
URL: https://github.com/apache/carbondata/pull/3738#issuecomment-624138362


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2942/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3738: [CARBONDATA-3791] Fix documentation for various features

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3738:
URL: https://github.com/apache/carbondata/pull/3738#issuecomment-623310559


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1200/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] akashrn5 commented on a change in pull request #3738: [CARBONDATA-3791] Fix documentation for various features

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on a change in pull request #3738:
URL: https://github.com/apache/carbondata/pull/3738#discussion_r420575005



##########
File path: docs/ddl-of-carbondata.md
##########
@@ -608,12 +607,10 @@ CarbonData DDL statements are documented here,which includes:
   This can be SDK output or C++ SDK output. Refer [SDK Guide](./sdk-guide.md) and [C++ SDK Guide](./csdk-guide.md). 
 
   **Note:**
-  1. Dropping of the external table should not delete the files present in the location.
+  1. Dropping of the external table will not delete the files present in the location.
   2. When external table is created on non-transactional table data, 
     external table will be registered with the schema of carbondata files.
-    If multiple files with different schema is present, exception will be thrown.
-    So, If table registered with one schema and files are of different schema, 
-    suggest to drop the external table and create again to register table with new schema.  
+    If multiple files with different schema is present, exception will be thrown.  

Review comment:
       it basically checks latest file and give data based on columns present in that file.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3738: [CARBONDATA-3791] Fix documentation for various features

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3738:
URL: https://github.com/apache/carbondata/pull/3738#issuecomment-624716268


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1240/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3738: [CARBONDATA-3791] Fix documentation for various features

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3738:
URL: https://github.com/apache/carbondata/pull/3738#issuecomment-623307157


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2917/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] kunal642 commented on a change in pull request #3738: [CARBONDATA-3791] Fix documentation for various features

Posted by GitBox <gi...@apache.org>.
kunal642 commented on a change in pull request #3738:
URL: https://github.com/apache/carbondata/pull/3738#discussion_r420770408



##########
File path: docs/index-server.md
##########
@@ -19,8 +19,8 @@
 
 ## Background
 
-Carbon currently prunes and caches all block/blocklet datamap index information into the driver for
-normal table, for Bloom/Index datamaps the JDBC driver will launch a job to prune and cache the
+Carbon currently prunes and caches all block/blocklet index information into the driver for
+normal table, for Bloom/Index indexes the JDBC driver will launch a job to prune and cache the
 datamaps in executors.

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3738: [CARBONDATA-3791] Fix documentation for various features

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3738:
URL: https://github.com/apache/carbondata/pull/3738#issuecomment-624715918


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2958/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3738: [CARBONDATA-3791] Fix documentation for various features

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3738:
URL: https://github.com/apache/carbondata/pull/3738#issuecomment-623366636


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2924/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3738: [CARBONDATA-3791] Fix documentation for various features

Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #3738:
URL: https://github.com/apache/carbondata/pull/3738#discussion_r419871344



##########
File path: docs/index-server.md
##########
@@ -82,6 +82,15 @@ the pruned blocklets which would be further used for result fetching.
 
 **Note:** Multiple JDBC drivers can connect to the index server to use the cache.
 
+## Enabling Size based distribution for Legacy stores
+The default round robin based distribution causes unequal distribution of cache amoung the executors, which can cause any 1 of the executors to be bloated with too much cache and cause performance degrade.
+This problem can be solved by running the upgrade_segment command which will fill the data size values for each segment in the tablestatus file. Any cache loaded after this can use the traditional size based distribution.

Review comment:
       ```suggestion
   This problem can be solved by running the `upgrade_segment` command which will fill the data size values for each segment in the tablestatus file. Any cache loaded after this can use the traditional size based distribution.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] kunal642 commented on a change in pull request #3738: [CARBONDATA-3791] Fix documentation for various features

Posted by GitBox <gi...@apache.org>.
kunal642 commented on a change in pull request #3738:
URL: https://github.com/apache/carbondata/pull/3738#discussion_r420111629



##########
File path: docs/ddl-of-carbondata.md
##########
@@ -608,12 +607,10 @@ CarbonData DDL statements are documented here,which includes:
   This can be SDK output or C++ SDK output. Refer [SDK Guide](./sdk-guide.md) and [C++ SDK Guide](./csdk-guide.md). 
 
   **Note:**
-  1. Dropping of the external table should not delete the files present in the location.
+  1. Dropping of the external table will not delete the files present in the location.
   2. When external table is created on non-transactional table data, 
     external table will be registered with the schema of carbondata files.
-    If multiple files with different schema is present, exception will be thrown.
-    So, If table registered with one schema and files are of different schema, 
-    suggest to drop the external table and create again to register table with new schema.  
+    If multiple files with different schema is present, exception will be thrown.  

Review comment:
       if column is not present then also we throw exception, right?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3738: [CARBONDATA-3791] Fix documentation for various features

Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #3738:
URL: https://github.com/apache/carbondata/pull/3738#discussion_r419235138



##########
File path: docs/ddl-of-carbondata.md
##########
@@ -20,7 +20,6 @@
 CarbonData DDL statements are documented here,which includes:
 
 * [CREATE TABLE](#create-table)
-  * [Dictionary Encoding](#dictionary-encoding-configuration)

Review comment:
       Datamap keyword exists in ddl guide. Please check




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3738: [CARBONDATA-3791] Fix documentation for various features

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on pull request #3738:
URL: https://github.com/apache/carbondata/pull/3738#issuecomment-624139653


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1224/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3738: [CARBONDATA-3791] Fix documentation for various features

Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #3738:
URL: https://github.com/apache/carbondata/pull/3738#discussion_r419872076



##########
File path: docs/index-server.md
##########
@@ -19,8 +19,8 @@
 
 ## Background
 
-Carbon currently prunes and caches all block/blocklet datamap index information into the driver for
-normal table, for Bloom/Index datamaps the JDBC driver will launch a job to prune and cache the
+Carbon currently prunes and caches all block/blocklet index information into the driver for
+normal table, for Bloom/Index indexes the JDBC driver will launch a job to prune and cache the

Review comment:
       ```suggestion
   normal table, for Bloom/Lucene indexes the JDBC driver will launch a job to prune and cache the
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3738: [CARBONDATA-3791] Fix documentation for various features

Posted by GitBox <gi...@apache.org>.
Indhumathi27 commented on a change in pull request #3738:
URL: https://github.com/apache/carbondata/pull/3738#discussion_r419872161



##########
File path: docs/index-server.md
##########
@@ -19,8 +19,8 @@
 
 ## Background
 
-Carbon currently prunes and caches all block/blocklet datamap index information into the driver for
-normal table, for Bloom/Index datamaps the JDBC driver will launch a job to prune and cache the
+Carbon currently prunes and caches all block/blocklet index information into the driver for
+normal table, for Bloom/Index indexes the JDBC driver will launch a job to prune and cache the
 datamaps in executors.

Review comment:
       ```suggestion
   indexes in executors.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] kunal642 commented on a change in pull request #3738: [CARBONDATA-3791] Fix documentation for various features

Posted by GitBox <gi...@apache.org>.
kunal642 commented on a change in pull request #3738:
URL: https://github.com/apache/carbondata/pull/3738#discussion_r420770573



##########
File path: docs/index-server.md
##########
@@ -82,6 +82,15 @@ the pruned blocklets which would be further used for result fetching.
 
 **Note:** Multiple JDBC drivers can connect to the index server to use the cache.
 
+## Enabling Size based distribution for Legacy stores

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] kunal642 commented on a change in pull request #3738: [CARBONDATA-3791] Fix documentation for various features

Posted by GitBox <gi...@apache.org>.
kunal642 commented on a change in pull request #3738:
URL: https://github.com/apache/carbondata/pull/3738#discussion_r420773716



##########
File path: docs/ddl-of-carbondata.md
##########
@@ -608,12 +607,10 @@ CarbonData DDL statements are documented here,which includes:
   This can be SDK output or C++ SDK output. Refer [SDK Guide](./sdk-guide.md) and [C++ SDK Guide](./csdk-guide.md). 
 
   **Note:**
-  1. Dropping of the external table should not delete the files present in the location.
+  1. Dropping of the external table will not delete the files present in the location.
   2. When external table is created on non-transactional table data, 
     external table will be registered with the schema of carbondata files.
-    If multiple files with different schema is present, exception will be thrown.
-    So, If table registered with one schema and files are of different schema, 
-    suggest to drop the external table and create again to register table with new schema.  
+    If multiple files with different schema is present, exception will be thrown.  

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org