You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@carbondata.apache.org by GitBox <gi...@apache.org> on 2021/08/26 08:55:46 UTC

[GitHub] [carbondata] pratyakshsharma opened a new pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

pratyakshsharma opened a new pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210


    ### Why is this PR needed?
    
    
    ### What changes were proposed in this PR?
   
       
    ### Does this PR introduce any user interface change?
    - No
    - Yes. (please explain the change and update document)
   
    ### Is any new testcase added?
    - No
    - Yes
   
       
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-948719754


   Build Success with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/471/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] pratyakshsharma commented on a change in pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on a change in pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#discussion_r732718290



##########
File path: docs/configuration-parameters.md
##########
@@ -119,6 +136,7 @@ This section provides the details of all the configurations required for the Car
 | carbon.enable.range.compaction | true | To configure Range-based Compaction to be used or not for RANGE_COLUMN. If true after compaction also the data would be present in ranges. |
 | carbon.si.segment.merge | false | Making this true degrades the LOAD performance. When the number of small files increase for SI segments(it can happen as number of columns will be less and we store position id and reference columns), user can either set to true which will merge the data files for upcoming loads or run SI refresh command which does this job for all segments. (REFRESH INDEX <index_table>) |
 | carbon.partition.data.on.tasklevel | false | When enabled, tasks launched for Local sort partition load will be based on one node one task. Compaction will be performed based on task level for a partition. Load performance might be degraded, because, the number of tasks launched is equal to number of nodes in case of local sort. For compaction, memory consumption will be less, as more number of tasks will be launched for a partition |
+| carbon.minor.compaction.size | (none) | Minor compaction originally worked based on the number of segments (by default 4). However in that scenario, there was no control over the size of segments to be compacted. This parameter was introduced to exclude segments whose size is greater than the configured threshold so that the overall IO and time taken decreases | 

Review comment:
       dynamically within spark session itself? @vikramahuja1001 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

akashrn5 commented on a change in pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#discussion_r734969269



##########
File path: docs/configuration-parameters.md
##########
@@ -151,6 +169,17 @@ This section provides the details of all the configurations required for the Car
 | carbon.partition.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)** upto which driver can cache partition metadata. Beyond this, least recently used data will be removed from cache before loading new set of values.
 | carbon.mapOrderPushDown.<db_name>_<table_name>.column| empty | If order by column is in sort column, specify that sort column here to avoid ordering at map task . |
 | carbon.metacache.expiration.seconds | Long.MAX_VALUE | Expiration time **(in seconds)** for tableInfo cache in CarbonMetadata and tableModifiedTime in CarbonFileMetastore, after the time configured since last access to the cache entry, tableInfo and tableModifiedTime will be removed from each cache. Recent access will refresh the timer. Default value of Long.MAX_VALUE means the cache will not be expired by time. **NOTE:** At the time when cache is being expired, queries on the table may fail with NullPointerException. |
+| is.driver.instance | false | This parameter decides if LRU cache for storing indexes need to be created on driver. By default, it is created on executors. |
+| carbon.input.metrics.update.interval | 500000 | This property determines the number of records queried after which input metrics are updated to spark. |
+| carbon.use.bitset.pipe.line | true | Carbondata has various optimizations for faster query execution. Setting this property acts like a catalyst for filter queries. If set to true, the bitset is passed from one filter to another, resulting in incremental filtering and improving overall performance |
+
+## Index Configuration
+| Parameter | Default Value | Description |
+|--------------------------------------|---------------|---------------------------------------------------|
+| is.internal.load.call | false | This parameter decides whether the insert call is triggered internally or by the user. If triggered by user, this ensures data does not get loaded into MV directly |
+| carbon.lucene.index.stop.words | false | By default, lucene does not create index for stop words like 'is', 'the' etc. This flag is used to override this behaviour |
+| carbon.load.dateformat.setlenient.enable | false | This property enables parsing of timestamp/date data in load flow if the parsing fails with invalid timestamp data error. For example: 1941-03-15 00:00:00 is valid time in Asia/Calcutta zone and is invalid and will fail to parse in Asia/Shanghai zone as DST is observed and clocks were turned forward 1 hour to 1941-03-15 01:00:00 |

Review comment:
       @pratyakshsharma , there is already documentation and commands to show the dynamic property in carbon, i had forgotten about that. You can refer `Dynamic Configuration In CarbonData Using SET-RESET` this section for it. So can you please revert just the commit of adding this new column and keep other commits intact. Please add the dynamically allowed parameter in `Dynamic Configuration In CarbonData Using SET-RESET` section if not present.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] pratyakshsharma commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-950859563


   @akashrn5 Please take a pass. This should be good to merge now. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-917627073


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5911/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] pratyakshsharma commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-917606390


   @MarvinLitt Please take a look.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-947668370


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6072/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-947785004


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4329/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-952726137


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6106/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] vikramahuja1001 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

vikramahuja1001 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-952557593


   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

akashrn5 commented on a change in pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#discussion_r728631522



##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of days after which the timestamp based subdirectories are expired in the trash folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether the min max pruning to be performed on the target table based on the source data. It will be useful when data is not sparse across target table which results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path on DFS where carbondata files and metadata will be stored. The configuration `carbon.storelocation` has been deprecated. For simplicity, we recommended you remove the configuration of `carbon.storelocation`. If `carbon.storelocation` and `spark.sql.warehouse.dir` are configured to different paths, exception will be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database location.|

Review comment:
       `spark.sql.warehouse.dir` is a spark property and no need to add the documentation here. Also, there is already a note present in document saying `carbon.storelocation` is depracated.

##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of days after which the timestamp based subdirectories are expired in the trash folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether the min max pruning to be performed on the target table based on the source data. It will be useful when data is not sparse across target table which results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path on DFS where carbondata files and metadata will be stored. The configuration `carbon.storelocation` has been deprecated. For simplicity, we recommended you remove the configuration of `carbon.storelocation`. If `carbon.storelocation` and `spark.sql.warehouse.dir` are configured to different paths, exception will be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database location.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which further consists of column pages. As per the latest V3 format, the default size of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000 rows. |
+| carbon.properties.filepath | conf/carbon.properties | This file is by default present in conf directory on your base project path. Users can configure all the carbondata related properties in this file. |
+| carbon.date.format | yyyy-MM-dd | This property specifies the format in which data will be stored in the column with DATE data type. |

Review comment:
       if you say the data will be stored in column in this format, it conveys a wrong info. Because, we store date as integer as its direct dictionary and time as long. Instead, you can say this specifies the way carbondata parses the date data for all the incoming date data to finally store in carbondata file. may be something like this.

##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of days after which the timestamp based subdirectories are expired in the trash folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether the min max pruning to be performed on the target table based on the source data. It will be useful when data is not sparse across target table which results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path on DFS where carbondata files and metadata will be stored. The configuration `carbon.storelocation` has been deprecated. For simplicity, we recommended you remove the configuration of `carbon.storelocation`. If `carbon.storelocation` and `spark.sql.warehouse.dir` are configured to different paths, exception will be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database location.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which further consists of column pages. As per the latest V3 format, the default size of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000 rows. |
+| carbon.properties.filepath | conf/carbon.properties | This file is by default present in conf directory on your base project path. Users can configure all the carbondata related properties in this file. |

Review comment:
       this is system property, where this file contains the carbon property, i think this place is not the better place to mention this. May be you can try to add these info in may be deployment guide or quickstart

##########
File path: docs/configuration-parameters.md
##########
@@ -151,6 +169,17 @@ This section provides the details of all the configurations required for the Car
 | carbon.partition.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)** upto which driver can cache partition metadata. Beyond this, least recently used data will be removed from cache before loading new set of values.
 | carbon.mapOrderPushDown.<db_name>_<table_name>.column| empty | If order by column is in sort column, specify that sort column here to avoid ordering at map task . |
 | carbon.metacache.expiration.seconds | Long.MAX_VALUE | Expiration time **(in seconds)** for tableInfo cache in CarbonMetadata and tableModifiedTime in CarbonFileMetastore, after the time configured since last access to the cache entry, tableInfo and tableModifiedTime will be removed from each cache. Recent access will refresh the timer. Default value of Long.MAX_VALUE means the cache will not be expired by time. **NOTE:** At the time when cache is being expired, queries on the table may fail with NullPointerException. |
+| is.driver.instance | false | This parameter decides if LRU cache for storing indexes need to be created on driver. By default, it is created on executors. |
+| carbon.input.metrics.update.interval | 500000 | This property determines the number of records queried after which input metrics are updated to spark. |
+| carbon.use.bitset.pipe.line | true | Carbondata has various optimizations for faster query execution. Setting this property acts like a catalyst for filter queries. If set to true, the bitset is passed from one filter to another, resulting in incremental filtering and improving overall performance |
+
+## Index Configuration
+| Parameter | Default Value | Description |
+|--------------------------------------|---------------|---------------------------------------------------|
+| is.internal.load.call | false | This parameter decides whether the insert call is triggered internally or by the user. If triggered by user, this ensures data does not get loaded into MV directly |

Review comment:
       this is actually an internal property no need to add in doc

##########
File path: docs/configuration-parameters.md
##########
@@ -151,6 +169,17 @@ This section provides the details of all the configurations required for the Car
 | carbon.partition.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)** upto which driver can cache partition metadata. Beyond this, least recently used data will be removed from cache before loading new set of values.
 | carbon.mapOrderPushDown.<db_name>_<table_name>.column| empty | If order by column is in sort column, specify that sort column here to avoid ordering at map task . |
 | carbon.metacache.expiration.seconds | Long.MAX_VALUE | Expiration time **(in seconds)** for tableInfo cache in CarbonMetadata and tableModifiedTime in CarbonFileMetastore, after the time configured since last access to the cache entry, tableInfo and tableModifiedTime will be removed from each cache. Recent access will refresh the timer. Default value of Long.MAX_VALUE means the cache will not be expired by time. **NOTE:** At the time when cache is being expired, queries on the table may fail with NullPointerException. |
+| is.driver.instance | false | This parameter decides if LRU cache for storing indexes need to be created on driver. By default, it is created on executors. |
+| carbon.input.metrics.update.interval | 500000 | This property determines the number of records queried after which input metrics are updated to spark. |

Review comment:
       for this add, it can be set dynamically also within session

##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of days after which the timestamp based subdirectories are expired in the trash folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether the min max pruning to be performed on the target table based on the source data. It will be useful when data is not sparse across target table which results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path on DFS where carbondata files and metadata will be stored. The configuration `carbon.storelocation` has been deprecated. For simplicity, we recommended you remove the configuration of `carbon.storelocation`. If `carbon.storelocation` and `spark.sql.warehouse.dir` are configured to different paths, exception will be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database location.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which further consists of column pages. As per the latest V3 format, the default size of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000 rows. |

Review comment:
       no need to mention about V2 here, as no one is using it. For blocklet size you can just say size of each blocklet will be 64mb by default inside a block. Recommended not to change it unless for any specific use or any specific issue. 

##########
File path: docs/configuration-parameters.md
##########
@@ -99,6 +110,12 @@ This section provides the details of all the configurations required for the Car
 | carbon.enable.bad.record.handling.for.insert | false | by default, disable the bad record and converter step during "insert into" |
 | carbon.load.si.repair | true | by default, enable loading for failed segments in SI during load/insert command |
 | carbon.si.repair.limit | (none) | Number of failed segments to be loaded in SI when repairing missing segments in SI, by default load all the missing segments. Supports value from 0 to 2147483646 |
+| carbon.complex.delimiter.level.1 | # | This delimiter is used for parsing complex data type columns. Level 1 delimiter splits the complex type data column in a row (eg., a\001b\001c --> Array = {a,b,c}). |
+| carbon.complex.delimiter.level.2 | $ | This delimiter splits the complex type nested data column in a row. Applies level_1 delimiter & applies level_2 based on complex data type (eg., a\002b\001c\002d --> Array> = {{a,b},{c,d}}). |
+| carbon.complex.delimiter.level.3 | @ | This delimiter splits the complex type nested data column in a row. Applies level_1 delimiter, applies level_2 and then level_3 delimiter based on complex data type. Used in case of nested Complex Map type. (eg., 'a\003b\002b\003c\001aa\003bb\002cc\003dd' --> Array Of Map> = {{a -> b, b -> c},{aa -> bb, cc -> dd}}). |
+| carbon.complex.delimiter.level.4 | (none) | All the levels of delimiters are used for parsing complex data type columns. All the delimiters are applied depending on the complexity of the given data type. Level 4 delimiter will be used for parsing the complex values after level 3 delimiter has been applied already. |
+| enable.unsafe.columnpage | true | This property enables creation of column pages while writing on off heap (unsafe) memory. It is set by default |
+| carbon.lucene.compression.mode | speed | Carbondata supports different types of indices for efficient queries. This parameter decides the compression mode used by lucene index for index writing. In the default mode, writing speed is given more priority rather than the index size. |

Review comment:
       please remove this also, as its present in lucene-index-guide.md

##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of days after which the timestamp based subdirectories are expired in the trash folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether the min max pruning to be performed on the target table based on the source data. It will be useful when data is not sparse across target table which results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path on DFS where carbondata files and metadata will be stored. The configuration `carbon.storelocation` has been deprecated. For simplicity, we recommended you remove the configuration of `carbon.storelocation`. If `carbon.storelocation` and `spark.sql.warehouse.dir` are configured to different paths, exception will be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database location.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which further consists of column pages. As per the latest V3 format, the default size of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000 rows. |
+| carbon.properties.filepath | conf/carbon.properties | This file is by default present in conf directory on your base project path. Users can configure all the carbondata related properties in this file. |
+| carbon.date.format | yyyy-MM-dd | This property specifies the format in which data will be stored in the column with DATE data type. |
+| carbon.lock.class | (none) | This specifies the implementation of ICarbonLock interface to be used for acquiring the locks in case of concurrent operations |
+| carbon.local.dictionary.enable | (none) | If set to true, this property enables the generation of local dictionary. Local dictionary enables to map string and varchar values to numbers which helps in storing the data efficiently. |

Review comment:
       you can remove this property, its already mentioned in document ddl-of-carbondata.md

##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of days after which the timestamp based subdirectories are expired in the trash folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether the min max pruning to be performed on the target table based on the source data. It will be useful when data is not sparse across target table which results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path on DFS where carbondata files and metadata will be stored. The configuration `carbon.storelocation` has been deprecated. For simplicity, we recommended you remove the configuration of `carbon.storelocation`. If `carbon.storelocation` and `spark.sql.warehouse.dir` are configured to different paths, exception will be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database location.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which further consists of column pages. As per the latest V3 format, the default size of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000 rows. |
+| carbon.properties.filepath | conf/carbon.properties | This file is by default present in conf directory on your base project path. Users can configure all the carbondata related properties in this file. |
+| carbon.date.format | yyyy-MM-dd | This property specifies the format in which data will be stored in the column with DATE data type. |
+| carbon.lock.class | (none) | This specifies the implementation of ICarbonLock interface to be used for acquiring the locks in case of concurrent operations |
+| carbon.local.dictionary.enable | (none) | If set to true, this property enables the generation of local dictionary. Local dictionary enables to map string and varchar values to numbers which helps in storing the data efficiently. |
+| carbon.local.dictionary.decoder.fallback | true | Page Level data will not be maintained for the blocklet. During fallback, actual data will be retrieved from the encoded page data using local dictionary. NOTE: Memory footprint decreases significantly as compared to when this property is set to false |

Review comment:
       same comment as above. You can first check whole project if its already present, if not you can add, else you can avoid duplication

##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of days after which the timestamp based subdirectories are expired in the trash folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether the min max pruning to be performed on the target table based on the source data. It will be useful when data is not sparse across target table which results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path on DFS where carbondata files and metadata will be stored. The configuration `carbon.storelocation` has been deprecated. For simplicity, we recommended you remove the configuration of `carbon.storelocation`. If `carbon.storelocation` and `spark.sql.warehouse.dir` are configured to different paths, exception will be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database location.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which further consists of column pages. As per the latest V3 format, the default size of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000 rows. |
+| carbon.properties.filepath | conf/carbon.properties | This file is by default present in conf directory on your base project path. Users can configure all the carbondata related properties in this file. |
+| carbon.date.format | yyyy-MM-dd | This property specifies the format in which data will be stored in the column with DATE data type. |
+| carbon.lock.class | (none) | This specifies the implementation of ICarbonLock interface to be used for acquiring the locks in case of concurrent operations |
+| carbon.local.dictionary.enable | (none) | If set to true, this property enables the generation of local dictionary. Local dictionary enables to map string and varchar values to numbers which helps in storing the data efficiently. |
+| carbon.local.dictionary.decoder.fallback | true | Page Level data will not be maintained for the blocklet. During fallback, actual data will be retrieved from the encoded page data using local dictionary. NOTE: Memory footprint decreases significantly as compared to when this property is set to false |
+| spark.deploy.zookeeper.url | (none) | The zookeeper url to connect to for using zookeeper based locking |

Review comment:
       this is also spark property, no need to add here

##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of days after which the timestamp based subdirectories are expired in the trash folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether the min max pruning to be performed on the target table based on the source data. It will be useful when data is not sparse across target table which results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path on DFS where carbondata files and metadata will be stored. The configuration `carbon.storelocation` has been deprecated. For simplicity, we recommended you remove the configuration of `carbon.storelocation`. If `carbon.storelocation` and `spark.sql.warehouse.dir` are configured to different paths, exception will be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database location.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which further consists of column pages. As per the latest V3 format, the default size of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000 rows. |
+| carbon.properties.filepath | conf/carbon.properties | This file is by default present in conf directory on your base project path. Users can configure all the carbondata related properties in this file. |
+| carbon.date.format | yyyy-MM-dd | This property specifies the format in which data will be stored in the column with DATE data type. |
+| carbon.lock.class | (none) | This specifies the implementation of ICarbonLock interface to be used for acquiring the locks in case of concurrent operations |
+| carbon.local.dictionary.enable | (none) | If set to true, this property enables the generation of local dictionary. Local dictionary enables to map string and varchar values to numbers which helps in storing the data efficiently. |
+| carbon.local.dictionary.decoder.fallback | true | Page Level data will not be maintained for the blocklet. During fallback, actual data will be retrieved from the encoded page data using local dictionary. NOTE: Memory footprint decreases significantly as compared to when this property is set to false |
+| spark.deploy.zookeeper.url | (none) | The zookeeper url to connect to for using zookeeper based locking |
+| carbon.data.file.version | V3 | This specifies carbondata file format version. Carbondata file format has evolved with time from V1 to V3 in terms of metadata storage and IO level pruning capabilities. You can find more details [here](https://carbondata.apache.org/file-structure-of-carbondata.html#carbondata-file-format). |
+| spark.carbon.hive.schema.store | false | Carbondata currently supports 2 different types of metastores for storing schemas. This property specifies if Hive metastore is to be used for storing and retrieving table schemas |
+| spark.carbon.sqlastbuilder.classname | `org.apache.spark.sql.hive.CarbonSqlAstBuilder` | Carbondata extension of spark's `SparkSqlAstBuilder` that converts an ANTLR ParseTree into a logical plan. |

Review comment:
       i think no need to mention this because just configuring carbon extensions class would be enough for carbon to work, so this will simply confuse the user.

##########
File path: docs/configuration-parameters.md
##########
@@ -99,6 +110,12 @@ This section provides the details of all the configurations required for the Car
 | carbon.enable.bad.record.handling.for.insert | false | by default, disable the bad record and converter step during "insert into" |
 | carbon.load.si.repair | true | by default, enable loading for failed segments in SI during load/insert command |
 | carbon.si.repair.limit | (none) | Number of failed segments to be loaded in SI when repairing missing segments in SI, by default load all the missing segments. Supports value from 0 to 2147483646 |
+| carbon.complex.delimiter.level.1 | # | This delimiter is used for parsing complex data type columns. Level 1 delimiter splits the complex type data column in a row (eg., a\001b\001c --> Array = {a,b,c}). |
+| carbon.complex.delimiter.level.2 | $ | This delimiter splits the complex type nested data column in a row. Applies level_1 delimiter & applies level_2 based on complex data type (eg., a\002b\001c\002d --> Array> = {{a,b},{c,d}}). |
+| carbon.complex.delimiter.level.3 | @ | This delimiter splits the complex type nested data column in a row. Applies level_1 delimiter, applies level_2 and then level_3 delimiter based on complex data type. Used in case of nested Complex Map type. (eg., 'a\003b\002b\003c\001aa\003bb\002cc\003dd' --> Array Of Map> = {{a -> b, b -> c},{aa -> bb, cc -> dd}}). |
+| carbon.complex.delimiter.level.4 | (none) | All the levels of delimiters are used for parsing complex data type columns. All the delimiters are applied depending on the complexity of the given data type. Level 4 delimiter will be used for parsing the complex values after level 3 delimiter has been applied already. |
+| enable.unsafe.columnpage | true | This property enables creation of column pages while writing on off heap (unsafe) memory. It is set by default |

Review comment:
       you can remove this, as its already present in usecases.md file. Else you can just copy the same thing here also




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-953109873


   Build Success with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/501/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] pratyakshsharma commented on a change in pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on a change in pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#discussion_r706813780



##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,18 @@ This section provides the details of all the configurations required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of days after which the timestamp based subdirectories are expired in the trash folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether the min max pruning to be performed on the target table based on the source data. It will be useful when data is not sparse across target table which results in better pruning.|
+| carbon.storelocation | (none) | This parameter defines the path on DFS where carbondata files and metadata will be stored. |

Review comment:
       Taken care of.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] pratyakshsharma commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-920810555


   @akashrn5 @Indhumathi27 Can we merge this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] pratyakshsharma commented on a change in pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on a change in pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#discussion_r732703698



##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of days after which the timestamp based subdirectories are expired in the trash folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether the min max pruning to be performed on the target table based on the source data. It will be useful when data is not sparse across target table which results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path on DFS where carbondata files and metadata will be stored. The configuration `carbon.storelocation` has been deprecated. For simplicity, we recommended you remove the configuration of `carbon.storelocation`. If `carbon.storelocation` and `spark.sql.warehouse.dir` are configured to different paths, exception will be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database location.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which further consists of column pages. As per the latest V3 format, the default size of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000 rows. |
+| carbon.properties.filepath | conf/carbon.properties | This file is by default present in conf directory on your base project path. Users can configure all the carbondata related properties in this file. |

Review comment:
       got it. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-947986431


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6074/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

akashrn5 commented on a change in pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#discussion_r736205380



##########
File path: docs/configuration-parameters.md
##########
@@ -70,6 +75,7 @@ This section provides the details of all the configurations required for the Car
 | carbon.load.global.sort.partitions | 0 | The number of partitions to use when shuffling data for global sort. Default value 0 means to use same number of map tasks as reduce tasks. **NOTE:** In general, it is recommended to have 2-3 tasks per CPU core in your cluster. |
 | carbon.sort.size | 100000 | Number of records to hold in memory to sort and write intermediate sort temp files. **NOTE:** Memory required for data loading will increase if you turn this value bigger. Besides each thread will cache this amout of records. The number of threads is configured by *carbon.number.of.cores.while.loading*. |
 | carbon.options.bad.records.logger.enable | false | CarbonData can identify the records that are not conformant to schema and isolate them as bad records. Enabling this configuration will make CarbonData to log such bad records. **NOTE:** If the input data contains many bad records, logging them will slow down the over all data loading throughput. The data load operation status would depend on the configuration in ***carbon.bad.records.action***. |
+| carbon.options.bad.records.action | FAIL | This property has four types of  bad record actions: FORCE, REDIRECT, IGNORE and FAIL. If set to FORCE then it auto-corrects the data by storing the bad records as NULL. If set to REDIRECT then bad records are written to the raw CSV instead of being loaded. If set to IGNORE then bad records are neither loaded nor written to the raw CSV. If set to FAIL then data loading fails if any bad records are found. |

Review comment:
       this property is actually the load level property. So its better to mention that first it takes load options and check for `bad_records_action`, if not present, then check for `carbon.options.bad.records.action` load property and if not configured takes value from `carbon.bad.records.action` system level property, if not configured considers default value with is FAIL.
   
   So this will clarify the priority of how we consider in carbon.

##########
File path: docs/configuration-parameters.md
##########
@@ -151,6 +161,16 @@ This section provides the details of all the configurations required for the Car
 | carbon.partition.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)** upto which driver can cache partition metadata. Beyond this, least recently used data will be removed from cache before loading new set of values.
 | carbon.mapOrderPushDown.<db_name>_<table_name>.column| empty | If order by column is in sort column, specify that sort column here to avoid ordering at map task . |
 | carbon.metacache.expiration.seconds | Long.MAX_VALUE | Expiration time **(in seconds)** for tableInfo cache in CarbonMetadata and tableModifiedTime in CarbonFileMetastore, after the time configured since last access to the cache entry, tableInfo and tableModifiedTime will be removed from each cache. Recent access will refresh the timer. Default value of Long.MAX_VALUE means the cache will not be expired by time. **NOTE:** At the time when cache is being expired, queries on the table may fail with NullPointerException. |
+| is.driver.instance | false | This parameter decides if LRU cache for storing indexes need to be created on driver. By default, it is created on executors. |
+| carbon.input.metrics.update.interval | 500000 | This property determines the number of records queried after which input metrics are updated to spark. It can be set dynamically within spark session itself as well. |

Review comment:
       same as above comment

##########
File path: docs/configuration-parameters.md
##########
@@ -119,6 +136,7 @@ This section provides the details of all the configurations required for the Car
 | carbon.enable.range.compaction | true | To configure Range-based Compaction to be used or not for RANGE_COLUMN. If true after compaction also the data would be present in ranges. |
 | carbon.si.segment.merge | false | Making this true degrades the LOAD performance. When the number of small files increase for SI segments(it can happen as number of columns will be less and we store position id and reference columns), user can either set to true which will merge the data files for upcoming loads or run SI refresh command which does this job for all segments. (REFRESH INDEX <index_table>) |
 | carbon.partition.data.on.tasklevel | false | When enabled, tasks launched for Local sort partition load will be based on one node one task. Compaction will be performed based on task level for a partition. Load performance might be degraded, because, the number of tasks launched is equal to number of nodes in case of local sort. For compaction, memory consumption will be less, as more number of tasks will be launched for a partition |
+| carbon.minor.compaction.size | (none) | Minor compaction originally worked based on the number of segments (by default 4). However in that scenario, there was no control over the size of segments to be compacted. This parameter was introduced to exclude segments whose size is greater than the configured threshold so that the overall IO and time taken decreases | 

Review comment:
       if dynamically configurable, please add this property in dynamic property section

##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,11 @@ This section provides the details of all the configurations required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of days after which the timestamp based subdirectories are expired in the trash folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether the min max pruning to be performed on the target table based on the source data. It will be useful when data is not sparse across target table which results in better pruning.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which further consists of column pages. As per the latest V3 format, the default size of a blocklet is 64 MB. It is recommended not to change this value except for some specific use case. |

Review comment:
       ```suggestion
   | carbon.blocklet.size | 64 MB | Carbondata file consist of blocklets which further consists of column pages. As per the latest V3 format, the default size of a blocklet is 64 MB. It is recommended not to change this value except for some specific use case. |
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-952818547


   Build Failed  with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/496/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] pratyakshsharma commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-947857235


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-947993263


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4331/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-906448964


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4155/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-906436661


   Build Success with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/300/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

vikramahuja1001 commented on a change in pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#discussion_r728686186



##########
File path: docs/configuration-parameters.md
##########
@@ -151,6 +169,17 @@ This section provides the details of all the configurations required for the Car
 | carbon.partition.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)** upto which driver can cache partition metadata. Beyond this, least recently used data will be removed from cache before loading new set of values.
 | carbon.mapOrderPushDown.<db_name>_<table_name>.column| empty | If order by column is in sort column, specify that sort column here to avoid ordering at map task . |
 | carbon.metacache.expiration.seconds | Long.MAX_VALUE | Expiration time **(in seconds)** for tableInfo cache in CarbonMetadata and tableModifiedTime in CarbonFileMetastore, after the time configured since last access to the cache entry, tableInfo and tableModifiedTime will be removed from each cache. Recent access will refresh the timer. Default value of Long.MAX_VALUE means the cache will not be expired by time. **NOTE:** At the time when cache is being expired, queries on the table may fail with NullPointerException. |
+| is.driver.instance | false | This parameter decides if LRU cache for storing indexes need to be created on driver. By default, it is created on executors. |
+| carbon.input.metrics.update.interval | 500000 | This property determines the number of records queried after which input metrics are updated to spark. |
+| carbon.use.bitset.pipe.line | true | Carbondata has various optimizations for faster query execution. Setting this property acts like a catalyst for filter queries. If set to true, the bitset is passed from one filter to another, resulting in incremental filtering and improving overall performance |
+
+## Index Configuration
+| Parameter | Default Value | Description |
+|--------------------------------------|---------------|---------------------------------------------------|
+| is.internal.load.call | false | This parameter decides whether the insert call is triggered internally or by the user. If triggered by user, this ensures data does not get loaded into MV directly |
+| carbon.lucene.index.stop.words | false | By default, lucene does not create index for stop words like 'is', 'the' etc. This flag is used to override this behaviour |
+| carbon.load.dateformat.setlenient.enable | false | This property enables parsing of timestamp/date data in load flow if the parsing fails with invalid timestamp data error. For example: 1941-03-15 00:00:00 is valid time in Asia/Calcutta zone and is invalid and will fail to parse in Asia/Shanghai zone as DST is observed and clocks were turned forward 1 hour to 1941-03-15 01:00:00 |

Review comment:
       This property can be set dynamically within a session, please add that detail here. Since there are a lot of properties which can be set dynamically i think we can make a new separate column to show if dynamic changes are allowed or not. @akashrn5 what do you think?  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-917626275


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4167/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-948685276


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6081/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] pratyakshsharma commented on a change in pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on a change in pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#discussion_r732732134



##########
File path: docs/configuration-parameters.md
##########
@@ -151,6 +169,17 @@ This section provides the details of all the configurations required for the Car
 | carbon.partition.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)** upto which driver can cache partition metadata. Beyond this, least recently used data will be removed from cache before loading new set of values.
 | carbon.mapOrderPushDown.<db_name>_<table_name>.column| empty | If order by column is in sort column, specify that sort column here to avoid ordering at map task . |
 | carbon.metacache.expiration.seconds | Long.MAX_VALUE | Expiration time **(in seconds)** for tableInfo cache in CarbonMetadata and tableModifiedTime in CarbonFileMetastore, after the time configured since last access to the cache entry, tableInfo and tableModifiedTime will be removed from each cache. Recent access will refresh the timer. Default value of Long.MAX_VALUE means the cache will not be expired by time. **NOTE:** At the time when cache is being expired, queries on the table may fail with NullPointerException. |
+| is.driver.instance | false | This parameter decides if LRU cache for storing indexes need to be created on driver. By default, it is created on executors. |
+| carbon.input.metrics.update.interval | 500000 | This property determines the number of records queried after which input metrics are updated to spark. |
+| carbon.use.bitset.pipe.line | true | Carbondata has various optimizations for faster query execution. Setting this property acts like a catalyst for filter queries. If set to true, the bitset is passed from one filter to another, resulting in incremental filtering and improving overall performance |
+
+## Index Configuration
+| Parameter | Default Value | Description |
+|--------------------------------------|---------------|---------------------------------------------------|
+| is.internal.load.call | false | This parameter decides whether the insert call is triggered internally or by the user. If triggered by user, this ensures data does not get loaded into MV directly |
+| carbon.lucene.index.stop.words | false | By default, lucene does not create index for stop words like 'is', 'the' etc. This flag is used to override this behaviour |
+| carbon.load.dateformat.setlenient.enable | false | This property enables parsing of timestamp/date data in load flow if the parsing fails with invalid timestamp data error. For example: 1941-03-15 00:00:00 is valid time in Asia/Calcutta zone and is invalid and will fail to parse in Asia/Shanghai zone as DST is observed and clocks were turned forward 1 hour to 1941-03-15 01:00:00 |

Review comment:
       Yeah we can do that actually. But for doing this, I would need info about what all properties can be configured dynamically. @akashrn5 Need your inputs here. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-951990138


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6099/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-947804799


   Build Failed  with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/462/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

vikramahuja1001 commented on a change in pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#discussion_r728687723



##########
File path: docs/configuration-parameters.md
##########
@@ -119,6 +136,7 @@ This section provides the details of all the configurations required for the Car
 | carbon.enable.range.compaction | true | To configure Range-based Compaction to be used or not for RANGE_COLUMN. If true after compaction also the data would be present in ranges. |
 | carbon.si.segment.merge | false | Making this true degrades the LOAD performance. When the number of small files increase for SI segments(it can happen as number of columns will be less and we store position id and reference columns), user can either set to true which will merge the data files for upcoming loads or run SI refresh command which does this job for all segments. (REFRESH INDEX <index_table>) |
 | carbon.partition.data.on.tasklevel | false | When enabled, tasks launched for Local sort partition load will be based on one node one task. Compaction will be performed based on task level for a partition. Load performance might be degraded, because, the number of tasks launched is equal to number of nodes in case of local sort. For compaction, memory consumption will be less, as more number of tasks will be launched for a partition |
+| carbon.minor.compaction.size | (none) | Minor compaction originally worked based on the number of segments (by default 4). However in that scenario, there was no control over the size of segments to be compacted. This parameter was introduced to exclude segments whose size is greater than the configured threshold so that the overall IO and time taken decreases | 

Review comment:
       This property is dynamically configurable too.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-947998844


   Build Success with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/464/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] MarvinLitt commented on a change in pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

MarvinLitt commented on a change in pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#discussion_r697783621



##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,18 @@ This section provides the details of all the configurations required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of days after which the timestamp based subdirectories are expired in the trash folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether the min max pruning to be performed on the target table based on the source data. It will be useful when data is not sparse across target table which results in better pruning.|
+| carbon.storelocation | (none) | This parameter defines the path on DFS where carbondata files and metadata will be stored. |

Review comment:
       carbon.storelocation is need to configure now? does it change to spark.sql.warehouse.dir ?
   @QiangCai @kumarvishal09 

##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,18 @@ This section provides the details of all the configurations required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of days after which the timestamp based subdirectories are expired in the trash folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether the min max pruning to be performed on the target table based on the source data. It will be useful when data is not sparse across target table which results in better pruning.|
+| carbon.storelocation | (none) | This parameter defines the path on DFS where carbondata files and metadata will be stored. |
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which further consists of column pages. As per the latest V3 format, the default size of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000 rows. |
+| carbon.properties.filepath | conf/carbon.properties | This file is by default present in conf directory on your base project path. Users can configure all the carbondata related properties in this file. |
+| carbon.date.format | yyyy-MM-dd | This property specifies the format in which data will be stored in the column with DATE data type. |
+| carbon.lock.class | (none) | This specifies the implementation of ICarbonLock interface to be used for acquiring the locks in case of concurrent operations |
+| carbon.local.dictionary.enable | (none) | If set to true, this property enables the generation of local dictionary. Local dictionary enables to map string and varchar values to numbers which helps in storing the data efficiently. |
+| carbon.local.dictionary.decoder.fallback | true | Page Level data will not be maintained for the blocklet. During fallback, actual data will be retrieved from the encoded page data using local dictionary. NOTE: Memory footprint decreases significantly as compared to when this property is set to false |
+| spark.deploy.zookeeper.url | (none) | The zookeeper url to connect to for using zookeeper based locking |
+| carbon.data.file.version | V3 | This specifies carbondata file format version. Carbondata file format has evolved with time from V1 to V3 in terms of metadata storage and IO level pruning capabilities. You can find more details [here](https://carbondata.apache.org/file-structure-of-carbondata.html#carbondata-file-format). |
+| spark.carbon.hive.schema.store | false | Carbondata currently supports 2 different types of metastores for storing schemas. This property specifies if Hive metastore is to be used for storing and retrieving table schemas |
+| spark.carbon.sessionstate.classname | `org.apache.spark.sql.hive.CarbonInMemorySessionStateBuilder` | This parameter determines the implementation of carbon session state to override sql parser and for adding strategies. Currently 2 different implementations are provided out of the box - one implements in memory session state and second implements Hive aware session state. |

Review comment:
       org.apache.spark.sql.hive.CarbonInMemorySessionStateBuilder is not a commonly used value, give a commonly used value




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

akashrn5 commented on a change in pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#discussion_r732974621



##########
File path: docs/configuration-parameters.md
##########
@@ -151,6 +169,17 @@ This section provides the details of all the configurations required for the Car
 | carbon.partition.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)** upto which driver can cache partition metadata. Beyond this, least recently used data will be removed from cache before loading new set of values.
 | carbon.mapOrderPushDown.<db_name>_<table_name>.column| empty | If order by column is in sort column, specify that sort column here to avoid ordering at map task . |
 | carbon.metacache.expiration.seconds | Long.MAX_VALUE | Expiration time **(in seconds)** for tableInfo cache in CarbonMetadata and tableModifiedTime in CarbonFileMetastore, after the time configured since last access to the cache entry, tableInfo and tableModifiedTime will be removed from each cache. Recent access will refresh the timer. Default value of Long.MAX_VALUE means the cache will not be expired by time. **NOTE:** At the time when cache is being expired, queries on the table may fail with NullPointerException. |
+| is.driver.instance | false | This parameter decides if LRU cache for storing indexes need to be created on driver. By default, it is created on executors. |
+| carbon.input.metrics.update.interval | 500000 | This property determines the number of records queried after which input metrics are updated to spark. |
+| carbon.use.bitset.pipe.line | true | Carbondata has various optimizations for faster query execution. Setting this property acts like a catalyst for filter queries. If set to true, the bitset is passed from one filter to another, resulting in incremental filtering and improving overall performance |
+
+## Index Configuration
+| Parameter | Default Value | Description |
+|--------------------------------------|---------------|---------------------------------------------------|
+| is.internal.load.call | false | This parameter decides whether the insert call is triggered internally or by the user. If triggered by user, this ensures data does not get loaded into MV directly |
+| carbon.lucene.index.stop.words | false | By default, lucene does not create index for stop words like 'is', 'the' etc. This flag is used to override this behaviour |
+| carbon.load.dateformat.setlenient.enable | false | This property enables parsing of timestamp/date data in load flow if the parsing fails with invalid timestamp data error. For example: 1941-03-15 00:00:00 is valid time in Asia/Calcutta zone and is invalid and will fail to parse in Asia/Shanghai zone as DST is observed and clocks were turned forward 1 hour to 1941-03-15 01:00:00 |

Review comment:
       yeah, its a good idea, we can update it. @pratyakshsharma you can find dynamic configurable in the constants defined.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-950954226


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6095/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] pratyakshsharma commented on a change in pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on a change in pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#discussion_r736481157



##########
File path: docs/configuration-parameters.md
##########
@@ -70,6 +75,7 @@ This section provides the details of all the configurations required for the Car
 | carbon.load.global.sort.partitions | 0 | The number of partitions to use when shuffling data for global sort. Default value 0 means to use same number of map tasks as reduce tasks. **NOTE:** In general, it is recommended to have 2-3 tasks per CPU core in your cluster. |
 | carbon.sort.size | 100000 | Number of records to hold in memory to sort and write intermediate sort temp files. **NOTE:** Memory required for data loading will increase if you turn this value bigger. Besides each thread will cache this amout of records. The number of threads is configured by *carbon.number.of.cores.while.loading*. |
 | carbon.options.bad.records.logger.enable | false | CarbonData can identify the records that are not conformant to schema and isolate them as bad records. Enabling this configuration will make CarbonData to log such bad records. **NOTE:** If the input data contains many bad records, logging them will slow down the over all data loading throughput. The data load operation status would depend on the configuration in ***carbon.bad.records.action***. |
+| carbon.options.bad.records.action | FAIL | This property has four types of  bad record actions: FORCE, REDIRECT, IGNORE and FAIL. If set to FORCE then it auto-corrects the data by storing the bad records as NULL. If set to REDIRECT then bad records are written to the raw CSV instead of being loaded. If set to IGNORE then bad records are neither loaded nor written to the raw CSV. If set to FAIL then data loading fails if any bad records are found. |

Review comment:
       done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] pratyakshsharma commented on a change in pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on a change in pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#discussion_r736469769



##########
File path: docs/configuration-parameters.md
##########
@@ -119,6 +136,7 @@ This section provides the details of all the configurations required for the Car
 | carbon.enable.range.compaction | true | To configure Range-based Compaction to be used or not for RANGE_COLUMN. If true after compaction also the data would be present in ranges. |
 | carbon.si.segment.merge | false | Making this true degrades the LOAD performance. When the number of small files increase for SI segments(it can happen as number of columns will be less and we store position id and reference columns), user can either set to true which will merge the data files for upcoming loads or run SI refresh command which does this job for all segments. (REFRESH INDEX <index_table>) |
 | carbon.partition.data.on.tasklevel | false | When enabled, tasks launched for Local sort partition load will be based on one node one task. Compaction will be performed based on task level for a partition. Load performance might be degraded, because, the number of tasks launched is equal to number of nodes in case of local sort. For compaction, memory consumption will be less, as more number of tasks will be launched for a partition |
+| carbon.minor.compaction.size | (none) | Minor compaction originally worked based on the number of segments (by default 4). However in that scenario, there was no control over the size of segments to be compacted. This parameter was introduced to exclude segments whose size is greater than the configured threshold so that the overall IO and time taken decreases | 

Review comment:
       added there as well with reference to this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-953094668


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4368/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] asfgit closed pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

asfgit closed pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] pratyakshsharma commented on a change in pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on a change in pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#discussion_r732717781



##########
File path: docs/configuration-parameters.md
##########
@@ -151,6 +169,17 @@ This section provides the details of all the configurations required for the Car
 | carbon.partition.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)** upto which driver can cache partition metadata. Beyond this, least recently used data will be removed from cache before loading new set of values.
 | carbon.mapOrderPushDown.<db_name>_<table_name>.column| empty | If order by column is in sort column, specify that sort column here to avoid ordering at map task . |
 | carbon.metacache.expiration.seconds | Long.MAX_VALUE | Expiration time **(in seconds)** for tableInfo cache in CarbonMetadata and tableModifiedTime in CarbonFileMetastore, after the time configured since last access to the cache entry, tableInfo and tableModifiedTime will be removed from each cache. Recent access will refresh the timer. Default value of Long.MAX_VALUE means the cache will not be expired by time. **NOTE:** At the time when cache is being expired, queries on the table may fail with NullPointerException. |
+| is.driver.instance | false | This parameter decides if LRU cache for storing indexes need to be created on driver. By default, it is created on executors. |
+| carbon.input.metrics.update.interval | 500000 | This property determines the number of records queried after which input metrics are updated to spark. |

Review comment:
       ok. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] pratyakshsharma commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-948540803


   This should be good to land now @vikramahuja1001 @akashrn5 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-952057177


   Build Success with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/489/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-906447513


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5899/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] pratyakshsharma commented on a change in pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on a change in pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#discussion_r733473793



##########
File path: docs/configuration-parameters.md
##########
@@ -151,6 +169,17 @@ This section provides the details of all the configurations required for the Car
 | carbon.partition.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)** upto which driver can cache partition metadata. Beyond this, least recently used data will be removed from cache before loading new set of values.
 | carbon.mapOrderPushDown.<db_name>_<table_name>.column| empty | If order by column is in sort column, specify that sort column here to avoid ordering at map task . |
 | carbon.metacache.expiration.seconds | Long.MAX_VALUE | Expiration time **(in seconds)** for tableInfo cache in CarbonMetadata and tableModifiedTime in CarbonFileMetastore, after the time configured since last access to the cache entry, tableInfo and tableModifiedTime will be removed from each cache. Recent access will refresh the timer. Default value of Long.MAX_VALUE means the cache will not be expired by time. **NOTE:** At the time when cache is being expired, queries on the table may fail with NullPointerException. |
+| is.driver.instance | false | This parameter decides if LRU cache for storing indexes need to be created on driver. By default, it is created on executors. |
+| carbon.input.metrics.update.interval | 500000 | This property determines the number of records queried after which input metrics are updated to spark. |
+| carbon.use.bitset.pipe.line | true | Carbondata has various optimizations for faster query execution. Setting this property acts like a catalyst for filter queries. If set to true, the bitset is passed from one filter to another, resulting in incremental filtering and improving overall performance |
+
+## Index Configuration
+| Parameter | Default Value | Description |
+|--------------------------------------|---------------|---------------------------------------------------|
+| is.internal.load.call | false | This parameter decides whether the insert call is triggered internally or by the user. If triggered by user, this ensures data does not get loaded into MV directly |
+| carbon.lucene.index.stop.words | false | By default, lucene does not create index for stop words like 'is', 'the' etc. This flag is used to override this behaviour |
+| carbon.load.dateformat.setlenient.enable | false | This property enables parsing of timestamp/date data in load flow if the parsing fails with invalid timestamp data error. For example: 1941-03-15 00:00:00 is valid time in Asia/Calcutta zone and is invalid and will fail to parse in Asia/Shanghai zone as DST is observed and clocks were turned forward 1 hour to 1941-03-15 01:00:00 |

Review comment:
       alright, let me take care of this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-948690043


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4338/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-951029134


   Build Failed  with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/485/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-951980415


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4356/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] akashrn5 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

akashrn5 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-952556114


   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] akashrn5 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

akashrn5 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-952920956


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] brijoobopanna commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

brijoobopanna commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-906340398


   add to whitelist
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] pratyakshsharma commented on a change in pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on a change in pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#discussion_r706813856



##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,18 @@ This section provides the details of all the configurations required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of days after which the timestamp based subdirectories are expired in the trash folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether the min max pruning to be performed on the target table based on the source data. It will be useful when data is not sparse across target table which results in better pruning.|
+| carbon.storelocation | (none) | This parameter defines the path on DFS where carbondata files and metadata will be stored. |
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which further consists of column pages. As per the latest V3 format, the default size of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000 rows. |
+| carbon.properties.filepath | conf/carbon.properties | This file is by default present in conf directory on your base project path. Users can configure all the carbondata related properties in this file. |
+| carbon.date.format | yyyy-MM-dd | This property specifies the format in which data will be stored in the column with DATE data type. |
+| carbon.lock.class | (none) | This specifies the implementation of ICarbonLock interface to be used for acquiring the locks in case of concurrent operations |
+| carbon.local.dictionary.enable | (none) | If set to true, this property enables the generation of local dictionary. Local dictionary enables to map string and varchar values to numbers which helps in storing the data efficiently. |
+| carbon.local.dictionary.decoder.fallback | true | Page Level data will not be maintained for the blocklet. During fallback, actual data will be retrieved from the encoded page data using local dictionary. NOTE: Memory footprint decreases significantly as compared to when this property is set to false |
+| spark.deploy.zookeeper.url | (none) | The zookeeper url to connect to for using zookeeper based locking |
+| carbon.data.file.version | V3 | This specifies carbondata file format version. Carbondata file format has evolved with time from V1 to V3 in terms of metadata storage and IO level pruning capabilities. You can find more details [here](https://carbondata.apache.org/file-structure-of-carbondata.html#carbondata-file-format). |
+| spark.carbon.hive.schema.store | false | Carbondata currently supports 2 different types of metastores for storing schemas. This property specifies if Hive metastore is to be used for storing and retrieving table schemas |
+| spark.carbon.sessionstate.classname | `org.apache.spark.sql.hive.CarbonInMemorySessionStateBuilder` | This parameter determines the implementation of carbon session state to override sql parser and for adding strategies. Currently 2 different implementations are provided out of the box - one implements in memory session state and second implements Hive aware session state. |

Review comment:
       This is internally taken care of. Hence removed the property itself.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-917625667


   Build Success with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/313/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-906222439


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] pratyakshsharma commented on a change in pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on a change in pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#discussion_r732707327



##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of days after which the timestamp based subdirectories are expired in the trash folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether the min max pruning to be performed on the target table based on the source data. It will be useful when data is not sparse across target table which results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path on DFS where carbondata files and metadata will be stored. The configuration `carbon.storelocation` has been deprecated. For simplicity, we recommended you remove the configuration of `carbon.storelocation`. If `carbon.storelocation` and `spark.sql.warehouse.dir` are configured to different paths, exception will be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database location.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which further consists of column pages. As per the latest V3 format, the default size of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000 rows. |
+| carbon.properties.filepath | conf/carbon.properties | This file is by default present in conf directory on your base project path. Users can configure all the carbondata related properties in this file. |
+| carbon.date.format | yyyy-MM-dd | This property specifies the format in which data will be stored in the column with DATE data type. |
+| carbon.lock.class | (none) | This specifies the implementation of ICarbonLock interface to be used for acquiring the locks in case of concurrent operations |
+| carbon.local.dictionary.enable | (none) | If set to true, this property enables the generation of local dictionary. Local dictionary enables to map string and varchar values to numbers which helps in storing the data efficiently. |
+| carbon.local.dictionary.decoder.fallback | true | Page Level data will not be maintained for the blocklet. During fallback, actual data will be retrieved from the encoded page data using local dictionary. NOTE: Memory footprint decreases significantly as compared to when this property is set to false |

Review comment:
       Actually these properties were mentioned in the corresponding jira. So I simply followed the jira without any cross verifying. Will remove it anyways.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-951001713


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4352/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-952811278


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4363/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] pratyakshsharma commented on a change in pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on a change in pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#discussion_r732565856



##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of days after which the timestamp based subdirectories are expired in the trash folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether the min max pruning to be performed on the target table based on the source data. It will be useful when data is not sparse across target table which results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path on DFS where carbondata files and metadata will be stored. The configuration `carbon.storelocation` has been deprecated. For simplicity, we recommended you remove the configuration of `carbon.storelocation`. If `carbon.storelocation` and `spark.sql.warehouse.dir` are configured to different paths, exception will be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database location.|

Review comment:
       got it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#issuecomment-953095991


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6111/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org