You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by GitBox <gi...@apache.org> on 2021/10/14 05:55:04 UTC

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4210: [CARBONDATA-4240]: Added missing properties on the configurations page

akashrn5 commented on a change in pull request #4210:
URL: https://github.com/apache/carbondata/pull/4210#discussion_r728631522



##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of days after which the timestamp based subdirectories are expired in the trash folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether the min max pruning to be performed on the target table based on the source data. It will be useful when data is not sparse across target table which results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path on DFS where carbondata files and metadata will be stored. The configuration `carbon.storelocation` has been deprecated. For simplicity, we recommended you remove the configuration of `carbon.storelocation`. If `carbon.storelocation` and `spark.sql.warehouse.dir` are configured to different paths, exception will be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database location.|

Review comment:
       `spark.sql.warehouse.dir` is a spark property and no need to add the documentation here. Also, there is already a note present in document saying `carbon.storelocation` is depracated.

##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of days after which the timestamp based subdirectories are expired in the trash folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether the min max pruning to be performed on the target table based on the source data. It will be useful when data is not sparse across target table which results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path on DFS where carbondata files and metadata will be stored. The configuration `carbon.storelocation` has been deprecated. For simplicity, we recommended you remove the configuration of `carbon.storelocation`. If `carbon.storelocation` and `spark.sql.warehouse.dir` are configured to different paths, exception will be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database location.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which further consists of column pages. As per the latest V3 format, the default size of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000 rows. |
+| carbon.properties.filepath | conf/carbon.properties | This file is by default present in conf directory on your base project path. Users can configure all the carbondata related properties in this file. |
+| carbon.date.format | yyyy-MM-dd | This property specifies the format in which data will be stored in the column with DATE data type. |

Review comment:
       if you say the data will be stored in column in this format, it conveys a wrong info. Because, we store date as integer as its direct dictionary and time as long. Instead, you can say this specifies the way carbondata parses the date data for all the incoming date data to finally store in carbondata file. may be something like this.

##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of days after which the timestamp based subdirectories are expired in the trash folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether the min max pruning to be performed on the target table based on the source data. It will be useful when data is not sparse across target table which results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path on DFS where carbondata files and metadata will be stored. The configuration `carbon.storelocation` has been deprecated. For simplicity, we recommended you remove the configuration of `carbon.storelocation`. If `carbon.storelocation` and `spark.sql.warehouse.dir` are configured to different paths, exception will be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database location.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which further consists of column pages. As per the latest V3 format, the default size of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000 rows. |
+| carbon.properties.filepath | conf/carbon.properties | This file is by default present in conf directory on your base project path. Users can configure all the carbondata related properties in this file. |

Review comment:
       this is system property, where this file contains the carbon property, i think this place is not the better place to mention this. May be you can try to add these info in may be deployment guide or quickstart

##########
File path: docs/configuration-parameters.md
##########
@@ -151,6 +169,17 @@ This section provides the details of all the configurations required for the Car
 | carbon.partition.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)** upto which driver can cache partition metadata. Beyond this, least recently used data will be removed from cache before loading new set of values.
 | carbon.mapOrderPushDown.<db_name>_<table_name>.column| empty | If order by column is in sort column, specify that sort column here to avoid ordering at map task . |
 | carbon.metacache.expiration.seconds | Long.MAX_VALUE | Expiration time **(in seconds)** for tableInfo cache in CarbonMetadata and tableModifiedTime in CarbonFileMetastore, after the time configured since last access to the cache entry, tableInfo and tableModifiedTime will be removed from each cache. Recent access will refresh the timer. Default value of Long.MAX_VALUE means the cache will not be expired by time. **NOTE:** At the time when cache is being expired, queries on the table may fail with NullPointerException. |
+| is.driver.instance | false | This parameter decides if LRU cache for storing indexes need to be created on driver. By default, it is created on executors. |
+| carbon.input.metrics.update.interval | 500000 | This property determines the number of records queried after which input metrics are updated to spark. |
+| carbon.use.bitset.pipe.line | true | Carbondata has various optimizations for faster query execution. Setting this property acts like a catalyst for filter queries. If set to true, the bitset is passed from one filter to another, resulting in incremental filtering and improving overall performance |
+
+## Index Configuration
+| Parameter | Default Value | Description |
+|--------------------------------------|---------------|---------------------------------------------------|
+| is.internal.load.call | false | This parameter decides whether the insert call is triggered internally or by the user. If triggered by user, this ensures data does not get loaded into MV directly |

Review comment:
       this is actually an internal property no need to add in doc

##########
File path: docs/configuration-parameters.md
##########
@@ -151,6 +169,17 @@ This section provides the details of all the configurations required for the Car
 | carbon.partition.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)** upto which driver can cache partition metadata. Beyond this, least recently used data will be removed from cache before loading new set of values.
 | carbon.mapOrderPushDown.<db_name>_<table_name>.column| empty | If order by column is in sort column, specify that sort column here to avoid ordering at map task . |
 | carbon.metacache.expiration.seconds | Long.MAX_VALUE | Expiration time **(in seconds)** for tableInfo cache in CarbonMetadata and tableModifiedTime in CarbonFileMetastore, after the time configured since last access to the cache entry, tableInfo and tableModifiedTime will be removed from each cache. Recent access will refresh the timer. Default value of Long.MAX_VALUE means the cache will not be expired by time. **NOTE:** At the time when cache is being expired, queries on the table may fail with NullPointerException. |
+| is.driver.instance | false | This parameter decides if LRU cache for storing indexes need to be created on driver. By default, it is created on executors. |
+| carbon.input.metrics.update.interval | 500000 | This property determines the number of records queried after which input metrics are updated to spark. |

Review comment:
       for this add, it can be set dynamically also within session

##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of days after which the timestamp based subdirectories are expired in the trash folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether the min max pruning to be performed on the target table based on the source data. It will be useful when data is not sparse across target table which results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path on DFS where carbondata files and metadata will be stored. The configuration `carbon.storelocation` has been deprecated. For simplicity, we recommended you remove the configuration of `carbon.storelocation`. If `carbon.storelocation` and `spark.sql.warehouse.dir` are configured to different paths, exception will be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database location.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which further consists of column pages. As per the latest V3 format, the default size of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000 rows. |

Review comment:
       no need to mention about V2 here, as no one is using it. For blocklet size you can just say size of each blocklet will be 64mb by default inside a block. Recommended not to change it unless for any specific use or any specific issue. 

##########
File path: docs/configuration-parameters.md
##########
@@ -99,6 +110,12 @@ This section provides the details of all the configurations required for the Car
 | carbon.enable.bad.record.handling.for.insert | false | by default, disable the bad record and converter step during "insert into" |
 | carbon.load.si.repair | true | by default, enable loading for failed segments in SI during load/insert command |
 | carbon.si.repair.limit | (none) | Number of failed segments to be loaded in SI when repairing missing segments in SI, by default load all the missing segments. Supports value from 0 to 2147483646 |
+| carbon.complex.delimiter.level.1 | # | This delimiter is used for parsing complex data type columns. Level 1 delimiter splits the complex type data column in a row (eg., a\001b\001c --> Array = {a,b,c}). |
+| carbon.complex.delimiter.level.2 | $ | This delimiter splits the complex type nested data column in a row. Applies level_1 delimiter & applies level_2 based on complex data type (eg., a\002b\001c\002d --> Array> = {{a,b},{c,d}}). |
+| carbon.complex.delimiter.level.3 | @ | This delimiter splits the complex type nested data column in a row. Applies level_1 delimiter, applies level_2 and then level_3 delimiter based on complex data type. Used in case of nested Complex Map type. (eg., 'a\003b\002b\003c\001aa\003bb\002cc\003dd' --> Array Of Map> = {{a -> b, b -> c},{aa -> bb, cc -> dd}}). |
+| carbon.complex.delimiter.level.4 | (none) | All the levels of delimiters are used for parsing complex data type columns. All the delimiters are applied depending on the complexity of the given data type. Level 4 delimiter will be used for parsing the complex values after level 3 delimiter has been applied already. |
+| enable.unsafe.columnpage | true | This property enables creation of column pages while writing on off heap (unsafe) memory. It is set by default |
+| carbon.lucene.compression.mode | speed | Carbondata supports different types of indices for efficient queries. This parameter decides the compression mode used by lucene index for index writing. In the default mode, writing speed is given more priority rather than the index size. |

Review comment:
       please remove this also, as its present in lucene-index-guide.md

##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of days after which the timestamp based subdirectories are expired in the trash folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether the min max pruning to be performed on the target table based on the source data. It will be useful when data is not sparse across target table which results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path on DFS where carbondata files and metadata will be stored. The configuration `carbon.storelocation` has been deprecated. For simplicity, we recommended you remove the configuration of `carbon.storelocation`. If `carbon.storelocation` and `spark.sql.warehouse.dir` are configured to different paths, exception will be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database location.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which further consists of column pages. As per the latest V3 format, the default size of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000 rows. |
+| carbon.properties.filepath | conf/carbon.properties | This file is by default present in conf directory on your base project path. Users can configure all the carbondata related properties in this file. |
+| carbon.date.format | yyyy-MM-dd | This property specifies the format in which data will be stored in the column with DATE data type. |
+| carbon.lock.class | (none) | This specifies the implementation of ICarbonLock interface to be used for acquiring the locks in case of concurrent operations |
+| carbon.local.dictionary.enable | (none) | If set to true, this property enables the generation of local dictionary. Local dictionary enables to map string and varchar values to numbers which helps in storing the data efficiently. |

Review comment:
       you can remove this property, its already mentioned in document ddl-of-carbondata.md

##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of days after which the timestamp based subdirectories are expired in the trash folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether the min max pruning to be performed on the target table based on the source data. It will be useful when data is not sparse across target table which results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path on DFS where carbondata files and metadata will be stored. The configuration `carbon.storelocation` has been deprecated. For simplicity, we recommended you remove the configuration of `carbon.storelocation`. If `carbon.storelocation` and `spark.sql.warehouse.dir` are configured to different paths, exception will be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database location.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which further consists of column pages. As per the latest V3 format, the default size of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000 rows. |
+| carbon.properties.filepath | conf/carbon.properties | This file is by default present in conf directory on your base project path. Users can configure all the carbondata related properties in this file. |
+| carbon.date.format | yyyy-MM-dd | This property specifies the format in which data will be stored in the column with DATE data type. |
+| carbon.lock.class | (none) | This specifies the implementation of ICarbonLock interface to be used for acquiring the locks in case of concurrent operations |
+| carbon.local.dictionary.enable | (none) | If set to true, this property enables the generation of local dictionary. Local dictionary enables to map string and varchar values to numbers which helps in storing the data efficiently. |
+| carbon.local.dictionary.decoder.fallback | true | Page Level data will not be maintained for the blocklet. During fallback, actual data will be retrieved from the encoded page data using local dictionary. NOTE: Memory footprint decreases significantly as compared to when this property is set to false |

Review comment:
       same comment as above. You can first check whole project if its already present, if not you can add, else you can avoid duplication

##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of days after which the timestamp based subdirectories are expired in the trash folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether the min max pruning to be performed on the target table based on the source data. It will be useful when data is not sparse across target table which results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path on DFS where carbondata files and metadata will be stored. The configuration `carbon.storelocation` has been deprecated. For simplicity, we recommended you remove the configuration of `carbon.storelocation`. If `carbon.storelocation` and `spark.sql.warehouse.dir` are configured to different paths, exception will be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database location.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which further consists of column pages. As per the latest V3 format, the default size of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000 rows. |
+| carbon.properties.filepath | conf/carbon.properties | This file is by default present in conf directory on your base project path. Users can configure all the carbondata related properties in this file. |
+| carbon.date.format | yyyy-MM-dd | This property specifies the format in which data will be stored in the column with DATE data type. |
+| carbon.lock.class | (none) | This specifies the implementation of ICarbonLock interface to be used for acquiring the locks in case of concurrent operations |
+| carbon.local.dictionary.enable | (none) | If set to true, this property enables the generation of local dictionary. Local dictionary enables to map string and varchar values to numbers which helps in storing the data efficiently. |
+| carbon.local.dictionary.decoder.fallback | true | Page Level data will not be maintained for the blocklet. During fallback, actual data will be retrieved from the encoded page data using local dictionary. NOTE: Memory footprint decreases significantly as compared to when this property is set to false |
+| spark.deploy.zookeeper.url | (none) | The zookeeper url to connect to for using zookeeper based locking |

Review comment:
       this is also spark property, no need to add here

##########
File path: docs/configuration-parameters.md
##########
@@ -52,6 +52,17 @@ This section provides the details of all the configurations required for the Car
 | carbon.trash.retention.days | 7 | This parameter specifies the number of days after which the timestamp based subdirectories are expired in the trash folder. Allowed Min value = 0, Allowed Max Value = 365 days|
 | carbon.clean.file.force.allowed | false | This parameter specifies if the clean files operation with force option is allowed or not.|
 | carbon.cdc.minmax.pruning.enabled | false | This parameter defines whether the min max pruning to be performed on the target table based on the source data. It will be useful when data is not sparse across target table which results in better pruning.|
+| spark.sql.warehouse.dir | ../carbon.store | This parameter defines the path on DFS where carbondata files and metadata will be stored. The configuration `carbon.storelocation` has been deprecated. For simplicity, we recommended you remove the configuration of `carbon.storelocation`. If `carbon.storelocation` and `spark.sql.warehouse.dir` are configured to different paths, exception will be thrown when CREATE DATABASE and DROP DATABASE to avoid inconsistent database location.|
+| carbon.blocklet.size | 64 MB | Carbondata files consist of blocklets which further consists of column pages. As per the latest V3 format, the default size of a blocklet is 64 MB. In V2 format, the default size of a blocklet was 120000 rows. |
+| carbon.properties.filepath | conf/carbon.properties | This file is by default present in conf directory on your base project path. Users can configure all the carbondata related properties in this file. |
+| carbon.date.format | yyyy-MM-dd | This property specifies the format in which data will be stored in the column with DATE data type. |
+| carbon.lock.class | (none) | This specifies the implementation of ICarbonLock interface to be used for acquiring the locks in case of concurrent operations |
+| carbon.local.dictionary.enable | (none) | If set to true, this property enables the generation of local dictionary. Local dictionary enables to map string and varchar values to numbers which helps in storing the data efficiently. |
+| carbon.local.dictionary.decoder.fallback | true | Page Level data will not be maintained for the blocklet. During fallback, actual data will be retrieved from the encoded page data using local dictionary. NOTE: Memory footprint decreases significantly as compared to when this property is set to false |
+| spark.deploy.zookeeper.url | (none) | The zookeeper url to connect to for using zookeeper based locking |
+| carbon.data.file.version | V3 | This specifies carbondata file format version. Carbondata file format has evolved with time from V1 to V3 in terms of metadata storage and IO level pruning capabilities. You can find more details [here](https://carbondata.apache.org/file-structure-of-carbondata.html#carbondata-file-format). |
+| spark.carbon.hive.schema.store | false | Carbondata currently supports 2 different types of metastores for storing schemas. This property specifies if Hive metastore is to be used for storing and retrieving table schemas |
+| spark.carbon.sqlastbuilder.classname | `org.apache.spark.sql.hive.CarbonSqlAstBuilder` | Carbondata extension of spark's `SparkSqlAstBuilder` that converts an ANTLR ParseTree into a logical plan. |

Review comment:
       i think no need to mention this because just configuring carbon extensions class would be enough for carbon to work, so this will simply confuse the user.

##########
File path: docs/configuration-parameters.md
##########
@@ -99,6 +110,12 @@ This section provides the details of all the configurations required for the Car
 | carbon.enable.bad.record.handling.for.insert | false | by default, disable the bad record and converter step during "insert into" |
 | carbon.load.si.repair | true | by default, enable loading for failed segments in SI during load/insert command |
 | carbon.si.repair.limit | (none) | Number of failed segments to be loaded in SI when repairing missing segments in SI, by default load all the missing segments. Supports value from 0 to 2147483646 |
+| carbon.complex.delimiter.level.1 | # | This delimiter is used for parsing complex data type columns. Level 1 delimiter splits the complex type data column in a row (eg., a\001b\001c --> Array = {a,b,c}). |
+| carbon.complex.delimiter.level.2 | $ | This delimiter splits the complex type nested data column in a row. Applies level_1 delimiter & applies level_2 based on complex data type (eg., a\002b\001c\002d --> Array> = {{a,b},{c,d}}). |
+| carbon.complex.delimiter.level.3 | @ | This delimiter splits the complex type nested data column in a row. Applies level_1 delimiter, applies level_2 and then level_3 delimiter based on complex data type. Used in case of nested Complex Map type. (eg., 'a\003b\002b\003c\001aa\003bb\002cc\003dd' --> Array Of Map> = {{a -> b, b -> c},{aa -> bb, cc -> dd}}). |
+| carbon.complex.delimiter.level.4 | (none) | All the levels of delimiters are used for parsing complex data type columns. All the delimiters are applied depending on the complexity of the given data type. Level 4 delimiter will be used for parsing the complex values after level 3 delimiter has been applied already. |
+| enable.unsafe.columnpage | true | This property enables creation of column pages while writing on off heap (unsafe) memory. It is set by default |

Review comment:
       you can remove this, as its already present in usecases.md file. Else you can just copy the same thing here also




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org