You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by Zhangshunyu <gi...@git.apache.org> on 2016/09/22 08:31:51 UTC

[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...

GitHub user Zhangshunyu opened a pull request:

    https://github.com/apache/incubator-carbondata/pull/189

    [CARBONDATA-267] Set block_size for table on table level

    ## Why raised this pr?
    To configure block file size for each table on column level.
    ## How to solve?
    Add a new parameter in TableSchema, when create table, setting it in table properties and write this info into thrift file.
    ## How to test?
    Pass all the test cases and the new test case.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Zhangshunyu/incubator-carbondata blocksize922

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-carbondata/pull/189.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #189
    
----
commit ed40d0f8012297cc9e9cffb3812ef5d141e03879
Author: Zhangshunyu <zh...@huawei.com>
Date:   2016-09-22T08:19:29Z

    Set block_size for table on table level

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...

Posted by Zhangshunyu <gi...@git.apache.org>.
Github user Zhangshunyu commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/189#discussion_r81291387
  
    --- Diff: format/src/main/thrift/schema.thrift ---
    @@ -124,6 +124,7 @@ struct TableSchema{
     	1: required string table_id;  // ID used to
     	2: required list<ColumnSchema> table_columns; // Columns in the table
     	3: required SchemaEvolution schema_evolution; // History of schema evolution of this table
    +	4: optional i32 block_size
    --- End diff --
    
    @manishgupta88 As this point, i hold my own opinion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-carbondata/pull/189


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...

Posted by Zhangshunyu <gi...@git.apache.org>.
Github user Zhangshunyu commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/189#discussion_r80177528
  
    --- Diff: format/src/main/thrift/schema.thrift ---
    @@ -124,6 +124,7 @@ struct TableSchema{
     	1: required string table_id;  // ID used to
     	2: required list<ColumnSchema> table_columns; // Columns in the table
     	3: required SchemaEvolution schema_evolution; // History of schema evolution of this table
    +	4: optional i32 block_size
    --- End diff --
    
    @jackylk Each table has his own metadata, this file is just that one......


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...

Posted by sujith71955 <gi...@git.apache.org>.
Github user sujith71955 commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/189#discussion_r81310297
  
    --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala ---
    @@ -419,6 +420,7 @@ class TableNewProcessor(cm: tableModel, sqlContext: SQLContext) {
         schemaEvol
           .setSchemaEvolutionEntryList(new util.ArrayList[SchemaEvolutionEntry]())
         tableSchema.setTableId(UUID.randomUUID().toString)
    +    tableSchema.setBlocksize(Integer.parseInt(cm.tableBlockSize.getOrElse(0).toString))
    --- End diff --
    
    What if user will provide value as 1024M or 1MB, you will get exception and the same has not been handled.
    I think we need to handle


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/189#discussion_r81279790
  
    --- Diff: format/src/main/thrift/schema.thrift ---
    @@ -124,6 +124,7 @@ struct TableSchema{
     	1: required string table_id;  // ID used to
     	2: required list<ColumnSchema> table_columns; // Columns in the table
     	3: required SchemaEvolution schema_evolution; // History of schema evolution of this table
    +	4: optional i32 block_size
    --- End diff --
    
    @Zhangshunyu ...I agree with @jackylk ...just like we are storing the table path in metastore, same way block size can also be stored as it is metadata for the table. schema file we write should only describe the structure the table and not the metadata for table


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...

Posted by gvramana <gi...@git.apache.org>.
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/189#discussion_r81474992
  
    --- Diff: format/src/main/thrift/schema.thrift ---
    @@ -124,6 +124,7 @@ struct TableSchema{
     	1: required string table_id;  // ID used to
     	2: required list<ColumnSchema> table_columns; // Columns in the table
     	3: required SchemaEvolution schema_evolution; // History of schema evolution of this table
    +	4: optional i32 block_size
    --- End diff --
    
    @Zhangshunyu Actually need not change Schema for such extra properties which are being added. We can directly store TABLE_PROPERTIES as a string key value pairs in thrift. So that all such properties configuration parameters can be effectively handled without modifying thrift in future. @jackylk please comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...

Posted by Zhangshunyu <gi...@git.apache.org>.
Github user Zhangshunyu commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/189#discussion_r80176748
  
    --- Diff: format/src/main/thrift/schema.thrift ---
    @@ -124,6 +124,7 @@ struct TableSchema{
     	1: required string table_id;  // ID used to
     	2: required list<ColumnSchema> table_columns; // Columns in the table
     	3: required SchemaEvolution schema_evolution; // History of schema evolution of this table
    +	4: optional i32 block_size
    --- End diff --
    
    @jackylk If dont save it into file, when we restart thirift-server, the message in memory object would be lost.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...

Posted by sujith71955 <gi...@git.apache.org>.
Github user sujith71955 commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/189#discussion_r81280323
  
    --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/CarbonSqlParser.scala ---
    @@ -546,6 +548,18 @@ class CarbonSqlParser()
         colGrpNames.toString()
       }
     
    +  protected def getTableBlockSize(tableProperties: Map[String, String]): Integer = {
    +    var tableBlockSize: Integer = 0
    +    if (tableProperties.get(CarbonCommonConstants.TABLE_BLOCKSIZE).isDefined) {
    +      val blockSizeStr: String = tableProperties.get(CarbonCommonConstants.TABLE_BLOCKSIZE).get
    +      try {
    +        tableBlockSize = Integer.parseInt(blockSizeStr)
    +      } catch {
    +        case e: NumberFormatException => tableBlockSize = 0
    --- End diff --
    
    better assign default value. so that system will be in a consistent state always


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...

Posted by sujith71955 <gi...@git.apache.org>.
Github user sujith71955 commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/189#discussion_r81308653
  
    --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/CarbonSqlParser.scala ---
    @@ -546,6 +548,30 @@ class CarbonSqlParser()
         colGrpNames.toString()
       }
     
    +  protected def getTableBlockSize(tableProperties: Map[String, String]): Integer = {
    +    var tableBlockSize: Integer = 0
    +    if (tableProperties.get(CarbonCommonConstants.TABLE_BLOCKSIZE).isDefined) {
    +      val blockSizeStr: String = tableProperties.get(CarbonCommonConstants.TABLE_BLOCKSIZE).get
    +      try {
    +        tableBlockSize = Integer.parseInt(blockSizeStr)
    +      } catch {
    +        case e: NumberFormatException =>
    +          throw new MalformedCarbonCommandException("Invalid table_blocksize value found: " +
    +            s"$blockSizeStr, only int value from 1 to 2048 is supported.")
    +      }
    +      if (tableBlockSize < CarbonCommonConstants.BLOCK_SIZE_MIN_VAL ||
    +        tableBlockSize > CarbonCommonConstants.BLOCK_SIZE_MAX_VAL) {
    +        throw new MalformedCarbonCommandException("Invalid table_blocksize value found: " +
    +          s"$blockSizeStr, only int value from 1 to 2048 is supported.")
    --- End diff --
    
    Please mention the Unit also in comment like GB or MB
    "Invalid table_blocksize value found: " +
     +          s"$blockSizeStr, only int value from 1 MB to 2048 MB is supported"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/189#discussion_r80177385
  
    --- Diff: format/src/main/thrift/schema.thrift ---
    @@ -124,6 +124,7 @@ struct TableSchema{
     	1: required string table_id;  // ID used to
     	2: required list<ColumnSchema> table_columns; // Columns in the table
     	3: required SchemaEvolution schema_evolution; // History of schema evolution of this table
    +	4: optional i32 block_size
    --- End diff --
    
    I think it is better to save it in the table metadata, through metastore for example. 
    By concept, this should not be tied with the file.
    @ravipesala @gvramana any suggestion?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...

Posted by Zhangshunyu <gi...@git.apache.org>.
Github user Zhangshunyu commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/189#discussion_r80177026
  
    --- Diff: format/src/main/thrift/schema.thrift ---
    @@ -124,6 +124,7 @@ struct TableSchema{
     	1: required string table_id;  // ID used to
     	2: required list<ColumnSchema> table_columns; // Columns in the table
     	3: required SchemaEvolution schema_evolution; // History of schema evolution of this table
    +	4: optional i32 block_size
    --- End diff --
    
    @jackylk Futher more, all the meta data of table would also be stored into file by thrift, if not, when restart all info would be lost


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/189#discussion_r81279628
  
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java ---
    @@ -196,11 +197,22 @@ public AbstractFactDataWriter(String storeLocation, int measureCount, int mdKeyL
         blockIndexInfoList = new ArrayList<>();
         // get max file size;
         CarbonProperties propInstance = CarbonProperties.getInstance();
    -    this.fileSizeInBytes = Long.parseLong(propInstance
    -        .getProperty(CarbonCommonConstants.MAX_FILE_SIZE,
    -            CarbonCommonConstants.MAX_FILE_SIZE_DEFAULT_VAL))
    -        * CarbonCommonConstants.BYTE_TO_KB_CONVERSION_FACTOR
    -        * CarbonCommonConstants.BYTE_TO_KB_CONVERSION_FACTOR * 1L;
    +    // If blocksize is defined by create table ddl, then use this value otherwise
    +    // use system level property
    +    if (blocksize >= CarbonCommonConstants.MAX_FILE_SIZE_DEFAULT_VAL_MIN_VAL &&
    +        blocksize <= CarbonCommonConstants.MAX_FILE_SIZE_DEFAULT_VAL_MAX_VAL) {
    +      LOGGER.audit("The max file size is set in table level. ");
    +      this.fileSizeInBytes = blocksize * CarbonCommonConstants.BYTE_TO_KB_CONVERSION_FACTOR
    +          * CarbonCommonConstants.BYTE_TO_KB_CONVERSION_FACTOR * 1L;
    +    } else {
    +      LOGGER.audit("The max file size in create table ddl is not set or is invalid, " +
    +          "so here use " + CarbonCommonConstants.MAX_FILE_SIZE + "to set. ");
    --- End diff --
    
    @Zhangshunyu ...I am thinking instead of validating block size here specified by the user, we can validate at the time of creating the table and in error message we can display the range of allowed block size so that user is not confused.
    If user does not specify the block size the consider the default value and update in the schema


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/189#discussion_r81299063
  
    --- Diff: format/src/main/thrift/schema.thrift ---
    @@ -124,6 +124,7 @@ struct TableSchema{
     	1: required string table_id;  // ID used to
     	2: required list<ColumnSchema> table_columns; // Columns in the table
     	3: required SchemaEvolution schema_evolution; // History of schema evolution of this table
    +	4: optional i32 block_size
    --- End diff --
    
    @Zhangshunyu ...if you register the property in hive metastore, the proeprty will never be lost even on restart of thrift server as we do for tablepath...you can create table flow


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/189#discussion_r80176593
  
    --- Diff: format/src/main/thrift/schema.thrift ---
    @@ -124,6 +124,7 @@ struct TableSchema{
     	1: required string table_id;  // ID used to
     	2: required list<ColumnSchema> table_columns; // Columns in the table
     	3: required SchemaEvolution schema_evolution; // History of schema evolution of this table
    +	4: optional i32 block_size
    --- End diff --
    
    Why save it to the file? It should be stored as a table level metadata.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...

Posted by sujith71955 <gi...@git.apache.org>.
Github user sujith71955 commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/189#discussion_r81280420
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/carbon/metadata/schema/table/TableSchema.java ---
    @@ -102,6 +107,14 @@ public void setSchemaEvalution(SchemaEvolution schemaEvalution) {
         this.schemaEvalution = schemaEvalution;
       }
     
    +  public int getBlockszie() {
    +    return blockszie;
    +  }
    +
    +  public void setBlockszie(int blockszie) {
    +    this.blockszie = blockszie;
    --- End diff --
    
    provide proper variable name 'blocksize'


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...

Posted by sujith71955 <gi...@git.apache.org>.
Github user sujith71955 commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/189#discussion_r81280222
  
    --- Diff: format/src/main/thrift/schema.thrift ---
    @@ -124,6 +124,7 @@ struct TableSchema{
     	1: required string table_id;  // ID used to
     	2: required list<ColumnSchema> table_columns; // Columns in the table
     	3: required SchemaEvolution schema_evolution; // History of schema evolution of this table
    +	4: optional i32 block_size
    --- End diff --
    
    i think we should initialize the block size with default value, Will be better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---