You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Elliot West (JIRA)" <ji...@apache.org> on 2016/08/31 12:44:20 UTC

[jira] [Updated] (HIVE-12274) Increase width of columns used for general configuration in the metastore.

     [ https://issues.apache.org/jira/browse/HIVE-12274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Elliot West updated HIVE-12274:
-------------------------------
    Description: 
h2. Overview
This issue is very similar in principle to HIVE-1364. We are hitting a limit when processing JSON data that has a large nested schema. The struct definition is truncated when inserted into the metastore database column {{COLUMNS_V2.YPE_NAME}} as it is greater than 4000 characters in length.

Given that the purpose of these columns is to hold very loosely defined configuration values it seems rather limiting to impose such a relatively low length bound. One can imagine that valid use cases will arise where reasonable parameter/property values exceed the current limit. 

h2. Context
These limitations were in by the patch attributed to HIVE-1364 which mentions the _"max length on Oracle 9i/10g/11g"_ as the reason. However, nowadays the limit can be increased because:
* Oracle DB's {{varchar2}} supports 32767 bytes now, by setting the configuration parameter {{MAX_STRING_SIZE}} to {{EXTENDED}}. (source)
* Postgres supports a max of 1GB for {{character}} datatype. (source)
* MySQL can support upto 65535 bytes for the entire row. So long as the {{PARAM_KEY}} value + {{PARAM_VALUE}} is less than 65535, we should be good. (source)
* SQL Server's {{varchar}} max length is 8000 and can go beyond using "varchar(max)" with the same limitation as MySQL being 65535 bytes for the entire row. (source)
* Derby's {{varchar}} can be upto 32672 bytes. (source)

h2. Proposal
Can these columns not use CLOB-like types as for example as used by {{TBLS.VIEW_EXPANDED_TEXT}}? It would seem that suitable type equivalents exist for all targeted database platforms:

* MySQL: {{mediumtext}}
* Postgres: {{text}}
* Oracle: {{CLOB}}
* Derby: {{LONG VARCHAR}}

I'd suggest that the candidates for type change are:

* {{COLUMNS_V2.TYPE_NAME}}
* {{TABLE_PARAMS.PARAM_VALUE}}
* {{SERDE_PARAMS.PARAM_VALUE}}
* {{SD_PARAMS.PARAM_VALUE}}

Finally, will this limitation persist in the work resulting from HIVE-9452?

  was:
This issue is very similar in principle to HIVE-1364. We are hitting a limit when processing JSON data that has a large nested schema. The struct definition is truncated when inserted into the metastore database column {{COLUMNS_V2.YPE_NAME}} as it is greater than 4000 characters in length.

Given that the purpose of these columns is to hold very loosely defined configuration values it seems rather limiting to impose such a relatively low length bound. One can imagine that valid use cases will arise where reasonable parameter/property values exceed the current limit. Can these columns not use CLOB-like types as for example as used by {{TBLS.VIEW_EXPANDED_TEXT}}? It would seem that suitable type equivalents exist for all targeted database platforms:

* MySQL: {{mediumtext}}
* Postgres: {{text}}
* Oracle: {{CLOB}}
* Derby: {{LONG VARCHAR}}

I'd suggest that the candidates for type change are:

* {{COLUMNS_V2.TYPE_NAME}}
* {{TABLE_PARAMS.PARAM_VALUE}}
* {{SERDE_PARAMS.PARAM_VALUE}}
* {{SD_PARAMS.PARAM_VALUE}}

Finally, will this limitation persist in the work resulting from HIVE-9452?


> Increase width of columns used for general configuration in the metastore.
> --------------------------------------------------------------------------
>
>                 Key: HIVE-12274
>                 URL: https://issues.apache.org/jira/browse/HIVE-12274
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 2.0.0
>            Reporter: Elliot West
>              Labels: metastore
>         Attachments: HIVE-12274.example.ddl.hql
>
>
> h2. Overview
> This issue is very similar in principle to HIVE-1364. We are hitting a limit when processing JSON data that has a large nested schema. The struct definition is truncated when inserted into the metastore database column {{COLUMNS_V2.YPE_NAME}} as it is greater than 4000 characters in length.
> Given that the purpose of these columns is to hold very loosely defined configuration values it seems rather limiting to impose such a relatively low length bound. One can imagine that valid use cases will arise where reasonable parameter/property values exceed the current limit. 
> h2. Context
> These limitations were in by the patch attributed to HIVE-1364 which mentions the _"max length on Oracle 9i/10g/11g"_ as the reason. However, nowadays the limit can be increased because:
> * Oracle DB's {{varchar2}} supports 32767 bytes now, by setting the configuration parameter {{MAX_STRING_SIZE}} to {{EXTENDED}}. (source)
> * Postgres supports a max of 1GB for {{character}} datatype. (source)
> * MySQL can support upto 65535 bytes for the entire row. So long as the {{PARAM_KEY}} value + {{PARAM_VALUE}} is less than 65535, we should be good. (source)
> * SQL Server's {{varchar}} max length is 8000 and can go beyond using "varchar(max)" with the same limitation as MySQL being 65535 bytes for the entire row. (source)
> * Derby's {{varchar}} can be upto 32672 bytes. (source)
> h2. Proposal
> Can these columns not use CLOB-like types as for example as used by {{TBLS.VIEW_EXPANDED_TEXT}}? It would seem that suitable type equivalents exist for all targeted database platforms:
> * MySQL: {{mediumtext}}
> * Postgres: {{text}}
> * Oracle: {{CLOB}}
> * Derby: {{LONG VARCHAR}}
> I'd suggest that the candidates for type change are:
> * {{COLUMNS_V2.TYPE_NAME}}
> * {{TABLE_PARAMS.PARAM_VALUE}}
> * {{SERDE_PARAMS.PARAM_VALUE}}
> * {{SD_PARAMS.PARAM_VALUE}}
> Finally, will this limitation persist in the work resulting from HIVE-9452?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)