You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2021/07/20 02:28:21 UTC

[GitHub] [incubator-doris] yangzhg opened a new issue #6276: [Proposal] Support large variable-length string type

yangzhg opened a new issue #6276:
URL: https://github.com/apache/incubator-doris/issues/6276


   ## Background
   
   There are currently two types of strings: CHAR and VARCHAR. Char stores fixed-length strings and VARCHAR stores variable-length strings. The maximum length of VARCHAR is 65533. This length can meet most demand scenarios, but for some scenarios. In the scenario of storing larger strings in doris, it is not enough, so we need to add a new data type String. String can correspond to blob or text storage in mysql. The maximum length is 4GB, but we still don't recommend it. Store more than 64K strings in DORIS
   
   ## Other system implementation
   
   * MYSQL: Mysql uses blob or TEXT as the storage type for very long strings. MySQL can perform string operations on these types, but performance is not guaranteed. In actual storage, the data will be stored in the overflow page. And according to the version and storage engine in the data page, the first n characters will be stored for indexing
   * parquet/ORC: These two pairs and large strings are directly stored in the data area, and there is no special processing and only dictionary encoding
   
   ## Design
   
   * Added the String type, which represents a string of any length. In order to be compatible with mysql, the maximum length is set to 4G-4, and 4 bytes are used to store the length of the string
   * The data storage is similar to the varchar type, the previous length identifier is changed to 4 bytes
   * Indexes are not currently supported, and zonemap indexes will be enabled after the zonemap length limit is ready.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] HappenLee commented on issue #6276: [Proposal] Support large variable-length string type

Posted by GitBox <gi...@apache.org>.
HappenLee commented on issue #6276:
URL: https://github.com/apache/incubator-doris/issues/6276#issuecomment-883311869


   We should suppot limit zonemap index.
   
   In the current scenario of doris, excessively long zonemap indexes do not contribute significantly to query improvement, but rather consume a lot of additional memory and storage resources. So it is wise to limit the length of zonemap for large types of objects and long strings
   
   For example, postgresql's Brin index, by default, records 128 rows of data once. The current Doris default page size is 64k, so a reasonable zonemap index length should be about 512byte.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yangzhg closed issue #6276: [Proposal] Support large variable-length string type

Posted by GitBox <gi...@apache.org>.
yangzhg closed issue #6276:
URL: https://github.com/apache/incubator-doris/issues/6276


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yangzhg closed issue #6276: [Proposal] Support large variable-length string type

Posted by GitBox <gi...@apache.org>.
yangzhg closed issue #6276:
URL: https://github.com/apache/incubator-doris/issues/6276


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yangzhg closed issue #6276: [Proposal] Support large variable-length string type

Posted by GitBox <gi...@apache.org>.
yangzhg closed issue #6276:
URL: https://github.com/apache/incubator-doris/issues/6276


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] HappenLee commented on issue #6276: [Proposal] Support large variable-length string type

Posted by GitBox <gi...@apache.org>.
HappenLee commented on issue #6276:
URL: https://github.com/apache/incubator-doris/issues/6276#issuecomment-883311869


   We should suppot limit zonemap index.
   
   In the current scenario of doris, excessively long zonemap indexes do not contribute significantly to query improvement, but rather consume a lot of additional memory and storage resources. So it is wise to limit the length of zonemap for large types of objects and long strings
   
   For example, postgresql's Brin index, by default, records 128 rows of data once. The current Doris default page size is 64k, so a reasonable zonemap index length should be about 512byte.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org