You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/04/22 13:08:56 UTC

[GitHub] [incubator-doris] decster opened a new issue #3382: [Proposal] Memory Optimized Column Storage Engine

decster opened a new issue #3382:
URL: https://github.com/apache/incubator-doris/issues/3382


   Currently, the underlying storage engine in Doris only supports append style batch write, update/upsert/delete by a primary key (like in traditional DBMS/KV) is not supported.
   
   As Doris become more popular, many use-cases start to request the `realtime update` capability, e.g.
   
   * eCommerce use-case: order, inventory status real-time analytics
   * Finance use case, use account/balance/transfer checking
   * Social media, user/post status updates & statistics
   
   Also, Doris rollup table update on new data batch can be considered as `counter` updates, which may also benefit from this storage engine.
   
   Currently, Doris uses a special `REPLACE` column property to `simulate` upsert semantics. It's basically merge-on-read, when scanning a tablet, Doris automatically merges versions under the same key and only keep the latest version. In real-time use-cases, if the ingestion frequency is high, there are a lot of segments need to be merged, causing a performance bottleneck.
   
   We propose a new memory optimized column storage engine to support the `realtime` + `frequent update` use-case, which borrows some ideas from Kudu.
   
   Desing doc can be found [here](https://decster.github.io/docs/choco.pdf)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] decster commented on issue #3382: [Proposal] Memory Optimized Column Storage Engine

Posted by GitBox <gi...@apache.org>.
decster commented on issue #3382:
URL: https://github.com/apache/incubator-doris/issues/3382#issuecomment-620377707


   > So in M1 and M2 step, the table's data is fully stored in memory(or PMEM),
   
   Yes, only in memory, no design&plan for PMEM currently... 
   
   > and the persistence is only guaranteed by WAL. Am I right?
   
   In M1&M2, there is no persistence at all, WriteTX files(WAL) will be write to disk, but no loading logic.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on issue #3382: [Proposal] Memory Optimized Column Storage Engine

Posted by GitBox <gi...@apache.org>.
morningman commented on issue #3382:
URL: https://github.com/apache/incubator-doris/issues/3382#issuecomment-620359659


   > Do you mean 1: MemTablet migrate to SSD/HDD tablet or 2: ColumnPages flush to hard disk?
   > For case 1: we will add partial row update support for disk storage engine, which may be slower.
   > For case 2: it's allowed
   
   After MemTablet migrate to hard disk. For `Update` operation, do we need to `seek and update`?
   Or actually we just implement a `Upsert` operation?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] decster commented on issue #3382: [Proposal] Memory Optimized Column Storage Engine

Posted by GitBox <gi...@apache.org>.
decster commented on issue #3382:
URL: https://github.com/apache/incubator-doris/issues/3382#issuecomment-618145730


   I have a POC project, which already has some code & tests, will create some PR to refactor & integrate those code into Doris, as a starting base and code skeleton. Will create more TODO issues based on this.
   Project location:
   https://github.com/decster/choco
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] imay commented on issue #3382: [Proposal] Memory Optimized Column Storage Engine

Posted by GitBox <gi...@apache.org>.
imay commented on issue #3382:
URL: https://github.com/apache/incubator-doris/issues/3382#issuecomment-618114581


   @decster 
   This is a wonderful project!
   This will make Doris perform better in high-concurrency update scenarios.
   And this is a long time project, I will create a project to track all the issues belong to it.
   And you should give [Memory Engine] prefix to related issues, which will help maintainers to manage them.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] decster commented on issue #3382: [Proposal] Memory Optimized Column Storage Engine

Posted by GitBox <gi...@apache.org>.
decster commented on issue #3382:
URL: https://github.com/apache/incubator-doris/issues/3382#issuecomment-620362956


   I write a roadmap for this project:
   
   M1: basic functionality, create table/read/write pipeline
   
   Scenario: Users can create a simple table, write(insert/update) to table, query table(could be slow), without persistence(data loss after BE restart).
   
   1. Create table support in FE, add memory storage medium
   2. Core: Make Tablet extendable
   3. Core: MemTable, MemSubTablet, HashIndex
   4. Core: Column, ColumnDelta, Column Reader/Writer
   5. Core: Tablet Reader/Writer
   6. Write Pipeline: partial row update Kafka record/file format definition, reader/writer
   7. Write Pipeline: extend DeltaWriter and other write pipeline related code change
   8. Metadata new MemTablet support in TabletMeta and meta
   9. TableManager support for MemTablet
   10. TxManager support for MemTablet
   11. Read Pipeline basic ScanNode for MemTablet(full scan only)
   12. Core delta compaction implementation
   13. Core old version GC implementation
   
   M2 complete core functionality and better performance
   
   Scenario: support multi-column rowkey, string datatype, etc. Support simple predicate pushdown, resolve potential performance issues.
   
   1. Core: add string/binary datatype support
   2. Core: add multi-column RowKey support
   3. Core: add delete support
   4. Read: ScanNode support simple predicate push down(>, <, =, etc.)
   5. Transaction: optimize publish version related code path
   6. Performance: resolve potential major performance issues.
   
   M3 data persistence and fault tolerance
   
   Scenario: there will be no data loss after BE restart or BE failure
   
   1. Persistence: incremental snapshot
   2. Persistence: Tablet load (from a snapshot and WAL)
   3. Persistence: outdated data GC
   4. MemTablet copy
   5. Old storage engine supports partial row update
   6. MemTablet migrate to SSD/HDD
    


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] decster commented on issue #3382: [Proposal] Memory Optimized Column Storage Engine

Posted by GitBox <gi...@apache.org>.
decster commented on issue #3382:
URL: https://github.com/apache/incubator-doris/issues/3382#issuecomment-620367207


   > After MemTablet migrate to hard disk. For `Update` operation, do we need to `seek and update`?
   > Or actually we just implement a `Upsert` operation?
   
   For full row update, it's an upsert and the old REPLACE behavior should cover it.
   For partial row update, one way is to `seek` and read the old version of missing columns, another way is just store the partial row(need a new file format), and do merge-on-read 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] decster edited a comment on issue #3382: [Proposal] Memory Optimized Column Storage Engine

Posted by GitBox <gi...@apache.org>.
decster edited a comment on issue #3382:
URL: https://github.com/apache/incubator-doris/issues/3382#issuecomment-620355243


   @morningman 
   > Does it allow to update the column which already be flushed to the hard disk?
   
   Do you mean 1: MemTablet migrate to SSD/HDD tablet or 2: ColumnPages flush to hard disk?
   For case 1: we will add partial row update support for disk storage engine, which may be slower.
   For case 2: it's allowed
   
   > What about the recovery process?How to recover the data from WAL?
   
   WAL is just a set of WriteTx files, we load the snapshot, and all the committed WriteTx files after the snapshot version.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on issue #3382: [Proposal] Memory Optimized Column Storage Engine

Posted by GitBox <gi...@apache.org>.
morningman commented on issue #3382:
URL: https://github.com/apache/incubator-doris/issues/3382#issuecomment-620370657


   > I write a roadmap for this project:
   
   Nice Road Map!
   
   So in M1 and M2 step, the table's data is fully stored in memory(or PMEM), and the persistence
   is only guaranteed by WAL. Am I right?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on issue #3382: [Proposal] Memory Optimized Column Storage Engine

Posted by GitBox <gi...@apache.org>.
morningman commented on issue #3382:
URL: https://github.com/apache/incubator-doris/issues/3382#issuecomment-619566233


   I have 2 questions about the design.
   1. Does it allow to update the column which already be flushed to the hard disk?
   2. What about the recovery process?How to recover the data from WAL?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on issue #3382: [Proposal] Memory Optimized Column Storage Engine

Posted by GitBox <gi...@apache.org>.
morningman commented on issue #3382:
URL: https://github.com/apache/incubator-doris/issues/3382#issuecomment-620650411


   > > So in M1 and M2 step, the table's data is fully stored in memory(or PMEM),
   > 
   > Yes, only in memory, no design&plan for PMEM currently...
   > 
   > > and the persistence is only guaranteed by WAL. Am I right?
   > 
   > In M1&M2, there is no persistence at all, WriteTX files(WAL) will be write to disk, but no loading logic.
   
   I see. Thank you~


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] decster commented on issue #3382: [Proposal] Memory Optimized Column Storage Engine

Posted by GitBox <gi...@apache.org>.
decster commented on issue #3382:
URL: https://github.com/apache/incubator-doris/issues/3382#issuecomment-620355243


   @morningman 
   > Does it allow to update the column which already be flushed to the hard disk?
   Do you mean 1: MemTablet migrate to SSD/HDD tablet or 2: ColumnPages flush to hard disk?
   For case 1: we will add partial row update support for disk storage engine, which may be slower.
   For case 2: it's allowed
   
   > What about the recovery process?How to recover the data from WAL?
   WAL is just a set of WriteTx files, we load the snapshot, and all the committed WriteTx files after the snapshot version.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] decster edited a comment on issue #3382: [Proposal] Memory Optimized Column Storage Engine

Posted by GitBox <gi...@apache.org>.
decster edited a comment on issue #3382:
URL: https://github.com/apache/incubator-doris/issues/3382#issuecomment-620362956


   I write a roadmap for this project:
   
   **M1: basic functionality, create table/read/write pipeline**
   
   **Scenario**: Users can create a simple table, write(insert/update) to table, query table(could be slow), without persistence(data loss after BE restart).
   
   1. Create table support in FE, add memory storage medium
   2. Core: Make Tablet extendable
   3. Core: MemTable, MemSubTablet, HashIndex
   4. Core: Column, ColumnDelta, Column Reader/Writer
   5. Core: Tablet Reader/Writer
   6. Write Pipeline: partial row update Kafka record/file format definition, reader/writer
   7. Write Pipeline: extend DeltaWriter and other write pipeline related code change
   8. Metadata new MemTablet support in TabletMeta and meta
   9. TableManager support for MemTablet
   10. TxManager support for MemTablet
   11. Read Pipeline basic ScanNode for MemTablet(full scan only)
   12. Core delta compaction implementation
   13. Core old version GC implementation
   
   **M2 complete core functionality and better performance**
   
   **Scenario**: support multi-column rowkey, string datatype, etc. Support simple predicate pushdown, resolve potential performance issues.
   
   1. Core: add string/binary datatype support
   2. Core: add multi-column RowKey support
   3. Core: add delete support
   4. Read: ScanNode support simple predicate push down(>, <, =, etc.)
   5. Transaction: optimize publish version related code path
   6. Performance: resolve potential major performance issues.
   
   **M3 data persistence and fault tolerance**
   
   **Scenario**: there will be no data loss after BE restart or BE failure
   
   1. Persistence: incremental snapshot
   2. Persistence: Tablet load (from a snapshot and WAL)
   3. Persistence: outdated data GC
   4. MemTablet copy
   5. Old storage engine supports partial row update
   6. MemTablet migrate to SSD/HDD
    


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org