You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/12/25 07:59:04 UTC

[GitHub] [incubator-doris] blackfox1983 opened a new issue #5153: [Proposal] support scan data from partitons of table

blackfox1983 opened a new issue #5153:
URL: https://github.com/apache/incubator-doris/issues/5153


   **Is your feature request related to a problem? Please describe.**
   At present, i use the broker to export data from doris and save query-outfile to some filesystem, now i found some very inconvenient and inefficient scenarios.
   
   1. Trial and Devlop
   
   I will first try this feature in my development environment. In our development environment, we don't have distributed storage systems such as BOS, S3 and HDFS. The cost of building these systems is very high, and it may take a long time to build and debug them. What we need is a stand-alone environment. For example, the data can be written to the local disk through the broker. (generally speaking, test data is also very small)
   
   But now, we have to do a lot of debugging work based on current broker and our own system. For example, we have to debug with cos(tencent cloud), which takes a long time. And this is just a trial phase.
   
   Another problem is that the interfaces of various cloud environments may not be unified, that is, they cannot be completely inherited from the base class org.apache.hadoop. fs.FileSystem. This is the base class of the file system managed by the current broker. The framework compatibility is not strong.
   
   2. Prod
   In Prod env, we hope the data can be exported stably. We will specify column separator, row separator, etc. Our data may contain the same character as the separator, which will lead to unexpected data processing: we have to be careful not to have a separator in the data field. In the current mode, it was a terrible experience to maintain this stability.
   
   **Describe the solution you'd like**
   It's better to design with open mode other than binding in broker.
   
   Doris support SCAN feature. User can Scan paritions of Table like hbase / es.
   
   Data from Scan can be stored in a variety of systems according to the user's desired format.
   
   **Describe alternatives you've considered**
   ```
   scan [partions] from table where a = 1 and b = 2
   with properties (
       "scan_thread_in_doris" = "1",
       "..."
   )
   ```
   return cursur and ttl(e.g. 5mins)
   
   ```
   scan by cursur
   ```
   
   In this way, it can support breakpoint continuation after application restart
   
   **Additional context**
   Add any other context or screenshots about the feature request here.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org