You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@doris.apache.org by GitBox <gi...@apache.org> on 2019/08/06 06:12:27 UTC

[GitHub] [incubator-doris] yuanlihan edited a comment on issue #1582: Enable Partition Discovery for Broker Load

yuanlihan edited a comment on issue #1582: Enable Partition Discovery for Broker Load
URL: https://github.com/apache/incubator-doris/issues/1582#issuecomment-518519112
 
 
   > > > I think we can support listing path like `"base_dir/*/*/*"`
   > > 
   > > 
   > > It seems a little weird about this syntax. What about support recursively listing files of path(eg, "base_dir/" or "base_dir/*") iff users specify _**columns_from_path**_ by _**[COLUMNS FROM PATH AS (columns_from_path)]**_, which rarely have conflicts with previous usage.
   > 
   > If you think wildcard is weird. I think we can keep `DATA INDIR`, and remove the `[PATH START WITH "base_path"]` clause. And if users specify the `columns_from_path` clause we can recursive directory according to it, if they don't we only traverse one depth.
   > 
   
   @imay 
   The _base_path_ property is required according to the following example:
   We try to **reload data of beijing on 2019-06-26 only**:
   
   - **base path**: hdfs://hdfs_host:hdfs_port/user/palo/data/input/dir/
   - **partitioned columns need to be extracted**: city and utc_date
   - **input path(dir)**: hdfs://hdfs_host:hdfs_port/user/palo/data/input/dir/city=beijing/utc_date=2019-06-26
   - **detail files**: 
   [hdfs://hdfs_host:hdfs_port/user/palo/data/input/dir/city=beijing/utc_date=2019-06-26/0000.csv, hdfs://hdfs_host:hdfs_port/user/palo/data/input/dir/city=beijing/utc_date=2019-06-26/0001.csv, ...]
   
   And would you kindly educate me about the main concerns of discouraging recursively listing files in the following way?
   
   > support recursively listing files of path(eg, "base_dir/" or "base_dir/*") iff users specify _**columns_from_path**_ by _**[COLUMNS FROM PATH AS (columns_from_path)]**_, which rarely have conflicts with previous usage.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@doris.apache.org
For additional commands, e-mail: dev-help@doris.apache.org