You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/04/23 20:36:40 UTC

[GitHub] [incubator-druid] dclim opened a new pull request #7531: Data loader (sampler component)

dclim opened a new pull request #7531: Data loader (sampler component)
URL: https://github.com/apache/incubator-druid/pull/7531
 
 
   Implementation of the sampler component of #7502. 
   
   Runs on the overlord and exposes an endpoint on `POST /druid/indexer/v1/sampler` that returns sampled data for use by the data loader GUI. This is currently intended as an internal-only endpoint and is intentionally not documented.
   
   Changes are as minimally-invasive as possible, and most code is confined to the `org.apache.druid.indexing.overlord.sampler` package. Additional methods were added to `Firehose` (returning the raw rows) and `FirehoseFactory` (adding a `connectForSampler` method that signals to the implementation that we only care about a few rows and to skip things like prefetching and caching) to improve the sampling experience; default implementations do 'the right thing' if not implemented.
   
   There are a few 'hacks' added to make the API a bit nicer - i.e. allowing the sampler to work if no `dataSchema` is provided, in which case it just returns the raw rows if possible and marks everything as unparseable (since no parser was provided).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org