You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/02/06 00:31:47 UTC

[GitHub] glasser edited a comment on issue #6989: Behavior of index_parallel with appendToExisting=false and no bucketIntervals in GranularitySpec is surprising

glasser edited a comment on issue #6989: Behavior of index_parallel with appendToExisting=false and no bucketIntervals in GranularitySpec is surprising
URL: https://github.com/apache/incubator-druid/issues/6989#issuecomment-460859550
 
 
   What I was missing here is that native batch parallel ingestion effectively acts as if appendToSegment is true unless you specify explicit intervals in the GranularitySpec.  This seems to be different from both Hadoop batch ingestion and the Local Index Task (including `index_parallel` with a non-splittable FirehoseFactory) — all of these (if I understand correctly) will run an additional phase to calculate the intervals if they are not provided.
   
   I think this is confusing and I'd like to help fix it.
   
   My honest instinct is that we should consider the current behavior a bug and we should make the following combination into an error in the top-level parallel index task:
   - Running`index_parallel`
   - `FirehoseFactory.isSplittable()`
   - `appendToExisting == true`
   - granularitySpec does not specify intervals
   
   While this would be a backwards-incompatible change in 0.14, native batch ingestion is still a very new feature and this behavior is very surprising — and there's a trivial workaround of setting appendToExisting to true if you like the current behavior.
   
   If that's not the right change, we could fix the docs instead. I'd update the doc of appendToExisting in native_tasks.md to mention that it is effectively true if intervals aren't specified, and the docs of `intervals` in ingestion_spec should mention that native parallel tasks care about them more.
   
   (I suppose one could also make parallel indexing do two scans in this case, but in my case I certainly would have been happier being asked to add one line to my spec rather than have my experience take twice as long, and it's more complex.)
   
   I'm happy to do implement either the new error or the docs update based on what is best.
   Thoughts (@jihoonson ?)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org