You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "snleee (via GitHub)" <gi...@apache.org> on 2023/06/07 08:00:39 UTC

[GitHub] [pinot] snleee commented on pull request #10856: Add the support for filling the default header if the header is missing

snleee commented on PR #10856:
URL: https://github.com/apache/pinot/pull/10856#issuecomment-1580148040

   > Can we also document user experience when this feature is enabled ?
   > 
   > 1. What happens if proper headers are there already ? Here the algorithm should not get this wrong and override the proper headers. How do we safeguard against this?
   > 2. What happens when there are no headers and we assign default headers
   > 3. What happens when there are no headers and we fail to detect and assign default
   
   Good point. 
   
   I added more comments on the user experience. By the way, the current logic is the following:
   
   - Check the header
   - If header found, keep the existing behavior
   - if header not found, fill default header `col_0, col_1`
   
   There can be 2 possibilities that the logic can go wrong:
   1. False negative:  detect 'no header' while there's a header <- in this case, we will replace the header to `col_0, col_1...` instead of honoring the header. I do see some reports on this. https://github.com/python/cpython/issues/104380
   2. False positive: detect 'header' while there's no header <- in this case, the end behavior would be the same as today because we will fall back to the original behavior when we detect the header (`format.withHeader()`). So, this would not cause any degradation.
   
   I think that `false negative` cases will cause new issues that doesn't exist today when this feature is turned on. However, I think that we need to incrementally improve the logic as we see more edge cases because it looks that the csv header detection cannot be perfect. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org