You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "LakshSingla (via GitHub)" <gi...@apache.org> on 2023/02/13 04:14:54 UTC

[GitHub] [druid] LakshSingla opened a new pull request, #13794: Improve the wording around the `InvalidNullByteException`

LakshSingla opened a new pull request, #13794:
URL: https://github.com/apache/druid/pull/13794

   ### Description
   
   This PR improves the wording around the mysterious `InvalidNullByteException`. The exception occurs when the strings that are added to the frame contain the 0x0000 byte which is internally being used as a delimiter in the case of a string column. The current use case for the frames is MSQ exclusively, and this error will only be generated if the ingested external data contains hidden null bytes, which in most cases can safely be sanitized.
   
   
   This PR has:
   
   - [x] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] paul-rogers commented on a diff in pull request #13794: Improve the wording around the `InvalidNullByteException`

Posted by "paul-rogers (via GitHub)" <gi...@apache.org>.
paul-rogers commented on code in PR #13794:
URL: https://github.com/apache/druid/pull/13794#discussion_r1105044774


##########
processing/src/main/java/org/apache/druid/frame/write/FrameWriterUtils.java:
##########
@@ -229,7 +229,11 @@ public static void copyByteBufferToMemory(
       final byte b = src.get(p);
 
       if (!allowNullBytes && b == 0) {
-        throw new InvalidNullByteException();
+        throw new InvalidNullByteException(
+            "Unable to add the frame because it contains null bytes. This usually happens when the added string columns "

Review Comment:
   Thanks for the improved message. The user, however, knows nothing about frames. Can we word this from the user's perspective?
   
   `Druid does not support null (0x00) bytes in strings. File %s, row $d, column %s contains null bytes: [%s].`
   
   The string would be encoded so that control characters appear as `\U0000` so the user can see the position of the null bytes.
   
   Maybe we don't know the row number (in a form useful to the user.) If not, just list the column.
   
   This is a case of unparsable data. Should we have caught it at the time we _read_ the data rather than when _writing_ to a frame? Should we invoke our bad-row logic to skip this row? That logic should log the bad row for later re-ingestion, but I don't think we've added that ability.
   
   If we catch the problem on read, then the check here is more of an assertion. Though, perhaps the data was created by an expression, so it is still worth validating.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] LakshSingla closed pull request #13794: Improve the wording around the `InvalidNullByteException`

Posted by "LakshSingla (via GitHub)" <gi...@apache.org>.
LakshSingla closed pull request #13794: Improve the wording around the `InvalidNullByteException`
URL: https://github.com/apache/druid/pull/13794


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] LakshSingla commented on pull request #13794: Improve the wording around the `InvalidNullByteException`

Posted by "LakshSingla (via GitHub)" <gi...@apache.org>.
LakshSingla commented on PR #13794:
URL: https://github.com/apache/druid/pull/13794#issuecomment-1594122337

   Raised an improved PR tackling this issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] github-actions[bot] commented on pull request #13794: Improve the wording around the `InvalidNullByteException`

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #13794:
URL: https://github.com/apache/druid/pull/13794#issuecomment-1544921438

   This pull request has been marked as stale due to 60 days of inactivity.
   It will be closed in 4 weeks if no further activity occurs. If you think
   that's incorrect or this pull request should instead be reviewed, please simply
   write any comment. Even if closed, you can still revive the PR at any time or
   discuss it on the dev@druid.apache.org list.
   Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org