You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/01/25 00:20:31 UTC

[GitHub] [druid] vogievetsky opened a new pull request #10795: Web console: allow encoding of ASCII control chars

vogievetsky opened a new pull request #10795:
URL: https://github.com/apache/druid/pull/10795


   This PR makes it possible to input ASCII control chars through the web console UI by writing them as `\x??` notation just like you can do in many languages inside of string literals.
   
   ![image](https://user-images.githubusercontent.com/177816/105648370-f0376b80-5e5f-11eb-9dfe-46b83d3107f6.png)
   
   This allows the `delimiter` and `listDelimiter` inputs to actually be useful in the parse data screen.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] vogievetsky commented on pull request #10795: Web console: allow encoding of ASCII control chars

Posted by GitBox <gi...@apache.org>.
vogievetsky commented on pull request #10795:
URL: https://github.com/apache/druid/pull/10795#issuecomment-767210535


   Are there chars outside of 0 to 0x1f that should be encoded if we do `\uNNNN` ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] vogievetsky commented on pull request #10795: Web console: allow encoding of ASCII control chars

Posted by GitBox <gi...@apache.org>.
vogievetsky commented on pull request #10795:
URL: https://github.com/apache/druid/pull/10795#issuecomment-767725200


   Ok that seems like a plan. I could actually use the JSON stringier / parser itself and it would be super robust and guaranteed to follow the JSON spec by definition. I would make only one exception to the "no more & no less" rule:
   
   Since this input would always be parsed in a string context I would allow a single unescaped `"`.
   
   Meaning `"` would parse as itself instead of throwing an error (like an actual JSON parse would). `\"` would also parse to `"`.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on pull request #10795: Web console: allow encoding of ASCII control chars

Posted by GitBox <gi...@apache.org>.
gianm commented on pull request #10795:
URL: https://github.com/apache/druid/pull/10795#issuecomment-767030067


   How did you pick this one? Neither JSON nor Java support `\xNN`. Both do support `\uNNNN` (Unicode style). Java additionally supports `\NNN` (octal style).
   
   References:
   
   - https://www.json.org/json-en.html
   - https://docs.oracle.com/javase/specs/jls/se7/html/jls-3.html#jls-3.10.6
   
   IMO matching what JSON does would make the most sense to people, since the underlying API is a JSON one.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] vogievetsky commented on pull request #10795: Web console: allow encoding of ASCII control chars

Posted by GitBox <gi...@apache.org>.
vogievetsky commented on pull request #10795:
URL: https://github.com/apache/druid/pull/10795#issuecomment-866365701


   @gianm I updated the PR per your suggestions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on pull request #10795: Web console: allow encoding of ASCII control chars

Posted by GitBox <gi...@apache.org>.
gianm commented on pull request #10795:
URL: https://github.com/apache/druid/pull/10795#issuecomment-767030067


   How did you pick this one? Neither JSON nor Java support `\xNN`. Both do support `\uNNNN` (Unicode style). Java additionally supports `\NNN` (octal style).
   
   References:
   
   - https://www.json.org/json-en.html
   - https://docs.oracle.com/javase/specs/jls/se7/html/jls-3.html#jls-3.10.6
   
   IMO matching what JSON does would make the most sense to people, since the underlying API is a JSON one.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] vogievetsky merged pull request #10795: Web console: allow encoding of ASCII control chars

Posted by GitBox <gi...@apache.org>.
vogievetsky merged pull request #10795:
URL: https://github.com/apache/druid/pull/10795


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] vogievetsky commented on pull request #10795: Web console: allow encoding of ASCII control chars

Posted by GitBox <gi...@apache.org>.
vogievetsky commented on pull request #10795:
URL: https://github.com/apache/druid/pull/10795#issuecomment-767209141


   I am only escaping char codes 0 to 0x1f so `\uNNNN` seemed like overkill. I have not considered (or ever used) `\NNN`. I use `\xNN` all the time in JS string literals. But I am down to do `\uNNNN`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on pull request #10795: Web console: allow encoding of ASCII control chars

Posted by GitBox <gi...@apache.org>.
gianm commented on pull request #10795:
URL: https://github.com/apache/druid/pull/10795#issuecomment-767743878


   > Since this input would always be parsed in a string context I would allow a single unescaped `"`.
   
   I think that would be OK, it makes sense to me. It breaks the 'freely copy back and forth' principle but in a very reasonable way. It will probably decrease the overall level of surprise for users.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm edited a comment on pull request #10795: Web console: allow encoding of ASCII control chars

Posted by GitBox <gi...@apache.org>.
gianm edited a comment on pull request #10795:
URL: https://github.com/apache/druid/pull/10795#issuecomment-767699168


   > I am only escaping char codes 0 to 0x1f so `\uNNNN` seemed like overkill. I have not considered (or ever used) `\NNN`. I use `\xNN` all the time in JS string literals. But I am down to do `\uNNNN`
   
   I don't think it's about overkill / underkill but more what people will expect. The underlying APIs are JSON so IMO it is best if the web console accepts JSON style escapes.
   
   > I have not considered (or ever used) `\NNN`.
   
   It's a C thing. I think we can skip it. It's only supported by Java, anyway, not JSON, and I think it makes more sense to stick to JSON.
   
   > Are there chars outside of 0 to 0x1f that should be encoded if we do `\uNNNN` ?
   
   I would accept exactly what JSON does, no more & no less:
   
   - `\"` -> `"`
   - `\\` -> `\`
   - `\/` -> `/`
   - `\b` -> ascii 8
   - `\t` -> ascii 9
   - `\n` -> ascii 10
   - `\f` -> ascii 12
   - `\r` -> ascii 13
   - `\uABCD` -> unicode code point ABCD for character in basic multilingual plane (test with non-latin languages)
   - `\uABCD\uEFGH` -> utf-16 surrogate pair ABCD EFGH (test with emojis like 🔥)
   
   That way, people can copy stuff freely back and forth between JSON specs and the web console fields.
   
   Note: the way unicode characters outside BMP are encoded here is not universal; for example Python would encode 🙂 as `\U0001F642` but JSON would encode it as `\uD83D\uDE42`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] vogievetsky commented on pull request #10795: Web console: allow encoding of ASCII control chars

Posted by GitBox <gi...@apache.org>.
vogievetsky commented on pull request #10795:
URL: https://github.com/apache/druid/pull/10795#issuecomment-767209141






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on pull request #10795: Web console: allow encoding of ASCII control chars

Posted by GitBox <gi...@apache.org>.
gianm commented on pull request #10795:
URL: https://github.com/apache/druid/pull/10795#issuecomment-767699168


   > I am only escaping char codes 0 to 0x1f so `\uNNNN` seemed like overkill. I have not considered (or ever used) `\NNN`. I use `\xNN` all the time in JS string literals. But I am down to do `\uNNNN`
   
   I don't think it's about overkill / underkill but more what people will expect. The underlying APIs are JSON so IMO it is best if the web console accepts JSON style escapes.
   
   > I have not considered (or ever used) `\NNN`.
   
   It's a C thing. I think we can skip it. It's only supported by Java, anyway, not JSON, and I think it makes more sense to stick to JSON.
   
   > Are there chars outside of 0 to 0x1f that should be encoded if we do `\uNNNN` ?
   
   I would accept exactly what JSON does, no more & no less:
   
   - `\"` -> `"`
   - `\\` -> `\`
   - `\/` -> `/`
   - `\b` -> ascii 8
   - `\t` -> ascii 9
   - `\n` -> ascii 10
   - `\f` -> ascii 12
   - `\r` -> ascii 13
   - `\uABCD` -> unicode code point ABCD for character in basic multilingual plane (test with non-latin languages)
   - `\uABCD\uEFGH -> utf-16 surrogate pair ABCD EFGH (test with emojis like 🔥)
   
   That way, people can copy stuff freely back and forth between JSON specs and the web console fields.
   
   Note: the way unicode characters outside BMP are encoded here is not universal; for example Python would encode 🙂 as `\U0001F642` but JSON would encode it as `\uD83D\uDE42`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org