You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Joep Rottinghuis (JIRA)" <ji...@apache.org> on 2016/05/27 00:46:12 UTC

[jira] [Commented] (YARN-5167) Escaping occurences of encodedValues

    [ https://issues.apache.org/jira/browse/YARN-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303270#comment-15303270 ] 

Joep Rottinghuis commented on YARN-5167:
----------------------------------------

In https://issues.apache.org/jira/browse/YARN-5109?focusedCommentId=15302672&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15302672
[~varun_saxena] pointed out:
"In Separator#encode, we are using String#replace which in turn uses Pattern. Why dont we use StringUtils#replace instead ?
I think former would be slower.
StringUtils#replace uses indexOf and would return the passed string if indexOf returns -1(which would be most of the cases)"

While this is a good point, this may be moot after this jira, because we may have to roll our own replace with indexOf, because we need to check for the existence of a backslash preceding the sequence we're looking to replace.

> Escaping occurences of encodedValues
> ------------------------------------
>
>                 Key: YARN-5167
>                 URL: https://issues.apache.org/jira/browse/YARN-5167
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Joep Rottinghuis
>            Assignee: Sangjin Lee
>            Priority: Critical
>
> We had earlier decided to punt on this, but in discussing YARN-5109 we thought it would be best to just be safe rather than sorry later on.
> Encoded sequences can occur in the original string, especially in case of "foreign key" if we decide to have lookups.
> For example, space is encoded as %2$.
> Encoding "String with %2$ in it" would decode to "String with   in it".
> We though we should first escape existing occurrences of encoded strings by prefixing a backslash (even if there is already a backslash that should be ok). Then we should replace all unencoded strings.
> On the way out, we should replace all occurrences of our encoded string to the original except when it is prefixed by an escape character. Lastly we should strip off the one additional backslash in front of each remaining (escaped) sequence.
> If we add the following entry to TestSeparator#testEncodeDecode() that demonstrates what this jira should accomplish:
> {code}
>     testEncodeDecode("Double-escape %2$ and %3$ or \\%2$ or \\%3$, nor  \\\\%2$ = no problem!", Separator.QUALIFIERS,
>         Separator.VALUES, Separator.SPACE, Separator.TAB);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org