You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Sangjin Lee (JIRA)" <ji...@apache.org> on 2016/06/02 23:29:59 UTC
[jira] [Commented] (YARN-5167) Escaping occurences of encodedValues

    [ https://issues.apache.org/jira/browse/YARN-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15313290#comment-15313290 ] 

Sangjin Lee commented on YARN-5167:
-----------------------------------

OK, I've hit a snag with this idea.

Initially we thought that we could always handle values safely if we
# escape a naturally occurring encoded value sequence (by adding a preceding backslash for example)
# and encode naked values
and
# decode encoded values *only if* the encoded value sequence is NOT escaped (i.e. not preceded by a backslash)
# and finally de-escape the backslash (remove the backslash if it is followed by the encoded value sequence) to get back the original naturally occurring encoded value sequence

I implemented this fairly easily, but I realized that we still have a pretty challenging ambiguity. The problem is if *we have the raw value preceded by a backslash*. For example, suppose the following is the original string:
{noformat}
\=%1$
{noformat}

Note that {{=}} is a value we want to encode, and {{%1$}} is the encoded equivalent. In this case, the user input contains both the raw value and a naturally occurring encoded value. If we put this through the above scheme, first we escape the naturally occurring encoded value:
{noformat}
\=\%1$
{noformat}

The next step is to encode the raw value ({{=}}). Then it becomes
{noformat}
\%1$\%1$
{noformat}

Note that now we have two identical parts. It is not possible to determine whether it was an encoded value that happened to be preceded by the escape character, or a naturally occurring encoded value that was escaped.

It's not clear how we can handle this issue without adding a whole lot more complexity. We can get increasingly sophisticated in trying to figure out these next combinations, but I am afraid we would hit the point of diminishing returns.

I am now thinking of a different idea. This is basically a similar idea to how URL encoding works. We could consider {{%}} an implicit reserved character as it starts all the encoded values. The idea is
# encode {{%}} before encoding a series of separator values
# proceed to encode other values
# on decoding, decode all values except {{%}}
# finally decode {{%}}

Suppose the original string is
{noformat}
%=%1$
{noformat}

If we follow the new idea, we will encode this to {{%9$=%9$1$}} to finally {{%9$%1$%9$1$}}. Conversely, we would decode it to {{%9$=%9$1$}} to finally {{%=%1$}}.

I believe this scheme would work in all cases, but I'd like you to poke holes in this idea to see if it stands up.

> Escaping occurences of encodedValues
> ------------------------------------
>
>                 Key: YARN-5167
>                 URL: https://issues.apache.org/jira/browse/YARN-5167
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Joep Rottinghuis
>            Assignee: Sangjin Lee
>            Priority: Critical
>              Labels: yarn-2928-1st-milestone
>
> We had earlier decided to punt on this, but in discussing YARN-5109 we thought it would be best to just be safe rather than sorry later on.
> Encoded sequences can occur in the original string, especially in case of "foreign key" if we decide to have lookups.
> For example, space is encoded as %2$.
> Encoding "String with %2$ in it" would decode to "String with   in it".
> We though we should first escape existing occurrences of encoded strings by prefixing a backslash (even if there is already a backslash that should be ok). Then we should replace all unencoded strings.
> On the way out, we should replace all occurrences of our encoded string to the original except when it is prefixed by an escape character. Lastly we should strip off the one additional backslash in front of each remaining (escaped) sequence.
> If we add the following entry to TestSeparator#testEncodeDecode() that demonstrates what this jira should accomplish:
> {code}
>     testEncodeDecode("Double-escape %2$ and %3$ or \\%2$ or \\%3$, nor  \\\\%2$ = no problem!", Separator.QUALIFIERS,
>         Separator.VALUES, Separator.SPACE, Separator.TAB);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org