You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Sangjin Lee (JIRA)" <ji...@apache.org> on 2016/05/18 05:06:12 UTC
[jira] [Commented] (YARN-5109) timestamps are stored unencoded causing parse errors

    [ https://issues.apache.org/jira/browse/YARN-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15288444#comment-15288444 ] 

Sangjin Lee commented on YARN-5109:
-----------------------------------

For example, code that creates the column name for events:
{code}
byte[] eventTs =
    Bytes.toBytes(TimelineStorageUtils.invertLong(eventTimestamp));
...
byte[] compoundColumnQualifierBytes =
    EntityColumnPrefix.EVENT.
        getCompoundColQualBytes(eventId, eventTs, null);
...
public static byte[] getCompoundColumnQualifierBytes(String qualifier,
    byte[]...components) {
  byte[] colQualBytes = Bytes.toBytes(Separator.VALUES.encode(qualifier));
  for (int i = 0; i < components.length; i++) {
    colQualBytes = Separator.VALUES.join(colQualBytes, components[i]);
  }
  return colQualBytes;
}
{code}
The {{getCompoundColumnQualifierBytes()}} method uses the bytes from the timestamp as is without any encoding for VALUES ({{\x3d}}).

I believe a similar issue exists with row keys. In most cases, long's are passed to the row key without any encoding for QUALIFIERS. If any of the byte values happens to be QUALIFIERS ({{\x21}}), it will cause the row key parsing to fail.

> timestamps are stored unencoded causing parse errors
> ----------------------------------------------------
>
>                 Key: YARN-5109
>                 URL: https://issues.apache.org/jira/browse/YARN-5109
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Priority: Blocker
>              Labels: yarn-2928-1st-milestone
>
> When we store timestamps (for example as part of the row key or part of the column name for an event), the bytes are used as is without any encoding. If the byte value happens to contain a separator character we use (e.g. "!" or "="), it causes a parse failure when we read it.
> I came across this while looking into this error in the timeline reader:
> {noformat}
> 2016-05-17 21:28:38,643 WARN org.apache.hadoop.yarn.server.timelineservice.storage.common.TimelineStorageUtils: incorrectly formatted column name: it will be discarded
> {noformat}
> I traced the data that was causing this, and the column name (for the event) was the following:
> {noformat}
> i:e!YARN_RM_CONTAINER_CREATED=\x7F\xFF\xFE\xABDY=\x99=YARN_CONTAINER_ALLOCATED_HOST
> {noformat}
> Note that the column name is supposed to be of the format (event id)=(timestamp)=(event info key). However, observe the timestamp portion:
> {noformat}
> \x7F\xFF\xFE\xABDY=\x99
> {noformat}
> The presence of the separator ("=") causes the parse error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org