You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Vrushali C (JIRA)" <ji...@apache.org> on 2019/03/22 06:01:00 UTC
[jira] [Commented] (YARN-9395) Short Names for repeated Hbase Column names

    [ https://issues.apache.org/jira/browse/YARN-9395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16798738#comment-16798738 ] 

Vrushali C commented on YARN-9395:
----------------------------------

Very good jira Prabhu. I agree that majority of the counter names are going to be repeated over and over again across jobs. Please do give it some thought and let's discuss about potential solutions. 

With Phoenix, since they have a predefined schema for a table, it is an option to have a mapping for a column name to a number. 


> Short Names for repeated Hbase Column names
> -------------------------------------------
>
>                 Key: YARN-9395
>                 URL: https://issues.apache.org/jira/browse/YARN-9395
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: ATSv2
>    Affects Versions: 3.2.0
>            Reporter: Prabhu Joseph
>            Assignee: Prabhu Joseph
>            Priority: Major
>
> Currently ATS HBase tables stores the config name / metric name as column names which are long. This repeats for all the rows and consumes lot of storage space. And we have seen Customers Hbase Tables already consumes more than 1.5 TB in few days
> {code}
> Example Configs:
> c:yarn.timeline-service.webapp.rest-csrf.methods-to-ignore
> c:yarn.timeline-service.entity-group-fs-store.active-dir
> c:yarn.scheduler.configuration.zk-store.parent-path
> Example Metrics:
> m:REDUCE:org.apache.hadoop.mapreduce.FileSystemCounter:HDFS_READ_OPS
> m:REDUCE:org.apache.hadoop.mapreduce.TaskCounter:COMBINE_INPUT_RECORDS
> m:REDUCE:org.apache.hadoop.mapreduce.TaskCounter:PHYSICAL_MEMORY_BYTES
> {code}
> We need to use short column names as per Hbase Best Practice - http://moi.vonos.net/bigdata/avro-hbase-colnames/ But the challenge is ATS does not know the column names until the rows get inserted. We can provide a mapping file to map the repeated configs / metrics / info from different applications to unique numbers which customers can configure upfront to save the storage space. Similar to what Phoenix does
> https://blogs.apache.org/phoenix/entry/column-mapping-and-immutable-data
> https://phoenix.apache.org/columnencoding.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org