You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Duo Zhang (Jira)" <ji...@apache.org> on 2022/11/04 07:08:00 UTC

[jira] [Commented] (HBASE-27457) Tags cannot be written to the hfile during bulkload

    [ https://issues.apache.org/jira/browse/HBASE-27457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628731#comment-17628731 ] 

Duo Zhang commented on HBASE-27457:
-----------------------------------

[~anoopsamjohn] PTAL?

Thanks.

> Tags cannot be written to the hfile during bulkload
> ---------------------------------------------------
>
>                 Key: HBASE-27457
>                 URL: https://issues.apache.org/jira/browse/HBASE-27457
>             Project: HBase
>          Issue Type: Bug
>            Reporter: zhengyang jia
>            Priority: Major
>
> When I store hdfs file data into hbase through bulkload, I want to give different data different ttl, so I use tags in mapper to create per cell ttl for each cell, but I found that this ttl does not take effect.
> I have the bulkload driver
>  
> {code:java}
> job.setInputFormatClass(TextInputFormat.class);
> TextInputFormat.setInputPaths(job, inputFile);
> job.setOutputFormatClass(HFileOutputFormat2.class);
> HFileOutputFormat2.setOutputPath(job, outputPath);
> HFileOutputFormat2.configureIncrementalLoad(job, table, conn.getRegionLocator(TableName.valueOf(tableName)));
> job.waitForCompletion(true);
> LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
> loader.doBulkLoad(outputPath, admin, table, conn.getRegionLocator(TableName.valueOf(tableName))); {code}
>  
> and the mapper
>  
> {code:java}
> public class BulkloadTTLTestMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
>   public void map(LongWritable key, Text value, Mapper.Context context) throws IOException, InterruptedException {
>     String values = value.toString();
>     String[] lines = values.split(",");
>     byte[] rowkey = Bytes.toBytes(lines[0]);
>     ImmutableBytesWritable putRowKey = new ImmutableBytesWritable(rowkey);
>     Put put = new Put(rowkey);
>     Cell cell1 = new KeyValue(Bytes.toBytes(lines[0]), Bytes.toBytes("d"), Bytes.toBytes("field1"), HConstants.LATEST_TIMESTAMP, Bytes.toBytes(lines[1]),new Tag[]{new Tag(TagType.TTL_TAG_TYPE, Bytes.toBytes(100000L))});
>     put.add(cell1);
>     Cell cell2 = new KeyValue(Bytes.toBytes(lines[0]), Bytes.toBytes("d"), Bytes.toBytes("field2"), HConstants.LATEST_TIMESTAMP, generateValue(lines[2]),new Tag[]{new Tag(TagType.TTL_TAG_TYPE, Bytes.toBytes(200000L))});
>     put.add(cell2);
>     context.write(putRowKey, put);
>   }
> } {code}
>  
> I checked the execution steps through debug, and I found that the tags were not written into hfile. By reading the source code, I found that in org.apache.hadoop.hbase.protobuf.ProtobufUtil.toMutation()
>  
> {code:java}
> if (cell.getTagsLength() > 0) {
>    valueBuilder.setTags(ByteStringer.wrap(cell.getTagsArray(),
>        cell.getTagsOffset(), cell.getTagsLength()));
> }{code}
>  
> This code was deleted at [issue15198|https://issues.apache.org/jira/browse/HBASE-15198], which made it impossible for me to write tags into hfile.
> If i just add 
> {code:java}
> conf.set("hbase.client.rpc.codec", "org.apache.hadoop.hbase.codec.CellCodecWithTags"){code}
> It is Useless, because bulkload does not use these Codecs.
> Only when I add back those two lines of code that were deleted at issue15198, I could write tags into hfile and make per cell ttl take effect.
> So I want to know is there any other way to help me carry tags in bulkload besides adding these two lines of code back.
> Thanks.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)