You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Doug Meil (JIRA)" <ji...@apache.org> on 2013/05/17 15:59:15 UTC
[jira] [Updated] (HBASE-8571) CopyTable and RowCounter don't seem to use setCaching setting

     [ https://issues.apache.org/jira/browse/HBASE-8571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Meil updated HBASE-8571:
-----------------------------

    Description: 
Maybe it's just me, but I've been looking on trunk and I don't see where either RowCounter or CopyTable MapReduce can adjust the setCaching setting on the Scan instance.

Example from RowCounter...
{code}
   Job job = new Job(conf, NAME + "_" + tableName);
    job.setJarByClass(RowCounter.class);
    Scan scan = new Scan();
    scan.setCacheBlocks(false);
    Set<byte []> qualifiers = new TreeSet<byte[]>(Bytes.BYTES_COMPARATOR);
    if (startKey != null && !startKey.equals("")) {
      scan.setStartRow(Bytes.toBytes(startKey));
    }
    if (endKey != null && !endKey.equals("")) {
      scan.setStopRow(Bytes.toBytes(endKey));
    }
    scan.setFilter(new FirstKeyOnlyFilter());
    if (sb.length() > 0) {
      for (String columnName : sb.toString().trim().split(" ")) {
        String [] fields = columnName.split(":");
        if(fields.length == 1) {
          scan.addFamily(Bytes.toBytes(fields[0]));
        } else {
          byte[] qualifier = Bytes.toBytes(fields[1]);
          qualifiers.add(qualifier);
          scan.addColumn(Bytes.toBytes(fields[0]), qualifier);
        }
      }
    }
    // specified column may or may not be part of first key value for the row.
    // Hence do not use FirstKeyOnlyFilter if scan has columns, instead use
    // FirstKeyValueMatchingQualifiersFilter.
    if (qualifiers.size() == 0) {
      scan.setFilter(new FirstKeyOnlyFilter());
    } else {
      scan.setFilter(new FirstKeyValueMatchingQualifiersFilter(qualifiers));
    }
    job.setOutputFormatClass(NullOutputFormat.class);
    TableMapReduceUtil.initTableMapperJob(tableName, scan,
      RowCounterMapper.class, ImmutableBytesWritable.class, Result.class, job);
    job.setNumReduceTasks(0);
    return job;

{code}

TableMapReduceUtil only serializes the Scan into the job, it doesn't adjust any of the settings.

Maybe I'm missing something, but this seems like a problem.

  was:
Maybe it's just me, but I've been looking on trunk and I don't see where either RowCounter or CopyTable MapReduce can adjust the setCaching setting.

Example from RowCounter...
{code}
   Job job = new Job(conf, NAME + "_" + tableName);
    job.setJarByClass(RowCounter.class);
    Scan scan = new Scan();
    scan.setCacheBlocks(false);
    Set<byte []> qualifiers = new TreeSet<byte[]>(Bytes.BYTES_COMPARATOR);
    if (startKey != null && !startKey.equals("")) {
      scan.setStartRow(Bytes.toBytes(startKey));
    }
    if (endKey != null && !endKey.equals("")) {
      scan.setStopRow(Bytes.toBytes(endKey));
    }
    scan.setFilter(new FirstKeyOnlyFilter());
    if (sb.length() > 0) {
      for (String columnName : sb.toString().trim().split(" ")) {
        String [] fields = columnName.split(":");
        if(fields.length == 1) {
          scan.addFamily(Bytes.toBytes(fields[0]));
        } else {
          byte[] qualifier = Bytes.toBytes(fields[1]);
          qualifiers.add(qualifier);
          scan.addColumn(Bytes.toBytes(fields[0]), qualifier);
        }
      }
    }
    // specified column may or may not be part of first key value for the row.
    // Hence do not use FirstKeyOnlyFilter if scan has columns, instead use
    // FirstKeyValueMatchingQualifiersFilter.
    if (qualifiers.size() == 0) {
      scan.setFilter(new FirstKeyOnlyFilter());
    } else {
      scan.setFilter(new FirstKeyValueMatchingQualifiersFilter(qualifiers));
    }
    job.setOutputFormatClass(NullOutputFormat.class);
    TableMapReduceUtil.initTableMapperJob(tableName, scan,
      RowCounterMapper.class, ImmutableBytesWritable.class, Result.class, job);
    job.setNumReduceTasks(0);
    return job;

{code}

TableMapReduceUtil only serializes the Scan into the job, it doesn't adjust any of the settings.

Maybe I'm missing something, but this seems like a problem.

    
> CopyTable and RowCounter don't seem to use setCaching setting
> -------------------------------------------------------------
>
>                 Key: HBASE-8571
>                 URL: https://issues.apache.org/jira/browse/HBASE-8571
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Doug Meil
>
> Maybe it's just me, but I've been looking on trunk and I don't see where either RowCounter or CopyTable MapReduce can adjust the setCaching setting on the Scan instance.
> Example from RowCounter...
> {code}
>    Job job = new Job(conf, NAME + "_" + tableName);
>     job.setJarByClass(RowCounter.class);
>     Scan scan = new Scan();
>     scan.setCacheBlocks(false);
>     Set<byte []> qualifiers = new TreeSet<byte[]>(Bytes.BYTES_COMPARATOR);
>     if (startKey != null && !startKey.equals("")) {
>       scan.setStartRow(Bytes.toBytes(startKey));
>     }
>     if (endKey != null && !endKey.equals("")) {
>       scan.setStopRow(Bytes.toBytes(endKey));
>     }
>     scan.setFilter(new FirstKeyOnlyFilter());
>     if (sb.length() > 0) {
>       for (String columnName : sb.toString().trim().split(" ")) {
>         String [] fields = columnName.split(":");
>         if(fields.length == 1) {
>           scan.addFamily(Bytes.toBytes(fields[0]));
>         } else {
>           byte[] qualifier = Bytes.toBytes(fields[1]);
>           qualifiers.add(qualifier);
>           scan.addColumn(Bytes.toBytes(fields[0]), qualifier);
>         }
>       }
>     }
>     // specified column may or may not be part of first key value for the row.
>     // Hence do not use FirstKeyOnlyFilter if scan has columns, instead use
>     // FirstKeyValueMatchingQualifiersFilter.
>     if (qualifiers.size() == 0) {
>       scan.setFilter(new FirstKeyOnlyFilter());
>     } else {
>       scan.setFilter(new FirstKeyValueMatchingQualifiersFilter(qualifiers));
>     }
>     job.setOutputFormatClass(NullOutputFormat.class);
>     TableMapReduceUtil.initTableMapperJob(tableName, scan,
>       RowCounterMapper.class, ImmutableBytesWritable.class, Result.class, job);
>     job.setNumReduceTasks(0);
>     return job;
> {code}
> TableMapReduceUtil only serializes the Scan into the job, it doesn't adjust any of the settings.
> Maybe I'm missing something, but this seems like a problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira