You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Elliot West <te...@gmail.com> on 2015/03/20 22:50:21 UTC

Updates/deletes with OrcRecordUpdater

Hi,

I'm trying to use the insert, update and delete methods on OrcRecordUpdater
to programmatically mutate an ORC based Hive table (1.0.0). I've got
inserts working correctly but I'm hitting into a problem with deletes and
updates. I get an NPE which I have traced back to what seems like a missing
recIdField(?).

java.lang.NullPointerException
at
org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldData(LazySimpleStructObjectInspector.java:103)
at
org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.addEvent(OrcRecordUpdater.java:296)
at
org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.delete(OrcRecordUpdater.java:330)


I've tried specifying a location for the field using
AcidOutputFormat.Options.recordIdColumn(0) but this fails dues to an
ObjectInspector mismatch. I'm not sure if I should be creating this field
as part of my table definition or not. Currently I'm constructing the table
with some code based on that located in the storm-hive project:

      Table tbl = new Table();
      tbl.setDbName(databaseName);
      tbl.setTableName(tableName);
      tbl.setTableType(TableType.MANAGED_TABLE.toString());
      StorageDescriptor sd = new StorageDescriptor();
      sd.setCols(getTableColumns(colNames, colTypes));
      sd.setNumBuckets(1);
      sd.setLocation(dbLocation + Path.SEPARATOR + tableName);
      if (partNames != null && partNames.length != 0) {
        tbl.setPartitionKeys(getPartitionKeys(partNames));
      }

      tbl.setSd(sd);

      sd.setBucketCols(new ArrayList<String>(2));
      sd.setSerdeInfo(new SerDeInfo());
      sd.getSerdeInfo().setName(tbl.getTableName());
      sd.getSerdeInfo().setParameters(new HashMap<String, String>());

sd.getSerdeInfo().getParameters().put(serdeConstants.SERIALIZATION_FORMAT,
"1");
      // Not sure if this does anything?
      sd.getSerdeInfo().getParameters().put("transactional",
Boolean.TRUE.toString());

      sd.getSerdeInfo().setSerializationLib(OrcSerde.class.getName());
      sd.setInputFormat(OrcInputFormat.class.getName());
      sd.setOutputFormat(OrcOutputFormat.class.getName());

      Map<String, String> tableParams = new HashMap<String, String>();
      // Not sure if this does anything?
      tableParams.put("transactional", Boolean.TRUE.toString());
      tbl.setParameters(tableParams);
      client.createTable(tbl);
      try {
        if (partVals != null && partVals.size() > 0) {
          addPartition(client, tbl, partVals);
        }
      } catch (AlreadyExistsException e) {
      }

I don't really know enough about Hive and ORCFile internals to work out
where I'm going wrong so any help would be appreciated.

Thanks - Elliot.

Re: Updates/deletes with OrcRecordUpdater

Posted by Alan Gates <al...@gmail.com>.
Your table definition looks fine, and no you shouldn't service the 
recIdField in the table itself.

Without seeing your writing code it's hard to know why you're hitting 
this, but some info that may be of use.  Hive itself uses a pseudo 
column to store the recIdInfo when it reads an ACID row so that it has 
it when it writes back for an update or delete.  I'm guessing you don't 
have this pseudo column set up correctly.  You can take a look at 
FileSinkOperator (look for ACID or UPDATE) and 
OrcInputFormat.getRecordReader to get an idea of how this works.

Alan.

> Elliot West <ma...@gmail.com>
> March 20, 2015 at 14:50
> Hi,
>
> I'm trying to use the insert, update and delete methods on 
> OrcRecordUpdater to programmatically mutate an ORC based Hive table 
> (1.0.0). I've got inserts working correctly but I'm hitting into a 
> problem with deletes and updates. I get an NPE which I have traced 
> back to what seems like a missing recIdField(?).
>
>
> I've tried specifying a location for the field using 
> AcidOutputFormat.Options.recordIdColumn(0) but this fails dues to an 
> ObjectInspector mismatch. I'm not sure if I should be creating this 
> field as part of my table definition or not. Currently I'm 
> constructing the table with some code based on that located in the 
> storm-hive project:
>
>       Table tbl = new Table();
>       tbl.setDbName(databaseName);
>       tbl.setTableName(tableName);
>       tbl.setTableType(TableType.MANAGED_TABLE.toString());
>       StorageDescriptor sd = new StorageDescriptor();
>       sd.setCols(getTableColumns(colNames, colTypes));
>       sd.setNumBuckets(1);
>       sd.setLocation(dbLocation + Path.SEPARATOR + tableName);
>       if (partNames != null && partNames.length != 0) {
>         tbl.setPartitionKeys(getPartitionKeys(partNames));
>       }
>
>       tbl.setSd(sd);
>
>       sd.setBucketCols(new ArrayList<String>(2));
>       sd.setSerdeInfo(new SerDeInfo());
>       sd.getSerdeInfo().setName(tbl.getTableName());
>       sd.getSerdeInfo().setParameters(new HashMap<String, String>());
>       
> sd.getSerdeInfo().getParameters().put(serdeConstants.SERIALIZATION_FORMAT, 
> "1");
>       // Not sure if this does anything?
>       sd.getSerdeInfo().getParameters().put("transactional", 
> Boolean.TRUE.toString());
>
>       sd.getSerdeInfo().setSerializationLib(OrcSerde.class.getName());
>       sd.setInputFormat(OrcInputFormat.class.getName());
>       sd.setOutputFormat(OrcOutputFormat.class.getName());
>
>       Map<String, String> tableParams = new HashMap<String, String>();
> // Not sure if this does anything?
>       tableParams.put("transactional", Boolean.TRUE.toString());
>       tbl.setParameters(tableParams);
>       client.createTable(tbl);
>       try {
>         if (partVals != null && partVals.size() > 0) {
>           addPartition(client, tbl, partVals);
>         }
>       } catch (AlreadyExistsException e) {
>       }
>
> I don't really know enough about Hive and ORCFile internals to work 
> out where I'm going wrong so any help would be appreciated.
>
> Thanks - Elliot.