You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Elliot West <te...@gmail.com> on 2015/03/20 22:50:21 UTC
Updates/deletes with OrcRecordUpdater
Hi,
I'm trying to use the insert, update and delete methods on OrcRecordUpdater
to programmatically mutate an ORC based Hive table (1.0.0). I've got
inserts working correctly but I'm hitting into a problem with deletes and
updates. I get an NPE which I have traced back to what seems like a missing
recIdField(?).
java.lang.NullPointerException
at
org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldData(LazySimpleStructObjectInspector.java:103)
at
org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.addEvent(OrcRecordUpdater.java:296)
at
org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.delete(OrcRecordUpdater.java:330)
I've tried specifying a location for the field using
AcidOutputFormat.Options.recordIdColumn(0) but this fails dues to an
ObjectInspector mismatch. I'm not sure if I should be creating this field
as part of my table definition or not. Currently I'm constructing the table
with some code based on that located in the storm-hive project:
Table tbl = new Table();
tbl.setDbName(databaseName);
tbl.setTableName(tableName);
tbl.setTableType(TableType.MANAGED_TABLE.toString());
StorageDescriptor sd = new StorageDescriptor();
sd.setCols(getTableColumns(colNames, colTypes));
sd.setNumBuckets(1);
sd.setLocation(dbLocation + Path.SEPARATOR + tableName);
if (partNames != null && partNames.length != 0) {
tbl.setPartitionKeys(getPartitionKeys(partNames));
}
tbl.setSd(sd);
sd.setBucketCols(new ArrayList<String>(2));
sd.setSerdeInfo(new SerDeInfo());
sd.getSerdeInfo().setName(tbl.getTableName());
sd.getSerdeInfo().setParameters(new HashMap<String, String>());
sd.getSerdeInfo().getParameters().put(serdeConstants.SERIALIZATION_FORMAT,
"1");
// Not sure if this does anything?
sd.getSerdeInfo().getParameters().put("transactional",
Boolean.TRUE.toString());
sd.getSerdeInfo().setSerializationLib(OrcSerde.class.getName());
sd.setInputFormat(OrcInputFormat.class.getName());
sd.setOutputFormat(OrcOutputFormat.class.getName());
Map<String, String> tableParams = new HashMap<String, String>();
// Not sure if this does anything?
tableParams.put("transactional", Boolean.TRUE.toString());
tbl.setParameters(tableParams);
client.createTable(tbl);
try {
if (partVals != null && partVals.size() > 0) {
addPartition(client, tbl, partVals);
}
} catch (AlreadyExistsException e) {
}
I don't really know enough about Hive and ORCFile internals to work out
where I'm going wrong so any help would be appreciated.
Thanks - Elliot.
Re: Updates/deletes with OrcRecordUpdater
Posted by Alan Gates <al...@gmail.com>.
Your table definition looks fine, and no you shouldn't service the
recIdField in the table itself.
Without seeing your writing code it's hard to know why you're hitting
this, but some info that may be of use. Hive itself uses a pseudo
column to store the recIdInfo when it reads an ACID row so that it has
it when it writes back for an update or delete. I'm guessing you don't
have this pseudo column set up correctly. You can take a look at
FileSinkOperator (look for ACID or UPDATE) and
OrcInputFormat.getRecordReader to get an idea of how this works.
Alan.
> Elliot West <ma...@gmail.com>
> March 20, 2015 at 14:50
> Hi,
>
> I'm trying to use the insert, update and delete methods on
> OrcRecordUpdater to programmatically mutate an ORC based Hive table
> (1.0.0). I've got inserts working correctly but I'm hitting into a
> problem with deletes and updates. I get an NPE which I have traced
> back to what seems like a missing recIdField(?).
>
>
> I've tried specifying a location for the field using
> AcidOutputFormat.Options.recordIdColumn(0) but this fails dues to an
> ObjectInspector mismatch. I'm not sure if I should be creating this
> field as part of my table definition or not. Currently I'm
> constructing the table with some code based on that located in the
> storm-hive project:
>
> Table tbl = new Table();
> tbl.setDbName(databaseName);
> tbl.setTableName(tableName);
> tbl.setTableType(TableType.MANAGED_TABLE.toString());
> StorageDescriptor sd = new StorageDescriptor();
> sd.setCols(getTableColumns(colNames, colTypes));
> sd.setNumBuckets(1);
> sd.setLocation(dbLocation + Path.SEPARATOR + tableName);
> if (partNames != null && partNames.length != 0) {
> tbl.setPartitionKeys(getPartitionKeys(partNames));
> }
>
> tbl.setSd(sd);
>
> sd.setBucketCols(new ArrayList<String>(2));
> sd.setSerdeInfo(new SerDeInfo());
> sd.getSerdeInfo().setName(tbl.getTableName());
> sd.getSerdeInfo().setParameters(new HashMap<String, String>());
>
> sd.getSerdeInfo().getParameters().put(serdeConstants.SERIALIZATION_FORMAT,
> "1");
> // Not sure if this does anything?
> sd.getSerdeInfo().getParameters().put("transactional",
> Boolean.TRUE.toString());
>
> sd.getSerdeInfo().setSerializationLib(OrcSerde.class.getName());
> sd.setInputFormat(OrcInputFormat.class.getName());
> sd.setOutputFormat(OrcOutputFormat.class.getName());
>
> Map<String, String> tableParams = new HashMap<String, String>();
> // Not sure if this does anything?
> tableParams.put("transactional", Boolean.TRUE.toString());
> tbl.setParameters(tableParams);
> client.createTable(tbl);
> try {
> if (partVals != null && partVals.size() > 0) {
> addPartition(client, tbl, partVals);
> }
> } catch (AlreadyExistsException e) {
> }
>
> I don't really know enough about Hive and ORCFile internals to work
> out where I'm going wrong so any help would be appreciated.
>
> Thanks - Elliot.