You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@orc.apache.org by Telco Phone <te...@yahoo.com> on 2017/02/25 17:39:37 UTC
Setting Null correctly
Give the code here I am trying to find the correct way to set null to various vectors
In the case of Long or Bytes vectors, how do you correctly set nulls ?
Lines in question are
col1.isNull[4] = Boolean.TRUE; <--- does not set to null but sets to 0 in outputcol2.isNull[4] = Boolean.TRUE; <--- throws error on write
Thanks in advance
void example() {
String s = "struct<_col0:bigint,_col1:string>";
TypeDescription schema = TypeDescription.fromString(s);
// Build col0
LongColumnVector col1 = new LongColumnVector(5); col1.init(); col1.vector[0] = 9L; col1.vector[1] = 9L; col1.vector[2] = 9L; col1.vector[3] = 9L; col1.isNull[4] = Boolean.TRUE;
// Build col1
BytesColumnVector col2 = new BytesColumnVector(); col2.init(); col2.initBuffer();
byte[] byteString = "Test0".getBytes(); col2.setVal(0, byteString, 0, byteString.length);
byteString = "Test1".getBytes(); col2.setVal(1, byteString, 0, byteString.length);
byteString = "Test2".getBytes(); col2.setVal(2, byteString, 0, byteString.length);
byteString = "Test3".getBytes(); col2.setVal(3, byteString, 0, byteString.length);
byteString = null;
col2.isNull[4] = Boolean.TRUE;
VectorizedRowBatch batch = schema.createRowBatch(); batch.cols[0] = col1; batch.cols[1] = col2;
batch.size=5;
try { File f = new File("/tmp/my-file.orc"); f.delete();
Configuration conf = new Configuration(); Writer writer = OrcFile.createWriter(new Path("/tmp/my-file.orc"), OrcFile.writerOptions(conf).setSchema(schema)); writer.addRowBatch(batch); writer.close();
} catch (Exception e) { e.printStackTrace(); }}
Re: Setting Null correctly
Posted by Telco Phone <te...@yahoo.com>.
Thanks that is what I was missing.
From: Owen O'Malley <om...@apache.org>
To: user@orc.apache.org; Telco Phone <te...@yahoo.com>
Sent: Saturday, February 25, 2017 2:02 PM
Subject: Re: Setting Null correctly
On Sat, Feb 25, 2017 at 9:39 AM, Telco Phone <te...@yahoo.com> wrote:
Give the code here I am trying to find the correct way to set null to various vectors
In the case of Long or Bytes vectors, how do you correctly set nulls ?
Lines in question are
col1.isNull[4] = Boolean.TRUE; <--- does not set to null but sets to 0 in outputcol2.isNull[4] = Boolean.TRUE; <--- throws error on write
It is easier to use "true" instead of "Boolean.TRUE":col1.isNull[4] = true;col2.isNull[4] = true;
You also need to set ColumnVector.noNulls http://orc.apache.org/api/hive-storage-api/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.html#noNulls to false:
col1.noNulls = false;col2.noNulls = false;
.. Owen
Thanks in advance
void example() {
String s = "struct<_col0:bigint,_col1:str ing>";
TypeDescription schema = TypeDescription.fromString(s);
// Build col0
LongColumnVector col1 = new LongColumnVector(5); col1.init(); col1.vector[0] = 9L; col1.vector[1] = 9L; col1.vector[2] = 9L; col1.vector[3] = 9L; col1.isNull[4] = Boolean.TRUE;
// Build col1
BytesColumnVector col2 = new BytesColumnVector(); col2.init(); col2.initBuffer();
byte[] byteString = "Test0".getBytes(); col2.setVal(0, byteString, 0, byteString.length);
byteString = "Test1".getBytes(); col2.setVal(1, byteString, 0, byteString.length);
byteString = "Test2".getBytes(); col2.setVal(2, byteString, 0, byteString.length);
byteString = "Test3".getBytes(); col2.setVal(3, byteString, 0, byteString.length);
byteString = null;
col2.isNull[4] = Boolean.TRUE;
VectorizedRowBatch batch = schema.createRowBatch(); batch.cols[0] = col1; batch.cols[1] = col2;
batch.size=5;
try { File f = new File("/tmp/my-file.orc"); f.delete();
Configuration conf = new Configuration(); Writer writer = OrcFile.createWriter(new Path("/tmp/my-file.orc"), OrcFile.writerOptions(conf).se tSchema(schema)); writer.addRowBatch(batch); writer.close();
} catch (Exception e) { e.printStackTrace(); }}
Re: Setting Null correctly
Posted by Owen O'Malley <om...@apache.org>.
On Sat, Feb 25, 2017 at 9:39 AM, Telco Phone <te...@yahoo.com> wrote:
> Give the code here I am trying to find the correct way to set null to
> various vectors
>
> In the case of Long or Bytes vectors, how do you correctly set nulls ?
>
> Lines in question are
>
> col1.isNull[4] = Boolean.TRUE; <--- does not set to null but sets to 0
> in output
> col2.isNull[4] = Boolean.TRUE; <--- throws error on write
>
It is easier to use "true" instead of "Boolean.TRUE":
col1.isNull[4] = true;
col2.isNull[4] = true;
You also need to set ColumnVector.noNulls
http://orc.apache.org/api/hive-storage-api/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.html#noNulls
to false:
col1.noNulls = false;
col2.noNulls = false;
.. Owen
>
>
> Thanks in advance
>
> void example() {
>
> String s = "struct<_col0:bigint,_col1:string>";
>
> TypeDescription schema = TypeDescription.fromString(s);
>
>
> // Build col0
>
> LongColumnVector col1 = new LongColumnVector(5);
> col1.init();
> col1.vector[0] = 9L;
> col1.vector[1] = 9L;
> col1.vector[2] = 9L;
> col1.vector[3] = 9L;
> col1.isNull[4] = Boolean.TRUE;
>
>
> // Build col1
>
> BytesColumnVector col2 = new BytesColumnVector();
> col2.init();
> col2.initBuffer();
>
> byte[] byteString = "Test0".getBytes();
> col2.setVal(0, byteString, 0, byteString.length);
>
> byteString = "Test1".getBytes();
> col2.setVal(1, byteString, 0, byteString.length);
>
> byteString = "Test2".getBytes();
> col2.setVal(2, byteString, 0, byteString.length);
>
> byteString = "Test3".getBytes();
> col2.setVal(3, byteString, 0, byteString.length);
>
> byteString = null;
>
> col2.isNull[4] = Boolean.TRUE;
>
>
> VectorizedRowBatch batch = schema.createRowBatch();
> batch.cols[0] = col1;
> batch.cols[1] = col2;
>
> batch.size=5;
>
>
> try {
> File f = new File("/tmp/my-file.orc");
> f.delete();
>
> Configuration conf = new Configuration();
> Writer writer = OrcFile.createWriter(new
> Path("/tmp/my-file.orc"), OrcFile.writerOptions(conf).setSchema(schema));
> writer.addRowBatch(batch);
> writer.close();
>
>
>
> } catch (Exception e) {
> e.printStackTrace();
> }
> }
>