You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@orc.apache.org by Telco Phone <te...@yahoo.com> on 2017/02/25 17:39:37 UTC

Setting Null correctly

Give the code here I am trying to find the correct way to set null to various vectors
In the case of Long or Bytes vectors, how do you correctly set nulls ?
Lines in question are 
col1.isNull[4] = Boolean.TRUE;    <--- does not set to null but sets to 0 in outputcol2.isNull[4] = Boolean.TRUE;   <--- throws error on write 

Thanks in advance
void example() {
String s = "struct<_col0:bigint,_col1:string>";
        TypeDescription schema = TypeDescription.fromString(s);

        // Build col0
        LongColumnVector col1 = new LongColumnVector(5);        col1.init();        col1.vector[0] = 9L;        col1.vector[1] = 9L;        col1.vector[2] = 9L;        col1.vector[3] = 9L;        col1.isNull[4] = Boolean.TRUE;

        // Build col1
        BytesColumnVector col2 = new BytesColumnVector();        col2.init();        col2.initBuffer();
        byte[] byteString = "Test0".getBytes();        col2.setVal(0, byteString, 0, byteString.length);
        byteString = "Test1".getBytes();        col2.setVal(1, byteString, 0, byteString.length);
        byteString = "Test2".getBytes();        col2.setVal(2, byteString, 0, byteString.length);
        byteString = "Test3".getBytes();        col2.setVal(3, byteString, 0, byteString.length);
        byteString = null;
        col2.isNull[4] = Boolean.TRUE;

        VectorizedRowBatch batch = schema.createRowBatch();        batch.cols[0] = col1;        batch.cols[1] = col2;
        batch.size=5;

        try {            File f = new File("/tmp/my-file.orc");            f.delete();
            Configuration conf = new Configuration();            Writer writer = OrcFile.createWriter(new Path("/tmp/my-file.orc"), OrcFile.writerOptions(conf).setSchema(schema));            writer.addRowBatch(batch);            writer.close();


        } catch (Exception e) {            e.printStackTrace();        }}

Re: Setting Null correctly

Posted by Telco Phone <te...@yahoo.com>.
Thanks that is what I was missing.



      From: Owen O'Malley <om...@apache.org>
 To: user@orc.apache.org; Telco Phone <te...@yahoo.com> 
 Sent: Saturday, February 25, 2017 2:02 PM
 Subject: Re: Setting Null correctly
   


On Sat, Feb 25, 2017 at 9:39 AM, Telco Phone <te...@yahoo.com> wrote:

Give the code here I am trying to find the correct way to set null to various vectors
In the case of Long or Bytes vectors, how do you correctly set nulls ?
Lines in question are 
col1.isNull[4] = Boolean.TRUE;    <--- does not set to null but sets to 0 in outputcol2.isNull[4] = Boolean.TRUE;   <--- throws error on write 

It is easier to use "true" instead of "Boolean.TRUE":col1.isNull[4] = true;col2.isNull[4] = true;
You also need to set ColumnVector.noNulls http://orc.apache.org/api/hive-storage-api/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.html#noNulls to false:
col1.noNulls = false;col2.noNulls = false;
.. Owen 


Thanks in advance
void example() {
String s = "struct<_col0:bigint,_col1:str ing>";
        TypeDescription schema = TypeDescription.fromString(s);

        // Build col0
        LongColumnVector col1 = new LongColumnVector(5);        col1.init();        col1.vector[0] = 9L;        col1.vector[1] = 9L;        col1.vector[2] = 9L;        col1.vector[3] = 9L;        col1.isNull[4] = Boolean.TRUE;

        // Build col1
        BytesColumnVector col2 = new BytesColumnVector();        col2.init();        col2.initBuffer();
        byte[] byteString = "Test0".getBytes();        col2.setVal(0, byteString, 0, byteString.length);
        byteString = "Test1".getBytes();        col2.setVal(1, byteString, 0, byteString.length);
        byteString = "Test2".getBytes();        col2.setVal(2, byteString, 0, byteString.length);
        byteString = "Test3".getBytes();        col2.setVal(3, byteString, 0, byteString.length);
        byteString = null;
        col2.isNull[4] = Boolean.TRUE;

        VectorizedRowBatch batch = schema.createRowBatch();        batch.cols[0] = col1;        batch.cols[1] = col2;
        batch.size=5;

        try {            File f = new File("/tmp/my-file.orc");            f.delete();
            Configuration conf = new Configuration();            Writer writer = OrcFile.createWriter(new Path("/tmp/my-file.orc"), OrcFile.writerOptions(conf).se tSchema(schema));            writer.addRowBatch(batch);            writer.close();


        } catch (Exception e) {            e.printStackTrace();        }}



   

Re: Setting Null correctly

Posted by Owen O'Malley <om...@apache.org>.
On Sat, Feb 25, 2017 at 9:39 AM, Telco Phone <te...@yahoo.com> wrote:

> Give the code here I am trying to find the correct way to set null to
> various vectors
>
> In the case of Long or Bytes vectors, how do you correctly set nulls ?
>
> Lines in question are
>
> col1.isNull[4] = Boolean.TRUE;    <--- does not set to null but sets to 0
> in output
> col2.isNull[4] = Boolean.TRUE;   <--- throws error on write
>

It is easier to use "true" instead of "Boolean.TRUE":
col1.isNull[4] = true;
col2.isNull[4] = true;

You also need to set ColumnVector.noNulls
http://orc.apache.org/api/hive-storage-api/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.html#noNulls
to false:

col1.noNulls = false;
col2.noNulls = false;

.. Owen


>
>
> Thanks in advance
>
> void example() {
>
> String s = "struct<_col0:bigint,_col1:string>";
>
>         TypeDescription schema = TypeDescription.fromString(s);
>
>
>         // Build col0
>
>         LongColumnVector col1 = new LongColumnVector(5);
>         col1.init();
>         col1.vector[0] = 9L;
>         col1.vector[1] = 9L;
>         col1.vector[2] = 9L;
>         col1.vector[3] = 9L;
>         col1.isNull[4] = Boolean.TRUE;
>
>
>         // Build col1
>
>         BytesColumnVector col2 = new BytesColumnVector();
>         col2.init();
>         col2.initBuffer();
>
>         byte[] byteString = "Test0".getBytes();
>         col2.setVal(0, byteString, 0, byteString.length);
>
>         byteString = "Test1".getBytes();
>         col2.setVal(1, byteString, 0, byteString.length);
>
>         byteString = "Test2".getBytes();
>         col2.setVal(2, byteString, 0, byteString.length);
>
>         byteString = "Test3".getBytes();
>         col2.setVal(3, byteString, 0, byteString.length);
>
>         byteString = null;
>
>         col2.isNull[4] = Boolean.TRUE;
>
>
>         VectorizedRowBatch batch = schema.createRowBatch();
>         batch.cols[0] = col1;
>         batch.cols[1] = col2;
>
>         batch.size=5;
>
>
>         try {
>             File f = new File("/tmp/my-file.orc");
>             f.delete();
>
>             Configuration conf = new Configuration();
>             Writer writer = OrcFile.createWriter(new
> Path("/tmp/my-file.orc"), OrcFile.writerOptions(conf).setSchema(schema));
>             writer.addRowBatch(batch);
>             writer.close();
>
>
>
>         } catch (Exception e) {
>             e.printStackTrace();
>         }
> }
>