You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@orc.apache.org by Ravi Tatapudi <ra...@in.ibm.com> on 2015/11/23 16:34:30 UTC

Facing issues while writing ORC files

Hello,

I am Ravi Tatapudi, from IBM-India. I am working on a simple tool, that 
writes data to ORC-file. I am new to "ORC/hive world" & I have prepared my 
test-application, primarily based on the example-code at:
https://github.com/cloudera/hive/blob/cdh5.4.0-release/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java#L487-L508

I see that, I am able to write the data successfully to ORC-file, when the 
column-definition is hard-coded in the class (orcw.java.sample1). However, 
when I defined an array of obejcts & assign the values at run-time 
(orcw.java.sample2), I see that, data is not written to the ORC-file.

Pl. find attached sample-programs:

 

Could you please see the same & provide your inputs on why 
"orcw.java.sample2" is not writing data ?

Thanks,
 Ravi


Re: Facing issues while writing ORC files

Posted by Ravi Tatapudi <ra...@in.ibm.com>.
Hello,

I could prepare test cases for ORC-write (for dynamic schema), using the 
example provided at: https://gist.github.com/omalley/ccabae7cccac28f64812.

Now, I am trying to read the data from ORC-file (that is created as part 
of the above example) using the attached "test-program", but, getting the 
following exception: 
"java.lang.ClassCastException: 
org.apache.hadoop.hive.ql.io.orc.OrcStruct$OrcStructInspector incompatible 
with OrcReader$OrcRowInspector":



Could you please see the same & provide your inputs on why it is not 
reading the data (or) if there is a corresponding "reader-example", that 
reads the data written by the above "writer-example", I request you to 
provide the same.

Thanks,
 Ravi





From:   "Owen O'Malley" <om...@apache.org>
To:     user@orc.apache.org
Date:   11/25/2015 01:01 AM
Subject:        Re: Facing issues while writing ORC files



Ok, the problem is that you need to create an ObjectInspector that 
specifies the types of the columns. With a generic record, the 
reflection-based ObjectInspector doesn't have enough information.

Unfortunately, it is kind of ugly, because there is a lot of boilerplate 
code dealing with ObjectInspectors. The following code works by building 
an ObjectInspector dynamically:

https://gist.github.com/omalley/ccabae7cccac28f64812

On the positive side, we've been working on updating the API as part of 
separating ORC out to a separate project. In Hive 2.0 it should look like 
the much simpler:

https://gist.github.com/omalley/7a53cb3ae91fa4c22023

.. Owen


On Mon, Nov 23, 2015 at 7:34 AM, Ravi Tatapudi <ra...@in.ibm.com> 
wrote:
Hello,

I am Ravi Tatapudi, from IBM-India. I am working on a simple tool, that 
writes data to ORC-file. I am new to "ORC/hive world" & I have prepared my 
test-application, primarily based on the example-code at:
https://github.com/cloudera/hive/blob/cdh5.4.0-release/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java#L487-L508


I see that, I am able to write the data successfully to ORC-file, when the 
column-definition is hard-coded in the class (orcw.java.sample1). However, 
when I defined an array of obejcts & assign the values at run-time 
(orcw.java.sample2), I see that, data is not written to the ORC-file.

Pl. find attached sample-programs:

 

Could you please see the same & provide your inputs on why 
"orcw.java.sample2" is not writing data ?

Thanks,
 Ravi




Re: Facing issues while writing ORC files

Posted by Ravi Tatapudi <ra...@in.ibm.com>.
Hello Lefty:

Thank you very much, for the info.

Regards,
Ravi



From:   Lefty Leverenz <le...@gmail.com>
To:     user@orc.apache.org
Cc:     Eric Jacobson <ej...@us.ibm.com>, Sumit Kumar6/India/IBM@IBMIN
Date:   11/26/2015 11:22 AM
Subject:        Re: Facing issues while writing ORC files



Preparations for the release of Hive 2.0.0 have begun.  If all goes well, 
the release should be available in December or possibly January.

-- Lefty

On Wed, Nov 25, 2015 at 12:58 AM, Ravi Tatapudi <ra...@in.ibm.com> 
wrote:
Hello Owen:

Many thanks for the details & the sample program (I tried it in my 
test-box & found that, it has written the data correctly). And I see that, 
the example that works with "Hive-2.0" is very nice. 

Do you have any tentative idea about when "hive-2.0" would be released 
(eg: Q1 / Q2 of 2016 or later ?) or is there any link, that shows the 
"planned release date for Hive-2.0" ? 

If have any info. in this regard, could you please let me know.

Thanks,
 Ravi




From:        "Owen O'Malley" <om...@apache.org>
To:        user@orc.apache.org
Date:        11/25/2015 01:01 AM
Subject:        Re: Facing issues while writing ORC files




Ok, the problem is that you need to create an ObjectInspector that 
specifies the types of the columns. With a generic record, the 
reflection-based ObjectInspector doesn't have enough information.

Unfortunately, it is kind of ugly, because there is a lot of boilerplate 
code dealing with ObjectInspectors. The following code works by building 
an ObjectInspector dynamically:

https://gist.github.com/omalley/ccabae7cccac28f64812

On the positive side, we've been working on updating the API as part of 
separating ORC out to a separate project. In Hive 2.0 it should look like 
the much simpler:

https://gist.github.com/omalley/7a53cb3ae91fa4c22023

.. Owen


On Mon, Nov 23, 2015 at 7:34 AM, Ravi Tatapudi <ra...@in.ibm.com> 
wrote:
Hello,

I am Ravi Tatapudi, from IBM-India. I am working on a simple tool, that 
writes data to ORC-file. I am new to "ORC/hive world" & I have prepared my 
test-application, primarily based on the example-code at:
https://github.com/cloudera/hive/blob/cdh5.4.0-release/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java#L487-L508


I see that, I am able to write the data successfully to ORC-file, when the 
column-definition is hard-coded in the class (orcw.java.sample1). However, 
when I defined an array of obejcts & assign the values at run-time 
(orcw.java.sample2), I see that, data is not written to the ORC-file.

Pl. find attached sample-programs:

 

Could you please see the same & provide your inputs on why 
"orcw.java.sample2" is not writing data ?

Thanks,
 Ravi






Re: Facing issues while writing ORC files

Posted by Lefty Leverenz <le...@gmail.com>.
Preparations for the release of Hive 2.0.0 have begun.  If all goes well,
the release should be available in December or possibly January.

-- Lefty

On Wed, Nov 25, 2015 at 12:58 AM, Ravi Tatapudi <ra...@in.ibm.com>
wrote:

> Hello Owen:
>
> Many thanks for the details & the sample program (I tried it in my
> test-box & found that, it has written the data correctly). And I see that,
> the example that works with "Hive-2.0" is very nice.
>
> Do you have any tentative idea about when "hive-2.0" would be released
> (eg: Q1 / Q2 of 2016 or later ?) or is there any link, that shows the
> "planned release date for Hive-2.0" ?
>
> If have any info. in this regard, could you please let me know.
>
> Thanks,
>  Ravi
>
>
>
>
> From:        "Owen O'Malley" <om...@apache.org>
> To:        user@orc.apache.org
> Date:        11/25/2015 01:01 AM
> Subject:        Re: Facing issues while writing ORC files
> ------------------------------
>
>
>
> Ok, the problem is that you need to create an ObjectInspector that
> specifies the types of the columns. With a generic record, the
> reflection-based ObjectInspector doesn't have enough information.
>
> Unfortunately, it is kind of ugly, because there is a lot of boilerplate
> code dealing with ObjectInspectors. The following code works by building an
> ObjectInspector dynamically:
>
> *https://gist.github.com/omalley/ccabae7cccac28f64812*
> <https://gist.github.com/omalley/ccabae7cccac28f64812>
>
> On the positive side, we've been working on updating the API as part of
> separating ORC out to a separate project. In Hive 2.0 it should look like
> the much simpler:
>
> *https://gist.github.com/omalley/7a53cb3ae91fa4c22023*
> <https://gist.github.com/omalley/7a53cb3ae91fa4c22023>
>
> .. Owen
>
>
> On Mon, Nov 23, 2015 at 7:34 AM, Ravi Tatapudi <*ravi_tatapudi@in.ibm.com*
> <ra...@in.ibm.com>> wrote:
> Hello,
>
> I am Ravi Tatapudi, from IBM-India. I am working on a simple tool, that
> writes data to ORC-file. I am new to "ORC/hive world" & I have prepared my
> test-application, primarily based on the example-code at:
> *https://github.com/cloudera/hive/blob/cdh5.4.0-release/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java#L487-L508*
> <https://github.com/cloudera/hive/blob/cdh5.4.0-release/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java#L487-L508>
>
> I see that, I am able to write the data successfully to ORC-file, when the
> column-definition is hard-coded in the class (orcw.java.sample1). However,
> when I defined an array of obejcts & assign the values at run-time
> (orcw.java.sample2), I see that, data is not written to the ORC-file.
>
> Pl. find attached sample-programs:
>
>
>
> Could you please see the same & provide your inputs on why
> "orcw.java.sample2" is not writing data ?
>
> Thanks,
>  Ravi
>
>
>

Re: Facing issues while writing ORC files

Posted by Ravi Tatapudi <ra...@in.ibm.com>.
Hello Owen:

Many thanks for the details & the sample program (I tried it in my 
test-box & found that, it has written the data correctly). And I see that, 
the example that works with "Hive-2.0" is very nice. 

Do you have any tentative idea about when "hive-2.0" would be released 
(eg: Q1 / Q2 of 2016 or later ?) or is there any link, that shows the 
"planned release date for Hive-2.0" ? 

If have any info. in this regard, could you please let me know.

Thanks,
 Ravi




From:   "Owen O'Malley" <om...@apache.org>
To:     user@orc.apache.org
Date:   11/25/2015 01:01 AM
Subject:        Re: Facing issues while writing ORC files



Ok, the problem is that you need to create an ObjectInspector that 
specifies the types of the columns. With a generic record, the 
reflection-based ObjectInspector doesn't have enough information.

Unfortunately, it is kind of ugly, because there is a lot of boilerplate 
code dealing with ObjectInspectors. The following code works by building 
an ObjectInspector dynamically:

https://gist.github.com/omalley/ccabae7cccac28f64812

On the positive side, we've been working on updating the API as part of 
separating ORC out to a separate project. In Hive 2.0 it should look like 
the much simpler:

https://gist.github.com/omalley/7a53cb3ae91fa4c22023

.. Owen


On Mon, Nov 23, 2015 at 7:34 AM, Ravi Tatapudi <ra...@in.ibm.com> 
wrote:
Hello,

I am Ravi Tatapudi, from IBM-India. I am working on a simple tool, that 
writes data to ORC-file. I am new to "ORC/hive world" & I have prepared my 
test-application, primarily based on the example-code at:
https://github.com/cloudera/hive/blob/cdh5.4.0-release/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java#L487-L508


I see that, I am able to write the data successfully to ORC-file, when the 
column-definition is hard-coded in the class (orcw.java.sample1). However, 
when I defined an array of obejcts & assign the values at run-time 
(orcw.java.sample2), I see that, data is not written to the ORC-file.

Pl. find attached sample-programs:

 

Could you please see the same & provide your inputs on why 
"orcw.java.sample2" is not writing data ?

Thanks,
 Ravi




Re: Facing issues while writing ORC files

Posted by Ravi Tatapudi <ra...@in.ibm.com>.
Hello,

Using the attached simple program (orcrd.java), I am able to read the 
column-types, column names & print the data-row, which is getting printed 
as: {1, hello, orcFile}", from the attached ORC-file: "orcfile1".
However, when I try to get individual field-values, from this row (by 
reading the row as "OrcRow"), I am getting the exception: 
java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.orc.OrcStruct 
incompatible with orcrd$OrcRow

 

Could you please see the sample-program reading from the attached file 
("orcfile1") & let me know, how to read individual "field-values" from 
each-row. 

Thanks,
 Ravi



From:   Ravi Tatapudi/India/IBM
To:     user@orc.apache.org
Cc:     Eric Jacobson/Worcester/IBM@IBMUS, Sumit Kumar6/India/IBM@IBMIN
Date:   12/01/2015 06:16 PM
Subject:        Re: Facing issues while writing ORC files


Hello,

I could prepare test cases for ORC-write (for dynamic schema), using the 
example provided at: https://gist.github.com/omalley/ccabae7cccac28f64812.

Now, I am trying to read the data from ORC-file (that is created as part 
of the above example) using the attached "test-program", but, getting the 
following exception: 
"java.lang.ClassCastException: 
org.apache.hadoop.hive.ql.io.orc.OrcStruct$OrcStructInspector incompatible 
with OrcReader$OrcRowInspector":

[attachment "OrcReader.java" deleted by Ravi Tatapudi/India/IBM] 

Could you please see the same & provide your inputs on why it is not 
reading the data (or) if there is a corresponding "reader-example", that 
reads the data written by the above "writer-example", I request you to 
provide the same.

Thanks,
 Ravi






From:   "Owen O'Malley" <om...@apache.org>
To:     user@orc.apache.org
Date:   11/25/2015 01:01 AM
Subject:        Re: Facing issues while writing ORC files



Ok, the problem is that you need to create an ObjectInspector that 
specifies the types of the columns. With a generic record, the 
reflection-based ObjectInspector doesn't have enough information.

Unfortunately, it is kind of ugly, because there is a lot of boilerplate 
code dealing with ObjectInspectors. The following code works by building 
an ObjectInspector dynamically:

https://gist.github.com/omalley/ccabae7cccac28f64812

On the positive side, we've been working on updating the API as part of 
separating ORC out to a separate project. In Hive 2.0 it should look like 
the much simpler:

https://gist.github.com/omalley/7a53cb3ae91fa4c22023

.. Owen


On Mon, Nov 23, 2015 at 7:34 AM, Ravi Tatapudi <ra...@in.ibm.com> 
wrote:
Hello,

I am Ravi Tatapudi, from IBM-India. I am working on a simple tool, that 
writes data to ORC-file. I am new to "ORC/hive world" & I have prepared my 
test-application, primarily based on the example-code at:
https://github.com/cloudera/hive/blob/cdh5.4.0-release/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java#L487-L508


I see that, I am able to write the data successfully to ORC-file, when the 
column-definition is hard-coded in the class (orcw.java.sample1). However, 
when I defined an array of obejcts & assign the values at run-time 
(orcw.java.sample2), I see that, data is not written to the ORC-file.

Pl. find attached sample-programs:

 

Could you please see the same & provide your inputs on why 
"orcw.java.sample2" is not writing data ?

Thanks,
 Ravi




Re: Facing issues while writing ORC files

Posted by Owen O'Malley <om...@apache.org>.
Ok, the problem is that you need to create an ObjectInspector that
specifies the types of the columns. With a generic record, the
reflection-based ObjectInspector doesn't have enough information.

Unfortunately, it is kind of ugly, because there is a lot of boilerplate
code dealing with ObjectInspectors. The following code works by building an
ObjectInspector dynamically:

https://gist.github.com/omalley/ccabae7cccac28f64812

On the positive side, we've been working on updating the API as part of
separating ORC out to a separate project. In Hive 2.0 it should look like
the much simpler:

https://gist.github.com/omalley/7a53cb3ae91fa4c22023

.. Owen


On Mon, Nov 23, 2015 at 7:34 AM, Ravi Tatapudi <ra...@in.ibm.com>
wrote:

> Hello,
>
> I am Ravi Tatapudi, from IBM-India. I am working on a simple tool, that
> writes data to ORC-file. I am new to "ORC/hive world" & I have prepared my
> test-application, primarily based on the example-code at:
> *https://github.com/cloudera/hive/blob/cdh5.4.0-release/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java#L487-L508*
> <https://github.com/cloudera/hive/blob/cdh5.4.0-release/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java#L487-L508>
>
> I see that, I am able to write the data successfully to ORC-file, when the
> column-definition is hard-coded in the class (orcw.java.sample1). However,
> when I defined an array of obejcts & assign the values at run-time
> (orcw.java.sample2), I see that, data is not written to the ORC-file.
>
> Pl. find attached sample-programs:
>
>
>
> Could you please see the same & provide your inputs on why
> "orcw.java.sample2" is not writing data ?
>
> Thanks,
>  Ravi
>