You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@crunch.apache.org by "Robinson, Landon - Landon" <la...@lowes.com> on 2016/01/28 19:20:29 UTC

Reading Hive Tables into PCollection

Crunch Gurus,

What is the Crunch-convenient or recommended way to read the contents of a Hive table into a Pcollection?
Thanks!
Best,
Landon
---------------------------------------------------------------------------
Landon Robinson
Big Data & Hadoop Engineer
---------------------------------------------------------------------------

NOTICE: All information in and attached to the e-mails below may be proprietary, confidential, privileged and otherwise protected from improper or erroneous disclosure. If you are not the sender's intended recipient, you are not authorized to intercept, read, print, retain, copy, forward, or disseminate this message. If you have erroneously received this communication, please notify the sender immediately by phone (704-758-1000) or by e-mail and destroy all copies of this message electronic, paper, or otherwise. 

By transmitting documents via this email: Users, Customers, Suppliers and Vendors collectively acknowledge and agree the transmittal of information via email is voluntary, is offered as a convenience, and is not a secured method of communication; Not to transmit any payment information E.G. credit card, debit card, checking account, wire transfer information, passwords, or sensitive and personal information E.G. Driver's license, DOB, social security, or any other information the user wishes to remain confidential; To transmit only non-confidential information such as plans, pictures and drawings and to assume all risk and liability for and indemnify Lowe's from any claims, losses or damages that may arise from the transmittal of documents or including non-confidential information in the body of an email transmittal. Thank you.

Re: Reading Hive Tables into PCollection

Posted by David Ortiz <dp...@gmail.com>.
Might be able to use the hive jdbc client then parse the "show create table
x" result.  Metastore API might have some stuff too, but I am not familiar
with that.  I have been able to use the jdbc driver within a crunch driver
to do some other stuff though, so I know that works.

On Fri, Jan 29, 2016 at 1:16 PM Robinson, Landon - Landon <
landon.t.robinson@lowes.com> wrote:

> On this same note, I still have a similar problem to solve.
> I can point Crunch at an HDFS location and it will ingest/read the Orc
> file just fine.
>
> But is there a way (maybe levering Hcat/Hive apis) to get the file
> locations dynamically/from Hive? Can I ask Hcat/Hive about a table and its
> partitions, and it tell me the file location on HDFS (which I can then pass
> to Crunch to consume the file into the pipeline)?
> ---------------------------------------------------------------------------
> Landon Robinson
> Big Data & Hadoop Engineer
> IT Business Intelligence, Lowe’s Companies Inc.
> ---------------------------------------------------------------------------
> From: <Robinson>, LCI <la...@lowes.com>
> Date: Friday, January 29, 2016 at 10:41 AM
> To: LCI <la...@lowes.com>, Apache Crunch Mailing List <
> user@crunch.apache.org>, David Ortiz <dp...@gmail.com>
>
> Subject: Re: Reading Hive Tables into PCollection
>
> *Solved:*
>
> Turns out you can use this:
>
> 	private HiveChar acl_idc;
>
> That comes from this package: org.apache.hadoop.hive.common.type.HiveChar;
>
> Sorry for all the emails, but hope the findings help someone else!
>
> ---------------------------------------------------------------------------
> Landon Robinson
> Big Data & Hadoop Engineer
> IT Business Intelligence, Lowe’s Companies Inc.
> ---------------------------------------------------------------------------
>
> From: <Robinson>, LCI <la...@lowes.com>
> Date: Friday, January 29, 2016 at 10:36 AM
> To: Apache Crunch Mailing List <us...@crunch.apache.org>, LCI <
> landon.t.robinson@lowes.com>, David Ortiz <dp...@gmail.com>
> Subject: Re: Reading Hive Tables into PCollection
>
> Additionally, we tried allowing those characters to be strings, but get
> the below error. The real issue is getting the Orc ‘char’ to cast to
> something we can use in the Orc structure.
>
> Exception in thread "main" org.apache.crunch.CrunchRuntimeException: Error
> while reading local file: file:/tmp/crunch-test/000000_0
> at
> org.apache.crunch.io.orc.OrcFileReaderFactory$1.next(OrcFileReaderFactory.java:110)
> at
> org.apache.crunch.io.CompositePathIterable$2.next(CompositePathIterable.java:99)
> at com.google.common.collect.Iterators$5.next(Iterators.java:607)
> at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:266)
> at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:223)
> at
> org.apache.crunch.impl.mem.collect.MemCollection.<init>(MemCollection.java:79)
> at org.apache.crunch.impl.mem.MemPipeline.read(MemPipeline.java:165)
> at org.apache.crunch.impl.mem.MemPipeline.read(MemPipeline.java:156)
> at
> com.lowes.bigdata.closerate.verint.DataQualityDriverTest.run(DataQualityDriverTest.java:57)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at
> com.lowes.bigdata.closerate.verint.DataQualityDriverTest.main(DataQualityDriverTest.java:36)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
> *Caused by: java.lang.ClassCastException:
> org.apache.hadoop.hive.serde2.io.HiveCharWritable cannot be cast to
> org.apache.hadoop.io.Text*
> at
> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:46)
> at
> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:26)
> at org.apache.crunch.types.orc.OrcUtils.convert(OrcUtils.java:169)
> at org.apache.crunch.types.orc.OrcUtils.convert(OrcUtils.java:222)
> at org.apache.crunch.types.orc.Orcs$ReflectInFn.map(Orcs.java:190)
> at org.apache.crunch.types.orc.Orcs$ReflectInFn.map(Orcs.java:168)
> at org.apache.crunch.fn.CompositeMapFn.map(CompositeMapFn.java:63)
> at
> org.apache.crunch.io.orc.OrcFileReaderFactory$1.next(OrcFileReaderFactory.java:108)
> ... 15 more
>
> *Verint1978Record*
>
> public class Verint1978Record {
>
>    private String lct_nbr;
>    private String vid_caa_id;
>    private Integer hrs_nbr;
>    private Integer mte_nbr;
>       private String acl_idc;
>    private Integer sec_dur;
>    private Integer sec_to_pcs;
>    private Integer sec_pcd;
>       private String use_for_rpr_idc;
>    private Integer grp_cnt;
>    private Integer sng_cnt;
>    private String upd_dt;
>    private String upd_id;
>    private String cal_dt;
>
> }
>
>
>
> ---------------------------------------------------------------------------
> Landon Robinson
> Big Data & Hadoop Engineer
> IT Business Intelligence, Lowe’s Companies Inc.
> ---------------------------------------------------------------------------
>
> From: <Robinson>, LCI <la...@lowes.com>
> Reply-To: Apache Crunch Mailing List <us...@crunch.apache.org>
> Date: Friday, January 29, 2016 at 10:33 AM
> To: David Ortiz <dp...@gmail.com>, Apache Crunch Mailing List <
> user@crunch.apache.org>
> Subject: Re: Reading Hive Tables into PCollection
>
> Right, we’ve been trying this with little luck — largely because I get the
> error:
>
> Caused by: java.lang.ClassCastException:
> org.apache.hadoop.hive.serde2.io.HiveCharWritable cannot be cast to
> org.apache.hadoop.hive.ql.io.orc.OrcStruct
>
> *Code:*
>
> OrcFileSource<Verint1978Record> source = new OrcFileSource<Verint1978Record>(new Path(inputPath), Orcs.reflects(Verint1978Record.class));
> PCollection<Verint1978Record> persons = pipeline.read(source);
>
> *Verint1978Record*
>
> public class Verint1978Record {
>
>    private String lct_nbr;
>    private String vid_caa_id;
>    private Integer hrs_nbr;
>    private Integer mte_nbr;
>    private Character acl_idc;
>    private Integer sec_dur;
>    private Integer sec_to_pcs;
>    private Integer sec_pcd;
>    private Character use_for_rpr_idc;
>    private Integer grp_cnt;
>    private Integer sng_cnt;
>    private String upd_dt;
>    private String upd_id;
>    private String cal_dt;
>
> }
>
> ---------------------------------------------------------------------------
> Landon Robinson
> Big Data & Hadoop Engineer
> IT Business Intelligence, Lowe’s Companies Inc.
> ---------------------------------------------------------------------------
>
> From: David Ortiz <dp...@gmail.com>
> Date: Friday, January 29, 2016 at 10:19 AM
> To: LCI <la...@lowes.com>, Apache Crunch Mailing List <
> user@crunch.apache.org>
> Subject: Re: Reading Hive Tables into PCollection
>
> http://hortonworks.com/blog/using-orcfile-cascading-apache-crunch/
>
> Here's the java excerpt from that article to read into Avro class (I'm
> assuming).
>
> [code language=”Java”]
> // Read an ORCFile using reflection-based serialization (slowest):
> OrcFileSource<Person> source = new OrcFileSource<Person>(new
> Path(inputPath), \
> Orcs.reflection(Person.class));
> PCollection<Person> persons = pipeline.read(source);
>
> On Fri, Jan 29, 2016 at 10:17 AM Robinson, Landon - Landon <
> landon.t.robinson@lowes.com> wrote:
>
>> Orc format.
>>
>> ---------------------------------------------------------------------------
>> Landon Robinson
>> Big Data & Hadoop Engineer
>> IT Business Intelligence, Lowe’s Companies Inc.
>>
>> ---------------------------------------------------------------------------
>>
>> From: David Ortiz <dp...@gmail.com>
>> Reply-To: Apache Crunch Mailing List <us...@crunch.apache.org>
>> Date: Thursday, January 28, 2016 at 1:22 PM
>> To: Apache Crunch Mailing List <us...@crunch.apache.org>
>> Subject: Re: Reading Hive Tables into PCollection
>>
>> What format are they stored as?
>>
>> On Thu, Jan 28, 2016 at 1:20 PM Robinson, Landon - Landon <
>> landon.t.robinson@lowes.com> wrote:
>>
>>> Crunch Gurus,
>>>
>>> What is the Crunch-convenient or recommended way to read the contents of
>>> a Hive table into a Pcollection?
>>> Thanks!
>>> Best,
>>> Landon
>>>
>>> ---------------------------------------------------------------------------
>>> Landon Robinson
>>> Big Data & Hadoop Engineer
>>>
>>> ---------------------------------------------------------------------------
>>> NOTICE: All information in and attached to the e-mails below may be
>>> proprietary, confidential, privileged and otherwise protected from improper
>>> or erroneous disclosure. If you are not the sender's intended recipient,
>>> you are not authorized to intercept, read, print, retain, copy, forward, or
>>> disseminate this message. If you have erroneously received this
>>> communication, please notify the sender immediately by phone (704-758-1000)
>>> or by e-mail and destroy all copies of this message electronic, paper, or
>>> otherwise.
>>>
>>> *By transmitting documents via this email: Users, Customers, Suppliers
>>> and Vendors collectively acknowledge and agree the transmittal of
>>> information via email is voluntary, is offered as a convenience, and is not
>>> a secured method of communication; Not to transmit any payment information
>>> E.G. credit card, debit card, checking account, wire transfer information,
>>> passwords, or sensitive and personal information E.G. Driver's license,
>>> DOB, social security, or any other information the user wishes to remain
>>> confidential; To transmit only non-confidential information such as plans,
>>> pictures and drawings and to assume all risk and liability for and
>>> indemnify Lowe's from any claims, losses or damages that may arise from the
>>> transmittal of documents or including non-confidential information in the
>>> body of an email transmittal. Thank you. *
>>>
>> NOTICE: All information in and attached to the e-mails below may be
>> proprietary, confidential, privileged and otherwise protected from improper
>> or erroneous disclosure. If you are not the sender's intended recipient,
>> you are not authorized to intercept, read, print, retain, copy, forward, or
>> disseminate this message. If you have erroneously received this
>> communication, please notify the sender immediately by phone (704-758-1000)
>> or by e-mail and destroy all copies of this message electronic, paper, or
>> otherwise.
>>
>> *By transmitting documents via this email: Users, Customers, Suppliers
>> and Vendors collectively acknowledge and agree the transmittal of
>> information via email is voluntary, is offered as a convenience, and is not
>> a secured method of communication; Not to transmit any payment information
>> E.G. credit card, debit card, checking account, wire transfer information,
>> passwords, or sensitive and personal information E.G. Driver's license,
>> DOB, social security, or any other information the user wishes to remain
>> confidential; To transmit only non-confidential information such as plans,
>> pictures and drawings and to assume all risk and liability for and
>> indemnify Lowe's from any claims, losses or damages that may arise from the
>> transmittal of documents or including non-confidential information in the
>> body of an email transmittal. Thank you. *
>>
> NOTICE: All information in and attached to the e-mails below may be
> proprietary, confidential, privileged and otherwise protected from improper
> or erroneous disclosure. If you are not the sender's intended recipient,
> you are not authorized to intercept, read, print, retain, copy, forward, or
> disseminate this message. If you have erroneously received this
> communication, please notify the sender immediately by phone (704-758-1000)
> or by e-mail and destroy all copies of this message electronic, paper, or
> otherwise.
>
> *By transmitting documents via this email: Users, Customers, Suppliers and
> Vendors collectively acknowledge and agree the transmittal of information
> via email is voluntary, is offered as a convenience, and is not a secured
> method of communication; Not to transmit any payment information E.G.
> credit card, debit card, checking account, wire transfer information,
> passwords, or sensitive and personal information E.G. Driver's license,
> DOB, social security, or any other information the user wishes to remain
> confidential; To transmit only non-confidential information such as plans,
> pictures and drawings and to assume all risk and liability for and
> indemnify Lowe's from any claims, losses or damages that may arise from the
> transmittal of documents or including non-confidential information in the
> body of an email transmittal. Thank you. *
> NOTICE: All information in and attached to the e-mails below may be
> proprietary, confidential, privileged and otherwise protected from improper
> or erroneous disclosure. If you are not the sender's intended recipient,
> you are not authorized to intercept, read, print, retain, copy, forward, or
> disseminate this message. If you have erroneously received this
> communication, please notify the sender immediately by phone (704-758-1000)
> or by e-mail and destroy all copies of this message electronic, paper, or
> otherwise.
>
> *By transmitting documents via this email: Users, Customers, Suppliers and
> Vendors collectively acknowledge and agree the transmittal of information
> via email is voluntary, is offered as a convenience, and is not a secured
> method of communication; Not to transmit any payment information E.G.
> credit card, debit card, checking account, wire transfer information,
> passwords, or sensitive and personal information E.G. Driver's license,
> DOB, social security, or any other information the user wishes to remain
> confidential; To transmit only non-confidential information such as plans,
> pictures and drawings and to assume all risk and liability for and
> indemnify Lowe's from any claims, losses or damages that may arise from the
> transmittal of documents or including non-confidential information in the
> body of an email transmittal. Thank you. *
>

Re: Reading Hive Tables into PCollection

Posted by Micah Whitacre <mk...@gmail.com>.
Rough guess would be using the client[1] you can get the Table and from
there get the StorageDescriptor[2].

Something like:
Path path = new Path(client.getTable(namespace,
name).getSd().getLocation());

[1] -
https://hive.apache.org/javadocs/r0.13.1/api/metastore/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.html#getTable(java.lang.String,
java.lang.String)
[2] -
https://hive.apache.org/javadocs/r0.12.0/api/org/apache/hadoop/hive/metastore/api/StorageDescriptor.html

On Fri, Jan 29, 2016 at 12:19 PM, Josh Wills <jo...@gmail.com> wrote:

> I am sure there is a way to do it using the HS2 thrift APIs, but I've
> never done it myself.
>
> On Fri, Jan 29, 2016 at 10:16 AM, Robinson, Landon - Landon <
> landon.t.robinson@lowes.com> wrote:
>
>> On this same note, I still have a similar problem to solve.
>> I can point Crunch at an HDFS location and it will ingest/read the Orc
>> file just fine.
>>
>> But is there a way (maybe levering Hcat/Hive apis) to get the file
>> locations dynamically/from Hive? Can I ask Hcat/Hive about a table and its
>> partitions, and it tell me the file location on HDFS (which I can then pass
>> to Crunch to consume the file into the pipeline)?
>>
>> ---------------------------------------------------------------------------
>> Landon Robinson
>> Big Data & Hadoop Engineer
>> IT Business Intelligence, Lowe’s Companies Inc.
>>
>> ---------------------------------------------------------------------------
>>
>> From: <Robinson>, LCI <la...@lowes.com>
>> Date: Friday, January 29, 2016 at 10:41 AM
>> To: LCI <la...@lowes.com>, Apache Crunch Mailing List <
>> user@crunch.apache.org>, David Ortiz <dp...@gmail.com>
>>
>> Subject: Re: Reading Hive Tables into PCollection
>>
>> *Solved:*
>>
>> Turns out you can use this:
>>
>> 	private HiveChar acl_idc;
>>
>> That comes from this package: org.apache.hadoop.hive.common.type.HiveChar;
>>
>> Sorry for all the emails, but hope the findings help someone else!
>>
>>
>> ---------------------------------------------------------------------------
>> Landon Robinson
>> Big Data & Hadoop Engineer
>> IT Business Intelligence, Lowe’s Companies Inc.
>>
>> ---------------------------------------------------------------------------
>>
>> From: <Robinson>, LCI <la...@lowes.com>
>> Date: Friday, January 29, 2016 at 10:36 AM
>> To: Apache Crunch Mailing List <us...@crunch.apache.org>, LCI <
>> landon.t.robinson@lowes.com>, David Ortiz <dp...@gmail.com>
>> Subject: Re: Reading Hive Tables into PCollection
>>
>> Additionally, we tried allowing those characters to be strings, but get
>> the below error. The real issue is getting the Orc ‘char’ to cast to
>> something we can use in the Orc structure.
>>
>> Exception in thread "main" org.apache.crunch.CrunchRuntimeException:
>> Error while reading local file: file:/tmp/crunch-test/000000_0
>> at
>> org.apache.crunch.io.orc.OrcFileReaderFactory$1.next(OrcFileReaderFactory.java:110)
>> at
>> org.apache.crunch.io.CompositePathIterable$2.next(CompositePathIterable.java:99)
>> at com.google.common.collect.Iterators$5.next(Iterators.java:607)
>> at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:266)
>> at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:223)
>> at
>> org.apache.crunch.impl.mem.collect.MemCollection.<init>(MemCollection.java:79)
>> at org.apache.crunch.impl.mem.MemPipeline.read(MemPipeline.java:165)
>> at org.apache.crunch.impl.mem.MemPipeline.read(MemPipeline.java:156)
>> at
>> com.lowes.bigdata.closerate.verint.DataQualityDriverTest.run(DataQualityDriverTest.java:57)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> at
>> com.lowes.bigdata.closerate.verint.DataQualityDriverTest.main(DataQualityDriverTest.java:36)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
>> *Caused by: java.lang.ClassCastException:
>> org.apache.hadoop.hive.serde2.io.HiveCharWritable cannot be cast to
>> org.apache.hadoop.io.Text*
>> at
>> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:46)
>> at
>> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:26)
>> at org.apache.crunch.types.orc.OrcUtils.convert(OrcUtils.java:169)
>> at org.apache.crunch.types.orc.OrcUtils.convert(OrcUtils.java:222)
>> at org.apache.crunch.types.orc.Orcs$ReflectInFn.map(Orcs.java:190)
>> at org.apache.crunch.types.orc.Orcs$ReflectInFn.map(Orcs.java:168)
>> at org.apache.crunch.fn.CompositeMapFn.map(CompositeMapFn.java:63)
>> at
>> org.apache.crunch.io.orc.OrcFileReaderFactory$1.next(OrcFileReaderFactory.java:108)
>> ... 15 more
>>
>> *Verint1978Record*
>>
>> public class Verint1978Record {
>>
>>    private String lct_nbr;
>>    private String vid_caa_id;
>>    private Integer hrs_nbr;
>>    private Integer mte_nbr;
>>       private String acl_idc;
>>    private Integer sec_dur;
>>    private Integer sec_to_pcs;
>>    private Integer sec_pcd;
>>       private String use_for_rpr_idc;
>>    private Integer grp_cnt;
>>    private Integer sng_cnt;
>>    private String upd_dt;
>>    private String upd_id;
>>    private String cal_dt;
>>
>> }
>>
>>
>>
>>
>> ---------------------------------------------------------------------------
>> Landon Robinson
>> Big Data & Hadoop Engineer
>> IT Business Intelligence, Lowe’s Companies Inc.
>>
>> ---------------------------------------------------------------------------
>>
>> From: <Robinson>, LCI <la...@lowes.com>
>> Reply-To: Apache Crunch Mailing List <us...@crunch.apache.org>
>> Date: Friday, January 29, 2016 at 10:33 AM
>> To: David Ortiz <dp...@gmail.com>, Apache Crunch Mailing List <
>> user@crunch.apache.org>
>> Subject: Re: Reading Hive Tables into PCollection
>>
>> Right, we’ve been trying this with little luck — largely because I get
>> the error:
>>
>> Caused by: java.lang.ClassCastException:
>> org.apache.hadoop.hive.serde2.io.HiveCharWritable cannot be cast to
>> org.apache.hadoop.hive.ql.io.orc.OrcStruct
>>
>> *Code:*
>>
>> OrcFileSource<Verint1978Record> source = new OrcFileSource<Verint1978Record>(new Path(inputPath), Orcs.reflects(Verint1978Record.class));
>> PCollection<Verint1978Record> persons = pipeline.read(source);
>>
>> *Verint1978Record*
>>
>> public class Verint1978Record {
>>
>>    private String lct_nbr;
>>    private String vid_caa_id;
>>    private Integer hrs_nbr;
>>    private Integer mte_nbr;
>>    private Character acl_idc;
>>    private Integer sec_dur;
>>    private Integer sec_to_pcs;
>>    private Integer sec_pcd;
>>    private Character use_for_rpr_idc;
>>    private Integer grp_cnt;
>>    private Integer sng_cnt;
>>    private String upd_dt;
>>    private String upd_id;
>>    private String cal_dt;
>>
>> }
>>
>>
>> ---------------------------------------------------------------------------
>> Landon Robinson
>> Big Data & Hadoop Engineer
>> IT Business Intelligence, Lowe’s Companies Inc.
>>
>> ---------------------------------------------------------------------------
>>
>> From: David Ortiz <dp...@gmail.com>
>> Date: Friday, January 29, 2016 at 10:19 AM
>> To: LCI <la...@lowes.com>, Apache Crunch Mailing List <
>> user@crunch.apache.org>
>> Subject: Re: Reading Hive Tables into PCollection
>>
>> http://hortonworks.com/blog/using-orcfile-cascading-apache-crunch/
>>
>> Here's the java excerpt from that article to read into Avro class (I'm
>> assuming).
>>
>> [code language=”Java”]
>> // Read an ORCFile using reflection-based serialization (slowest):
>> OrcFileSource<Person> source = new OrcFileSource<Person>(new
>> Path(inputPath), \
>> Orcs.reflection(Person.class));
>> PCollection<Person> persons = pipeline.read(source);
>>
>> On Fri, Jan 29, 2016 at 10:17 AM Robinson, Landon - Landon <
>> landon.t.robinson@lowes.com> wrote:
>>
>>> Orc format.
>>>
>>> ---------------------------------------------------------------------------
>>> Landon Robinson
>>> Big Data & Hadoop Engineer
>>> IT Business Intelligence, Lowe’s Companies Inc.
>>>
>>> ---------------------------------------------------------------------------
>>>
>>> From: David Ortiz <dp...@gmail.com>
>>> Reply-To: Apache Crunch Mailing List <us...@crunch.apache.org>
>>> Date: Thursday, January 28, 2016 at 1:22 PM
>>> To: Apache Crunch Mailing List <us...@crunch.apache.org>
>>> Subject: Re: Reading Hive Tables into PCollection
>>>
>>> What format are they stored as?
>>>
>>> On Thu, Jan 28, 2016 at 1:20 PM Robinson, Landon - Landon <
>>> landon.t.robinson@lowes.com> wrote:
>>>
>>>> Crunch Gurus,
>>>>
>>>> What is the Crunch-convenient or recommended way to read the contents
>>>> of a Hive table into a Pcollection?
>>>> Thanks!
>>>> Best,
>>>> Landon
>>>>
>>>> ---------------------------------------------------------------------------
>>>> Landon Robinson
>>>> Big Data & Hadoop Engineer
>>>>
>>>> ---------------------------------------------------------------------------
>>>> NOTICE: All information in and attached to the e-mails below may be
>>>> proprietary, confidential, privileged and otherwise protected from improper
>>>> or erroneous disclosure. If you are not the sender's intended recipient,
>>>> you are not authorized to intercept, read, print, retain, copy, forward, or
>>>> disseminate this message. If you have erroneously received this
>>>> communication, please notify the sender immediately by phone
>>>> (704-758-1000) or by e-mail and destroy all copies of this message
>>>> electronic, paper, or otherwise.
>>>>
>>>> *By transmitting documents via this email: Users, Customers, Suppliers
>>>> and Vendors collectively acknowledge and agree the transmittal of
>>>> information via email is voluntary, is offered as a convenience, and is not
>>>> a secured method of communication; Not to transmit any payment information
>>>> E.G. credit card, debit card, checking account, wire transfer information,
>>>> passwords, or sensitive and personal information E.G. Driver's license,
>>>> DOB, social security, or any other information the user wishes to remain
>>>> confidential; To transmit only non-confidential information such as plans,
>>>> pictures and drawings and to assume all risk and liability for and
>>>> indemnify Lowe's from any claims, losses or damages that may arise from the
>>>> transmittal of documents or including non-confidential information in the
>>>> body of an email transmittal. Thank you. *
>>>>
>>> NOTICE: All information in and attached to the e-mails below may be
>>> proprietary, confidential, privileged and otherwise protected from improper
>>> or erroneous disclosure. If you are not the sender's intended recipient,
>>> you are not authorized to intercept, read, print, retain, copy, forward, or
>>> disseminate this message. If you have erroneously received this
>>> communication, please notify the sender immediately by phone
>>> (704-758-1000) or by e-mail and destroy all copies of this message
>>> electronic, paper, or otherwise.
>>>
>>> *By transmitting documents via this email: Users, Customers, Suppliers
>>> and Vendors collectively acknowledge and agree the transmittal of
>>> information via email is voluntary, is offered as a convenience, and is not
>>> a secured method of communication; Not to transmit any payment information
>>> E.G. credit card, debit card, checking account, wire transfer information,
>>> passwords, or sensitive and personal information E.G. Driver's license,
>>> DOB, social security, or any other information the user wishes to remain
>>> confidential; To transmit only non-confidential information such as plans,
>>> pictures and drawings and to assume all risk and liability for and
>>> indemnify Lowe's from any claims, losses or damages that may arise from the
>>> transmittal of documents or including non-confidential information in the
>>> body of an email transmittal. Thank you. *
>>>
>> NOTICE: All information in and attached to the e-mails below may be
>> proprietary, confidential, privileged and otherwise protected from improper
>> or erroneous disclosure. If you are not the sender's intended recipient,
>> you are not authorized to intercept, read, print, retain, copy, forward, or
>> disseminate this message. If you have erroneously received this
>> communication, please notify the sender immediately by phone
>> (704-758-1000) or by e-mail and destroy all copies of this message
>> electronic, paper, or otherwise.
>>
>> *By transmitting documents via this email: Users, Customers, Suppliers
>> and Vendors collectively acknowledge and agree the transmittal of
>> information via email is voluntary, is offered as a convenience, and is not
>> a secured method of communication; Not to transmit any payment information
>> E.G. credit card, debit card, checking account, wire transfer information,
>> passwords, or sensitive and personal information E.G. Driver's license,
>> DOB, social security, or any other information the user wishes to remain
>> confidential; To transmit only non-confidential information such as plans,
>> pictures and drawings and to assume all risk and liability for and
>> indemnify Lowe's from any claims, losses or damages that may arise from the
>> transmittal of documents or including non-confidential information in the
>> body of an email transmittal. Thank you. *
>> NOTICE: All information in and attached to the e-mails below may be
>> proprietary, confidential, privileged and otherwise protected from improper
>> or erroneous disclosure. If you are not the sender's intended recipient,
>> you are not authorized to intercept, read, print, retain, copy, forward, or
>> disseminate this message. If you have erroneously received this
>> communication, please notify the sender immediately by phone
>> (704-758-1000) or by e-mail and destroy all copies of this message
>> electronic, paper, or otherwise.
>>
>> *By transmitting documents via this email: Users, Customers, Suppliers
>> and Vendors collectively acknowledge and agree the transmittal of
>> information via email is voluntary, is offered as a convenience, and is not
>> a secured method of communication; Not to transmit any payment information
>> E.G. credit card, debit card, checking account, wire transfer information,
>> passwords, or sensitive and personal information E.G. Driver's license,
>> DOB, social security, or any other information the user wishes to remain
>> confidential; To transmit only non-confidential information such as plans,
>> pictures and drawings and to assume all risk and liability for and
>> indemnify Lowe's from any claims, losses or damages that may arise from the
>> transmittal of documents or including non-confidential information in the
>> body of an email transmittal. Thank you. *
>>
>
>

Re: Reading Hive Tables into PCollection

Posted by Josh Wills <jo...@gmail.com>.
I am sure there is a way to do it using the HS2 thrift APIs, but I've never
done it myself.

On Fri, Jan 29, 2016 at 10:16 AM, Robinson, Landon - Landon <
landon.t.robinson@lowes.com> wrote:

> On this same note, I still have a similar problem to solve.
> I can point Crunch at an HDFS location and it will ingest/read the Orc
> file just fine.
>
> But is there a way (maybe levering Hcat/Hive apis) to get the file
> locations dynamically/from Hive? Can I ask Hcat/Hive about a table and its
> partitions, and it tell me the file location on HDFS (which I can then pass
> to Crunch to consume the file into the pipeline)?
> ---------------------------------------------------------------------------
> Landon Robinson
> Big Data & Hadoop Engineer
> IT Business Intelligence, Lowe’s Companies Inc.
> ---------------------------------------------------------------------------
>
> From: <Robinson>, LCI <la...@lowes.com>
> Date: Friday, January 29, 2016 at 10:41 AM
> To: LCI <la...@lowes.com>, Apache Crunch Mailing List <
> user@crunch.apache.org>, David Ortiz <dp...@gmail.com>
>
> Subject: Re: Reading Hive Tables into PCollection
>
> *Solved:*
>
> Turns out you can use this:
>
> 	private HiveChar acl_idc;
>
> That comes from this package: org.apache.hadoop.hive.common.type.HiveChar;
>
> Sorry for all the emails, but hope the findings help someone else!
>
> ---------------------------------------------------------------------------
> Landon Robinson
> Big Data & Hadoop Engineer
> IT Business Intelligence, Lowe’s Companies Inc.
> ---------------------------------------------------------------------------
>
> From: <Robinson>, LCI <la...@lowes.com>
> Date: Friday, January 29, 2016 at 10:36 AM
> To: Apache Crunch Mailing List <us...@crunch.apache.org>, LCI <
> landon.t.robinson@lowes.com>, David Ortiz <dp...@gmail.com>
> Subject: Re: Reading Hive Tables into PCollection
>
> Additionally, we tried allowing those characters to be strings, but get
> the below error. The real issue is getting the Orc ‘char’ to cast to
> something we can use in the Orc structure.
>
> Exception in thread "main" org.apache.crunch.CrunchRuntimeException: Error
> while reading local file: file:/tmp/crunch-test/000000_0
> at
> org.apache.crunch.io.orc.OrcFileReaderFactory$1.next(OrcFileReaderFactory.java:110)
> at
> org.apache.crunch.io.CompositePathIterable$2.next(CompositePathIterable.java:99)
> at com.google.common.collect.Iterators$5.next(Iterators.java:607)
> at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:266)
> at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:223)
> at
> org.apache.crunch.impl.mem.collect.MemCollection.<init>(MemCollection.java:79)
> at org.apache.crunch.impl.mem.MemPipeline.read(MemPipeline.java:165)
> at org.apache.crunch.impl.mem.MemPipeline.read(MemPipeline.java:156)
> at
> com.lowes.bigdata.closerate.verint.DataQualityDriverTest.run(DataQualityDriverTest.java:57)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at
> com.lowes.bigdata.closerate.verint.DataQualityDriverTest.main(DataQualityDriverTest.java:36)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
> *Caused by: java.lang.ClassCastException:
> org.apache.hadoop.hive.serde2.io.HiveCharWritable cannot be cast to
> org.apache.hadoop.io.Text*
> at
> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:46)
> at
> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:26)
> at org.apache.crunch.types.orc.OrcUtils.convert(OrcUtils.java:169)
> at org.apache.crunch.types.orc.OrcUtils.convert(OrcUtils.java:222)
> at org.apache.crunch.types.orc.Orcs$ReflectInFn.map(Orcs.java:190)
> at org.apache.crunch.types.orc.Orcs$ReflectInFn.map(Orcs.java:168)
> at org.apache.crunch.fn.CompositeMapFn.map(CompositeMapFn.java:63)
> at
> org.apache.crunch.io.orc.OrcFileReaderFactory$1.next(OrcFileReaderFactory.java:108)
> ... 15 more
>
> *Verint1978Record*
>
> public class Verint1978Record {
>
>    private String lct_nbr;
>    private String vid_caa_id;
>    private Integer hrs_nbr;
>    private Integer mte_nbr;
>       private String acl_idc;
>    private Integer sec_dur;
>    private Integer sec_to_pcs;
>    private Integer sec_pcd;
>       private String use_for_rpr_idc;
>    private Integer grp_cnt;
>    private Integer sng_cnt;
>    private String upd_dt;
>    private String upd_id;
>    private String cal_dt;
>
> }
>
>
>
> ---------------------------------------------------------------------------
> Landon Robinson
> Big Data & Hadoop Engineer
> IT Business Intelligence, Lowe’s Companies Inc.
> ---------------------------------------------------------------------------
>
> From: <Robinson>, LCI <la...@lowes.com>
> Reply-To: Apache Crunch Mailing List <us...@crunch.apache.org>
> Date: Friday, January 29, 2016 at 10:33 AM
> To: David Ortiz <dp...@gmail.com>, Apache Crunch Mailing List <
> user@crunch.apache.org>
> Subject: Re: Reading Hive Tables into PCollection
>
> Right, we’ve been trying this with little luck — largely because I get the
> error:
>
> Caused by: java.lang.ClassCastException:
> org.apache.hadoop.hive.serde2.io.HiveCharWritable cannot be cast to
> org.apache.hadoop.hive.ql.io.orc.OrcStruct
>
> *Code:*
>
> OrcFileSource<Verint1978Record> source = new OrcFileSource<Verint1978Record>(new Path(inputPath), Orcs.reflects(Verint1978Record.class));
> PCollection<Verint1978Record> persons = pipeline.read(source);
>
> *Verint1978Record*
>
> public class Verint1978Record {
>
>    private String lct_nbr;
>    private String vid_caa_id;
>    private Integer hrs_nbr;
>    private Integer mte_nbr;
>    private Character acl_idc;
>    private Integer sec_dur;
>    private Integer sec_to_pcs;
>    private Integer sec_pcd;
>    private Character use_for_rpr_idc;
>    private Integer grp_cnt;
>    private Integer sng_cnt;
>    private String upd_dt;
>    private String upd_id;
>    private String cal_dt;
>
> }
>
> ---------------------------------------------------------------------------
> Landon Robinson
> Big Data & Hadoop Engineer
> IT Business Intelligence, Lowe’s Companies Inc.
> ---------------------------------------------------------------------------
>
> From: David Ortiz <dp...@gmail.com>
> Date: Friday, January 29, 2016 at 10:19 AM
> To: LCI <la...@lowes.com>, Apache Crunch Mailing List <
> user@crunch.apache.org>
> Subject: Re: Reading Hive Tables into PCollection
>
> http://hortonworks.com/blog/using-orcfile-cascading-apache-crunch/
>
> Here's the java excerpt from that article to read into Avro class (I'm
> assuming).
>
> [code language=”Java”]
> // Read an ORCFile using reflection-based serialization (slowest):
> OrcFileSource<Person> source = new OrcFileSource<Person>(new
> Path(inputPath), \
> Orcs.reflection(Person.class));
> PCollection<Person> persons = pipeline.read(source);
>
> On Fri, Jan 29, 2016 at 10:17 AM Robinson, Landon - Landon <
> landon.t.robinson@lowes.com> wrote:
>
>> Orc format.
>>
>> ---------------------------------------------------------------------------
>> Landon Robinson
>> Big Data & Hadoop Engineer
>> IT Business Intelligence, Lowe’s Companies Inc.
>>
>> ---------------------------------------------------------------------------
>>
>> From: David Ortiz <dp...@gmail.com>
>> Reply-To: Apache Crunch Mailing List <us...@crunch.apache.org>
>> Date: Thursday, January 28, 2016 at 1:22 PM
>> To: Apache Crunch Mailing List <us...@crunch.apache.org>
>> Subject: Re: Reading Hive Tables into PCollection
>>
>> What format are they stored as?
>>
>> On Thu, Jan 28, 2016 at 1:20 PM Robinson, Landon - Landon <
>> landon.t.robinson@lowes.com> wrote:
>>
>>> Crunch Gurus,
>>>
>>> What is the Crunch-convenient or recommended way to read the contents of
>>> a Hive table into a Pcollection?
>>> Thanks!
>>> Best,
>>> Landon
>>>
>>> ---------------------------------------------------------------------------
>>> Landon Robinson
>>> Big Data & Hadoop Engineer
>>>
>>> ---------------------------------------------------------------------------
>>> NOTICE: All information in and attached to the e-mails below may be
>>> proprietary, confidential, privileged and otherwise protected from improper
>>> or erroneous disclosure. If you are not the sender's intended recipient,
>>> you are not authorized to intercept, read, print, retain, copy, forward, or
>>> disseminate this message. If you have erroneously received this
>>> communication, please notify the sender immediately by phone
>>> (704-758-1000) or by e-mail and destroy all copies of this message
>>> electronic, paper, or otherwise.
>>>
>>> *By transmitting documents via this email: Users, Customers, Suppliers
>>> and Vendors collectively acknowledge and agree the transmittal of
>>> information via email is voluntary, is offered as a convenience, and is not
>>> a secured method of communication; Not to transmit any payment information
>>> E.G. credit card, debit card, checking account, wire transfer information,
>>> passwords, or sensitive and personal information E.G. Driver's license,
>>> DOB, social security, or any other information the user wishes to remain
>>> confidential; To transmit only non-confidential information such as plans,
>>> pictures and drawings and to assume all risk and liability for and
>>> indemnify Lowe's from any claims, losses or damages that may arise from the
>>> transmittal of documents or including non-confidential information in the
>>> body of an email transmittal. Thank you. *
>>>
>> NOTICE: All information in and attached to the e-mails below may be
>> proprietary, confidential, privileged and otherwise protected from improper
>> or erroneous disclosure. If you are not the sender's intended recipient,
>> you are not authorized to intercept, read, print, retain, copy, forward, or
>> disseminate this message. If you have erroneously received this
>> communication, please notify the sender immediately by phone
>> (704-758-1000) or by e-mail and destroy all copies of this message
>> electronic, paper, or otherwise.
>>
>> *By transmitting documents via this email: Users, Customers, Suppliers
>> and Vendors collectively acknowledge and agree the transmittal of
>> information via email is voluntary, is offered as a convenience, and is not
>> a secured method of communication; Not to transmit any payment information
>> E.G. credit card, debit card, checking account, wire transfer information,
>> passwords, or sensitive and personal information E.G. Driver's license,
>> DOB, social security, or any other information the user wishes to remain
>> confidential; To transmit only non-confidential information such as plans,
>> pictures and drawings and to assume all risk and liability for and
>> indemnify Lowe's from any claims, losses or damages that may arise from the
>> transmittal of documents or including non-confidential information in the
>> body of an email transmittal. Thank you. *
>>
> NOTICE: All information in and attached to the e-mails below may be
> proprietary, confidential, privileged and otherwise protected from improper
> or erroneous disclosure. If you are not the sender's intended recipient,
> you are not authorized to intercept, read, print, retain, copy, forward, or
> disseminate this message. If you have erroneously received this
> communication, please notify the sender immediately by phone (704-758-1000)
> or by e-mail and destroy all copies of this message electronic, paper, or
> otherwise.
>
> *By transmitting documents via this email: Users, Customers, Suppliers and
> Vendors collectively acknowledge and agree the transmittal of information
> via email is voluntary, is offered as a convenience, and is not a secured
> method of communication; Not to transmit any payment information E.G.
> credit card, debit card, checking account, wire transfer information,
> passwords, or sensitive and personal information E.G. Driver's license,
> DOB, social security, or any other information the user wishes to remain
> confidential; To transmit only non-confidential information such as plans,
> pictures and drawings and to assume all risk and liability for and
> indemnify Lowe's from any claims, losses or damages that may arise from the
> transmittal of documents or including non-confidential information in the
> body of an email transmittal. Thank you. *
> NOTICE: All information in and attached to the e-mails below may be
> proprietary, confidential, privileged and otherwise protected from improper
> or erroneous disclosure. If you are not the sender's intended recipient,
> you are not authorized to intercept, read, print, retain, copy, forward, or
> disseminate this message. If you have erroneously received this
> communication, please notify the sender immediately by phone (704-758-1000)
> or by e-mail and destroy all copies of this message electronic, paper, or
> otherwise.
>
> *By transmitting documents via this email: Users, Customers, Suppliers and
> Vendors collectively acknowledge and agree the transmittal of information
> via email is voluntary, is offered as a convenience, and is not a secured
> method of communication; Not to transmit any payment information E.G.
> credit card, debit card, checking account, wire transfer information,
> passwords, or sensitive and personal information E.G. Driver's license,
> DOB, social security, or any other information the user wishes to remain
> confidential; To transmit only non-confidential information such as plans,
> pictures and drawings and to assume all risk and liability for and
> indemnify Lowe's from any claims, losses or damages that may arise from the
> transmittal of documents or including non-confidential information in the
> body of an email transmittal. Thank you. *
>

Re: Reading Hive Tables into PCollection

Posted by "Robinson, Landon - Landon" <la...@lowes.com>.
On this same note, I still have a similar problem to solve.
I can point Crunch at an HDFS location and it will ingest/read the Orc file just fine.

But is there a way (maybe levering Hcat/Hive apis) to get the file locations dynamically/from Hive? Can I ask Hcat/Hive about a table and its partitions, and it tell me the file location on HDFS (which I can then pass to Crunch to consume the file into the pipeline)?
---------------------------------------------------------------------------
[cid:2F7C5E49-FB4C-4277-9D7D-8F4E275B7CEB]
Landon Robinson
Big Data & Hadoop Engineer
IT Business Intelligence, Lowe’s Companies Inc.
---------------------------------------------------------------------------

From: <Robinson>, LCI <la...@lowes.com>>
Date: Friday, January 29, 2016 at 10:41 AM
To: LCI <la...@lowes.com>>, Apache Crunch Mailing List <us...@crunch.apache.org>>, David Ortiz <dp...@gmail.com>>
Subject: Re: Reading Hive Tables into PCollection

Solved:

Turns out you can use this:

        private HiveChar acl_idc;

That comes from this package: org.apache.hadoop.hive.common.type.HiveChar;

Sorry for all the emails, but hope the findings help someone else!

---------------------------------------------------------------------------
[cid:0E4D51C4-6A09-48E3-8F87-400FBE84579C]
Landon Robinson
Big Data & Hadoop Engineer
IT Business Intelligence, Lowe’s Companies Inc.
---------------------------------------------------------------------------

From: <Robinson>, LCI <la...@lowes.com>>
Date: Friday, January 29, 2016 at 10:36 AM
To: Apache Crunch Mailing List <us...@crunch.apache.org>>, LCI <la...@lowes.com>>, David Ortiz <dp...@gmail.com>>
Subject: Re: Reading Hive Tables into PCollection

Additionally, we tried allowing those characters to be strings, but get the below error. The real issue is getting the Orc ‘char’ to cast to something we can use in the Orc structure.

Exception in thread "main" org.apache.crunch.CrunchRuntimeException: Error while reading local file: file:/tmp/crunch-test/000000_0
at org.apache.crunch.io.orc.OrcFileReaderFactory$1.next(OrcFileReaderFactory.java:110)
at org.apache.crunch.io.CompositePathIterable$2.next(CompositePathIterable.java:99)
at com.google.common.collect.Iterators$5.next(Iterators.java:607)
at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:266)
at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:223)
at org.apache.crunch.impl.mem.collect.MemCollection.<init>(MemCollection.java:79)
at org.apache.crunch.impl.mem.MemPipeline.read(MemPipeline.java:165)
at org.apache.crunch.impl.mem.MemPipeline.read(MemPipeline.java:156)
at com.lowes.bigdata.closerate.verint.DataQualityDriverTest.run(DataQualityDriverTest.java:57)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at com.lowes.bigdata.closerate.verint.DataQualityDriverTest.main(DataQualityDriverTest.java:36)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.HiveCharWritable cannot be cast to org.apache.hadoop.io.Text
at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:46)
at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:26)
at org.apache.crunch.types.orc.OrcUtils.convert(OrcUtils.java:169)
at org.apache.crunch.types.orc.OrcUtils.convert(OrcUtils.java:222)
at org.apache.crunch.types.orc.Orcs$ReflectInFn.map(Orcs.java:190)
at org.apache.crunch.types.orc.Orcs$ReflectInFn.map(Orcs.java:168)
at org.apache.crunch.fn.CompositeMapFn.map(CompositeMapFn.java:63)
at org.apache.crunch.io.orc.OrcFileReaderFactory$1.next(OrcFileReaderFactory.java:108)
... 15 more


Verint1978Record

public class Verint1978Record {

   private String lct_nbr;
   private String vid_caa_id;
   private Integer hrs_nbr;
   private Integer mte_nbr;
      private String acl_idc;
   private Integer sec_dur;
   private Integer sec_to_pcs;
   private Integer sec_pcd;
      private String use_for_rpr_idc;
   private Integer grp_cnt;
   private Integer sng_cnt;
   private String upd_dt;
   private String upd_id;
   private String cal_dt;

}


---------------------------------------------------------------------------
[cid:E38FEBBD-1C12-48B7-B1A5-465C75E68DB8]
Landon Robinson
Big Data & Hadoop Engineer
IT Business Intelligence, Lowe’s Companies Inc.
---------------------------------------------------------------------------

From: <Robinson>, LCI <la...@lowes.com>>
Reply-To: Apache Crunch Mailing List <us...@crunch.apache.org>>
Date: Friday, January 29, 2016 at 10:33 AM
To: David Ortiz <dp...@gmail.com>>, Apache Crunch Mailing List <us...@crunch.apache.org>>
Subject: Re: Reading Hive Tables into PCollection

Right, we’ve been trying this with little luck — largely because I get the error:

Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.HiveCharWritable cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcStruct

Code:

OrcFileSource<Verint1978Record> source = new OrcFileSource<Verint1978Record>(new Path(inputPath), Orcs.reflects(Verint1978Record.class));
PCollection<Verint1978Record> persons = pipeline.read(source);

Verint1978Record

public class Verint1978Record {

   private String lct_nbr;
   private String vid_caa_id;
   private Integer hrs_nbr;
   private Integer mte_nbr;
   private Character acl_idc;
   private Integer sec_dur;
   private Integer sec_to_pcs;
   private Integer sec_pcd;
   private Character use_for_rpr_idc;
   private Integer grp_cnt;
   private Integer sng_cnt;
   private String upd_dt;
   private String upd_id;
   private String cal_dt;

}

---------------------------------------------------------------------------
[cid:81A61E19-6323-41F7-A88E-590E34601268]
Landon Robinson
Big Data & Hadoop Engineer
IT Business Intelligence, Lowe’s Companies Inc.
---------------------------------------------------------------------------

From: David Ortiz <dp...@gmail.com>>
Date: Friday, January 29, 2016 at 10:19 AM
To: LCI <la...@lowes.com>>, Apache Crunch Mailing List <us...@crunch.apache.org>>
Subject: Re: Reading Hive Tables into PCollection

http://hortonworks.com/blog/using-orcfile-cascading-apache-crunch/

Here's the java excerpt from that article to read into Avro class (I'm assuming).

[code language=”Java”]
// Read an ORCFile using reflection-based serialization (slowest):
OrcFileSource<Person> source = new OrcFileSource<Person>(new Path(inputPath), \
Orcs.reflection(Person.class));
PCollection<Person> persons = pipeline.read(source);

On Fri, Jan 29, 2016 at 10:17 AM Robinson, Landon - Landon <la...@lowes.com>> wrote:
Orc format.
---------------------------------------------------------------------------
[cid:4743D013-31C4-407E-A06B-31ABF1E6414D]
Landon Robinson
Big Data & Hadoop Engineer
IT Business Intelligence, Lowe’s Companies Inc.
---------------------------------------------------------------------------

From: David Ortiz <dp...@gmail.com>>
Reply-To: Apache Crunch Mailing List <us...@crunch.apache.org>>
Date: Thursday, January 28, 2016 at 1:22 PM
To: Apache Crunch Mailing List <us...@crunch.apache.org>>
Subject: Re: Reading Hive Tables into PCollection

What format are they stored as?

On Thu, Jan 28, 2016 at 1:20 PM Robinson, Landon - Landon <la...@lowes.com>> wrote:
Crunch Gurus,

What is the Crunch-convenient or recommended way to read the contents of a Hive table into a Pcollection?
Thanks!
Best,
Landon
---------------------------------------------------------------------------
Landon Robinson
Big Data & Hadoop Engineer
---------------------------------------------------------------------------
NOTICE: All information in and attached to the e-mails below may be proprietary, confidential, privileged and otherwise protected from improper or erroneous disclosure. If you are not the sender's intended recipient, you are not authorized to intercept, read, print, retain, copy, forward, or disseminate this message. If you have erroneously received this communication, please notify the sender immediately by phone (704-758-1000) or by e-mail and destroy all copies of this message electronic, paper, or otherwise.

By transmitting documents via this email: Users, Customers, Suppliers and Vendors collectively acknowledge and agree the transmittal of information via email is voluntary, is offered as a convenience, and is not a secured method of communication; Not to transmit any payment information E.G. credit card, debit card, checking account, wire transfer information, passwords, or sensitive and personal information E.G. Driver's license, DOB, social security, or any other information the user wishes to remain confidential; To transmit only non-confidential information such as plans, pictures and drawings and to assume all risk and liability for and indemnify Lowe's from any claims, losses or damages that may arise from the transmittal of documents or including non-confidential information in the body of an email transmittal. Thank you.
NOTICE: All information in and attached to the e-mails below may be proprietary, confidential, privileged and otherwise protected from improper or erroneous disclosure. If you are not the sender's intended recipient, you are not authorized to intercept, read, print, retain, copy, forward, or disseminate this message. If you have erroneously received this communication, please notify the sender immediately by phone (704-758-1000) or by e-mail and destroy all copies of this message electronic, paper, or otherwise.

By transmitting documents via this email: Users, Customers, Suppliers and Vendors collectively acknowledge and agree the transmittal of information via email is voluntary, is offered as a convenience, and is not a secured method of communication; Not to transmit any payment information E.G. credit card, debit card, checking account, wire transfer information, passwords, or sensitive and personal information E.G. Driver's license, DOB, social security, or any other information the user wishes to remain confidential; To transmit only non-confidential information such as plans, pictures and drawings and to assume all risk and liability for and indemnify Lowe's from any claims, losses or damages that may arise from the transmittal of documents or including non-confidential information in the body of an email transmittal. Thank you.
NOTICE: All information in and attached to the e-mails below may be proprietary, confidential, privileged and otherwise protected from improper or erroneous disclosure. If you are not the sender's intended recipient, you are not authorized to intercept, read, print, retain, copy, forward, or disseminate this message. If you have erroneously received this communication, please notify the sender immediately by phone (704-758-1000) or by e-mail and destroy all copies of this message electronic, paper, or otherwise.

By transmitting documents via this email: Users, Customers, Suppliers and Vendors collectively acknowledge and agree the transmittal of information via email is voluntary, is offered as a convenience, and is not a secured method of communication; Not to transmit any payment information E.G. credit card, debit card, checking account, wire transfer information, passwords, or sensitive and personal information E.G. Driver's license, DOB, social security, or any other information the user wishes to remain confidential; To transmit only non-confidential information such as plans, pictures and drawings and to assume all risk and liability for and indemnify Lowe's from any claims, losses or damages that may arise from the transmittal of documents or including non-confidential information in the body of an email transmittal. Thank you.

NOTICE: All information in and attached to the e-mails below may be proprietary, confidential, privileged and otherwise protected from improper or erroneous disclosure. If you are not the sender's intended recipient, you are not authorized to intercept, read, print, retain, copy, forward, or disseminate this message. If you have erroneously received this communication, please notify the sender immediately by phone (704-758-1000) or by e-mail and destroy all copies of this message electronic, paper, or otherwise. 

By transmitting documents via this email: Users, Customers, Suppliers and Vendors collectively acknowledge and agree the transmittal of information via email is voluntary, is offered as a convenience, and is not a secured method of communication; Not to transmit any payment information E.G. credit card, debit card, checking account, wire transfer information, passwords, or sensitive and personal information E.G. Driver's license, DOB, social security, or any other information the user wishes to remain confidential; To transmit only non-confidential information such as plans, pictures and drawings and to assume all risk and liability for and indemnify Lowe's from any claims, losses or damages that may arise from the transmittal of documents or including non-confidential information in the body of an email transmittal. Thank you.

Re: Reading Hive Tables into PCollection

Posted by "Robinson, Landon - Landon" <la...@lowes.com>.
Solved:

Turns out you can use this:

        private HiveChar acl_idc;

That comes from this package: org.apache.hadoop.hive.common.type.HiveChar;

Sorry for all the emails, but hope the findings help someone else!

---------------------------------------------------------------------------
[cid:0E4D51C4-6A09-48E3-8F87-400FBE84579C]
Landon Robinson
Big Data & Hadoop Engineer
IT Business Intelligence, Lowe’s Companies Inc.
---------------------------------------------------------------------------

From: <Robinson>, LCI <la...@lowes.com>>
Date: Friday, January 29, 2016 at 10:36 AM
To: Apache Crunch Mailing List <us...@crunch.apache.org>>, LCI <la...@lowes.com>>, David Ortiz <dp...@gmail.com>>
Subject: Re: Reading Hive Tables into PCollection

Additionally, we tried allowing those characters to be strings, but get the below error. The real issue is getting the Orc ‘char’ to cast to something we can use in the Orc structure.

Exception in thread "main" org.apache.crunch.CrunchRuntimeException: Error while reading local file: file:/tmp/crunch-test/000000_0
at org.apache.crunch.io.orc.OrcFileReaderFactory$1.next(OrcFileReaderFactory.java:110)
at org.apache.crunch.io.CompositePathIterable$2.next(CompositePathIterable.java:99)
at com.google.common.collect.Iterators$5.next(Iterators.java:607)
at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:266)
at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:223)
at org.apache.crunch.impl.mem.collect.MemCollection.<init>(MemCollection.java:79)
at org.apache.crunch.impl.mem.MemPipeline.read(MemPipeline.java:165)
at org.apache.crunch.impl.mem.MemPipeline.read(MemPipeline.java:156)
at com.lowes.bigdata.closerate.verint.DataQualityDriverTest.run(DataQualityDriverTest.java:57)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at com.lowes.bigdata.closerate.verint.DataQualityDriverTest.main(DataQualityDriverTest.java:36)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.HiveCharWritable cannot be cast to org.apache.hadoop.io.Text
at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:46)
at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:26)
at org.apache.crunch.types.orc.OrcUtils.convert(OrcUtils.java:169)
at org.apache.crunch.types.orc.OrcUtils.convert(OrcUtils.java:222)
at org.apache.crunch.types.orc.Orcs$ReflectInFn.map(Orcs.java:190)
at org.apache.crunch.types.orc.Orcs$ReflectInFn.map(Orcs.java:168)
at org.apache.crunch.fn.CompositeMapFn.map(CompositeMapFn.java:63)
at org.apache.crunch.io.orc.OrcFileReaderFactory$1.next(OrcFileReaderFactory.java:108)
... 15 more


Verint1978Record

public class Verint1978Record {

   private String lct_nbr;
   private String vid_caa_id;
   private Integer hrs_nbr;
   private Integer mte_nbr;
      private String acl_idc;
   private Integer sec_dur;
   private Integer sec_to_pcs;
   private Integer sec_pcd;
      private String use_for_rpr_idc;
   private Integer grp_cnt;
   private Integer sng_cnt;
   private String upd_dt;
   private String upd_id;
   private String cal_dt;

}


---------------------------------------------------------------------------
[cid:E38FEBBD-1C12-48B7-B1A5-465C75E68DB8]
Landon Robinson
Big Data & Hadoop Engineer
IT Business Intelligence, Lowe’s Companies Inc.
---------------------------------------------------------------------------

From: <Robinson>, LCI <la...@lowes.com>>
Reply-To: Apache Crunch Mailing List <us...@crunch.apache.org>>
Date: Friday, January 29, 2016 at 10:33 AM
To: David Ortiz <dp...@gmail.com>>, Apache Crunch Mailing List <us...@crunch.apache.org>>
Subject: Re: Reading Hive Tables into PCollection

Right, we’ve been trying this with little luck — largely because I get the error:

Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.HiveCharWritable cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcStruct

Code:

OrcFileSource<Verint1978Record> source = new OrcFileSource<Verint1978Record>(new Path(inputPath), Orcs.reflects(Verint1978Record.class));
PCollection<Verint1978Record> persons = pipeline.read(source);

Verint1978Record

public class Verint1978Record {

   private String lct_nbr;
   private String vid_caa_id;
   private Integer hrs_nbr;
   private Integer mte_nbr;
   private Character acl_idc;
   private Integer sec_dur;
   private Integer sec_to_pcs;
   private Integer sec_pcd;
   private Character use_for_rpr_idc;
   private Integer grp_cnt;
   private Integer sng_cnt;
   private String upd_dt;
   private String upd_id;
   private String cal_dt;

}

---------------------------------------------------------------------------
[cid:81A61E19-6323-41F7-A88E-590E34601268]
Landon Robinson
Big Data & Hadoop Engineer
IT Business Intelligence, Lowe’s Companies Inc.
---------------------------------------------------------------------------

From: David Ortiz <dp...@gmail.com>>
Date: Friday, January 29, 2016 at 10:19 AM
To: LCI <la...@lowes.com>>, Apache Crunch Mailing List <us...@crunch.apache.org>>
Subject: Re: Reading Hive Tables into PCollection

http://hortonworks.com/blog/using-orcfile-cascading-apache-crunch/

Here's the java excerpt from that article to read into Avro class (I'm assuming).

[code language=”Java”]
// Read an ORCFile using reflection-based serialization (slowest):
OrcFileSource<Person> source = new OrcFileSource<Person>(new Path(inputPath), \
Orcs.reflection(Person.class));
PCollection<Person> persons = pipeline.read(source);

On Fri, Jan 29, 2016 at 10:17 AM Robinson, Landon - Landon <la...@lowes.com>> wrote:
Orc format.
---------------------------------------------------------------------------
[cid:4743D013-31C4-407E-A06B-31ABF1E6414D]
Landon Robinson
Big Data & Hadoop Engineer
IT Business Intelligence, Lowe’s Companies Inc.
---------------------------------------------------------------------------

From: David Ortiz <dp...@gmail.com>>
Reply-To: Apache Crunch Mailing List <us...@crunch.apache.org>>
Date: Thursday, January 28, 2016 at 1:22 PM
To: Apache Crunch Mailing List <us...@crunch.apache.org>>
Subject: Re: Reading Hive Tables into PCollection

What format are they stored as?

On Thu, Jan 28, 2016 at 1:20 PM Robinson, Landon - Landon <la...@lowes.com>> wrote:
Crunch Gurus,

What is the Crunch-convenient or recommended way to read the contents of a Hive table into a Pcollection?
Thanks!
Best,
Landon
---------------------------------------------------------------------------
Landon Robinson
Big Data & Hadoop Engineer
---------------------------------------------------------------------------
NOTICE: All information in and attached to the e-mails below may be proprietary, confidential, privileged and otherwise protected from improper or erroneous disclosure. If you are not the sender's intended recipient, you are not authorized to intercept, read, print, retain, copy, forward, or disseminate this message. If you have erroneously received this communication, please notify the sender immediately by phone (704-758-1000) or by e-mail and destroy all copies of this message electronic, paper, or otherwise.

By transmitting documents via this email: Users, Customers, Suppliers and Vendors collectively acknowledge and agree the transmittal of information via email is voluntary, is offered as a convenience, and is not a secured method of communication; Not to transmit any payment information E.G. credit card, debit card, checking account, wire transfer information, passwords, or sensitive and personal information E.G. Driver's license, DOB, social security, or any other information the user wishes to remain confidential; To transmit only non-confidential information such as plans, pictures and drawings and to assume all risk and liability for and indemnify Lowe's from any claims, losses or damages that may arise from the transmittal of documents or including non-confidential information in the body of an email transmittal. Thank you.
NOTICE: All information in and attached to the e-mails below may be proprietary, confidential, privileged and otherwise protected from improper or erroneous disclosure. If you are not the sender's intended recipient, you are not authorized to intercept, read, print, retain, copy, forward, or disseminate this message. If you have erroneously received this communication, please notify the sender immediately by phone (704-758-1000) or by e-mail and destroy all copies of this message electronic, paper, or otherwise.

By transmitting documents via this email: Users, Customers, Suppliers and Vendors collectively acknowledge and agree the transmittal of information via email is voluntary, is offered as a convenience, and is not a secured method of communication; Not to transmit any payment information E.G. credit card, debit card, checking account, wire transfer information, passwords, or sensitive and personal information E.G. Driver's license, DOB, social security, or any other information the user wishes to remain confidential; To transmit only non-confidential information such as plans, pictures and drawings and to assume all risk and liability for and indemnify Lowe's from any claims, losses or damages that may arise from the transmittal of documents or including non-confidential information in the body of an email transmittal. Thank you.
NOTICE: All information in and attached to the e-mails below may be proprietary, confidential, privileged and otherwise protected from improper or erroneous disclosure. If you are not the sender's intended recipient, you are not authorized to intercept, read, print, retain, copy, forward, or disseminate this message. If you have erroneously received this communication, please notify the sender immediately by phone (704-758-1000) or by e-mail and destroy all copies of this message electronic, paper, or otherwise.

By transmitting documents via this email: Users, Customers, Suppliers and Vendors collectively acknowledge and agree the transmittal of information via email is voluntary, is offered as a convenience, and is not a secured method of communication; Not to transmit any payment information E.G. credit card, debit card, checking account, wire transfer information, passwords, or sensitive and personal information E.G. Driver's license, DOB, social security, or any other information the user wishes to remain confidential; To transmit only non-confidential information such as plans, pictures and drawings and to assume all risk and liability for and indemnify Lowe's from any claims, losses or damages that may arise from the transmittal of documents or including non-confidential information in the body of an email transmittal. Thank you.

NOTICE: All information in and attached to the e-mails below may be proprietary, confidential, privileged and otherwise protected from improper or erroneous disclosure. If you are not the sender's intended recipient, you are not authorized to intercept, read, print, retain, copy, forward, or disseminate this message. If you have erroneously received this communication, please notify the sender immediately by phone (704-758-1000) or by e-mail and destroy all copies of this message electronic, paper, or otherwise. 

By transmitting documents via this email: Users, Customers, Suppliers and Vendors collectively acknowledge and agree the transmittal of information via email is voluntary, is offered as a convenience, and is not a secured method of communication; Not to transmit any payment information E.G. credit card, debit card, checking account, wire transfer information, passwords, or sensitive and personal information E.G. Driver's license, DOB, social security, or any other information the user wishes to remain confidential; To transmit only non-confidential information such as plans, pictures and drawings and to assume all risk and liability for and indemnify Lowe's from any claims, losses or damages that may arise from the transmittal of documents or including non-confidential information in the body of an email transmittal. Thank you.

Re: Reading Hive Tables into PCollection

Posted by "Robinson, Landon - Landon" <la...@lowes.com>.
Additionally, we tried allowing those characters to be strings, but get the below error. The real issue is getting the Orc ‘char’ to cast to something we can use in the Orc structure.

Exception in thread "main" org.apache.crunch.CrunchRuntimeException: Error while reading local file: file:/tmp/crunch-test/000000_0
at org.apache.crunch.io.orc.OrcFileReaderFactory$1.next(OrcFileReaderFactory.java:110)
at org.apache.crunch.io.CompositePathIterable$2.next(CompositePathIterable.java:99)
at com.google.common.collect.Iterators$5.next(Iterators.java:607)
at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:266)
at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:223)
at org.apache.crunch.impl.mem.collect.MemCollection.<init>(MemCollection.java:79)
at org.apache.crunch.impl.mem.MemPipeline.read(MemPipeline.java:165)
at org.apache.crunch.impl.mem.MemPipeline.read(MemPipeline.java:156)
at com.lowes.bigdata.closerate.verint.DataQualityDriverTest.run(DataQualityDriverTest.java:57)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at com.lowes.bigdata.closerate.verint.DataQualityDriverTest.main(DataQualityDriverTest.java:36)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.HiveCharWritable cannot be cast to org.apache.hadoop.io.Text
at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:46)
at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:26)
at org.apache.crunch.types.orc.OrcUtils.convert(OrcUtils.java:169)
at org.apache.crunch.types.orc.OrcUtils.convert(OrcUtils.java:222)
at org.apache.crunch.types.orc.Orcs$ReflectInFn.map(Orcs.java:190)
at org.apache.crunch.types.orc.Orcs$ReflectInFn.map(Orcs.java:168)
at org.apache.crunch.fn.CompositeMapFn.map(CompositeMapFn.java:63)
at org.apache.crunch.io.orc.OrcFileReaderFactory$1.next(OrcFileReaderFactory.java:108)
... 15 more


Verint1978Record

public class Verint1978Record {

   private String lct_nbr;
   private String vid_caa_id;
   private Integer hrs_nbr;
   private Integer mte_nbr;
      private String acl_idc;
   private Integer sec_dur;
   private Integer sec_to_pcs;
   private Integer sec_pcd;
      private String use_for_rpr_idc;
   private Integer grp_cnt;
   private Integer sng_cnt;
   private String upd_dt;
   private String upd_id;
   private String cal_dt;

}


---------------------------------------------------------------------------
[cid:E38FEBBD-1C12-48B7-B1A5-465C75E68DB8]
Landon Robinson
Big Data & Hadoop Engineer
IT Business Intelligence, Lowe’s Companies Inc.
---------------------------------------------------------------------------

From: <Robinson>, LCI <la...@lowes.com>>
Reply-To: Apache Crunch Mailing List <us...@crunch.apache.org>>
Date: Friday, January 29, 2016 at 10:33 AM
To: David Ortiz <dp...@gmail.com>>, Apache Crunch Mailing List <us...@crunch.apache.org>>
Subject: Re: Reading Hive Tables into PCollection

Right, we’ve been trying this with little luck — largely because I get the error:

Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.HiveCharWritable cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcStruct

Code:

OrcFileSource<Verint1978Record> source = new OrcFileSource<Verint1978Record>(new Path(inputPath), Orcs.reflects(Verint1978Record.class));
PCollection<Verint1978Record> persons = pipeline.read(source);

Verint1978Record

public class Verint1978Record {

   private String lct_nbr;
   private String vid_caa_id;
   private Integer hrs_nbr;
   private Integer mte_nbr;
   private Character acl_idc;
   private Integer sec_dur;
   private Integer sec_to_pcs;
   private Integer sec_pcd;
   private Character use_for_rpr_idc;
   private Integer grp_cnt;
   private Integer sng_cnt;
   private String upd_dt;
   private String upd_id;
   private String cal_dt;

}

---------------------------------------------------------------------------
[cid:81A61E19-6323-41F7-A88E-590E34601268]
Landon Robinson
Big Data & Hadoop Engineer
IT Business Intelligence, Lowe’s Companies Inc.
---------------------------------------------------------------------------

From: David Ortiz <dp...@gmail.com>>
Date: Friday, January 29, 2016 at 10:19 AM
To: LCI <la...@lowes.com>>, Apache Crunch Mailing List <us...@crunch.apache.org>>
Subject: Re: Reading Hive Tables into PCollection

http://hortonworks.com/blog/using-orcfile-cascading-apache-crunch/

Here's the java excerpt from that article to read into Avro class (I'm assuming).

[code language=”Java”]
// Read an ORCFile using reflection-based serialization (slowest):
OrcFileSource<Person> source = new OrcFileSource<Person>(new Path(inputPath), \
Orcs.reflection(Person.class));
PCollection<Person> persons = pipeline.read(source);

On Fri, Jan 29, 2016 at 10:17 AM Robinson, Landon - Landon <la...@lowes.com>> wrote:
Orc format.
---------------------------------------------------------------------------
[cid:4743D013-31C4-407E-A06B-31ABF1E6414D]
Landon Robinson
Big Data & Hadoop Engineer
IT Business Intelligence, Lowe’s Companies Inc.
---------------------------------------------------------------------------

From: David Ortiz <dp...@gmail.com>>
Reply-To: Apache Crunch Mailing List <us...@crunch.apache.org>>
Date: Thursday, January 28, 2016 at 1:22 PM
To: Apache Crunch Mailing List <us...@crunch.apache.org>>
Subject: Re: Reading Hive Tables into PCollection

What format are they stored as?

On Thu, Jan 28, 2016 at 1:20 PM Robinson, Landon - Landon <la...@lowes.com>> wrote:
Crunch Gurus,

What is the Crunch-convenient or recommended way to read the contents of a Hive table into a Pcollection?
Thanks!
Best,
Landon
---------------------------------------------------------------------------
Landon Robinson
Big Data & Hadoop Engineer
---------------------------------------------------------------------------
NOTICE: All information in and attached to the e-mails below may be proprietary, confidential, privileged and otherwise protected from improper or erroneous disclosure. If you are not the sender's intended recipient, you are not authorized to intercept, read, print, retain, copy, forward, or disseminate this message. If you have erroneously received this communication, please notify the sender immediately by phone (704-758-1000) or by e-mail and destroy all copies of this message electronic, paper, or otherwise.

By transmitting documents via this email: Users, Customers, Suppliers and Vendors collectively acknowledge and agree the transmittal of information via email is voluntary, is offered as a convenience, and is not a secured method of communication; Not to transmit any payment information E.G. credit card, debit card, checking account, wire transfer information, passwords, or sensitive and personal information E.G. Driver's license, DOB, social security, or any other information the user wishes to remain confidential; To transmit only non-confidential information such as plans, pictures and drawings and to assume all risk and liability for and indemnify Lowe's from any claims, losses or damages that may arise from the transmittal of documents or including non-confidential information in the body of an email transmittal. Thank you.
NOTICE: All information in and attached to the e-mails below may be proprietary, confidential, privileged and otherwise protected from improper or erroneous disclosure. If you are not the sender's intended recipient, you are not authorized to intercept, read, print, retain, copy, forward, or disseminate this message. If you have erroneously received this communication, please notify the sender immediately by phone (704-758-1000) or by e-mail and destroy all copies of this message electronic, paper, or otherwise.

By transmitting documents via this email: Users, Customers, Suppliers and Vendors collectively acknowledge and agree the transmittal of information via email is voluntary, is offered as a convenience, and is not a secured method of communication; Not to transmit any payment information E.G. credit card, debit card, checking account, wire transfer information, passwords, or sensitive and personal information E.G. Driver's license, DOB, social security, or any other information the user wishes to remain confidential; To transmit only non-confidential information such as plans, pictures and drawings and to assume all risk and liability for and indemnify Lowe's from any claims, losses or damages that may arise from the transmittal of documents or including non-confidential information in the body of an email transmittal. Thank you.
NOTICE: All information in and attached to the e-mails below may be proprietary, confidential, privileged and otherwise protected from improper or erroneous disclosure. If you are not the sender's intended recipient, you are not authorized to intercept, read, print, retain, copy, forward, or disseminate this message. If you have erroneously received this communication, please notify the sender immediately by phone (704-758-1000) or by e-mail and destroy all copies of this message electronic, paper, or otherwise.

By transmitting documents via this email: Users, Customers, Suppliers and Vendors collectively acknowledge and agree the transmittal of information via email is voluntary, is offered as a convenience, and is not a secured method of communication; Not to transmit any payment information E.G. credit card, debit card, checking account, wire transfer information, passwords, or sensitive and personal information E.G. Driver's license, DOB, social security, or any other information the user wishes to remain confidential; To transmit only non-confidential information such as plans, pictures and drawings and to assume all risk and liability for and indemnify Lowe's from any claims, losses or damages that may arise from the transmittal of documents or including non-confidential information in the body of an email transmittal. Thank you.

NOTICE: All information in and attached to the e-mails below may be proprietary, confidential, privileged and otherwise protected from improper or erroneous disclosure. If you are not the sender's intended recipient, you are not authorized to intercept, read, print, retain, copy, forward, or disseminate this message. If you have erroneously received this communication, please notify the sender immediately by phone (704-758-1000) or by e-mail and destroy all copies of this message electronic, paper, or otherwise. 

By transmitting documents via this email: Users, Customers, Suppliers and Vendors collectively acknowledge and agree the transmittal of information via email is voluntary, is offered as a convenience, and is not a secured method of communication; Not to transmit any payment information E.G. credit card, debit card, checking account, wire transfer information, passwords, or sensitive and personal information E.G. Driver's license, DOB, social security, or any other information the user wishes to remain confidential; To transmit only non-confidential information such as plans, pictures and drawings and to assume all risk and liability for and indemnify Lowe's from any claims, losses or damages that may arise from the transmittal of documents or including non-confidential information in the body of an email transmittal. Thank you.

Re: Reading Hive Tables into PCollection

Posted by "Robinson, Landon - Landon" <la...@lowes.com>.
Right, we’ve been trying this with little luck — largely because I get the error:

Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.HiveCharWritable cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcStruct

Code:

OrcFileSource<Verint1978Record> source = new OrcFileSource<Verint1978Record>(new Path(inputPath), Orcs.reflects(Verint1978Record.class));
PCollection<Verint1978Record> persons = pipeline.read(source);

Verint1978Record

public class Verint1978Record {

   private String lct_nbr;
   private String vid_caa_id;
   private Integer hrs_nbr;
   private Integer mte_nbr;
   private Character acl_idc;
   private Integer sec_dur;
   private Integer sec_to_pcs;
   private Integer sec_pcd;
   private Character use_for_rpr_idc;
   private Integer grp_cnt;
   private Integer sng_cnt;
   private String upd_dt;
   private String upd_id;
   private String cal_dt;

}

---------------------------------------------------------------------------
[cid:81A61E19-6323-41F7-A88E-590E34601268]
Landon Robinson
Big Data & Hadoop Engineer
IT Business Intelligence, Lowe’s Companies Inc.
---------------------------------------------------------------------------

From: David Ortiz <dp...@gmail.com>>
Date: Friday, January 29, 2016 at 10:19 AM
To: LCI <la...@lowes.com>>, Apache Crunch Mailing List <us...@crunch.apache.org>>
Subject: Re: Reading Hive Tables into PCollection

http://hortonworks.com/blog/using-orcfile-cascading-apache-crunch/

Here's the java excerpt from that article to read into Avro class (I'm assuming).

[code language=”Java”]
// Read an ORCFile using reflection-based serialization (slowest):
OrcFileSource<Person> source = new OrcFileSource<Person>(new Path(inputPath), \
Orcs.reflection(Person.class));
PCollection<Person> persons = pipeline.read(source);

On Fri, Jan 29, 2016 at 10:17 AM Robinson, Landon - Landon <la...@lowes.com>> wrote:
Orc format.
---------------------------------------------------------------------------
[cid:4743D013-31C4-407E-A06B-31ABF1E6414D]
Landon Robinson
Big Data & Hadoop Engineer
IT Business Intelligence, Lowe’s Companies Inc.
---------------------------------------------------------------------------

From: David Ortiz <dp...@gmail.com>>
Reply-To: Apache Crunch Mailing List <us...@crunch.apache.org>>
Date: Thursday, January 28, 2016 at 1:22 PM
To: Apache Crunch Mailing List <us...@crunch.apache.org>>
Subject: Re: Reading Hive Tables into PCollection

What format are they stored as?

On Thu, Jan 28, 2016 at 1:20 PM Robinson, Landon - Landon <la...@lowes.com>> wrote:
Crunch Gurus,

What is the Crunch-convenient or recommended way to read the contents of a Hive table into a Pcollection?
Thanks!
Best,
Landon
---------------------------------------------------------------------------
Landon Robinson
Big Data & Hadoop Engineer
---------------------------------------------------------------------------
NOTICE: All information in and attached to the e-mails below may be proprietary, confidential, privileged and otherwise protected from improper or erroneous disclosure. If you are not the sender's intended recipient, you are not authorized to intercept, read, print, retain, copy, forward, or disseminate this message. If you have erroneously received this communication, please notify the sender immediately by phone (704-758-1000) or by e-mail and destroy all copies of this message electronic, paper, or otherwise.

By transmitting documents via this email: Users, Customers, Suppliers and Vendors collectively acknowledge and agree the transmittal of information via email is voluntary, is offered as a convenience, and is not a secured method of communication; Not to transmit any payment information E.G. credit card, debit card, checking account, wire transfer information, passwords, or sensitive and personal information E.G. Driver's license, DOB, social security, or any other information the user wishes to remain confidential; To transmit only non-confidential information such as plans, pictures and drawings and to assume all risk and liability for and indemnify Lowe's from any claims, losses or damages that may arise from the transmittal of documents or including non-confidential information in the body of an email transmittal. Thank you.
NOTICE: All information in and attached to the e-mails below may be proprietary, confidential, privileged and otherwise protected from improper or erroneous disclosure. If you are not the sender's intended recipient, you are not authorized to intercept, read, print, retain, copy, forward, or disseminate this message. If you have erroneously received this communication, please notify the sender immediately by phone (704-758-1000) or by e-mail and destroy all copies of this message electronic, paper, or otherwise.

By transmitting documents via this email: Users, Customers, Suppliers and Vendors collectively acknowledge and agree the transmittal of information via email is voluntary, is offered as a convenience, and is not a secured method of communication; Not to transmit any payment information E.G. credit card, debit card, checking account, wire transfer information, passwords, or sensitive and personal information E.G. Driver's license, DOB, social security, or any other information the user wishes to remain confidential; To transmit only non-confidential information such as plans, pictures and drawings and to assume all risk and liability for and indemnify Lowe's from any claims, losses or damages that may arise from the transmittal of documents or including non-confidential information in the body of an email transmittal. Thank you.

NOTICE: All information in and attached to the e-mails below may be proprietary, confidential, privileged and otherwise protected from improper or erroneous disclosure. If you are not the sender's intended recipient, you are not authorized to intercept, read, print, retain, copy, forward, or disseminate this message. If you have erroneously received this communication, please notify the sender immediately by phone (704-758-1000) or by e-mail and destroy all copies of this message electronic, paper, or otherwise. 

By transmitting documents via this email: Users, Customers, Suppliers and Vendors collectively acknowledge and agree the transmittal of information via email is voluntary, is offered as a convenience, and is not a secured method of communication; Not to transmit any payment information E.G. credit card, debit card, checking account, wire transfer information, passwords, or sensitive and personal information E.G. Driver's license, DOB, social security, or any other information the user wishes to remain confidential; To transmit only non-confidential information such as plans, pictures and drawings and to assume all risk and liability for and indemnify Lowe's from any claims, losses or damages that may arise from the transmittal of documents or including non-confidential information in the body of an email transmittal. Thank you.

Re: Reading Hive Tables into PCollection

Posted by David Ortiz <dp...@gmail.com>.
http://hortonworks.com/blog/using-orcfile-cascading-apache-crunch/

Here's the java excerpt from that article to read into Avro class (I'm
assuming).

[code language=”Java”]
// Read an ORCFile using reflection-based serialization (slowest):
OrcFileSource<Person> source = new OrcFileSource<Person>(new
Path(inputPath), \
Orcs.reflection(Person.class));
PCollection<Person> persons = pipeline.read(source);

On Fri, Jan 29, 2016 at 10:17 AM Robinson, Landon - Landon <
landon.t.robinson@lowes.com> wrote:

> Orc format.
> ---------------------------------------------------------------------------
> Landon Robinson
> Big Data & Hadoop Engineer
> IT Business Intelligence, Lowe’s Companies Inc.
> ---------------------------------------------------------------------------
>
> From: David Ortiz <dp...@gmail.com>
> Reply-To: Apache Crunch Mailing List <us...@crunch.apache.org>
> Date: Thursday, January 28, 2016 at 1:22 PM
> To: Apache Crunch Mailing List <us...@crunch.apache.org>
> Subject: Re: Reading Hive Tables into PCollection
>
> What format are they stored as?
>
> On Thu, Jan 28, 2016 at 1:20 PM Robinson, Landon - Landon <
> landon.t.robinson@lowes.com> wrote:
>
>> Crunch Gurus,
>>
>> What is the Crunch-convenient or recommended way to read the contents of
>> a Hive table into a Pcollection?
>> Thanks!
>> Best,
>> Landon
>>
>> ---------------------------------------------------------------------------
>> Landon Robinson
>> Big Data & Hadoop Engineer
>>
>> ---------------------------------------------------------------------------
>> NOTICE: All information in and attached to the e-mails below may be
>> proprietary, confidential, privileged and otherwise protected from improper
>> or erroneous disclosure. If you are not the sender's intended recipient,
>> you are not authorized to intercept, read, print, retain, copy, forward, or
>> disseminate this message. If you have erroneously received this
>> communication, please notify the sender immediately by phone (704-758-1000)
>> or by e-mail and destroy all copies of this message electronic, paper, or
>> otherwise.
>>
>> *By transmitting documents via this email: Users, Customers, Suppliers
>> and Vendors collectively acknowledge and agree the transmittal of
>> information via email is voluntary, is offered as a convenience, and is not
>> a secured method of communication; Not to transmit any payment information
>> E.G. credit card, debit card, checking account, wire transfer information,
>> passwords, or sensitive and personal information E.G. Driver's license,
>> DOB, social security, or any other information the user wishes to remain
>> confidential; To transmit only non-confidential information such as plans,
>> pictures and drawings and to assume all risk and liability for and
>> indemnify Lowe's from any claims, losses or damages that may arise from the
>> transmittal of documents or including non-confidential information in the
>> body of an email transmittal. Thank you. *
>>
> NOTICE: All information in and attached to the e-mails below may be
> proprietary, confidential, privileged and otherwise protected from improper
> or erroneous disclosure. If you are not the sender's intended recipient,
> you are not authorized to intercept, read, print, retain, copy, forward, or
> disseminate this message. If you have erroneously received this
> communication, please notify the sender immediately by phone (704-758-1000)
> or by e-mail and destroy all copies of this message electronic, paper, or
> otherwise.
>
> *By transmitting documents via this email: Users, Customers, Suppliers and
> Vendors collectively acknowledge and agree the transmittal of information
> via email is voluntary, is offered as a convenience, and is not a secured
> method of communication; Not to transmit any payment information E.G.
> credit card, debit card, checking account, wire transfer information,
> passwords, or sensitive and personal information E.G. Driver's license,
> DOB, social security, or any other information the user wishes to remain
> confidential; To transmit only non-confidential information such as plans,
> pictures and drawings and to assume all risk and liability for and
> indemnify Lowe's from any claims, losses or damages that may arise from the
> transmittal of documents or including non-confidential information in the
> body of an email transmittal. Thank you. *
>

Re: Reading Hive Tables into PCollection

Posted by "Robinson, Landon - Landon" <la...@lowes.com>.
Orc format.
---------------------------------------------------------------------------
[cid:4743D013-31C4-407E-A06B-31ABF1E6414D]
Landon Robinson
Big Data & Hadoop Engineer
IT Business Intelligence, Lowe’s Companies Inc.
---------------------------------------------------------------------------

From: David Ortiz <dp...@gmail.com>>
Reply-To: Apache Crunch Mailing List <us...@crunch.apache.org>>
Date: Thursday, January 28, 2016 at 1:22 PM
To: Apache Crunch Mailing List <us...@crunch.apache.org>>
Subject: Re: Reading Hive Tables into PCollection

What format are they stored as?

On Thu, Jan 28, 2016 at 1:20 PM Robinson, Landon - Landon <la...@lowes.com>> wrote:
Crunch Gurus,

What is the Crunch-convenient or recommended way to read the contents of a Hive table into a Pcollection?
Thanks!
Best,
Landon
---------------------------------------------------------------------------
Landon Robinson
Big Data & Hadoop Engineer
---------------------------------------------------------------------------
NOTICE: All information in and attached to the e-mails below may be proprietary, confidential, privileged and otherwise protected from improper or erroneous disclosure. If you are not the sender's intended recipient, you are not authorized to intercept, read, print, retain, copy, forward, or disseminate this message. If you have erroneously received this communication, please notify the sender immediately by phone (704-758-1000) or by e-mail and destroy all copies of this message electronic, paper, or otherwise.

By transmitting documents via this email: Users, Customers, Suppliers and Vendors collectively acknowledge and agree the transmittal of information via email is voluntary, is offered as a convenience, and is not a secured method of communication; Not to transmit any payment information E.G. credit card, debit card, checking account, wire transfer information, passwords, or sensitive and personal information E.G. Driver's license, DOB, social security, or any other information the user wishes to remain confidential; To transmit only non-confidential information such as plans, pictures and drawings and to assume all risk and liability for and indemnify Lowe's from any claims, losses or damages that may arise from the transmittal of documents or including non-confidential information in the body of an email transmittal. Thank you.

NOTICE: All information in and attached to the e-mails below may be proprietary, confidential, privileged and otherwise protected from improper or erroneous disclosure. If you are not the sender's intended recipient, you are not authorized to intercept, read, print, retain, copy, forward, or disseminate this message. If you have erroneously received this communication, please notify the sender immediately by phone (704-758-1000) or by e-mail and destroy all copies of this message electronic, paper, or otherwise. 

By transmitting documents via this email: Users, Customers, Suppliers and Vendors collectively acknowledge and agree the transmittal of information via email is voluntary, is offered as a convenience, and is not a secured method of communication; Not to transmit any payment information E.G. credit card, debit card, checking account, wire transfer information, passwords, or sensitive and personal information E.G. Driver's license, DOB, social security, or any other information the user wishes to remain confidential; To transmit only non-confidential information such as plans, pictures and drawings and to assume all risk and liability for and indemnify Lowe's from any claims, losses or damages that may arise from the transmittal of documents or including non-confidential information in the body of an email transmittal. Thank you.

Re: Reading Hive Tables into PCollection

Posted by David Ortiz <dp...@gmail.com>.
What format are they stored as?

On Thu, Jan 28, 2016 at 1:20 PM Robinson, Landon - Landon <
landon.t.robinson@lowes.com> wrote:

> Crunch Gurus,
>
> What is the Crunch-convenient or recommended way to read the contents of a
> Hive table into a Pcollection?
> Thanks!
> Best,
> Landon
> ---------------------------------------------------------------------------
> Landon Robinson
> Big Data & Hadoop Engineer
> ---------------------------------------------------------------------------
> NOTICE: All information in and attached to the e-mails below may be
> proprietary, confidential, privileged and otherwise protected from improper
> or erroneous disclosure. If you are not the sender's intended recipient,
> you are not authorized to intercept, read, print, retain, copy, forward, or
> disseminate this message. If you have erroneously received this
> communication, please notify the sender immediately by phone (704-758-1000)
> or by e-mail and destroy all copies of this message electronic, paper, or
> otherwise.
>
> *By transmitting documents via this email: Users, Customers, Suppliers and
> Vendors collectively acknowledge and agree the transmittal of information
> via email is voluntary, is offered as a convenience, and is not a secured
> method of communication; Not to transmit any payment information E.G.
> credit card, debit card, checking account, wire transfer information,
> passwords, or sensitive and personal information E.G. Driver's license,
> DOB, social security, or any other information the user wishes to remain
> confidential; To transmit only non-confidential information such as plans,
> pictures and drawings and to assume all risk and liability for and
> indemnify Lowe's from any claims, losses or damages that may arise from the
> transmittal of documents or including non-confidential information in the
> body of an email transmittal. Thank you. *
>