You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Timothy Potter <th...@gmail.com> on 2013/10/09 21:05:57 UTC

Trouble doing the most basic thing in Pig with HCatalog (v. 0.11.0)

Hi,

Long time user of HCatalog 0.4 and am testing out an upgrade to Hive /
HCatalog 0.11.0 as we need windowing functions and ORC

I'm testing the HCatLoader from Pig and am getting the exceptions below
using this simple Pig script:

sigs_in = load 'signals' using org.apache.hcatalog.pig.HCatLoader();
describe sigs_in;
sigs = filter sigs_in by datetime_partition == '2013-10-07_0000';
...

The exceptions (see below) occur in the Pig front-end processing, trying to
get the input paths. The Pig describe command returns the schema, so I know
there's some communication going on between the LoadFunc and the metastore.
Also, if I do: hcat -e "show partitions signals;" I get the list of
expected partitions on that table.

Any ideas on where to start troubleshooting this issue? I'm using Pig 0.10
with Hive / HCatalog 0.11.0 running on Hadoop 2.0.0-cdh4.1.2.

I built Hive/HCatalog from source using: *ant clean package
-Dmvn.hadoop.profile=hadoop23 -Dhadoop.mr.rev=23*

Exception:

Caused by: java.io.IOException:
org.shaded.thrift.transport.TTransportException:
java.net.SocketTimeoutException: Read timed out
at
org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:87)
at
org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:63)
at org.apache.hcatalog.pig.HCatLoader.setLocation(HCatLoader.java:119)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:380)
... 17 more
Caused by: org.shaded.thrift.transport.TTransportException:
java.net.SocketTimeoutException: Read timed out
at
org.shaded.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
at org.shaded.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.shaded.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at
org.shaded.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
at
org.shaded.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
at org.shaded.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
*at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partitions_by_filter(ThriftHiveMetastore.java:1738)
*
* at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partitions_by_filter(ThriftHiveMetastore.java:1722)
*
* at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsByFilter(HiveMetaStoreClient.java:780)
*
* at
org.apache.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:112)
*
* at
org.apache.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:85)
*
* at
org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:85)
*
... 20 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at
org.shaded.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)


NOTE: Don't worry about the org.shaded.thrift package names as I had to
build a shaded JAR for my HCatalog clients to work-around Thrift version
issues on my classpath. I tested the same w/o the shading and received the
same error.

Re: Trouble doing the most basic thing in Pig with HCatalog (v. 0.11.0)

Posted by Timothy Potter <th...@gmail.com>.
Having deja vu from when I did our HCatalog 0.4 install ... the issue is
the datanucleus jars are out-of-date. I upgraded the datanucleus JARs and
am now past this issue ... new issue is:

2013-10-09 20:20:36,411 [main] ERROR
org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to
recreate exception from backend error:
org.apache.hcatalog.common.HCatException : 2004 : HCatOutputFormat not
initialized, setOutput has to be called
at
org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.getJobInfo(HCatBaseOutputFormat.java:111)
at
org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.getJobInfo(HCatBaseOutputFormat.java:97)
at
org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.getOutputFormat(HCatBaseOutputFormat.java:85)
at
org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.checkOutputSpecs(HCatBaseOutputFormat.java:75)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecsHelper(PigOutputFormat.java:207)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:187)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:935)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:531)
at
org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:318)
at
org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.startReadyJobs(JobControl.java:238)
at
org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:269)
at java.lang.Thread.run(Thread.java:662)



On Wed, Oct 9, 2013 at 1:05 PM, Timothy Potter <th...@gmail.com> wrote:

> Hi,
>
> Long time user of HCatalog 0.4 and am testing out an upgrade to Hive /
> HCatalog 0.11.0 as we need windowing functions and ORC
>
> I'm testing the HCatLoader from Pig and am getting the exceptions below
> using this simple Pig script:
>
> sigs_in = load 'signals' using org.apache.hcatalog.pig.HCatLoader();
> describe sigs_in;
> sigs = filter sigs_in by datetime_partition == '2013-10-07_0000';
> ...
>
> The exceptions (see below) occur in the Pig front-end processing, trying
> to get the input paths. The Pig describe command returns the schema, so I
> know there's some communication going on between the LoadFunc and the
> metastore. Also, if I do: hcat -e "show partitions signals;" I get the
> list of expected partitions on that table.
>
> Any ideas on where to start troubleshooting this issue? I'm using Pig 0.10
> with Hive / HCatalog 0.11.0 running on Hadoop 2.0.0-cdh4.1.2.
>
> I built Hive/HCatalog from source using: *ant clean package
> -Dmvn.hadoop.profile=hadoop23 -Dhadoop.mr.rev=23*
>
> Exception:
>
> Caused by: java.io.IOException:
> org.shaded.thrift.transport.TTransportException:
> java.net.SocketTimeoutException: Read timed out
> at
> org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:87)
>  at
> org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:63)
> at org.apache.hcatalog.pig.HCatLoader.setLocation(HCatLoader.java:119)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:380)
> ... 17 more
> Caused by: org.shaded.thrift.transport.TTransportException:
> java.net.SocketTimeoutException: Read timed out
> at
> org.shaded.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
>  at org.shaded.thrift.transport.TTransport.readAll(TTransport.java:84)
> at
> org.shaded.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
>  at
> org.shaded.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
> at
> org.shaded.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
>  at org.shaded.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
> *at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partitions_by_filter(ThriftHiveMetastore.java:1738)
> *
> * at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partitions_by_filter(ThriftHiveMetastore.java:1722)
> *
> * at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsByFilter(HiveMetaStoreClient.java:780)
> *
> * at
> org.apache.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:112)
> *
> * at
> org.apache.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:85)
> *
> * at
> org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:85)
> *
> ... 20 more
> Caused by: java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:129)
>  at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
>  at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
> at
> org.shaded.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
>
>
> NOTE: Don't worry about the org.shaded.thrift package names as I had to
> build a shaded JAR for my HCatalog clients to work-around Thrift version
> issues on my classpath. I tested the same w/o the shading and received the
> same error.
>
>