You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Bill Graham <bi...@gmail.com> on 2009/07/29 02:53:32 UTC

partitions not being created

Hi,

I'm trying to create a partitioned table and the partition is not appearing
for some reason. Am I doing something wrong, or is this a bug? Below are the
commands I'm executing with their output. Note that the 'show partitions'
command is not returning anything. If I were to try to load data into this
table I'd get a 'get_partition failed' error. I'm using bleeding-edge Hive,
built from the trunk.

hive> create table partTable (a string, b int) partitioned by (dt int);
OK
Time taken: 0.308 seconds
hive> show partitions partTable;
OK
Time taken: 0.329 seconds
hive> describe partTable;
OK
a       string
b       int
dt      int
Time taken: 0.181 seconds

thanks,
Bill

Re: partitions not being created

Posted by Bill Graham <bi...@gmail.com>.
It's really strange to me too why this isn't working. Your description of
our setup is correct. Using the client I connect to the Hive server, which
has a local metastore. On the client I see errors in hive.log, but the
server doesn't show any for the LOAD DATA INPATH call. Below are both of
their outputs:

- On the hive client I see this in /var/$USER/hive.log:

2009-07-31 15:48:13,148 ERROR metadata.Hive (Hive.java:getPartition(588)) -
org.apache.thrift.TApplicationException: get_partition failed: unknown
result
        at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partition(ThriftHiveMetastore.java:784)
        at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partition(ThriftHiveMetastore.java:752)
        at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartition(HiveMetaStoreClient.java:415)
        at
org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:579)
        at
org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:466)
        at
org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:135)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:357)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:263)
        at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:122)
        at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:173)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:266)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

2009-07-31 15:48:13,149 ERROR exec.MoveTask
(SessionState.java:printError(279)) - Failed with exception
org.apache.thrift.TApplicationException: get_partition failed: unknown
result
org.apache.hadoop.hive.ql.metadata.HiveException:
org.apache.thrift.TApplicationException: get_partition failed: unknown
result
        at
org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:589)
        at
org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:466)
        at
org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:135)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:357)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:263)
        at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:122)
        at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:173)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:266)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
Caused by: org.apache.thrift.TApplicationException: get_partition failed:
unknown result
        at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partition(ThriftHiveMetastore.java:784)
        at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partition(ThriftHiveMetastore.java:752)
        at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartition(HiveMetaStoreClient.java:415)
        at
org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:579)
        ... 16 more

2009-07-31 15:48:13,150 ERROR ql.Driver (SessionState.java:printError(279))
- FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MoveTask

- In the Hive Server I don't see an exception for the LOAD DATA INPATH
command, just this:

Hive history file=/tmp/app/hive_job_log_app_200907311548_1601573169.txt
09/07/31 15:48:11 INFO exec.HiveHistory: Hive history
file=/tmp/app/hive_job_log_app_200907311548_1601573169.txt
09/07/31 15:48:11 INFO metastore.HiveMetaStore: 0: get_table : db=default
tbl=ApiUsage
09/07/31 15:48:11 INFO metastore.HiveMetaStore: 0: Opening raw store with
implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
09/07/31 15:48:11 INFO metastore.ObjectStore: ObjectStore, initialize called
09/07/31 15:48:11 INFO metastore.ObjectStore: Initialized ObjectStore
Hive history file=/tmp/app/hive_job_log_app_200907311548_1472642154.txt
09/07/31 15:48:12 INFO exec.HiveHistory: Hive history
file=/tmp/app/hive_job_log_app_200907311548_1472642154.txt
09/07/31 15:48:12 INFO metastore.HiveMetaStore: 3: get_table : db=default
tbl=apiusage
09/07/31 15:48:12 INFO metastore.HiveMetaStore: 3: Opening raw store with
implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
09/07/31 15:48:12 INFO metastore.ObjectStore: ObjectStore, initialize called
09/07/31 15:48:12 INFO metastore.ObjectStore: Initialized ObjectStore
Hive history file=/tmp/app/hive_job_log_app_200907311548_157169269.txt
09/07/31 15:48:12 INFO exec.HiveHistory: Hive history
file=/tmp/app/hive_job_log_app_200907311548_157169269.txt
09/07/31 15:48:12 INFO metastore.HiveMetaStore: 4: get_table : db=default
tbl=apiusage
09/07/31 15:48:12 INFO metastore.HiveMetaStore: 4: Opening raw store with
implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
09/07/31 15:48:12 INFO metastore.ObjectStore: ObjectStore, initialize called
09/07/31 15:48:12 INFO metastore.ObjectStore: Initialized ObjectStore
Hive history file=/tmp/app/hive_job_log_app_200907311548_491779462.txt
09/07/31 15:48:13 INFO exec.HiveHistory: Hive history
file=/tmp/app/hive_job_log_app_200907311548_491779462.txt
09/07/31 15:48:13 INFO metastore.HiveMetaStore: 2: get_partition :
db=default tbl=apiusage
09/07/31 15:48:13 INFO metastore.HiveMetaStore: 2: Opening raw store with
implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
09/07/31 15:48:13 INFO metastore.ObjectStore: ObjectStore, initialize called
09/07/31 15:48:13 INFO metastore.ObjectStore: Initialized ObjectStore


On Fri, Jul 31, 2009 at 3:15 PM, Prasad Chakka <pc...@facebook.com> wrote:

>  This is a very common op that we do everyday. So I am surprised that this
> is happening. This time you are executing commands from CLI which connects
> to a thrift Metastore Server. Correct?
> Can you print the stack trace from Hive server (or the metastore server)
> logs?
>
> Prasad
> ------------------------------
> *From: *Bill Graham <bi...@gmail.com>
> *Reply-To: *<bi...@gmail.com>
> *Date: *Fri, 31 Jul 2009 14:00:20 -0700
> *To: *Prasad Chakka <pc...@facebook.com>
> *Cc: *<hi...@hadoop.apache.org>
> *Subject: *Re: partitions not being created
>
> I just completely removed my all of my Hive tables and folders in HDFS, as
> well as metadata_db. I then re-built Hive from the latest from the trunk.
> After replacing my Hive server with the contents of build/dist, and doing
> the same for my client, I created new tables from scratch and again tried to
> migrate from ApiUsageTemp --> ApiUsage. I got the same "get_partition
> failed: unknown result" error.
>
> I decided to skip the table migration and just load data directly into a
> partitioned table. That also gives the same error. Below is what I tried.
> Any ideas?
>
> hive> CREATE TABLE ApiUsage
>     >     (user STRING, restResource STRING, statusCode INT, requestDate
> STRING, requestHour INT, numRequests STRING, responseTime STRING,
> numSlowRequests STRING)
>     >     PARTITIONED BY (dt STRING)
>     >     ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ';
> OK
> Time taken: 0.27 seconds
>
> hive> describe extended
> ApiUsage;
>
> OK
> user    string
> restresource    string
> statuscode      int
> requestdate     string
> requesthour     int
> numrequests     string
> responsetime    string
> numslowrequests string
> dt      string
>
> Detailed Table Information      Table(tableName:apiusage, dbName:default,
> owner:grahamb, createTime:1249073147, lastAccessTime:0, retention:0,
> sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string,
> comment:null), FieldSchema(name:restresource, type:string, comment:null),
> FieldSchema(name:statuscode, type:int, comment:null),
> FieldSchema(name:requestdate, type:string, comment:null),
> FieldSchema(name:requesthour, type:int, comment:null),
> FieldSchema(name:numrequests, type:string, comment:null),
> FieldSchema(name:responsetime, type:string, comment:null),
> FieldSchema(name:numslowrequests, type:string, comment:null)],
> location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage,
> inputFormat:org.apache.hadoop.mapred.TextInputFormat,
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
> parameters:{field.delim= , serialization.format= }), bucketCols:[],
> sortCols:[], parameters:{}), partitionKeys:[FieldSchema(name:dt,
> type:string, comment:null)], parameters:{})
> Time taken: 0.276 seconds
>
> hive> LOAD DATA INPATH "sample_data/apilogs/summary-small/2009/05/18" INTO
> TABLE ApiUsage PARTITION (dt = "20090518" );
> Loading data to table apiusage partition {dt=20090518}
> Failed with exception org.apache.thrift.TApplicationException:
> get_partition failed: unknown result
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.MoveTask
>
> On Thu, Jul 30, 2009 at 1:24 PM, Prasad Chakka <pc...@facebook.com>
> wrote:
>
> This is not backward compatibility issue. Check HIVE-592 for details.
> Before this patch, a rename doesn’t change the name of the hdfs directory
> and if you create a new table with the old name of the renamed  table then
> both tables will be pointing to the same directory thus causing problems.
> HIVE-592 fixes this to rename directories correctly. So if you have created
> all tables after HIVE-592 patch went in, you should be fine.
>
>
>
> ------------------------------
> *From: *Bill Graham <billgraham@gmail.com <ht...@gmail.com> >
> *Reply-To: *<billgraham@gmail.com <ht...@gmail.com> >
> *Date: *Thu, 30 Jul 2009 13:09:03 -0700
>
> *To: *Prasad Chakka <pchakka@facebook.com <ht...@facebook.com> >
> *Cc: *<hive-user@hadoop.apache.org <ht...@hadoop.apache.org> >
> *Subject: *Re: partitions not being created
>
> I sent my last try reply before seeing your last email.
>
> Thanks, that seems possible. I did initially create ApiUsageTemp using the
> most recent Hive release. Then while working on a JIRA I updated my Hive
> client and server to the more recent builds from the trunk.
>
> If that could cause such a problem, this is troubling though, since it
> implies that we can't upgrade Hive without possibly corrupting our metadata
> store.
>
> I'll try again from scratch though and see if it works, thanks.
>
>
> On Thu, Jul 30, 2009 at 1:04 PM, Bill Graham <billgraham@gmail.com <
> http://billgraham@gmail.com> > wrote:
>
> Prasad,
>
> My setup is Hive client -> Hive Server (with local metastore) -> Hadoop. I
> was also suspecting metastore issues, so I've tried multiple times with
> newly created destination tables and I see the same thing happening.
>
> All of the log info I've been able to find I've included already in this
> thread. Let me know if there's anywhere else I could look for clues.
>
> I've included from the client:
> - /tmp/$USER/hive.log
>
> And from the hive server:
> - Stdout/err logs
>
> - /tmp/$USER/hive_job_log*.txt
>
> Is there anything else I should be looking at? All of the M/R logs don't
> show any exceptions anything suspect.
>
> Thanks for your time and insights on this issue, I appreciate it.
>
> thanks,
> Bill
>
>
> On Thu, Jul 30, 2009 at 11:57 AM, Prasad Chakka <pchakka@facebook.com <
> http://pchakka@facebook.com> > wrote:
>
> Bill,
>
> The real error is happening on the Hive Metastore Server or Hive Server
>  (depending on the setup you are using). Error logs on it must have
> different stack trace. From the information below I am guessing that the way
> the destination table hdfs directories that got created has some problems.
> Can you drop that table (and make sure that there is no corresponding HDFS
> directory for both integer and string type partitions that you created) and
> retry the query.
>
> If you don’t want to drop the destination table then send me the logs on
> Hive Server.
>
> Prasad
>
>
>
> ------------------------------
> *From: *Bill Graham <billgraham@gmail.com <ht...@gmail.com>  <
> http://billgraham@gmail.com> >
> *Reply-To: *<billgraham@gmail.com <ht...@gmail.com>  <
> http://billgraham@gmail.com> >
> *Date: *Thu, 30 Jul 2009 11:47:41 -0700
>
> *To: *Prasad Chakka <pchakka@facebook.com <ht...@facebook.com>  <
> http://pchakka@facebook.com> >
> *Cc: *<hive-user@hadoop.apache.org <ht...@hadoop.apache.org>  <
> http://hive-user@hadoop.apache.org> >
> *Subject: *Re: partitions not being created
>
> That file contains a similar error as the Hive Server logs:
>
> 2009-07-30 11:44:21,095 WARN  mapred.JobClient
> (JobClient.java:configureCommandLineOptions(510)) - Use GenericOptionsParser
> for parsing the arguments. Applications should implement Tool for the same.
> 2009-07-30 11:44:48,070 WARN  mapred.JobClient
> (JobClient.java:configureCommandLineOptions(510)) - Use GenericOptionsParser
> for parsing the arguments. Applications should implement Tool for the same.
> 2009-07-30 11:45:27,796 ERROR metadata.Hive (Hive.java:getPartition(588)) -
> org.apache.thrift.TApplicationException: get_partition failed: unknown
> result
>         at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partition(ThriftHiveMetastore.java:784)
>         at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partition(ThriftHiveMetastore.java:752)
>         at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartition(HiveMetaStoreClient.java:415)
>         at
> org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:579)
>         at
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:466)
>         at
> org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:135)
>         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:335)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:241)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:122)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:165)
>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:258)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>         at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>         at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>
> 2009-07-30 11:45:27,797 ERROR exec.MoveTask
> (SessionState.java:printError(279)) - Failed with exception
> org.apache.thrift.TApplicationException: get_partition failed: unknown
> result
> org.apache.hadoop.hive.ql.metadata.HiveException:
> org.apache.thrift.TApplicationException: get_partition failed: unknown
> result
>         at
> org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:589)
>         at
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:466)
>         at
> org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:135)
>         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:335)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:241)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:122)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:165)
>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:258)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>         at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>         at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> Caused by: org.apache.thrift.TApplicationException: get_partition failed:
> unknown result
>         at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partition(ThriftHiveMetastore.java:784)
>         at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partition(ThriftHiveMetastore.java:752)
>         at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartition(HiveMetaStoreClient.java:415)
>         at
> org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:579)
>         ... 16 more
>
> 2009-07-30 11:45:27,798 ERROR ql.Driver (SessionState.java:printError(279))
> - FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.MoveTask
>
> On Thu, Jul 30, 2009 at 11:33 AM, Prasad Chakka <pchakka@facebook.com <
> http://pchakka@facebook.com>  <ht...@facebook.com> > wrote:
>
>
> The hive logs go into /tmp/$USER/hive.log not hive_job_log*.txt.
>
>
> ------------------------------
> *From: *Bill Graham <billgraham@gmail.com <ht...@gmail.com>  <
> http://billgraham@gmail.com>  <ht...@gmail.com> >
> *Reply-To: *<billgraham@gmail.com <ht...@gmail.com>  <
> http://billgraham@gmail.com>  <ht...@gmail.com> >
>
>
> *Date: *Thu, 30 Jul 2009 10:52:06 -0700
> *To: *Prasad Chakka <pchakka@facebook.com <ht...@facebook.com>  <
> http://pchakka@facebook.com>  <ht...@facebook.com> >
> *Cc: *<hive-user@hadoop.apache.org <ht...@hadoop.apache.org>  <
> http://hive-user@hadoop.apache.org>  <ht...@hadoop.apache.org>
> >, Zheng Shao <zshao9@gmail.com <ht...@gmail.com>  <
> http://zshao9@gmail.com>  <ht...@gmail.com> >
>
>
>
> *Subject: *Re: partitions not being created
>
> I'm trying to set a string to a string and I'm seeing this error. I also
> had an attempt where it was a string to an int, and I also saw the same
> error.
>
> The /tmp/$USER/hive_job_log*.txt file doesn't contain any exceptions, but
> I've included it's output below. Only the Hive server logs show the
> exceptions listed above. (Note that the table I'm loading from in this log
> output is ApiUsageSmall, which is identical to ApiUsageTemp. For some reason
> the data from ApiUsageTemp is now gone.)
>
> QueryStart QUERY_STRING="INSERT OVERWRITE TABLE ApiUsage PARTITION (dt =
> "20090518") SELECT `(requestDate)?+.+` FROM ApiUsageSmall WHERE requestDate
> = '2009/05/18'" QUERY_ID="app_20090730104242" TIME="1248975752235"
> TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver"
> TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TIME="1248975752235"
> TaskProgress TASK_HADOOP_PROGRESS="2009-07-30 10:42:34,783 map = 0%,
> reduce =0%" TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver"
> TASK_COUNTERS="Job Counters .Launched map tasks:1,Job Counters .Data-local
> map tasks:1" TASK_ID="Stage-1" QUERY_ID="app_20090730104242"
> TASK_HADOOP_ID="job_200906301559_0409" TIME="1248975754785"
> TaskProgress ROWS_INSERTED="apiusage~296" TASK_HADOOP_PROGRESS="2009-07-30
> 10:42:43,031 map = 40%,  reduce =0%"
> TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File
> Systems.HDFS bytes read:23019,File Systems.HDFS bytes written:19178,Job
> Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job
> Counters .Data-local map
> tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:592,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:6,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:296,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce
> Framework.Map input records:302,Map-Reduce Framework.Map input
> bytes:23019,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1"
> QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409"
> TIME="1248975763033"
> TaskProgress ROWS_INSERTED="apiusage~1471" TASK_HADOOP_PROGRESS="2009-07-30
> 10:42:44,068 map = 100%,  reduce =100%"
> TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File
> Systems.HDFS bytes read:114068,File Systems.HDFS bytes written:95275,Job
> Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job
> Counters .Data-local map
> tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:2942,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:27,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:1471,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce
> Framework.Map input records:1498,Map-Reduce Framework.Map input
> bytes:114068,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1"
> QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409"
> TIME="1248975764071"
> TaskEnd ROWS_INSERTED="apiusage~1471" TASK_RET_CODE="0"
> TASK_HADOOP_PROGRESS="2009-07-30 10:42:44,068 map = 100%,  reduce =100%"
> TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File
> Systems.HDFS bytes read:114068,File Systems.HDFS bytes written:95275,Job
> Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job
> Counters .Data-local map
> tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:2942,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:27,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:1471,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce
> Framework.Map input records:1498,Map-Reduce Framework.Map input
> bytes:114068,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1"
> QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409"
> TIME="1248975764199"
> TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.ConditionalTask"
> TASK_ID="Stage-4" QUERY_ID="app_20090730104242" TIME="1248975764199"
> TaskEnd TASK_RET_CODE="0"
> TASK_NAME="org.apache.hadoop.hive.ql.exec.ConditionalTask" TASK_ID="Stage-4"
> QUERY_ID="app_20090730104242" TIME="1248975782277"
> TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.MoveTask"
> TASK_ID="Stage-0" QUERY_ID="app_20090730104242" TIME="1248975782277"
> TaskEnd TASK_RET_CODE="1"
> TASK_NAME="org.apache.hadoop.hive.ql.exec.MoveTask" TASK_ID="Stage-0"
> QUERY_ID="app_20090730104242" TIME="1248975782473"
> QueryEnd ROWS_INSERTED="apiusage~1471" QUERY_STRING="INSERT OVERWRITE TABLE
> ApiUsage PARTITION (dt = "20090518") SELECT `(requestDate)?+.+` FROM
> ApiUsageSmall WHERE requestDate = '2009/05/18'"
> QUERY_ID="app_20090730104242" QUERY_NUM_TASKS="2" TIME="1248975782474"
>
>
>
> On Thu, Jul 30, 2009 at 10:09 AM, Prasad Chakka <pchakka@facebook.com <
> http://pchakka@facebook.com>  <ht...@facebook.com>  <
> http://pchakka@facebook.com> > wrote:
>
> Are you sure you are getting the same error even with the schema below
> (i.e. trying to set a string to an int column?). Can you give the full stack
> trace that you might see in /tmp/$USER/hive.log?
>
>
> ------------------------------
> *From: *Bill Graham <billgraham@gmail.com <ht...@gmail.com>  <
> http://billgraham@gmail.com>  <ht...@gmail.com>  <
> http://billgraham@gmail.com> >
> *Reply-To: *<hive-user@hadoop.apache.org <
> http://hive-user@hadoop.apache.org>  <ht...@hadoop.apache.org>
>  <ht...@hadoop.apache.org>  <ht...@hadoop.apache.org>
> >, <billgraham@gmail.com <ht...@gmail.com>  <
> http://billgraham@gmail.com>  <ht...@gmail.com>  <
> http://billgraham@gmail.com> >
>
>
>
> *Date: *Thu, 30 Jul 2009 10:02:54 -0700
> *To: *Zheng Shao <zshao9@gmail.com <ht...@gmail.com>  <
> http://zshao9@gmail.com>  <ht...@gmail.com>  <
> http://zshao9@gmail.com> >
> *Cc: *<hive-user@hadoop.apache.org <ht...@hadoop.apache.org>  <
> http://hive-user@hadoop.apache.org>  <ht...@hadoop.apache.org>
>  <ht...@hadoop.apache.org> >
>
>
>
> *Subject: *Re: partitions not being created
>
>
> Based on these describe statements, is what I'm trying to do feasable? I'm
> basically trying to load the contents of ApiUsageTemp into ApiUsage, with
> the ApiUsageTemp.requestdate column becoming the ApiUsage.dt partition.
>
>
> On Wed, Jul 29, 2009 at 9:28 AM, Bill Graham <billgraham@gmail.com <
> http://billgraham@gmail.com>  <ht...@gmail.com>  <
> http://billgraham@gmail.com>  <ht...@gmail.com> > wrote:
>
> Sure. The only difference I see is that the ApiUsage has a dt partition,
> instead of the requestdate column:
>
> hive> describe extended
> ApiUsage;
> OK
> user    string
> restresource    string
> statuscode      int
> requesthour     int
> numrequests     string
> responsetime    string
> numslowrequests string
> dt      string
>
> Detailed Table Information      Table(tableName:apiusage, dbName:default,
> owner:grahamb, createTime:1248884801, lastAccessTime:0, retention:0,
> sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string,
> comment:null), FieldSchema(name:restresource, type:string, comment:null),
> FieldSchema(name:statuscode, type:int, comment:null),
> FieldSchema(name:requesthour, type:int, comment:null),
> FieldSchema(name:numrequests, type:string, comment:null),
> FieldSchema(name:responsetime, type:string, comment:null),
> FieldSchema(name:numslowrequests, type:string, comment:null)],
> location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage <
> http://c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage> ,
> inputFormat:org.apache.hadoop.mapred.TextInputFormat,
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
> parameters:{field.delim= , serialization.format= }), bucketCols:[],
> sortCols:[], parameters:{}), partitionKeys:[FieldSchema(name:dt,
> type:string, comment:null)], parameters:{})
>
> Time taken: 0.277 seconds
> hive> describe extended ApiUsageTemp;
> OK
> user    string
> restresource    string
> statuscode      int
> requestdate     string
> requesthour     int
> numrequests     string
> responsetime    string
> numslowrequests string
>
> Detailed Table Information      Table(tableName:apiusagetemp,
> dbName:default, owner:grahamb, createTime:1248466925, lastAccessTime:0,
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string,
> comment:null), FieldSchema(name:restresource, type:string, comment:null),
> FieldSchema(name:statuscode, type:int, comment:null),
> FieldSchema(name:requestdate, type:string, comment:null),
> FieldSchema(name:requesthour, type:int, comment:null),
> FieldSchema(name:numrequests, type:string, comment:null),
> FieldSchema(name:responsetime, type:string, comment:null),
> FieldSchema(name:numslowrequests, type:string, comment:null)],
> location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage <
> http://c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage> ,
> inputFormat:org.apache.hadoop.mapred.TextInputFormat,
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
> parameters:{field.delim= , serialization.format= }), bucketCols:[],
> sortCols:[], parameters:{}), partitionKeys:[],
> parameters:{last_modified_time=1248826696, last_modified_by=app})
>
> Time taken: 0.235 seconds
>
>
>
> On Tue, Jul 28, 2009 at 9:03 PM, Zheng Shao <zshao9@gmail.com <
> http://zshao9@gmail.com>  <ht...@gmail.com>  <
> http://zshao9@gmail.com>  <ht...@gmail.com> > wrote:
>
> Can you send the output of these 2 commands?
>
> describe extended ApiUsage;
> describe extended ApiUsageTemp;
>
>
> Zheng
>
> On Tue, Jul 28, 2009 at 6:29 PM, Bill Graham<billgraham@gmail.com <
> http://billgraham@gmail.com>  <ht...@gmail.com>  <
> http://billgraham@gmail.com>  <ht...@gmail.com> > wrote:
> > Thanks for the tip, but it fails in the same way when I use a string.
> >
> > On Tue, Jul 28, 2009 at 6:21 PM, David Lerman <dlerman@videoegg.com <
> http://dlerman@videoegg.com>  <ht...@videoegg.com>  <
> http://dlerman@videoegg.com>  <ht...@videoegg.com> > wrote:
> >>
> >> >> hive> create table partTable (a string, b int) partitioned by (dt
> int);
> >>
> >> > INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518")
> >> > SELECT `(requestDate)?+.+` FROM ApiUsageTemp WHERE requestDate =
> >> > '2009/05/18'
> >>
> >> The table has an int partition column (dt), but you're trying to set a
> >> string value (dt = "20090518").
> >>
> >> Try :
> >>
> >> create table partTable (a string, b int) partitioned by (dt string);
> >>
> >> and then do your insert.
> >>
> >
> >
>
>
>
> --
> Yours,
> Zheng
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: partitions not being created

Posted by Prasad Chakka <pc...@facebook.com>.
This is a very common op that we do everyday. So I am surprised that this is happening. This time you are executing commands from CLI which connects to a thrift Metastore Server. Correct?
Can you print the stack trace from Hive server (or the metastore server) logs?

Prasad
________________________________
From: Bill Graham <bi...@gmail.com>
Reply-To: <bi...@gmail.com>
Date: Fri, 31 Jul 2009 14:00:20 -0700
To: Prasad Chakka <pc...@facebook.com>
Cc: <hi...@hadoop.apache.org>
Subject: Re: partitions not being created

I just completely removed my all of my Hive tables and folders in HDFS, as well as metadata_db. I then re-built Hive from the latest from the trunk. After replacing my Hive server with the contents of build/dist, and doing the same for my client, I created new tables from scratch and again tried to migrate from ApiUsageTemp --> ApiUsage. I got the same "get_partition failed: unknown result" error.

I decided to skip the table migration and just load data directly into a partitioned table. That also gives the same error. Below is what I tried. Any ideas?

hive> CREATE TABLE ApiUsage
    >     (user STRING, restResource STRING, statusCode INT, requestDate STRING, requestHour INT, numRequests STRING, responseTime STRING, numSlowRequests STRING)
    >     PARTITIONED BY (dt STRING)
    >     ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ';
OK
Time taken: 0.27 seconds

hive> describe extended ApiUsage;
OK
user    string
restresource    string
statuscode      int
requestdate     string
requesthour     int
numrequests     string
responsetime    string
numslowrequests string
dt      string

Detailed Table Information      Table(tableName:apiusage, dbName:default, owner:grahamb, createTime:1249073147, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string, comment:null), FieldSchema(name:restresource, type:string, comment:null), FieldSchema(name:statuscode, type:int, comment:null), FieldSchema(name:requestdate, type:string, comment:null), FieldSchema(name:requesthour, type:int, comment:null), FieldSchema(name:numrequests, type:string, comment:null), FieldSchema(name:responsetime, type:string, comment:null), FieldSchema(name:numslowrequests, type:string, comment:null)], location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{field.delim= , serialization.format= }), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[FieldSchema(name:dt, type:string, comment:null)], parameters:{})
Time taken: 0.276 seconds

hive> LOAD DATA INPATH "sample_data/apilogs/summary-small/2009/05/18" INTO TABLE ApiUsage PARTITION (dt = "20090518" );
Loading data to table apiusage partition {dt=20090518}
Failed with exception org.apache.thrift.TApplicationException: get_partition failed: unknown result
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

On Thu, Jul 30, 2009 at 1:24 PM, Prasad Chakka <pc...@facebook.com> wrote:
This is not backward compatibility issue. Check HIVE-592 for details. Before this patch, a rename doesn't change the name of the hdfs directory and if you create a new table with the old name of the renamed  table then both tables will be pointing to the same directory thus causing problems. HIVE-592 fixes this to rename directories correctly. So if you have created all tables after HIVE-592 patch went in, you should be fine.



________________________________
From: Bill Graham <billgraham@gmail.com <ht...@gmail.com> >
Reply-To: <billgraham@gmail.com <ht...@gmail.com> >
Date: Thu, 30 Jul 2009 13:09:03 -0700

To: Prasad Chakka <pchakka@facebook.com <ht...@facebook.com> >
Cc: <hive-user@hadoop.apache.org <ht...@hadoop.apache.org> >
Subject: Re: partitions not being created

I sent my last try reply before seeing your last email.

Thanks, that seems possible. I did initially create ApiUsageTemp using the most recent Hive release. Then while working on a JIRA I updated my Hive client and server to the more recent builds from the trunk.

If that could cause such a problem, this is troubling though, since it implies that we can't upgrade Hive without possibly corrupting our metadata store.

I'll try again from scratch though and see if it works, thanks.


On Thu, Jul 30, 2009 at 1:04 PM, Bill Graham <billgraham@gmail.com <ht...@gmail.com> > wrote:
Prasad,

My setup is Hive client -> Hive Server (with local metastore) -> Hadoop. I was also suspecting metastore issues, so I've tried multiple times with newly created destination tables and I see the same thing happening.

All of the log info I've been able to find I've included already in this thread. Let me know if there's anywhere else I could look for clues.

I've included from the client:
- /tmp/$USER/hive.log

And from the hive server:
- Stdout/err logs

- /tmp/$USER/hive_job_log*.txt

Is there anything else I should be looking at? All of the M/R logs don't show any exceptions anything suspect.

Thanks for your time and insights on this issue, I appreciate it.

thanks,
Bill


On Thu, Jul 30, 2009 at 11:57 AM, Prasad Chakka <pchakka@facebook.com <ht...@facebook.com> > wrote:
Bill,

The real error is happening on the Hive Metastore Server or Hive Server  (depending on the setup you are using). Error logs on it must have different stack trace. From the information below I am guessing that the way the destination table hdfs directories that got created has some problems. Can you drop that table (and make sure that there is no corresponding HDFS directory for both integer and string type partitions that you created) and retry the query.

If you don't want to drop the destination table then send me the logs on Hive Server.

Prasad



________________________________
From: Bill Graham <billgraham@gmail.com <ht...@gmail.com>  <ht...@gmail.com> >
Reply-To: <billgraham@gmail.com <ht...@gmail.com>  <ht...@gmail.com> >
Date: Thu, 30 Jul 2009 11:47:41 -0700

To: Prasad Chakka <pchakka@facebook.com <ht...@facebook.com>  <ht...@facebook.com> >
Cc: <hive-user@hadoop.apache.org <ht...@hadoop.apache.org>  <ht...@hadoop.apache.org> >
Subject: Re: partitions not being created

That file contains a similar error as the Hive Server logs:

2009-07-30 11:44:21,095 WARN  mapred.JobClient (JobClient.java:configureCommandLineOptions(510)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2009-07-30 11:44:48,070 WARN  mapred.JobClient (JobClient.java:configureCommandLineOptions(510)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2009-07-30 11:45:27,796 ERROR metadata.Hive (Hive.java:getPartition(588)) - org.apache.thrift.TApplicationException: get_partition failed: unknown result
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partition(ThriftHiveMetastore.java:784)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partition(ThriftHiveMetastore.java:752)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartition(HiveMetaStoreClient.java:415)
        at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:579)
        at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:466)
        at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:135)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:335)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:241)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:122)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:165)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:258)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

2009-07-30 11:45:27,797 ERROR exec.MoveTask (SessionState.java:printError(279)) - Failed with exception org.apache.thrift.TApplicationException: get_partition failed: unknown result
org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.TApplicationException: get_partition failed: unknown result
        at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:589)
        at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:466)
        at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:135)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:335)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:241)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:122)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:165)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:258)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
Caused by: org.apache.thrift.TApplicationException: get_partition failed: unknown result
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partition(ThriftHiveMetastore.java:784)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partition(ThriftHiveMetastore.java:752)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartition(HiveMetaStoreClient.java:415)
        at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:579)
        ... 16 more

2009-07-30 11:45:27,798 ERROR ql.Driver (SessionState.java:printError(279)) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

On Thu, Jul 30, 2009 at 11:33 AM, Prasad Chakka <pchakka@facebook.com <ht...@facebook.com>  <ht...@facebook.com> > wrote:

The hive logs go into /tmp/$USER/hive.log not hive_job_log*.txt.


________________________________
From: Bill Graham <billgraham@gmail.com <ht...@gmail.com>  <ht...@gmail.com>  <ht...@gmail.com> >
Reply-To: <billgraham@gmail.com <ht...@gmail.com>  <ht...@gmail.com>  <ht...@gmail.com> >


Date: Thu, 30 Jul 2009 10:52:06 -0700
To: Prasad Chakka <pchakka@facebook.com <ht...@facebook.com>  <ht...@facebook.com>  <ht...@facebook.com> >
Cc: <hive-user@hadoop.apache.org <ht...@hadoop.apache.org>  <ht...@hadoop.apache.org>  <ht...@hadoop.apache.org> >, Zheng Shao <zshao9@gmail.com <ht...@gmail.com>  <ht...@gmail.com>  <ht...@gmail.com> >



Subject: Re: partitions not being created

I'm trying to set a string to a string and I'm seeing this error. I also had an attempt where it was a string to an int, and I also saw the same error.

The /tmp/$USER/hive_job_log*.txt file doesn't contain any exceptions, but I've included it's output below. Only the Hive server logs show the exceptions listed above. (Note that the table I'm loading from in this log output is ApiUsageSmall, which is identical to ApiUsageTemp. For some reason the data from ApiUsageTemp is now gone.)

QueryStart QUERY_STRING="INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518") SELECT `(requestDate)?+.+` FROM ApiUsageSmall WHERE requestDate = '2009/05/18'" QUERY_ID="app_20090730104242" TIME="1248975752235"
TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TIME="1248975752235"
TaskProgress TASK_HADOOP_PROGRESS="2009-07-30 10:42:34,783 map = 0%,  reduce =0%" TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="Job Counters .Launched map tasks:1,Job Counters .Data-local map tasks:1" TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409" TIME="1248975754785"
TaskProgress ROWS_INSERTED="apiusage~296" TASK_HADOOP_PROGRESS="2009-07-30 10:42:43,031 map = 40%,  reduce =0%" TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File Systems.HDFS bytes read:23019,File Systems.HDFS bytes written:19178,Job Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job Counters .Data-local map tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:592,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:6,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:296,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce Framework.Map input records:302,Map-Reduce Framework.Map input bytes:23019,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409" TIME="1248975763033"
TaskProgress ROWS_INSERTED="apiusage~1471" TASK_HADOOP_PROGRESS="2009-07-30 10:42:44,068 map = 100%,  reduce =100%" TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File Systems.HDFS bytes read:114068,File Systems.HDFS bytes written:95275,Job Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job Counters .Data-local map tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:2942,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:27,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:1471,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce Framework.Map input records:1498,Map-Reduce Framework.Map input bytes:114068,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409" TIME="1248975764071"
TaskEnd ROWS_INSERTED="apiusage~1471" TASK_RET_CODE="0" TASK_HADOOP_PROGRESS="2009-07-30 10:42:44,068 map = 100%,  reduce =100%" TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File Systems.HDFS bytes read:114068,File Systems.HDFS bytes written:95275,Job Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job Counters .Data-local map tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:2942,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:27,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:1471,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce Framework.Map input records:1498,Map-Reduce Framework.Map input bytes:114068,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409" TIME="1248975764199"
TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.ConditionalTask" TASK_ID="Stage-4" QUERY_ID="app_20090730104242" TIME="1248975764199"
TaskEnd TASK_RET_CODE="0" TASK_NAME="org.apache.hadoop.hive.ql.exec.ConditionalTask" TASK_ID="Stage-4" QUERY_ID="app_20090730104242" TIME="1248975782277"
TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.MoveTask" TASK_ID="Stage-0" QUERY_ID="app_20090730104242" TIME="1248975782277"
TaskEnd TASK_RET_CODE="1" TASK_NAME="org.apache.hadoop.hive.ql.exec.MoveTask" TASK_ID="Stage-0" QUERY_ID="app_20090730104242" TIME="1248975782473"
QueryEnd ROWS_INSERTED="apiusage~1471" QUERY_STRING="INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518") SELECT `(requestDate)?+.+` FROM ApiUsageSmall WHERE requestDate = '2009/05/18'" QUERY_ID="app_20090730104242" QUERY_NUM_TASKS="2" TIME="1248975782474"



On Thu, Jul 30, 2009 at 10:09 AM, Prasad Chakka <pchakka@facebook.com <ht...@facebook.com>  <ht...@facebook.com>  <ht...@facebook.com> > wrote:
Are you sure you are getting the same error even with the schema below (i.e. trying to set a string to an int column?). Can you give the full stack trace that you might see in /tmp/$USER/hive.log?


________________________________
From: Bill Graham <billgraham@gmail.com <ht...@gmail.com>  <ht...@gmail.com>  <ht...@gmail.com>  <ht...@gmail.com> >
Reply-To: <hive-user@hadoop.apache.org <ht...@hadoop.apache.org>  <ht...@hadoop.apache.org>  <ht...@hadoop.apache.org>  <ht...@hadoop.apache.org> >, <billgraham@gmail.com <ht...@gmail.com>  <ht...@gmail.com>  <ht...@gmail.com>  <ht...@gmail.com> >



Date: Thu, 30 Jul 2009 10:02:54 -0700
To: Zheng Shao <zshao9@gmail.com <ht...@gmail.com>  <ht...@gmail.com>  <ht...@gmail.com>  <ht...@gmail.com> >
Cc: <hive-user@hadoop.apache.org <ht...@hadoop.apache.org>  <ht...@hadoop.apache.org>  <ht...@hadoop.apache.org>  <ht...@hadoop.apache.org> >



Subject: Re: partitions not being created


Based on these describe statements, is what I'm trying to do feasable? I'm basically trying to load the contents of ApiUsageTemp into ApiUsage, with the ApiUsageTemp.requestdate column becoming the ApiUsage.dt partition.


On Wed, Jul 29, 2009 at 9:28 AM, Bill Graham <billgraham@gmail.com <ht...@gmail.com>  <ht...@gmail.com>  <ht...@gmail.com>  <ht...@gmail.com> > wrote:
Sure. The only difference I see is that the ApiUsage has a dt partition, instead of the requestdate column:

hive> describe extended ApiUsage;
OK
user    string
restresource    string
statuscode      int
requesthour     int
numrequests     string
responsetime    string
numslowrequests string
dt      string

Detailed Table Information      Table(tableName:apiusage, dbName:default, owner:grahamb, createTime:1248884801, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string, comment:null), FieldSchema(name:restresource, type:string, comment:null), FieldSchema(name:statuscode, type:int, comment:null), FieldSchema(name:requesthour, type:int, comment:null), FieldSchema(name:numrequests, type:string, comment:null), FieldSchema(name:responsetime, type:string, comment:null), FieldSchema(name:numslowrequests, type:string, comment:null)], location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage <http://c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage> , inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{field.delim= , serialization.format= }), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[FieldSchema(name:dt, type:string, comment:null)], parameters:{})

Time taken: 0.277 seconds
hive> describe extended ApiUsageTemp;
OK
user    string
restresource    string
statuscode      int
requestdate     string
requesthour     int
numrequests     string
responsetime    string
numslowrequests string

Detailed Table Information      Table(tableName:apiusagetemp, dbName:default, owner:grahamb, createTime:1248466925, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string, comment:null), FieldSchema(name:restresource, type:string, comment:null), FieldSchema(name:statuscode, type:int, comment:null), FieldSchema(name:requestdate, type:string, comment:null), FieldSchema(name:requesthour, type:int, comment:null), FieldSchema(name:numrequests, type:string, comment:null), FieldSchema(name:responsetime, type:string, comment:null), FieldSchema(name:numslowrequests, type:string, comment:null)], location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage <http://c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage> , inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{field.delim= , serialization.format= }), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[], parameters:{last_modified_time=1248826696, last_modified_by=app})

Time taken: 0.235 seconds



On Tue, Jul 28, 2009 at 9:03 PM, Zheng Shao <zshao9@gmail.com <ht...@gmail.com>  <ht...@gmail.com>  <ht...@gmail.com>  <ht...@gmail.com> > wrote:
Can you send the output of these 2 commands?

describe extended ApiUsage;
describe extended ApiUsageTemp;


Zheng

On Tue, Jul 28, 2009 at 6:29 PM, Bill Graham<billgraham@gmail.com <ht...@gmail.com>  <ht...@gmail.com>  <ht...@gmail.com>  <ht...@gmail.com> > wrote:
> Thanks for the tip, but it fails in the same way when I use a string.
>
> On Tue, Jul 28, 2009 at 6:21 PM, David Lerman <dlerman@videoegg.com <ht...@videoegg.com>  <ht...@videoegg.com>  <ht...@videoegg.com>  <ht...@videoegg.com> > wrote:
>>
>> >> hive> create table partTable (a string, b int) partitioned by (dt int);
>>
>> > INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518")
>> > SELECT `(requestDate)?+.+` FROM ApiUsageTemp WHERE requestDate =
>> > '2009/05/18'
>>
>> The table has an int partition column (dt), but you're trying to set a
>> string value (dt = "20090518").
>>
>> Try :
>>
>> create table partTable (a string, b int) partitioned by (dt string);
>>
>> and then do your insert.
>>
>
>



--
Yours,
Zheng













Re: partitions not being created

Posted by Bill Graham <bi...@gmail.com>.
I just completely removed my all of my Hive tables and folders in HDFS, as
well as metadata_db. I then re-built Hive from the latest from the trunk.
After replacing my Hive server with the contents of build/dist, and doing
the same for my client, I created new tables from scratch and again tried to
migrate from ApiUsageTemp --> ApiUsage. I got the same "get_partition
failed: unknown result" error.

I decided to skip the table migration and just load data directly into a
partitioned table. That also gives the same error. Below is what I tried.
Any ideas?

hive> CREATE TABLE ApiUsage
    >     (user STRING, restResource STRING, statusCode INT, requestDate
STRING, requestHour INT, numRequests STRING, responseTime STRING,
numSlowRequests STRING)
    >     PARTITIONED BY (dt STRING)
    >     ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ';
OK
Time taken: 0.27 seconds

hive> describe extended
ApiUsage;

OK
user    string
restresource    string
statuscode      int
requestdate     string
requesthour     int
numrequests     string
responsetime    string
numslowrequests string
dt      string

Detailed Table Information      Table(tableName:apiusage, dbName:default,
owner:grahamb, createTime:1249073147, lastAccessTime:0, retention:0,
sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string,
comment:null), FieldSchema(name:restresource, type:string, comment:null),
FieldSchema(name:statuscode, type:int, comment:null),
FieldSchema(name:requestdate, type:string, comment:null),
FieldSchema(name:requesthour, type:int, comment:null),
FieldSchema(name:numrequests, type:string, comment:null),
FieldSchema(name:responsetime, type:string, comment:null),
FieldSchema(name:numslowrequests, type:string, comment:null)],
location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage,
inputFormat:org.apache.hadoop.mapred.TextInputFormat,
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
parameters:{field.delim= , serialization.format= }), bucketCols:[],
sortCols:[], parameters:{}), partitionKeys:[FieldSchema(name:dt,
type:string, comment:null)], parameters:{})
Time taken: 0.276 seconds

hive> LOAD DATA INPATH "sample_data/apilogs/summary-small/2009/05/18" INTO
TABLE ApiUsage PARTITION (dt = "20090518" );
Loading data to table apiusage partition {dt=20090518}
Failed with exception org.apache.thrift.TApplicationException: get_partition
failed: unknown result
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MoveTask

On Thu, Jul 30, 2009 at 1:24 PM, Prasad Chakka <pc...@facebook.com> wrote:

>  This is not backward compatibility issue. Check HIVE-592 for details.
> Before this patch, a rename doesn’t change the name of the hdfs directory
> and if you create a new table with the old name of the renamed  table then
> both tables will be pointing to the same directory thus causing problems.
> HIVE-592 fixes this to rename directories correctly. So if you have created
> all tables after HIVE-592 patch went in, you should be fine.
>
>
> ------------------------------
> *From: *Bill Graham <bi...@gmail.com>
> *Reply-To: *<bi...@gmail.com>
> *Date: *Thu, 30 Jul 2009 13:09:03 -0700
> *To: *Prasad Chakka <pc...@facebook.com>
> *Cc: *<hi...@hadoop.apache.org>
> *Subject: *Re: partitions not being created
>
> I sent my last try reply before seeing your last email.
>
> Thanks, that seems possible. I did initially create ApiUsageTemp using the
> most recent Hive release. Then while working on a JIRA I updated my Hive
> client and server to the more recent builds from the trunk.
>
> If that could cause such a problem, this is troubling though, since it
> implies that we can't upgrade Hive without possibly corrupting our metadata
> store.
>
> I'll try again from scratch though and see if it works, thanks.
>
>
> On Thu, Jul 30, 2009 at 1:04 PM, Bill Graham <bi...@gmail.com> wrote:
>
> Prasad,
>
> My setup is Hive client -> Hive Server (with local metastore) -> Hadoop. I
> was also suspecting metastore issues, so I've tried multiple times with
> newly created destination tables and I see the same thing happening.
>
> All of the log info I've been able to find I've included already in this
> thread. Let me know if there's anywhere else I could look for clues.
>
> I've included from the client:
> - /tmp/$USER/hive.log
>
> And from the hive server:
> - Stdout/err logs
>
> - /tmp/$USER/hive_job_log*.txt
>
> Is there anything else I should be looking at? All of the M/R logs don't
> show any exceptions anything suspect.
>
> Thanks for your time and insights on this issue, I appreciate it.
>
> thanks,
> Bill
>
>
> On Thu, Jul 30, 2009 at 11:57 AM, Prasad Chakka <pc...@facebook.com>
> wrote:
>
> Bill,
>
> The real error is happening on the Hive Metastore Server or Hive Server
>  (depending on the setup you are using). Error logs on it must have
> different stack trace. From the information below I am guessing that the way
> the destination table hdfs directories that got created has some problems.
> Can you drop that table (and make sure that there is no corresponding HDFS
> directory for both integer and string type partitions that you created) and
> retry the query.
>
> If you don’t want to drop the destination table then send me the logs on
> Hive Server.
>
> Prasad
>
>
>
> ------------------------------
> *From: *Bill Graham <billgraham@gmail.com <ht...@gmail.com> >
> *Reply-To: *<billgraham@gmail.com <ht...@gmail.com> >
> *Date: *Thu, 30 Jul 2009 11:47:41 -0700
>
> *To: *Prasad Chakka <pchakka@facebook.com <ht...@facebook.com> >
> *Cc: *<hive-user@hadoop.apache.org <ht...@hadoop.apache.org> >
> *Subject: *Re: partitions not being created
>
> That file contains a similar error as the Hive Server logs:
>
> 2009-07-30 11:44:21,095 WARN  mapred.JobClient
> (JobClient.java:configureCommandLineOptions(510)) - Use GenericOptionsParser
> for parsing the arguments. Applications should implement Tool for the same.
> 2009-07-30 11:44:48,070 WARN  mapred.JobClient
> (JobClient.java:configureCommandLineOptions(510)) - Use GenericOptionsParser
> for parsing the arguments. Applications should implement Tool for the same.
> 2009-07-30 11:45:27,796 ERROR metadata.Hive (Hive.java:getPartition(588)) -
> org.apache.thrift.TApplicationException: get_partition failed: unknown
> result
>         at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partition(ThriftHiveMetastore.java:784)
>         at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partition(ThriftHiveMetastore.java:752)
>         at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartition(HiveMetaStoreClient.java:415)
>         at
> org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:579)
>         at
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:466)
>         at
> org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:135)
>         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:335)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:241)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:122)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:165)
>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:258)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>         at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>         at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>
> 2009-07-30 11:45:27,797 ERROR exec.MoveTask
> (SessionState.java:printError(279)) - Failed with exception
> org.apache.thrift.TApplicationException: get_partition failed: unknown
> result
> org.apache.hadoop.hive.ql.metadata.HiveException:
> org.apache.thrift.TApplicationException: get_partition failed: unknown
> result
>         at
> org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:589)
>         at
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:466)
>         at
> org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:135)
>         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:335)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:241)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:122)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:165)
>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:258)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>         at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>         at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> Caused by: org.apache.thrift.TApplicationException: get_partition failed:
> unknown result
>         at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partition(ThriftHiveMetastore.java:784)
>         at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partition(ThriftHiveMetastore.java:752)
>         at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartition(HiveMetaStoreClient.java:415)
>         at
> org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:579)
>         ... 16 more
>
> 2009-07-30 11:45:27,798 ERROR ql.Driver (SessionState.java:printError(279))
> - FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.MoveTask
>
> On Thu, Jul 30, 2009 at 11:33 AM, Prasad Chakka <pchakka@facebook.com <
> http://pchakka@facebook.com> > wrote:
>
>
> The hive logs go into /tmp/$USER/hive.log not hive_job_log*.txt.
>
>
> ------------------------------
> *From: *Bill Graham <billgraham@gmail.com <ht...@gmail.com>  <
> http://billgraham@gmail.com> >
> *Reply-To: *<billgraham@gmail.com <ht...@gmail.com>  <
> http://billgraham@gmail.com> >
>
> *Date: *Thu, 30 Jul 2009 10:52:06 -0700
> *To: *Prasad Chakka <pchakka@facebook.com <ht...@facebook.com>  <
> http://pchakka@facebook.com> >
> *Cc: *<hive-user@hadoop.apache.org <ht...@hadoop.apache.org>  <
> http://hive-user@hadoop.apache.org> >, Zheng Shao <zshao9@gmail.com <
> http://zshao9@gmail.com>  <ht...@gmail.com> >
>
>
> *Subject: *Re: partitions not being created
>
> I'm trying to set a string to a string and I'm seeing this error. I also
> had an attempt where it was a string to an int, and I also saw the same
> error.
>
> The /tmp/$USER/hive_job_log*.txt file doesn't contain any exceptions, but
> I've included it's output below. Only the Hive server logs show the
> exceptions listed above. (Note that the table I'm loading from in this log
> output is ApiUsageSmall, which is identical to ApiUsageTemp. For some reason
> the data from ApiUsageTemp is now gone.)
>
> QueryStart QUERY_STRING="INSERT OVERWRITE TABLE ApiUsage PARTITION (dt =
> "20090518") SELECT `(requestDate)?+.+` FROM ApiUsageSmall WHERE requestDate
> = '2009/05/18'" QUERY_ID="app_20090730104242" TIME="1248975752235"
> TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver"
> TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TIME="1248975752235"
> TaskProgress TASK_HADOOP_PROGRESS="2009-07-30 10:42:34,783 map = 0%,
> reduce =0%" TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver"
> TASK_COUNTERS="Job Counters .Launched map tasks:1,Job Counters .Data-local
> map tasks:1" TASK_ID="Stage-1" QUERY_ID="app_20090730104242"
> TASK_HADOOP_ID="job_200906301559_0409" TIME="1248975754785"
> TaskProgress ROWS_INSERTED="apiusage~296" TASK_HADOOP_PROGRESS="2009-07-30
> 10:42:43,031 map = 40%,  reduce =0%"
> TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File
> Systems.HDFS bytes read:23019,File Systems.HDFS bytes written:19178,Job
> Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job
> Counters .Data-local map
> tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:592,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:6,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:296,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce
> Framework.Map input records:302,Map-Reduce Framework.Map input
> bytes:23019,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1"
> QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409"
> TIME="1248975763033"
> TaskProgress ROWS_INSERTED="apiusage~1471" TASK_HADOOP_PROGRESS="2009-07-30
> 10:42:44,068 map = 100%,  reduce =100%"
> TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File
> Systems.HDFS bytes read:114068,File Systems.HDFS bytes written:95275,Job
> Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job
> Counters .Data-local map
> tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:2942,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:27,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:1471,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce
> Framework.Map input records:1498,Map-Reduce Framework.Map input
> bytes:114068,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1"
> QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409"
> TIME="1248975764071"
> TaskEnd ROWS_INSERTED="apiusage~1471" TASK_RET_CODE="0"
> TASK_HADOOP_PROGRESS="2009-07-30 10:42:44,068 map = 100%,  reduce =100%"
> TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File
> Systems.HDFS bytes read:114068,File Systems.HDFS bytes written:95275,Job
> Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job
> Counters .Data-local map
> tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:2942,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:27,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:1471,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce
> Framework.Map input records:1498,Map-Reduce Framework.Map input
> bytes:114068,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1"
> QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409"
> TIME="1248975764199"
> TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.ConditionalTask"
> TASK_ID="Stage-4" QUERY_ID="app_20090730104242" TIME="1248975764199"
> TaskEnd TASK_RET_CODE="0"
> TASK_NAME="org.apache.hadoop.hive.ql.exec.ConditionalTask" TASK_ID="Stage-4"
> QUERY_ID="app_20090730104242" TIME="1248975782277"
> TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.MoveTask"
> TASK_ID="Stage-0" QUERY_ID="app_20090730104242" TIME="1248975782277"
> TaskEnd TASK_RET_CODE="1"
> TASK_NAME="org.apache.hadoop.hive.ql.exec.MoveTask" TASK_ID="Stage-0"
> QUERY_ID="app_20090730104242" TIME="1248975782473"
> QueryEnd ROWS_INSERTED="apiusage~1471" QUERY_STRING="INSERT OVERWRITE TABLE
> ApiUsage PARTITION (dt = "20090518") SELECT `(requestDate)?+.+` FROM
> ApiUsageSmall WHERE requestDate = '2009/05/18'"
> QUERY_ID="app_20090730104242" QUERY_NUM_TASKS="2" TIME="1248975782474"
>
>
>
> On Thu, Jul 30, 2009 at 10:09 AM, Prasad Chakka <pchakka@facebook.com <
> http://pchakka@facebook.com>  <ht...@facebook.com> > wrote:
>
> Are you sure you are getting the same error even with the schema below
> (i.e. trying to set a string to an int column?). Can you give the full stack
> trace that you might see in /tmp/$USER/hive.log?
>
>
> ------------------------------
> *From: *Bill Graham <billgraham@gmail.com <ht...@gmail.com>  <
> http://billgraham@gmail.com>  <ht...@gmail.com> >
> *Reply-To: *<hive-user@hadoop.apache.org <
> http://hive-user@hadoop.apache.org>  <ht...@hadoop.apache.org>
>  <ht...@hadoop.apache.org> >, <billgraham@gmail.com <
> http://billgraham@gmail.com>  <ht...@gmail.com>  <
> http://billgraham@gmail.com> >
>
>
> *Date: *Thu, 30 Jul 2009 10:02:54 -0700
> *To: *Zheng Shao <zshao9@gmail.com <ht...@gmail.com>  <
> http://zshao9@gmail.com>  <ht...@gmail.com> >
> *Cc: *<hive-user@hadoop.apache.org <ht...@hadoop.apache.org>  <
> http://hive-user@hadoop.apache.org>  <ht...@hadoop.apache.org>
> >
>
>
> *Subject: *Re: partitions not being created
>
>
> Based on these describe statements, is what I'm trying to do feasable? I'm
> basically trying to load the contents of ApiUsageTemp into ApiUsage, with
> the ApiUsageTemp.requestdate column becoming the ApiUsage.dt partition.
>
>
> On Wed, Jul 29, 2009 at 9:28 AM, Bill Graham <billgraham@gmail.com <
> http://billgraham@gmail.com>  <ht...@gmail.com>  <
> http://billgraham@gmail.com> > wrote:
>
> Sure. The only difference I see is that the ApiUsage has a dt partition,
> instead of the requestdate column:
>
> hive> describe extended
> ApiUsage;
> OK
> user    string
> restresource    string
> statuscode      int
> requesthour     int
> numrequests     string
> responsetime    string
> numslowrequests string
> dt      string
>
> Detailed Table Information      Table(tableName:apiusage, dbName:default,
> owner:grahamb, createTime:1248884801, lastAccessTime:0, retention:0,
> sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string,
> comment:null), FieldSchema(name:restresource, type:string, comment:null),
> FieldSchema(name:statuscode, type:int, comment:null),
> FieldSchema(name:requesthour, type:int, comment:null),
> FieldSchema(name:numrequests, type:string, comment:null),
> FieldSchema(name:responsetime, type:string, comment:null),
> FieldSchema(name:numslowrequests, type:string, comment:null)],
> location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage <
> http://c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage> ,
> inputFormat:org.apache.hadoop.mapred.TextInputFormat,
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
> parameters:{field.delim= , serialization.format= }), bucketCols:[],
> sortCols:[], parameters:{}), partitionKeys:[FieldSchema(name:dt,
> type:string, comment:null)], parameters:{})
>
> Time taken: 0.277 seconds
> hive> describe extended ApiUsageTemp;
> OK
> user    string
> restresource    string
> statuscode      int
> requestdate     string
> requesthour     int
> numrequests     string
> responsetime    string
> numslowrequests string
>
> Detailed Table Information      Table(tableName:apiusagetemp,
> dbName:default, owner:grahamb, createTime:1248466925, lastAccessTime:0,
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string,
> comment:null), FieldSchema(name:restresource, type:string, comment:null),
> FieldSchema(name:statuscode, type:int, comment:null),
> FieldSchema(name:requestdate, type:string, comment:null),
> FieldSchema(name:requesthour, type:int, comment:null),
> FieldSchema(name:numrequests, type:string, comment:null),
> FieldSchema(name:responsetime, type:string, comment:null),
> FieldSchema(name:numslowrequests, type:string, comment:null)],
> location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage <
> http://c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage> ,
> inputFormat:org.apache.hadoop.mapred.TextInputFormat,
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
> parameters:{field.delim= , serialization.format= }), bucketCols:[],
> sortCols:[], parameters:{}), partitionKeys:[],
> parameters:{last_modified_time=1248826696, last_modified_by=app})
>
> Time taken: 0.235 seconds
>
>
>
> On Tue, Jul 28, 2009 at 9:03 PM, Zheng Shao <zshao9@gmail.com <
> http://zshao9@gmail.com>  <ht...@gmail.com>  <
> http://zshao9@gmail.com> > wrote:
>
> Can you send the output of these 2 commands?
>
> describe extended ApiUsage;
> describe extended ApiUsageTemp;
>
>
> Zheng
>
> On Tue, Jul 28, 2009 at 6:29 PM, Bill Graham<billgraham@gmail.com <
> http://billgraham@gmail.com>  <ht...@gmail.com>  <
> http://billgraham@gmail.com> > wrote:
> > Thanks for the tip, but it fails in the same way when I use a string.
> >
> > On Tue, Jul 28, 2009 at 6:21 PM, David Lerman <dlerman@videoegg.com <
> http://dlerman@videoegg.com>  <ht...@videoegg.com>  <
> http://dlerman@videoegg.com> > wrote:
> >>
> >> >> hive> create table partTable (a string, b int) partitioned by (dt
> int);
> >>
> >> > INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518")
> >> > SELECT `(requestDate)?+.+` FROM ApiUsageTemp WHERE requestDate =
> >> > '2009/05/18'
> >>
> >> The table has an int partition column (dt), but you're trying to set a
> >> string value (dt = "20090518").
> >>
> >> Try :
> >>
> >> create table partTable (a string, b int) partitioned by (dt string);
> >>
> >> and then do your insert.
> >>
> >
> >
>
>
>
> --
> Yours,
> Zheng
>
>
>
>
>
>
>
>
>
>
>
>

Re: partitions not being created

Posted by Prasad Chakka <pc...@facebook.com>.
This is not backward compatibility issue. Check HIVE-592 for details. Before this patch, a rename doesn't change the name of the hdfs directory and if you create a new table with the old name of the renamed  table then both tables will be pointing to the same directory thus causing problems. HIVE-592 fixes this to rename directories correctly. So if you have created all tables after HIVE-592 patch went in, you should be fine.


________________________________
From: Bill Graham <bi...@gmail.com>
Reply-To: <bi...@gmail.com>
Date: Thu, 30 Jul 2009 13:09:03 -0700
To: Prasad Chakka <pc...@facebook.com>
Cc: <hi...@hadoop.apache.org>
Subject: Re: partitions not being created

I sent my last try reply before seeing your last email.

Thanks, that seems possible. I did initially create ApiUsageTemp using the most recent Hive release. Then while working on a JIRA I updated my Hive client and server to the more recent builds from the trunk.

If that could cause such a problem, this is troubling though, since it implies that we can't upgrade Hive without possibly corrupting our metadata store.

I'll try again from scratch though and see if it works, thanks.


On Thu, Jul 30, 2009 at 1:04 PM, Bill Graham <bi...@gmail.com> wrote:
Prasad,

My setup is Hive client -> Hive Server (with local metastore) -> Hadoop. I was also suspecting metastore issues, so I've tried multiple times with newly created destination tables and I see the same thing happening.

All of the log info I've been able to find I've included already in this thread. Let me know if there's anywhere else I could look for clues.

I've included from the client:
- /tmp/$USER/hive.log

And from the hive server:
- Stdout/err logs

- /tmp/$USER/hive_job_log*.txt

Is there anything else I should be looking at? All of the M/R logs don't show any exceptions anything suspect.

Thanks for your time and insights on this issue, I appreciate it.

thanks,
Bill


On Thu, Jul 30, 2009 at 11:57 AM, Prasad Chakka <pc...@facebook.com> wrote:
Bill,

The real error is happening on the Hive Metastore Server or Hive Server  (depending on the setup you are using). Error logs on it must have different stack trace. From the information below I am guessing that the way the destination table hdfs directories that got created has some problems. Can you drop that table (and make sure that there is no corresponding HDFS directory for both integer and string type partitions that you created) and retry the query.

If you don't want to drop the destination table then send me the logs on Hive Server.

Prasad



________________________________
From: Bill Graham <billgraham@gmail.com <ht...@gmail.com> >
Reply-To: <billgraham@gmail.com <ht...@gmail.com> >
Date: Thu, 30 Jul 2009 11:47:41 -0700

To: Prasad Chakka <pchakka@facebook.com <ht...@facebook.com> >
Cc: <hive-user@hadoop.apache.org <ht...@hadoop.apache.org> >
Subject: Re: partitions not being created

That file contains a similar error as the Hive Server logs:

2009-07-30 11:44:21,095 WARN  mapred.JobClient (JobClient.java:configureCommandLineOptions(510)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2009-07-30 11:44:48,070 WARN  mapred.JobClient (JobClient.java:configureCommandLineOptions(510)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2009-07-30 11:45:27,796 ERROR metadata.Hive (Hive.java:getPartition(588)) - org.apache.thrift.TApplicationException: get_partition failed: unknown result
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partition(ThriftHiveMetastore.java:784)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partition(ThriftHiveMetastore.java:752)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartition(HiveMetaStoreClient.java:415)
        at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:579)
        at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:466)
        at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:135)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:335)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:241)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:122)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:165)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:258)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

2009-07-30 11:45:27,797 ERROR exec.MoveTask (SessionState.java:printError(279)) - Failed with exception org.apache.thrift.TApplicationException: get_partition failed: unknown result
org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.TApplicationException: get_partition failed: unknown result
        at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:589)
        at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:466)
        at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:135)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:335)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:241)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:122)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:165)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:258)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
Caused by: org.apache.thrift.TApplicationException: get_partition failed: unknown result
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partition(ThriftHiveMetastore.java:784)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partition(ThriftHiveMetastore.java:752)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartition(HiveMetaStoreClient.java:415)
        at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:579)
        ... 16 more

2009-07-30 11:45:27,798 ERROR ql.Driver (SessionState.java:printError(279)) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

On Thu, Jul 30, 2009 at 11:33 AM, Prasad Chakka <pchakka@facebook.com <ht...@facebook.com> > wrote:

The hive logs go into /tmp/$USER/hive.log not hive_job_log*.txt.


________________________________
From: Bill Graham <billgraham@gmail.com <ht...@gmail.com>  <ht...@gmail.com> >
Reply-To: <billgraham@gmail.com <ht...@gmail.com>  <ht...@gmail.com> >

Date: Thu, 30 Jul 2009 10:52:06 -0700
To: Prasad Chakka <pchakka@facebook.com <ht...@facebook.com>  <ht...@facebook.com> >
Cc: <hive-user@hadoop.apache.org <ht...@hadoop.apache.org>  <ht...@hadoop.apache.org> >, Zheng Shao <zshao9@gmail.com <ht...@gmail.com>  <ht...@gmail.com> >


Subject: Re: partitions not being created

I'm trying to set a string to a string and I'm seeing this error. I also had an attempt where it was a string to an int, and I also saw the same error.

The /tmp/$USER/hive_job_log*.txt file doesn't contain any exceptions, but I've included it's output below. Only the Hive server logs show the exceptions listed above. (Note that the table I'm loading from in this log output is ApiUsageSmall, which is identical to ApiUsageTemp. For some reason the data from ApiUsageTemp is now gone.)

QueryStart QUERY_STRING="INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518") SELECT `(requestDate)?+.+` FROM ApiUsageSmall WHERE requestDate = '2009/05/18'" QUERY_ID="app_20090730104242" TIME="1248975752235"
TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TIME="1248975752235"
TaskProgress TASK_HADOOP_PROGRESS="2009-07-30 10:42:34,783 map = 0%,  reduce =0%" TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="Job Counters .Launched map tasks:1,Job Counters .Data-local map tasks:1" TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409" TIME="1248975754785"
TaskProgress ROWS_INSERTED="apiusage~296" TASK_HADOOP_PROGRESS="2009-07-30 10:42:43,031 map = 40%,  reduce =0%" TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File Systems.HDFS bytes read:23019,File Systems.HDFS bytes written:19178,Job Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job Counters .Data-local map tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:592,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:6,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:296,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce Framework.Map input records:302,Map-Reduce Framework.Map input bytes:23019,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409" TIME="1248975763033"
TaskProgress ROWS_INSERTED="apiusage~1471" TASK_HADOOP_PROGRESS="2009-07-30 10:42:44,068 map = 100%,  reduce =100%" TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File Systems.HDFS bytes read:114068,File Systems.HDFS bytes written:95275,Job Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job Counters .Data-local map tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:2942,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:27,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:1471,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce Framework.Map input records:1498,Map-Reduce Framework.Map input bytes:114068,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409" TIME="1248975764071"
TaskEnd ROWS_INSERTED="apiusage~1471" TASK_RET_CODE="0" TASK_HADOOP_PROGRESS="2009-07-30 10:42:44,068 map = 100%,  reduce =100%" TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File Systems.HDFS bytes read:114068,File Systems.HDFS bytes written:95275,Job Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job Counters .Data-local map tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:2942,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:27,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:1471,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce Framework.Map input records:1498,Map-Reduce Framework.Map input bytes:114068,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409" TIME="1248975764199"
TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.ConditionalTask" TASK_ID="Stage-4" QUERY_ID="app_20090730104242" TIME="1248975764199"
TaskEnd TASK_RET_CODE="0" TASK_NAME="org.apache.hadoop.hive.ql.exec.ConditionalTask" TASK_ID="Stage-4" QUERY_ID="app_20090730104242" TIME="1248975782277"
TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.MoveTask" TASK_ID="Stage-0" QUERY_ID="app_20090730104242" TIME="1248975782277"
TaskEnd TASK_RET_CODE="1" TASK_NAME="org.apache.hadoop.hive.ql.exec.MoveTask" TASK_ID="Stage-0" QUERY_ID="app_20090730104242" TIME="1248975782473"
QueryEnd ROWS_INSERTED="apiusage~1471" QUERY_STRING="INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518") SELECT `(requestDate)?+.+` FROM ApiUsageSmall WHERE requestDate = '2009/05/18'" QUERY_ID="app_20090730104242" QUERY_NUM_TASKS="2" TIME="1248975782474"



On Thu, Jul 30, 2009 at 10:09 AM, Prasad Chakka <pchakka@facebook.com <ht...@facebook.com>  <ht...@facebook.com> > wrote:
Are you sure you are getting the same error even with the schema below (i.e. trying to set a string to an int column?). Can you give the full stack trace that you might see in /tmp/$USER/hive.log?


________________________________
From: Bill Graham <billgraham@gmail.com <ht...@gmail.com>  <ht...@gmail.com>  <ht...@gmail.com> >
Reply-To: <hive-user@hadoop.apache.org <ht...@hadoop.apache.org>  <ht...@hadoop.apache.org>  <ht...@hadoop.apache.org> >, <billgraham@gmail.com <ht...@gmail.com>  <ht...@gmail.com>  <ht...@gmail.com> >


Date: Thu, 30 Jul 2009 10:02:54 -0700
To: Zheng Shao <zshao9@gmail.com <ht...@gmail.com>  <ht...@gmail.com>  <ht...@gmail.com> >
Cc: <hive-user@hadoop.apache.org <ht...@hadoop.apache.org>  <ht...@hadoop.apache.org>  <ht...@hadoop.apache.org> >


Subject: Re: partitions not being created


Based on these describe statements, is what I'm trying to do feasable? I'm basically trying to load the contents of ApiUsageTemp into ApiUsage, with the ApiUsageTemp.requestdate column becoming the ApiUsage.dt partition.


On Wed, Jul 29, 2009 at 9:28 AM, Bill Graham <billgraham@gmail.com <ht...@gmail.com>  <ht...@gmail.com>  <ht...@gmail.com> > wrote:
Sure. The only difference I see is that the ApiUsage has a dt partition, instead of the requestdate column:

hive> describe extended ApiUsage;
OK
user    string
restresource    string
statuscode      int
requesthour     int
numrequests     string
responsetime    string
numslowrequests string
dt      string

Detailed Table Information      Table(tableName:apiusage, dbName:default, owner:grahamb, createTime:1248884801, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string, comment:null), FieldSchema(name:restresource, type:string, comment:null), FieldSchema(name:statuscode, type:int, comment:null), FieldSchema(name:requesthour, type:int, comment:null), FieldSchema(name:numrequests, type:string, comment:null), FieldSchema(name:responsetime, type:string, comment:null), FieldSchema(name:numslowrequests, type:string, comment:null)], location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage <http://c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage> , inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{field.delim= , serialization.format= }), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[FieldSchema(name:dt, type:string, comment:null)], parameters:{})

Time taken: 0.277 seconds
hive> describe extended ApiUsageTemp;
OK
user    string
restresource    string
statuscode      int
requestdate     string
requesthour     int
numrequests     string
responsetime    string
numslowrequests string

Detailed Table Information      Table(tableName:apiusagetemp, dbName:default, owner:grahamb, createTime:1248466925, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string, comment:null), FieldSchema(name:restresource, type:string, comment:null), FieldSchema(name:statuscode, type:int, comment:null), FieldSchema(name:requestdate, type:string, comment:null), FieldSchema(name:requesthour, type:int, comment:null), FieldSchema(name:numrequests, type:string, comment:null), FieldSchema(name:responsetime, type:string, comment:null), FieldSchema(name:numslowrequests, type:string, comment:null)], location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage <http://c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage> , inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{field.delim= , serialization.format= }), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[], parameters:{last_modified_time=1248826696, last_modified_by=app})

Time taken: 0.235 seconds



On Tue, Jul 28, 2009 at 9:03 PM, Zheng Shao <zshao9@gmail.com <ht...@gmail.com>  <ht...@gmail.com>  <ht...@gmail.com> > wrote:
Can you send the output of these 2 commands?

describe extended ApiUsage;
describe extended ApiUsageTemp;


Zheng

On Tue, Jul 28, 2009 at 6:29 PM, Bill Graham<billgraham@gmail.com <ht...@gmail.com>  <ht...@gmail.com>  <ht...@gmail.com> > wrote:
> Thanks for the tip, but it fails in the same way when I use a string.
>
> On Tue, Jul 28, 2009 at 6:21 PM, David Lerman <dlerman@videoegg.com <ht...@videoegg.com>  <ht...@videoegg.com>  <ht...@videoegg.com> > wrote:
>>
>> >> hive> create table partTable (a string, b int) partitioned by (dt int);
>>
>> > INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518")
>> > SELECT `(requestDate)?+.+` FROM ApiUsageTemp WHERE requestDate =
>> > '2009/05/18'
>>
>> The table has an int partition column (dt), but you're trying to set a
>> string value (dt = "20090518").
>>
>> Try :
>>
>> create table partTable (a string, b int) partitioned by (dt string);
>>
>> and then do your insert.
>>
>
>



--
Yours,
Zheng











Re: partitions not being created

Posted by Bill Graham <bi...@gmail.com>.
I sent my last try reply before seeing your last email.

Thanks, that seems possible. I did initially create ApiUsageTemp using the
most recent Hive release. Then while working on a JIRA I updated my Hive
client and server to the more recent builds from the trunk.

If that could cause such a problem, this is troubling though, since it
implies that we can't upgrade Hive without possibly corrupting our metadata
store.

I'll try again from scratch though and see if it works, thanks.


On Thu, Jul 30, 2009 at 1:04 PM, Bill Graham <bi...@gmail.com> wrote:

> Prasad,
>
> My setup is Hive client -> Hive Server (with local metastore) -> Hadoop. I
> was also suspecting metastore issues, so I've tried multiple times with
> newly created destination tables and I see the same thing happening.
>
> All of the log info I've been able to find I've included already in this
> thread. Let me know if there's anywhere else I could look for clues.
>
> I've included from the client:
> - /tmp/$USER/hive.log
>
> And from the hive server:
> - Stdout/err logs
> - /tmp/$USER/hive_job_log*.txt
>
> Is there anything else I should be looking at? All of the M/R logs don't
> show any exceptions anything suspect.
>
> Thanks for your time and insights on this issue, I appreciate it.
>
> thanks,
> Bill
>
>
> On Thu, Jul 30, 2009 at 11:57 AM, Prasad Chakka <pc...@facebook.com>wrote:
>
>>  Bill,
>>
>> The real error is happening on the Hive Metastore Server or Hive Server
>>  (depending on the setup you are using). Error logs on it must have
>> different stack trace. From the information below I am guessing that the way
>> the destination table hdfs directories that got created has some problems.
>> Can you drop that table (and make sure that there is no corresponding HDFS
>> directory for both integer and string type partitions that you created) and
>> retry the query.
>>
>> If you don’t want to drop the destination table then send me the logs on
>> Hive Server.
>>
>> Prasad
>>
>>
>> ------------------------------
>> *From: *Bill Graham <bi...@gmail.com>
>> *Reply-To: *<bi...@gmail.com>
>> *Date: *Thu, 30 Jul 2009 11:47:41 -0700
>> *To: *Prasad Chakka <pc...@facebook.com>
>> *Cc: *<hi...@hadoop.apache.org>
>> *Subject: *Re: partitions not being created
>>
>> That file contains a similar error as the Hive Server logs:
>>
>> 2009-07-30 11:44:21,095 WARN  mapred.JobClient
>> (JobClient.java:configureCommandLineOptions(510)) - Use GenericOptionsParser
>> for parsing the arguments. Applications should implement Tool for the same.
>> 2009-07-30 11:44:48,070 WARN  mapred.JobClient
>> (JobClient.java:configureCommandLineOptions(510)) - Use GenericOptionsParser
>> for parsing the arguments. Applications should implement Tool for the same.
>> 2009-07-30 11:45:27,796 ERROR metadata.Hive (Hive.java:getPartition(588))
>> - org.apache.thrift.TApplicationException: get_partition failed: unknown
>> result
>>         at
>> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partition(ThriftHiveMetastore.java:784)
>>         at
>> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partition(ThriftHiveMetastore.java:752)
>>         at
>> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartition(HiveMetaStoreClient.java:415)
>>         at
>> org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:579)
>>         at
>> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:466)
>>         at
>> org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:135)
>>         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:335)
>>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:241)
>>         at
>> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:122)
>>         at
>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:165)
>>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:258)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>>         at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>         at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>>
>> 2009-07-30 11:45:27,797 ERROR exec.MoveTask
>> (SessionState.java:printError(279)) - Failed with exception
>> org.apache.thrift.TApplicationException: get_partition failed: unknown
>> result
>> org.apache.hadoop.hive.ql.metadata.HiveException:
>> org.apache.thrift.TApplicationException: get_partition failed: unknown
>> result
>>         at
>> org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:589)
>>         at
>> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:466)
>>         at
>> org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:135)
>>         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:335)
>>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:241)
>>         at
>> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:122)
>>         at
>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:165)
>>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:258)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>>         at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>         at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>> Caused by: org.apache.thrift.TApplicationException: get_partition failed:
>> unknown result
>>         at
>> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partition(ThriftHiveMetastore.java:784)
>>         at
>> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partition(ThriftHiveMetastore.java:752)
>>         at
>> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartition(HiveMetaStoreClient.java:415)
>>         at
>> org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:579)
>>         ... 16 more
>>
>> 2009-07-30 11:45:27,798 ERROR ql.Driver
>> (SessionState.java:printError(279)) - FAILED: Execution Error, return code 1
>> from org.apache.hadoop.hive.ql.exec.MoveTask
>>
>> On Thu, Jul 30, 2009 at 11:33 AM, Prasad Chakka <pc...@facebook.com>
>> wrote:
>>
>>
>> The hive logs go into /tmp/$USER/hive.log not hive_job_log*.txt.
>>
>>
>> ------------------------------
>> *From: *Bill Graham <billgraham@gmail.com <ht...@gmail.com> >
>> *Reply-To: *<billgraham@gmail.com <ht...@gmail.com> >
>> *Date: *Thu, 30 Jul 2009 10:52:06 -0700
>> *To: *Prasad Chakka <pchakka@facebook.com <ht...@facebook.com> >
>> *Cc: *<hive-user@hadoop.apache.org <ht...@hadoop.apache.org>
>> >, Zheng Shao <zshao9@gmail.com <ht...@gmail.com> >
>>
>> *Subject: *Re: partitions not being created
>>
>> I'm trying to set a string to a string and I'm seeing this error. I also
>> had an attempt where it was a string to an int, and I also saw the same
>> error.
>>
>> The /tmp/$USER/hive_job_log*.txt file doesn't contain any exceptions, but
>> I've included it's output below. Only the Hive server logs show the
>> exceptions listed above. (Note that the table I'm loading from in this log
>> output is ApiUsageSmall, which is identical to ApiUsageTemp. For some reason
>> the data from ApiUsageTemp is now gone.)
>>
>> QueryStart QUERY_STRING="INSERT OVERWRITE TABLE ApiUsage PARTITION (dt =
>> "20090518") SELECT `(requestDate)?+.+` FROM ApiUsageSmall WHERE requestDate
>> = '2009/05/18'" QUERY_ID="app_20090730104242" TIME="1248975752235"
>> TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver"
>> TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TIME="1248975752235"
>> TaskProgress TASK_HADOOP_PROGRESS="2009-07-30 10:42:34,783 map = 0%,
>> reduce =0%" TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver"
>> TASK_COUNTERS="Job Counters .Launched map tasks:1,Job Counters .Data-local
>> map tasks:1" TASK_ID="Stage-1" QUERY_ID="app_20090730104242"
>> TASK_HADOOP_ID="job_200906301559_0409" TIME="1248975754785"
>> TaskProgress ROWS_INSERTED="apiusage~296" TASK_HADOOP_PROGRESS="2009-07-30
>> 10:42:43,031 map = 40%,  reduce =0%"
>> TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File
>> Systems.HDFS bytes read:23019,File Systems.HDFS bytes written:19178,Job
>> Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job
>> Counters .Data-local map
>> tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:592,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:6,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:296,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce
>> Framework.Map input records:302,Map-Reduce Framework.Map input
>> bytes:23019,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1"
>> QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409"
>> TIME="1248975763033"
>> TaskProgress ROWS_INSERTED="apiusage~1471"
>> TASK_HADOOP_PROGRESS="2009-07-30 10:42:44,068 map = 100%,  reduce =100%"
>> TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File
>> Systems.HDFS bytes read:114068,File Systems.HDFS bytes written:95275,Job
>> Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job
>> Counters .Data-local map
>> tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:2942,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:27,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:1471,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce
>> Framework.Map input records:1498,Map-Reduce Framework.Map input
>> bytes:114068,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1"
>> QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409"
>> TIME="1248975764071"
>> TaskEnd ROWS_INSERTED="apiusage~1471" TASK_RET_CODE="0"
>> TASK_HADOOP_PROGRESS="2009-07-30 10:42:44,068 map = 100%,  reduce =100%"
>> TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File
>> Systems.HDFS bytes read:114068,File Systems.HDFS bytes written:95275,Job
>> Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job
>> Counters .Data-local map
>> tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:2942,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:27,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:1471,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce
>> Framework.Map input records:1498,Map-Reduce Framework.Map input
>> bytes:114068,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1"
>> QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409"
>> TIME="1248975764199"
>> TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.ConditionalTask"
>> TASK_ID="Stage-4" QUERY_ID="app_20090730104242" TIME="1248975764199"
>> TaskEnd TASK_RET_CODE="0"
>> TASK_NAME="org.apache.hadoop.hive.ql.exec.ConditionalTask" TASK_ID="Stage-4"
>> QUERY_ID="app_20090730104242" TIME="1248975782277"
>> TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.MoveTask"
>> TASK_ID="Stage-0" QUERY_ID="app_20090730104242" TIME="1248975782277"
>> TaskEnd TASK_RET_CODE="1"
>> TASK_NAME="org.apache.hadoop.hive.ql.exec.MoveTask" TASK_ID="Stage-0"
>> QUERY_ID="app_20090730104242" TIME="1248975782473"
>> QueryEnd ROWS_INSERTED="apiusage~1471" QUERY_STRING="INSERT OVERWRITE
>> TABLE ApiUsage PARTITION (dt = "20090518") SELECT `(requestDate)?+.+` FROM
>> ApiUsageSmall WHERE requestDate = '2009/05/18'"
>> QUERY_ID="app_20090730104242" QUERY_NUM_TASKS="2" TIME="1248975782474"
>>
>>
>>
>> On Thu, Jul 30, 2009 at 10:09 AM, Prasad Chakka <pchakka@facebook.com <
>> http://pchakka@facebook.com> > wrote:
>>
>> Are you sure you are getting the same error even with the schema below
>> (i.e. trying to set a string to an int column?). Can you give the full stack
>> trace that you might see in /tmp/$USER/hive.log?
>>
>>
>> ------------------------------
>> *From: *Bill Graham <billgraham@gmail.com <ht...@gmail.com>
>>  <ht...@gmail.com> >
>> *Reply-To: *<hive-user@hadoop.apache.org <
>> http://hive-user@hadoop.apache.org>  <ht...@hadoop.apache.org>
>> >, <billgraham@gmail.com <ht...@gmail.com>  <
>> http://billgraham@gmail.com> >
>>
>> *Date: *Thu, 30 Jul 2009 10:02:54 -0700
>> *To: *Zheng Shao <zshao9@gmail.com <ht...@gmail.com>  <
>> http://zshao9@gmail.com> >
>> *Cc: *<hive-user@hadoop.apache.org <ht...@hadoop.apache.org>
>>  <ht...@hadoop.apache.org> >
>>
>> *Subject: *Re: partitions not being created
>>
>>
>> Based on these describe statements, is what I'm trying to do feasable? I'm
>> basically trying to load the contents of ApiUsageTemp into ApiUsage, with
>> the ApiUsageTemp.requestdate column becoming the ApiUsage.dt partition.
>>
>>
>> On Wed, Jul 29, 2009 at 9:28 AM, Bill Graham <billgraham@gmail.com <
>> http://billgraham@gmail.com>  <ht...@gmail.com> > wrote:
>>
>> Sure. The only difference I see is that the ApiUsage has a dt partition,
>> instead of the requestdate column:
>>
>> hive> describe extended
>> ApiUsage;
>> OK
>> user    string
>> restresource    string
>> statuscode      int
>> requesthour     int
>> numrequests     string
>> responsetime    string
>> numslowrequests string
>> dt      string
>>
>> Detailed Table Information      Table(tableName:apiusage, dbName:default,
>> owner:grahamb, createTime:1248884801, lastAccessTime:0, retention:0,
>> sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string,
>> comment:null), FieldSchema(name:restresource, type:string, comment:null),
>> FieldSchema(name:statuscode, type:int, comment:null),
>> FieldSchema(name:requesthour, type:int, comment:null),
>> FieldSchema(name:numrequests, type:string, comment:null),
>> FieldSchema(name:responsetime, type:string, comment:null),
>> FieldSchema(name:numslowrequests, type:string, comment:null)],
>> location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage <
>> http://c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage>
>> , inputFormat:org.apache.hadoop.mapred.TextInputFormat,
>> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
>> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
>> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
>> parameters:{field.delim= , serialization.format= }), bucketCols:[],
>> sortCols:[], parameters:{}), partitionKeys:[FieldSchema(name:dt,
>> type:string, comment:null)], parameters:{})
>>
>> Time taken: 0.277 seconds
>> hive> describe extended ApiUsageTemp;
>> OK
>> user    string
>> restresource    string
>> statuscode      int
>> requestdate     string
>> requesthour     int
>> numrequests     string
>> responsetime    string
>> numslowrequests string
>>
>> Detailed Table Information      Table(tableName:apiusagetemp,
>> dbName:default, owner:grahamb, createTime:1248466925, lastAccessTime:0,
>> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string,
>> comment:null), FieldSchema(name:restresource, type:string, comment:null),
>> FieldSchema(name:statuscode, type:int, comment:null),
>> FieldSchema(name:requestdate, type:string, comment:null),
>> FieldSchema(name:requesthour, type:int, comment:null),
>> FieldSchema(name:numrequests, type:string, comment:null),
>> FieldSchema(name:responsetime, type:string, comment:null),
>> FieldSchema(name:numslowrequests, type:string, comment:null)],
>> location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage <
>> http://c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage>
>> , inputFormat:org.apache.hadoop.mapred.TextInputFormat,
>> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
>> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
>> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
>> parameters:{field.delim= , serialization.format= }), bucketCols:[],
>> sortCols:[], parameters:{}), partitionKeys:[],
>> parameters:{last_modified_time=1248826696, last_modified_by=app})
>>
>> Time taken: 0.235 seconds
>>
>>
>>
>> On Tue, Jul 28, 2009 at 9:03 PM, Zheng Shao <zshao9@gmail.com <
>> http://zshao9@gmail.com>  <ht...@gmail.com> > wrote:
>>
>> Can you send the output of these 2 commands?
>>
>> describe extended ApiUsage;
>> describe extended ApiUsageTemp;
>>
>>
>> Zheng
>>
>> On Tue, Jul 28, 2009 at 6:29 PM, Bill Graham<billgraham@gmail.com <
>> http://billgraham@gmail.com>  <ht...@gmail.com> > wrote:
>> > Thanks for the tip, but it fails in the same way when I use a string.
>> >
>> > On Tue, Jul 28, 2009 at 6:21 PM, David Lerman <dlerman@videoegg.com <
>> http://dlerman@videoegg.com>  <ht...@videoegg.com> > wrote:
>> >>
>> >> >> hive> create table partTable (a string, b int) partitioned by (dt
>> int);
>> >>
>> >> > INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518")
>> >> > SELECT `(requestDate)?+.+` FROM ApiUsageTemp WHERE requestDate =
>> >> > '2009/05/18'
>> >>
>> >> The table has an int partition column (dt), but you're trying to set a
>> >> string value (dt = "20090518").
>> >>
>> >> Try :
>> >>
>> >> create table partTable (a string, b int) partitioned by (dt string);
>> >>
>> >> and then do your insert.
>> >>
>> >
>> >
>>
>>
>>
>> --
>> Yours,
>> Zheng
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

Re: partitions not being created

Posted by Bill Graham <bi...@gmail.com>.
Prasad,

My setup is Hive client -> Hive Server (with local metastore) -> Hadoop. I
was also suspecting metastore issues, so I've tried multiple times with
newly created destination tables and I see the same thing happening.

All of the log info I've been able to find I've included already in this
thread. Let me know if there's anywhere else I could look for clues.

I've included from the client:
- /tmp/$USER/hive.log

And from the hive server:
- Stdout/err logs
- /tmp/$USER/hive_job_log*.txt

Is there anything else I should be looking at? All of the M/R logs don't
show any exceptions anything suspect.

Thanks for your time and insights on this issue, I appreciate it.

thanks,
Bill

On Thu, Jul 30, 2009 at 11:57 AM, Prasad Chakka <pc...@facebook.com>wrote:

>  Bill,
>
> The real error is happening on the Hive Metastore Server or Hive Server
>  (depending on the setup you are using). Error logs on it must have
> different stack trace. From the information below I am guessing that the way
> the destination table hdfs directories that got created has some problems.
> Can you drop that table (and make sure that there is no corresponding HDFS
> directory for both integer and string type partitions that you created) and
> retry the query.
>
> If you don’t want to drop the destination table then send me the logs on
> Hive Server.
>
> Prasad
>
>
> ------------------------------
> *From: *Bill Graham <bi...@gmail.com>
> *Reply-To: *<bi...@gmail.com>
> *Date: *Thu, 30 Jul 2009 11:47:41 -0700
> *To: *Prasad Chakka <pc...@facebook.com>
> *Cc: *<hi...@hadoop.apache.org>
> *Subject: *Re: partitions not being created
>
> That file contains a similar error as the Hive Server logs:
>
> 2009-07-30 11:44:21,095 WARN  mapred.JobClient
> (JobClient.java:configureCommandLineOptions(510)) - Use GenericOptionsParser
> for parsing the arguments. Applications should implement Tool for the same.
> 2009-07-30 11:44:48,070 WARN  mapred.JobClient
> (JobClient.java:configureCommandLineOptions(510)) - Use GenericOptionsParser
> for parsing the arguments. Applications should implement Tool for the same.
> 2009-07-30 11:45:27,796 ERROR metadata.Hive (Hive.java:getPartition(588)) -
> org.apache.thrift.TApplicationException: get_partition failed: unknown
> result
>         at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partition(ThriftHiveMetastore.java:784)
>         at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partition(ThriftHiveMetastore.java:752)
>         at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartition(HiveMetaStoreClient.java:415)
>         at
> org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:579)
>         at
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:466)
>         at
> org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:135)
>         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:335)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:241)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:122)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:165)
>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:258)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>         at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>         at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>
> 2009-07-30 11:45:27,797 ERROR exec.MoveTask
> (SessionState.java:printError(279)) - Failed with exception
> org.apache.thrift.TApplicationException: get_partition failed: unknown
> result
> org.apache.hadoop.hive.ql.metadata.HiveException:
> org.apache.thrift.TApplicationException: get_partition failed: unknown
> result
>         at
> org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:589)
>         at
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:466)
>         at
> org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:135)
>         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:335)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:241)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:122)
>         at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:165)
>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:258)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>         at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>         at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> Caused by: org.apache.thrift.TApplicationException: get_partition failed:
> unknown result
>         at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partition(ThriftHiveMetastore.java:784)
>         at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partition(ThriftHiveMetastore.java:752)
>         at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartition(HiveMetaStoreClient.java:415)
>         at
> org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:579)
>         ... 16 more
>
> 2009-07-30 11:45:27,798 ERROR ql.Driver (SessionState.java:printError(279))
> - FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.MoveTask
>
> On Thu, Jul 30, 2009 at 11:33 AM, Prasad Chakka <pc...@facebook.com>
> wrote:
>
>
> The hive logs go into /tmp/$USER/hive.log not hive_job_log*.txt.
>
>
> ------------------------------
> *From: *Bill Graham <billgraham@gmail.com <ht...@gmail.com> >
> *Reply-To: *<billgraham@gmail.com <ht...@gmail.com> >
> *Date: *Thu, 30 Jul 2009 10:52:06 -0700
> *To: *Prasad Chakka <pchakka@facebook.com <ht...@facebook.com> >
> *Cc: *<hive-user@hadoop.apache.org <ht...@hadoop.apache.org> >,
> Zheng Shao <zshao9@gmail.com <ht...@gmail.com> >
>
> *Subject: *Re: partitions not being created
>
> I'm trying to set a string to a string and I'm seeing this error. I also
> had an attempt where it was a string to an int, and I also saw the same
> error.
>
> The /tmp/$USER/hive_job_log*.txt file doesn't contain any exceptions, but
> I've included it's output below. Only the Hive server logs show the
> exceptions listed above. (Note that the table I'm loading from in this log
> output is ApiUsageSmall, which is identical to ApiUsageTemp. For some reason
> the data from ApiUsageTemp is now gone.)
>
> QueryStart QUERY_STRING="INSERT OVERWRITE TABLE ApiUsage PARTITION (dt =
> "20090518") SELECT `(requestDate)?+.+` FROM ApiUsageSmall WHERE requestDate
> = '2009/05/18'" QUERY_ID="app_20090730104242" TIME="1248975752235"
> TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver"
> TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TIME="1248975752235"
> TaskProgress TASK_HADOOP_PROGRESS="2009-07-30 10:42:34,783 map = 0%,
> reduce =0%" TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver"
> TASK_COUNTERS="Job Counters .Launched map tasks:1,Job Counters .Data-local
> map tasks:1" TASK_ID="Stage-1" QUERY_ID="app_20090730104242"
> TASK_HADOOP_ID="job_200906301559_0409" TIME="1248975754785"
> TaskProgress ROWS_INSERTED="apiusage~296" TASK_HADOOP_PROGRESS="2009-07-30
> 10:42:43,031 map = 40%,  reduce =0%"
> TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File
> Systems.HDFS bytes read:23019,File Systems.HDFS bytes written:19178,Job
> Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job
> Counters .Data-local map
> tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:592,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:6,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:296,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce
> Framework.Map input records:302,Map-Reduce Framework.Map input
> bytes:23019,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1"
> QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409"
> TIME="1248975763033"
> TaskProgress ROWS_INSERTED="apiusage~1471" TASK_HADOOP_PROGRESS="2009-07-30
> 10:42:44,068 map = 100%,  reduce =100%"
> TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File
> Systems.HDFS bytes read:114068,File Systems.HDFS bytes written:95275,Job
> Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job
> Counters .Data-local map
> tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:2942,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:27,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:1471,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce
> Framework.Map input records:1498,Map-Reduce Framework.Map input
> bytes:114068,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1"
> QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409"
> TIME="1248975764071"
> TaskEnd ROWS_INSERTED="apiusage~1471" TASK_RET_CODE="0"
> TASK_HADOOP_PROGRESS="2009-07-30 10:42:44,068 map = 100%,  reduce =100%"
> TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File
> Systems.HDFS bytes read:114068,File Systems.HDFS bytes written:95275,Job
> Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job
> Counters .Data-local map
> tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:2942,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:27,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:1471,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce
> Framework.Map input records:1498,Map-Reduce Framework.Map input
> bytes:114068,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1"
> QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409"
> TIME="1248975764199"
> TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.ConditionalTask"
> TASK_ID="Stage-4" QUERY_ID="app_20090730104242" TIME="1248975764199"
> TaskEnd TASK_RET_CODE="0"
> TASK_NAME="org.apache.hadoop.hive.ql.exec.ConditionalTask" TASK_ID="Stage-4"
> QUERY_ID="app_20090730104242" TIME="1248975782277"
> TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.MoveTask"
> TASK_ID="Stage-0" QUERY_ID="app_20090730104242" TIME="1248975782277"
> TaskEnd TASK_RET_CODE="1"
> TASK_NAME="org.apache.hadoop.hive.ql.exec.MoveTask" TASK_ID="Stage-0"
> QUERY_ID="app_20090730104242" TIME="1248975782473"
> QueryEnd ROWS_INSERTED="apiusage~1471" QUERY_STRING="INSERT OVERWRITE TABLE
> ApiUsage PARTITION (dt = "20090518") SELECT `(requestDate)?+.+` FROM
> ApiUsageSmall WHERE requestDate = '2009/05/18'"
> QUERY_ID="app_20090730104242" QUERY_NUM_TASKS="2" TIME="1248975782474"
>
>
>
> On Thu, Jul 30, 2009 at 10:09 AM, Prasad Chakka <pchakka@facebook.com <
> http://pchakka@facebook.com> > wrote:
>
> Are you sure you are getting the same error even with the schema below
> (i.e. trying to set a string to an int column?). Can you give the full stack
> trace that you might see in /tmp/$USER/hive.log?
>
>
> ------------------------------
> *From: *Bill Graham <billgraham@gmail.com <ht...@gmail.com>  <
> http://billgraham@gmail.com> >
> *Reply-To: *<hive-user@hadoop.apache.org <
> http://hive-user@hadoop.apache.org>  <ht...@hadoop.apache.org>
> >, <billgraham@gmail.com <ht...@gmail.com>  <
> http://billgraham@gmail.com> >
>
> *Date: *Thu, 30 Jul 2009 10:02:54 -0700
> *To: *Zheng Shao <zshao9@gmail.com <ht...@gmail.com>  <
> http://zshao9@gmail.com> >
> *Cc: *<hive-user@hadoop.apache.org <ht...@hadoop.apache.org>  <
> http://hive-user@hadoop.apache.org> >
>
> *Subject: *Re: partitions not being created
>
>
> Based on these describe statements, is what I'm trying to do feasable? I'm
> basically trying to load the contents of ApiUsageTemp into ApiUsage, with
> the ApiUsageTemp.requestdate column becoming the ApiUsage.dt partition.
>
>
> On Wed, Jul 29, 2009 at 9:28 AM, Bill Graham <billgraham@gmail.com <
> http://billgraham@gmail.com>  <ht...@gmail.com> > wrote:
>
> Sure. The only difference I see is that the ApiUsage has a dt partition,
> instead of the requestdate column:
>
> hive> describe extended
> ApiUsage;
> OK
> user    string
> restresource    string
> statuscode      int
> requesthour     int
> numrequests     string
> responsetime    string
> numslowrequests string
> dt      string
>
> Detailed Table Information      Table(tableName:apiusage, dbName:default,
> owner:grahamb, createTime:1248884801, lastAccessTime:0, retention:0,
> sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string,
> comment:null), FieldSchema(name:restresource, type:string, comment:null),
> FieldSchema(name:statuscode, type:int, comment:null),
> FieldSchema(name:requesthour, type:int, comment:null),
> FieldSchema(name:numrequests, type:string, comment:null),
> FieldSchema(name:responsetime, type:string, comment:null),
> FieldSchema(name:numslowrequests, type:string, comment:null)],
> location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage <
> http://c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage> ,
> inputFormat:org.apache.hadoop.mapred.TextInputFormat,
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
> parameters:{field.delim= , serialization.format= }), bucketCols:[],
> sortCols:[], parameters:{}), partitionKeys:[FieldSchema(name:dt,
> type:string, comment:null)], parameters:{})
>
> Time taken: 0.277 seconds
> hive> describe extended ApiUsageTemp;
> OK
> user    string
> restresource    string
> statuscode      int
> requestdate     string
> requesthour     int
> numrequests     string
> responsetime    string
> numslowrequests string
>
> Detailed Table Information      Table(tableName:apiusagetemp,
> dbName:default, owner:grahamb, createTime:1248466925, lastAccessTime:0,
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string,
> comment:null), FieldSchema(name:restresource, type:string, comment:null),
> FieldSchema(name:statuscode, type:int, comment:null),
> FieldSchema(name:requestdate, type:string, comment:null),
> FieldSchema(name:requesthour, type:int, comment:null),
> FieldSchema(name:numrequests, type:string, comment:null),
> FieldSchema(name:responsetime, type:string, comment:null),
> FieldSchema(name:numslowrequests, type:string, comment:null)],
> location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage <
> http://c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage> ,
> inputFormat:org.apache.hadoop.mapred.TextInputFormat,
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
> parameters:{field.delim= , serialization.format= }), bucketCols:[],
> sortCols:[], parameters:{}), partitionKeys:[],
> parameters:{last_modified_time=1248826696, last_modified_by=app})
>
> Time taken: 0.235 seconds
>
>
>
> On Tue, Jul 28, 2009 at 9:03 PM, Zheng Shao <zshao9@gmail.com <
> http://zshao9@gmail.com>  <ht...@gmail.com> > wrote:
>
> Can you send the output of these 2 commands?
>
> describe extended ApiUsage;
> describe extended ApiUsageTemp;
>
>
> Zheng
>
> On Tue, Jul 28, 2009 at 6:29 PM, Bill Graham<billgraham@gmail.com <
> http://billgraham@gmail.com>  <ht...@gmail.com> > wrote:
> > Thanks for the tip, but it fails in the same way when I use a string.
> >
> > On Tue, Jul 28, 2009 at 6:21 PM, David Lerman <dlerman@videoegg.com <
> http://dlerman@videoegg.com>  <ht...@videoegg.com> > wrote:
> >>
> >> >> hive> create table partTable (a string, b int) partitioned by (dt
> int);
> >>
> >> > INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518")
> >> > SELECT `(requestDate)?+.+` FROM ApiUsageTemp WHERE requestDate =
> >> > '2009/05/18'
> >>
> >> The table has an int partition column (dt), but you're trying to set a
> >> string value (dt = "20090518").
> >>
> >> Try :
> >>
> >> create table partTable (a string, b int) partitioned by (dt string);
> >>
> >> and then do your insert.
> >>
> >
> >
>
>
>
> --
> Yours,
> Zheng
>
>
>
>
>
>
>
>
>

Re: partitions not being created

Posted by Prasad Chakka <pc...@facebook.com>.
Bill,

The real error is happening on the Hive Metastore Server or Hive Server  (depending on the setup you are using). Error logs on it must have different stack trace. From the information below I am guessing that the way the destination table hdfs directories that got created has some problems. Can you drop that table (and make sure that there is no corresponding HDFS directory for both integer and string type partitions that you created) and retry the query.

If you don't want to drop the destination table then send me the logs on Hive Server.

Prasad


________________________________
From: Bill Graham <bi...@gmail.com>
Reply-To: <bi...@gmail.com>
Date: Thu, 30 Jul 2009 11:47:41 -0700
To: Prasad Chakka <pc...@facebook.com>
Cc: <hi...@hadoop.apache.org>
Subject: Re: partitions not being created

That file contains a similar error as the Hive Server logs:

2009-07-30 11:44:21,095 WARN  mapred.JobClient (JobClient.java:configureCommandLineOptions(510)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2009-07-30 11:44:48,070 WARN  mapred.JobClient (JobClient.java:configureCommandLineOptions(510)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2009-07-30 11:45:27,796 ERROR metadata.Hive (Hive.java:getPartition(588)) - org.apache.thrift.TApplicationException: get_partition failed: unknown result
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partition(ThriftHiveMetastore.java:784)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partition(ThriftHiveMetastore.java:752)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartition(HiveMetaStoreClient.java:415)
        at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:579)
        at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:466)
        at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:135)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:335)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:241)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:122)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:165)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:258)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

2009-07-30 11:45:27,797 ERROR exec.MoveTask (SessionState.java:printError(279)) - Failed with exception org.apache.thrift.TApplicationException: get_partition failed: unknown result
org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.TApplicationException: get_partition failed: unknown result
        at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:589)
        at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:466)
        at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:135)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:335)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:241)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:122)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:165)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:258)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
Caused by: org.apache.thrift.TApplicationException: get_partition failed: unknown result
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partition(ThriftHiveMetastore.java:784)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partition(ThriftHiveMetastore.java:752)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartition(HiveMetaStoreClient.java:415)
        at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:579)
        ... 16 more

2009-07-30 11:45:27,798 ERROR ql.Driver (SessionState.java:printError(279)) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

On Thu, Jul 30, 2009 at 11:33 AM, Prasad Chakka <pc...@facebook.com> wrote:

The hive logs go into /tmp/$USER/hive.log not hive_job_log*.txt.


________________________________
From: Bill Graham <billgraham@gmail.com <ht...@gmail.com> >
Reply-To: <billgraham@gmail.com <ht...@gmail.com> >
Date: Thu, 30 Jul 2009 10:52:06 -0700
To: Prasad Chakka <pchakka@facebook.com <ht...@facebook.com> >
Cc: <hive-user@hadoop.apache.org <ht...@hadoop.apache.org> >, Zheng Shao <zshao9@gmail.com <ht...@gmail.com> >

Subject: Re: partitions not being created

I'm trying to set a string to a string and I'm seeing this error. I also had an attempt where it was a string to an int, and I also saw the same error.

The /tmp/$USER/hive_job_log*.txt file doesn't contain any exceptions, but I've included it's output below. Only the Hive server logs show the exceptions listed above. (Note that the table I'm loading from in this log output is ApiUsageSmall, which is identical to ApiUsageTemp. For some reason the data from ApiUsageTemp is now gone.)

QueryStart QUERY_STRING="INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518") SELECT `(requestDate)?+.+` FROM ApiUsageSmall WHERE requestDate = '2009/05/18'" QUERY_ID="app_20090730104242" TIME="1248975752235"
TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TIME="1248975752235"
TaskProgress TASK_HADOOP_PROGRESS="2009-07-30 10:42:34,783 map = 0%,  reduce =0%" TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="Job Counters .Launched map tasks:1,Job Counters .Data-local map tasks:1" TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409" TIME="1248975754785"
TaskProgress ROWS_INSERTED="apiusage~296" TASK_HADOOP_PROGRESS="2009-07-30 10:42:43,031 map = 40%,  reduce =0%" TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File Systems.HDFS bytes read:23019,File Systems.HDFS bytes written:19178,Job Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job Counters .Data-local map tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:592,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:6,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:296,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce Framework.Map input records:302,Map-Reduce Framework.Map input bytes:23019,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409" TIME="1248975763033"
TaskProgress ROWS_INSERTED="apiusage~1471" TASK_HADOOP_PROGRESS="2009-07-30 10:42:44,068 map = 100%,  reduce =100%" TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File Systems.HDFS bytes read:114068,File Systems.HDFS bytes written:95275,Job Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job Counters .Data-local map tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:2942,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:27,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:1471,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce Framework.Map input records:1498,Map-Reduce Framework.Map input bytes:114068,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409" TIME="1248975764071"
TaskEnd ROWS_INSERTED="apiusage~1471" TASK_RET_CODE="0" TASK_HADOOP_PROGRESS="2009-07-30 10:42:44,068 map = 100%,  reduce =100%" TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File Systems.HDFS bytes read:114068,File Systems.HDFS bytes written:95275,Job Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job Counters .Data-local map tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:2942,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:27,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:1471,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce Framework.Map input records:1498,Map-Reduce Framework.Map input bytes:114068,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409" TIME="1248975764199"
TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.ConditionalTask" TASK_ID="Stage-4" QUERY_ID="app_20090730104242" TIME="1248975764199"
TaskEnd TASK_RET_CODE="0" TASK_NAME="org.apache.hadoop.hive.ql.exec.ConditionalTask" TASK_ID="Stage-4" QUERY_ID="app_20090730104242" TIME="1248975782277"
TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.MoveTask" TASK_ID="Stage-0" QUERY_ID="app_20090730104242" TIME="1248975782277"
TaskEnd TASK_RET_CODE="1" TASK_NAME="org.apache.hadoop.hive.ql.exec.MoveTask" TASK_ID="Stage-0" QUERY_ID="app_20090730104242" TIME="1248975782473"
QueryEnd ROWS_INSERTED="apiusage~1471" QUERY_STRING="INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518") SELECT `(requestDate)?+.+` FROM ApiUsageSmall WHERE requestDate = '2009/05/18'" QUERY_ID="app_20090730104242" QUERY_NUM_TASKS="2" TIME="1248975782474"



On Thu, Jul 30, 2009 at 10:09 AM, Prasad Chakka <pchakka@facebook.com <ht...@facebook.com> > wrote:
Are you sure you are getting the same error even with the schema below (i.e. trying to set a string to an int column?). Can you give the full stack trace that you might see in /tmp/$USER/hive.log?


________________________________
From: Bill Graham <billgraham@gmail.com <ht...@gmail.com>  <ht...@gmail.com> >
Reply-To: <hive-user@hadoop.apache.org <ht...@hadoop.apache.org>  <ht...@hadoop.apache.org> >, <billgraham@gmail.com <ht...@gmail.com>  <ht...@gmail.com> >

Date: Thu, 30 Jul 2009 10:02:54 -0700
To: Zheng Shao <zshao9@gmail.com <ht...@gmail.com>  <ht...@gmail.com> >
Cc: <hive-user@hadoop.apache.org <ht...@hadoop.apache.org>  <ht...@hadoop.apache.org> >

Subject: Re: partitions not being created


Based on these describe statements, is what I'm trying to do feasable? I'm basically trying to load the contents of ApiUsageTemp into ApiUsage, with the ApiUsageTemp.requestdate column becoming the ApiUsage.dt partition.


On Wed, Jul 29, 2009 at 9:28 AM, Bill Graham <billgraham@gmail.com <ht...@gmail.com>  <ht...@gmail.com> > wrote:
Sure. The only difference I see is that the ApiUsage has a dt partition, instead of the requestdate column:

hive> describe extended ApiUsage;
OK
user    string
restresource    string
statuscode      int
requesthour     int
numrequests     string
responsetime    string
numslowrequests string
dt      string

Detailed Table Information      Table(tableName:apiusage, dbName:default, owner:grahamb, createTime:1248884801, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string, comment:null), FieldSchema(name:restresource, type:string, comment:null), FieldSchema(name:statuscode, type:int, comment:null), FieldSchema(name:requesthour, type:int, comment:null), FieldSchema(name:numrequests, type:string, comment:null), FieldSchema(name:responsetime, type:string, comment:null), FieldSchema(name:numslowrequests, type:string, comment:null)], location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage <http://c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage> , inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{field.delim= , serialization.format= }), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[FieldSchema(name:dt, type:string, comment:null)], parameters:{})

Time taken: 0.277 seconds
hive> describe extended ApiUsageTemp;
OK
user    string
restresource    string
statuscode      int
requestdate     string
requesthour     int
numrequests     string
responsetime    string
numslowrequests string

Detailed Table Information      Table(tableName:apiusagetemp, dbName:default, owner:grahamb, createTime:1248466925, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string, comment:null), FieldSchema(name:restresource, type:string, comment:null), FieldSchema(name:statuscode, type:int, comment:null), FieldSchema(name:requestdate, type:string, comment:null), FieldSchema(name:requesthour, type:int, comment:null), FieldSchema(name:numrequests, type:string, comment:null), FieldSchema(name:responsetime, type:string, comment:null), FieldSchema(name:numslowrequests, type:string, comment:null)], location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage <http://c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage> , inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{field.delim= , serialization.format= }), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[], parameters:{last_modified_time=1248826696, last_modified_by=app})

Time taken: 0.235 seconds



On Tue, Jul 28, 2009 at 9:03 PM, Zheng Shao <zshao9@gmail.com <ht...@gmail.com>  <ht...@gmail.com> > wrote:
Can you send the output of these 2 commands?

describe extended ApiUsage;
describe extended ApiUsageTemp;


Zheng

On Tue, Jul 28, 2009 at 6:29 PM, Bill Graham<billgraham@gmail.com <ht...@gmail.com>  <ht...@gmail.com> > wrote:
> Thanks for the tip, but it fails in the same way when I use a string.
>
> On Tue, Jul 28, 2009 at 6:21 PM, David Lerman <dlerman@videoegg.com <ht...@videoegg.com>  <ht...@videoegg.com> > wrote:
>>
>> >> hive> create table partTable (a string, b int) partitioned by (dt int);
>>
>> > INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518")
>> > SELECT `(requestDate)?+.+` FROM ApiUsageTemp WHERE requestDate =
>> > '2009/05/18'
>>
>> The table has an int partition column (dt), but you're trying to set a
>> string value (dt = "20090518").
>>
>> Try :
>>
>> create table partTable (a string, b int) partitioned by (dt string);
>>
>> and then do your insert.
>>
>
>



--
Yours,
Zheng








Re: partitions not being created

Posted by Prasad Chakka <pc...@facebook.com>.
Describe extended information below shows same HDFS directories for both tables (ApiUsageTemp & ApiUsage). This may be that you are using an older version of Hive and did a rename. You may want restart the whole process from scratch (delete tables and directories) and create them afresh or use totally different table names (do not rename with older version) if you don't want want to drop tables.


________________________________
From: Bill Graham <bi...@gmail.com>
Reply-To: <bi...@gmail.com>
Date: Thu, 30 Jul 2009 11:47:41 -0700
To: Prasad Chakka <pc...@facebook.com>
Cc: <hi...@hadoop.apache.org>
Subject: Re: partitions not being created

That file contains a similar error as the Hive Server logs:

2009-07-30 11:44:21,095 WARN  mapred.JobClient (JobClient.java:configureCommandLineOptions(510)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2009-07-30 11:44:48,070 WARN  mapred.JobClient (JobClient.java:configureCommandLineOptions(510)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2009-07-30 11:45:27,796 ERROR metadata.Hive (Hive.java:getPartition(588)) - org.apache.thrift.TApplicationException: get_partition failed: unknown result
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partition(ThriftHiveMetastore.java:784)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partition(ThriftHiveMetastore.java:752)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartition(HiveMetaStoreClient.java:415)
        at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:579)
        at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:466)
        at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:135)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:335)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:241)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:122)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:165)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:258)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

2009-07-30 11:45:27,797 ERROR exec.MoveTask (SessionState.java:printError(279)) - Failed with exception org.apache.thrift.TApplicationException: get_partition failed: unknown result
org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.TApplicationException: get_partition failed: unknown result
        at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:589)
        at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:466)
        at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:135)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:335)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:241)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:122)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:165)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:258)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
Caused by: org.apache.thrift.TApplicationException: get_partition failed: unknown result
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partition(ThriftHiveMetastore.java:784)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partition(ThriftHiveMetastore.java:752)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartition(HiveMetaStoreClient.java:415)
        at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:579)
        ... 16 more

2009-07-30 11:45:27,798 ERROR ql.Driver (SessionState.java:printError(279)) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

On Thu, Jul 30, 2009 at 11:33 AM, Prasad Chakka <pc...@facebook.com> wrote:

The hive logs go into /tmp/$USER/hive.log not hive_job_log*.txt.


________________________________
From: Bill Graham <billgraham@gmail.com <ht...@gmail.com> >
Reply-To: <billgraham@gmail.com <ht...@gmail.com> >
Date: Thu, 30 Jul 2009 10:52:06 -0700
To: Prasad Chakka <pchakka@facebook.com <ht...@facebook.com> >
Cc: <hive-user@hadoop.apache.org <ht...@hadoop.apache.org> >, Zheng Shao <zshao9@gmail.com <ht...@gmail.com> >

Subject: Re: partitions not being created

I'm trying to set a string to a string and I'm seeing this error. I also had an attempt where it was a string to an int, and I also saw the same error.

The /tmp/$USER/hive_job_log*.txt file doesn't contain any exceptions, but I've included it's output below. Only the Hive server logs show the exceptions listed above. (Note that the table I'm loading from in this log output is ApiUsageSmall, which is identical to ApiUsageTemp. For some reason the data from ApiUsageTemp is now gone.)

QueryStart QUERY_STRING="INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518") SELECT `(requestDate)?+.+` FROM ApiUsageSmall WHERE requestDate = '2009/05/18'" QUERY_ID="app_20090730104242" TIME="1248975752235"
TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TIME="1248975752235"
TaskProgress TASK_HADOOP_PROGRESS="2009-07-30 10:42:34,783 map = 0%,  reduce =0%" TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="Job Counters .Launched map tasks:1,Job Counters .Data-local map tasks:1" TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409" TIME="1248975754785"
TaskProgress ROWS_INSERTED="apiusage~296" TASK_HADOOP_PROGRESS="2009-07-30 10:42:43,031 map = 40%,  reduce =0%" TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File Systems.HDFS bytes read:23019,File Systems.HDFS bytes written:19178,Job Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job Counters .Data-local map tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:592,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:6,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:296,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce Framework.Map input records:302,Map-Reduce Framework.Map input bytes:23019,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409" TIME="1248975763033"
TaskProgress ROWS_INSERTED="apiusage~1471" TASK_HADOOP_PROGRESS="2009-07-30 10:42:44,068 map = 100%,  reduce =100%" TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File Systems.HDFS bytes read:114068,File Systems.HDFS bytes written:95275,Job Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job Counters .Data-local map tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:2942,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:27,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:1471,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce Framework.Map input records:1498,Map-Reduce Framework.Map input bytes:114068,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409" TIME="1248975764071"
TaskEnd ROWS_INSERTED="apiusage~1471" TASK_RET_CODE="0" TASK_HADOOP_PROGRESS="2009-07-30 10:42:44,068 map = 100%,  reduce =100%" TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File Systems.HDFS bytes read:114068,File Systems.HDFS bytes written:95275,Job Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job Counters .Data-local map tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:2942,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:27,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:1471,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce Framework.Map input records:1498,Map-Reduce Framework.Map input bytes:114068,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409" TIME="1248975764199"
TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.ConditionalTask" TASK_ID="Stage-4" QUERY_ID="app_20090730104242" TIME="1248975764199"
TaskEnd TASK_RET_CODE="0" TASK_NAME="org.apache.hadoop.hive.ql.exec.ConditionalTask" TASK_ID="Stage-4" QUERY_ID="app_20090730104242" TIME="1248975782277"
TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.MoveTask" TASK_ID="Stage-0" QUERY_ID="app_20090730104242" TIME="1248975782277"
TaskEnd TASK_RET_CODE="1" TASK_NAME="org.apache.hadoop.hive.ql.exec.MoveTask" TASK_ID="Stage-0" QUERY_ID="app_20090730104242" TIME="1248975782473"
QueryEnd ROWS_INSERTED="apiusage~1471" QUERY_STRING="INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518") SELECT `(requestDate)?+.+` FROM ApiUsageSmall WHERE requestDate = '2009/05/18'" QUERY_ID="app_20090730104242" QUERY_NUM_TASKS="2" TIME="1248975782474"



On Thu, Jul 30, 2009 at 10:09 AM, Prasad Chakka <pchakka@facebook.com <ht...@facebook.com> > wrote:
Are you sure you are getting the same error even with the schema below (i.e. trying to set a string to an int column?). Can you give the full stack trace that you might see in /tmp/$USER/hive.log?


________________________________
From: Bill Graham <billgraham@gmail.com <ht...@gmail.com>  <ht...@gmail.com> >
Reply-To: <hive-user@hadoop.apache.org <ht...@hadoop.apache.org>  <ht...@hadoop.apache.org> >, <billgraham@gmail.com <ht...@gmail.com>  <ht...@gmail.com> >

Date: Thu, 30 Jul 2009 10:02:54 -0700
To: Zheng Shao <zshao9@gmail.com <ht...@gmail.com>  <ht...@gmail.com> >
Cc: <hive-user@hadoop.apache.org <ht...@hadoop.apache.org>  <ht...@hadoop.apache.org> >

Subject: Re: partitions not being created


Based on these describe statements, is what I'm trying to do feasable? I'm basically trying to load the contents of ApiUsageTemp into ApiUsage, with the ApiUsageTemp.requestdate column becoming the ApiUsage.dt partition.


On Wed, Jul 29, 2009 at 9:28 AM, Bill Graham <billgraham@gmail.com <ht...@gmail.com>  <ht...@gmail.com> > wrote:
Sure. The only difference I see is that the ApiUsage has a dt partition, instead of the requestdate column:

hive> describe extended ApiUsage;
OK
user    string
restresource    string
statuscode      int
requesthour     int
numrequests     string
responsetime    string
numslowrequests string
dt      string

Detailed Table Information      Table(tableName:apiusage, dbName:default, owner:grahamb, createTime:1248884801, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string, comment:null), FieldSchema(name:restresource, type:string, comment:null), FieldSchema(name:statuscode, type:int, comment:null), FieldSchema(name:requesthour, type:int, comment:null), FieldSchema(name:numrequests, type:string, comment:null), FieldSchema(name:responsetime, type:string, comment:null), FieldSchema(name:numslowrequests, type:string, comment:null)], location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage <http://c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage> , inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{field.delim= , serialization.format= }), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[FieldSchema(name:dt, type:string, comment:null)], parameters:{})

Time taken: 0.277 seconds
hive> describe extended ApiUsageTemp;
OK
user    string
restresource    string
statuscode      int
requestdate     string
requesthour     int
numrequests     string
responsetime    string
numslowrequests string

Detailed Table Information      Table(tableName:apiusagetemp, dbName:default, owner:grahamb, createTime:1248466925, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string, comment:null), FieldSchema(name:restresource, type:string, comment:null), FieldSchema(name:statuscode, type:int, comment:null), FieldSchema(name:requestdate, type:string, comment:null), FieldSchema(name:requesthour, type:int, comment:null), FieldSchema(name:numrequests, type:string, comment:null), FieldSchema(name:responsetime, type:string, comment:null), FieldSchema(name:numslowrequests, type:string, comment:null)], location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage <http://c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage> , inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{field.delim= , serialization.format= }), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[], parameters:{last_modified_time=1248826696, last_modified_by=app})

Time taken: 0.235 seconds



On Tue, Jul 28, 2009 at 9:03 PM, Zheng Shao <zshao9@gmail.com <ht...@gmail.com>  <ht...@gmail.com> > wrote:
Can you send the output of these 2 commands?

describe extended ApiUsage;
describe extended ApiUsageTemp;


Zheng

On Tue, Jul 28, 2009 at 6:29 PM, Bill Graham<billgraham@gmail.com <ht...@gmail.com>  <ht...@gmail.com> > wrote:
> Thanks for the tip, but it fails in the same way when I use a string.
>
> On Tue, Jul 28, 2009 at 6:21 PM, David Lerman <dlerman@videoegg.com <ht...@videoegg.com>  <ht...@videoegg.com> > wrote:
>>
>> >> hive> create table partTable (a string, b int) partitioned by (dt int);
>>
>> > INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518")
>> > SELECT `(requestDate)?+.+` FROM ApiUsageTemp WHERE requestDate =
>> > '2009/05/18'
>>
>> The table has an int partition column (dt), but you're trying to set a
>> string value (dt = "20090518").
>>
>> Try :
>>
>> create table partTable (a string, b int) partitioned by (dt string);
>>
>> and then do your insert.
>>
>
>



--
Yours,
Zheng








Re: partitions not being created

Posted by Bill Graham <bi...@gmail.com>.
That file contains a similar error as the Hive Server logs:

2009-07-30 11:44:21,095 WARN  mapred.JobClient
(JobClient.java:configureCommandLineOptions(510)) - Use GenericOptionsParser
for parsing the arguments. Applications should implement Tool for the same.
2009-07-30 11:44:48,070 WARN  mapred.JobClient
(JobClient.java:configureCommandLineOptions(510)) - Use GenericOptionsParser
for parsing the arguments. Applications should implement Tool for the same.
2009-07-30 11:45:27,796 ERROR metadata.Hive (Hive.java:getPartition(588)) -
org.apache.thrift.TApplicationException: get_partition failed: unknown
result
        at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partition(ThriftHiveMetastore.java:784)
        at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partition(ThriftHiveMetastore.java:752)
        at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartition(HiveMetaStoreClient.java:415)
        at
org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:579)
        at
org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:466)
        at
org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:135)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:335)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:241)
        at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:122)
        at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:165)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:258)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

2009-07-30 11:45:27,797 ERROR exec.MoveTask
(SessionState.java:printError(279)) - Failed with exception
org.apache.thrift.TApplicationException: get_partition failed: unknown
result
org.apache.hadoop.hive.ql.metadata.HiveException:
org.apache.thrift.TApplicationException: get_partition failed: unknown
result
        at
org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:589)
        at
org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:466)
        at
org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:135)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:335)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:241)
        at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:122)
        at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:165)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:258)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
Caused by: org.apache.thrift.TApplicationException: get_partition failed:
unknown result
        at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partition(ThriftHiveMetastore.java:784)
        at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partition(ThriftHiveMetastore.java:752)
        at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartition(HiveMetaStoreClient.java:415)
        at
org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:579)
        ... 16 more

2009-07-30 11:45:27,798 ERROR ql.Driver (SessionState.java:printError(279))
- FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MoveTask

On Thu, Jul 30, 2009 at 11:33 AM, Prasad Chakka <pc...@facebook.com>wrote:

>
> The hive logs go into /tmp/$USER/hive.log not hive_job_log*.txt.
>
> ------------------------------
> *From: *Bill Graham <bi...@gmail.com>
> *Reply-To: *<bi...@gmail.com>
> *Date: *Thu, 30 Jul 2009 10:52:06 -0700
> *To: *Prasad Chakka <pc...@facebook.com>
> *Cc: *<hi...@hadoop.apache.org>, Zheng Shao <zs...@gmail.com>
> *Subject: *Re: partitions not being created
>
> I'm trying to set a string to a string and I'm seeing this error. I also
> had an attempt where it was a string to an int, and I also saw the same
> error.
>
> The /tmp/$USER/hive_job_log*.txt file doesn't contain any exceptions, but
> I've included it's output below. Only the Hive server logs show the
> exceptions listed above. (Note that the table I'm loading from in this log
> output is ApiUsageSmall, which is identical to ApiUsageTemp. For some reason
> the data from ApiUsageTemp is now gone.)
>
> QueryStart QUERY_STRING="INSERT OVERWRITE TABLE ApiUsage PARTITION (dt =
> "20090518") SELECT `(requestDate)?+.+` FROM ApiUsageSmall WHERE requestDate
> = '2009/05/18'" QUERY_ID="app_20090730104242" TIME="1248975752235"
> TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver"
> TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TIME="1248975752235"
> TaskProgress TASK_HADOOP_PROGRESS="2009-07-30 10:42:34,783 map = 0%,
> reduce =0%" TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver"
> TASK_COUNTERS="Job Counters .Launched map tasks:1,Job Counters .Data-local
> map tasks:1" TASK_ID="Stage-1" QUERY_ID="app_20090730104242"
> TASK_HADOOP_ID="job_200906301559_0409" TIME="1248975754785"
> TaskProgress ROWS_INSERTED="apiusage~296" TASK_HADOOP_PROGRESS="2009-07-30
> 10:42:43,031 map = 40%,  reduce =0%"
> TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File
> Systems.HDFS bytes read:23019,File Systems.HDFS bytes written:19178,Job
> Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job
> Counters .Data-local map
> tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:592,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:6,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:296,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce
> Framework.Map input records:302,Map-Reduce Framework.Map input
> bytes:23019,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1"
> QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409"
> TIME="1248975763033"
> TaskProgress ROWS_INSERTED="apiusage~1471" TASK_HADOOP_PROGRESS="2009-07-30
> 10:42:44,068 map = 100%,  reduce =100%"
> TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File
> Systems.HDFS bytes read:114068,File Systems.HDFS bytes written:95275,Job
> Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job
> Counters .Data-local map
> tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:2942,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:27,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:1471,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce
> Framework.Map input records:1498,Map-Reduce Framework.Map input
> bytes:114068,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1"
> QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409"
> TIME="1248975764071"
> TaskEnd ROWS_INSERTED="apiusage~1471" TASK_RET_CODE="0"
> TASK_HADOOP_PROGRESS="2009-07-30 10:42:44,068 map = 100%,  reduce =100%"
> TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File
> Systems.HDFS bytes read:114068,File Systems.HDFS bytes written:95275,Job
> Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job
> Counters .Data-local map
> tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:2942,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:27,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:1471,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce
> Framework.Map input records:1498,Map-Reduce Framework.Map input
> bytes:114068,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1"
> QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409"
> TIME="1248975764199"
> TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.ConditionalTask"
> TASK_ID="Stage-4" QUERY_ID="app_20090730104242" TIME="1248975764199"
> TaskEnd TASK_RET_CODE="0"
> TASK_NAME="org.apache.hadoop.hive.ql.exec.ConditionalTask" TASK_ID="Stage-4"
> QUERY_ID="app_20090730104242" TIME="1248975782277"
> TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.MoveTask"
> TASK_ID="Stage-0" QUERY_ID="app_20090730104242" TIME="1248975782277"
> TaskEnd TASK_RET_CODE="1"
> TASK_NAME="org.apache.hadoop.hive.ql.exec.MoveTask" TASK_ID="Stage-0"
> QUERY_ID="app_20090730104242" TIME="1248975782473"
> QueryEnd ROWS_INSERTED="apiusage~1471" QUERY_STRING="INSERT OVERWRITE TABLE
> ApiUsage PARTITION (dt = "20090518") SELECT `(requestDate)?+.+` FROM
> ApiUsageSmall WHERE requestDate = '2009/05/18'"
> QUERY_ID="app_20090730104242" QUERY_NUM_TASKS="2" TIME="1248975782474"
>
>
>
> On Thu, Jul 30, 2009 at 10:09 AM, Prasad Chakka <pc...@facebook.com>
> wrote:
>
> Are you sure you are getting the same error even with the schema below
> (i.e. trying to set a string to an int column?). Can you give the full stack
> trace that you might see in /tmp/$USER/hive.log?
>
>
> ------------------------------
> *From: *Bill Graham <billgraham@gmail.com <ht...@gmail.com> >
> *Reply-To: *<hive-user@hadoop.apache.org <
> http://hive-user@hadoop.apache.org> >, <billgraham@gmail.com <
> http://billgraham@gmail.com> >
> *Date: *Thu, 30 Jul 2009 10:02:54 -0700
> *To: *Zheng Shao <zshao9@gmail.com <ht...@gmail.com> >
> *Cc: *<hive-user@hadoop.apache.org <ht...@hadoop.apache.org> >
> *Subject: *Re: partitions not being created
>
>
> Based on these describe statements, is what I'm trying to do feasable? I'm
> basically trying to load the contents of ApiUsageTemp into ApiUsage, with
> the ApiUsageTemp.requestdate column becoming the ApiUsage.dt partition.
>
>
> On Wed, Jul 29, 2009 at 9:28 AM, Bill Graham <billgraham@gmail.com <
> http://billgraham@gmail.com> > wrote:
>
> Sure. The only difference I see is that the ApiUsage has a dt partition,
> instead of the requestdate column:
>
> hive> describe extended
> ApiUsage;
> OK
> user    string
> restresource    string
> statuscode      int
> requesthour     int
> numrequests     string
> responsetime    string
> numslowrequests string
> dt      string
>
> Detailed Table Information      Table(tableName:apiusage, dbName:default,
> owner:grahamb, createTime:1248884801, lastAccessTime:0, retention:0,
> sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string,
> comment:null), FieldSchema(name:restresource, type:string, comment:null),
> FieldSchema(name:statuscode, type:int, comment:null),
> FieldSchema(name:requesthour, type:int, comment:null),
> FieldSchema(name:numrequests, type:string, comment:null),
> FieldSchema(name:responsetime, type:string, comment:null),
> FieldSchema(name:numslowrequests, type:string, comment:null)],
> location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage <
> http://c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage> ,
> inputFormat:org.apache.hadoop.mapred.TextInputFormat,
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
> parameters:{field.delim= , serialization.format= }), bucketCols:[],
> sortCols:[], parameters:{}), partitionKeys:[FieldSchema(name:dt,
> type:string, comment:null)], parameters:{})
>
> Time taken: 0.277 seconds
> hive> describe extended ApiUsageTemp;
> OK
> user    string
> restresource    string
> statuscode      int
> requestdate     string
> requesthour     int
> numrequests     string
> responsetime    string
> numslowrequests string
>
> Detailed Table Information      Table(tableName:apiusagetemp,
> dbName:default, owner:grahamb, createTime:1248466925, lastAccessTime:0,
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string,
> comment:null), FieldSchema(name:restresource, type:string, comment:null),
> FieldSchema(name:statuscode, type:int, comment:null),
> FieldSchema(name:requestdate, type:string, comment:null),
> FieldSchema(name:requesthour, type:int, comment:null),
> FieldSchema(name:numrequests, type:string, comment:null),
> FieldSchema(name:responsetime, type:string, comment:null),
> FieldSchema(name:numslowrequests, type:string, comment:null)],
> location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage <
> http://c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage> ,
> inputFormat:org.apache.hadoop.mapred.TextInputFormat,
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
> parameters:{field.delim= , serialization.format= }), bucketCols:[],
> sortCols:[], parameters:{}), partitionKeys:[],
> parameters:{last_modified_time=1248826696, last_modified_by=app})
>
> Time taken: 0.235 seconds
>
>
>
> On Tue, Jul 28, 2009 at 9:03 PM, Zheng Shao <zshao9@gmail.com <
> http://zshao9@gmail.com> > wrote:
>
> Can you send the output of these 2 commands?
>
> describe extended ApiUsage;
> describe extended ApiUsageTemp;
>
>
> Zheng
>
> On Tue, Jul 28, 2009 at 6:29 PM, Bill Graham<billgraham@gmail.com <
> http://billgraham@gmail.com> > wrote:
> > Thanks for the tip, but it fails in the same way when I use a string.
> >
> > On Tue, Jul 28, 2009 at 6:21 PM, David Lerman <dlerman@videoegg.com <
> http://dlerman@videoegg.com> > wrote:
> >>
> >> >> hive> create table partTable (a string, b int) partitioned by (dt
> int);
> >>
> >> > INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518")
> >> > SELECT `(requestDate)?+.+` FROM ApiUsageTemp WHERE requestDate =
> >> > '2009/05/18'
> >>
> >> The table has an int partition column (dt), but you're trying to set a
> >> string value (dt = "20090518").
> >>
> >> Try :
> >>
> >> create table partTable (a string, b int) partitioned by (dt string);
> >>
> >> and then do your insert.
> >>
> >
> >
>
>
>
> --
> Yours,
> Zheng
>
>
>
>
>
>
>

Re: partitions not being created

Posted by Prasad Chakka <pc...@facebook.com>.
The hive logs go into /tmp/$USER/hive.log not hive_job_log*.txt.

________________________________
From: Bill Graham <bi...@gmail.com>
Reply-To: <bi...@gmail.com>
Date: Thu, 30 Jul 2009 10:52:06 -0700
To: Prasad Chakka <pc...@facebook.com>
Cc: <hi...@hadoop.apache.org>, Zheng Shao <zs...@gmail.com>
Subject: Re: partitions not being created

I'm trying to set a string to a string and I'm seeing this error. I also had an attempt where it was a string to an int, and I also saw the same error.

The /tmp/$USER/hive_job_log*.txt file doesn't contain any exceptions, but I've included it's output below. Only the Hive server logs show the exceptions listed above. (Note that the table I'm loading from in this log output is ApiUsageSmall, which is identical to ApiUsageTemp. For some reason the data from ApiUsageTemp is now gone.)

QueryStart QUERY_STRING="INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518") SELECT `(requestDate)?+.+` FROM ApiUsageSmall WHERE requestDate = '2009/05/18'" QUERY_ID="app_20090730104242" TIME="1248975752235"
TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TIME="1248975752235"
TaskProgress TASK_HADOOP_PROGRESS="2009-07-30 10:42:34,783 map = 0%,  reduce =0%" TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="Job Counters .Launched map tasks:1,Job Counters .Data-local map tasks:1" TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409" TIME="1248975754785"
TaskProgress ROWS_INSERTED="apiusage~296" TASK_HADOOP_PROGRESS="2009-07-30 10:42:43,031 map = 40%,  reduce =0%" TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File Systems.HDFS bytes read:23019,File Systems.HDFS bytes written:19178,Job Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job Counters .Data-local map tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:592,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:6,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:296,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce Framework.Map input records:302,Map-Reduce Framework.Map input bytes:23019,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409" TIME="1248975763033"
TaskProgress ROWS_INSERTED="apiusage~1471" TASK_HADOOP_PROGRESS="2009-07-30 10:42:44,068 map = 100%,  reduce =100%" TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File Systems.HDFS bytes read:114068,File Systems.HDFS bytes written:95275,Job Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job Counters .Data-local map tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:2942,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:27,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:1471,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce Framework.Map input records:1498,Map-Reduce Framework.Map input bytes:114068,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409" TIME="1248975764071"
TaskEnd ROWS_INSERTED="apiusage~1471" TASK_RET_CODE="0" TASK_HADOOP_PROGRESS="2009-07-30 10:42:44,068 map = 100%,  reduce =100%" TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File Systems.HDFS bytes read:114068,File Systems.HDFS bytes written:95275,Job Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job Counters .Data-local map tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:2942,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:27,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:1471,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce Framework.Map input records:1498,Map-Reduce Framework.Map input bytes:114068,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409" TIME="1248975764199"
TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.ConditionalTask" TASK_ID="Stage-4" QUERY_ID="app_20090730104242" TIME="1248975764199"
TaskEnd TASK_RET_CODE="0" TASK_NAME="org.apache.hadoop.hive.ql.exec.ConditionalTask" TASK_ID="Stage-4" QUERY_ID="app_20090730104242" TIME="1248975782277"
TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.MoveTask" TASK_ID="Stage-0" QUERY_ID="app_20090730104242" TIME="1248975782277"
TaskEnd TASK_RET_CODE="1" TASK_NAME="org.apache.hadoop.hive.ql.exec.MoveTask" TASK_ID="Stage-0" QUERY_ID="app_20090730104242" TIME="1248975782473"
QueryEnd ROWS_INSERTED="apiusage~1471" QUERY_STRING="INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518") SELECT `(requestDate)?+.+` FROM ApiUsageSmall WHERE requestDate = '2009/05/18'" QUERY_ID="app_20090730104242" QUERY_NUM_TASKS="2" TIME="1248975782474"



On Thu, Jul 30, 2009 at 10:09 AM, Prasad Chakka <pc...@facebook.com> wrote:
Are you sure you are getting the same error even with the schema below (i.e. trying to set a string to an int column?). Can you give the full stack trace that you might see in /tmp/$USER/hive.log?


________________________________
From: Bill Graham <billgraham@gmail.com <ht...@gmail.com> >
Reply-To: <hive-user@hadoop.apache.org <ht...@hadoop.apache.org> >, <billgraham@gmail.com <ht...@gmail.com> >
Date: Thu, 30 Jul 2009 10:02:54 -0700
To: Zheng Shao <zshao9@gmail.com <ht...@gmail.com> >
Cc: <hive-user@hadoop.apache.org <ht...@hadoop.apache.org> >
Subject: Re: partitions not being created


Based on these describe statements, is what I'm trying to do feasable? I'm basically trying to load the contents of ApiUsageTemp into ApiUsage, with the ApiUsageTemp.requestdate column becoming the ApiUsage.dt partition.


On Wed, Jul 29, 2009 at 9:28 AM, Bill Graham <billgraham@gmail.com <ht...@gmail.com> > wrote:
Sure. The only difference I see is that the ApiUsage has a dt partition, instead of the requestdate column:

hive> describe extended ApiUsage;
OK
user    string
restresource    string
statuscode      int
requesthour     int
numrequests     string
responsetime    string
numslowrequests string
dt      string

Detailed Table Information      Table(tableName:apiusage, dbName:default, owner:grahamb, createTime:1248884801, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string, comment:null), FieldSchema(name:restresource, type:string, comment:null), FieldSchema(name:statuscode, type:int, comment:null), FieldSchema(name:requesthour, type:int, comment:null), FieldSchema(name:numrequests, type:string, comment:null), FieldSchema(name:responsetime, type:string, comment:null), FieldSchema(name:numslowrequests, type:string, comment:null)], location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage <http://c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage> , inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{field.delim= , serialization.format= }), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[FieldSchema(name:dt, type:string, comment:null)], parameters:{})

Time taken: 0.277 seconds
hive> describe extended ApiUsageTemp;
OK
user    string
restresource    string
statuscode      int
requestdate     string
requesthour     int
numrequests     string
responsetime    string
numslowrequests string

Detailed Table Information      Table(tableName:apiusagetemp, dbName:default, owner:grahamb, createTime:1248466925, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string, comment:null), FieldSchema(name:restresource, type:string, comment:null), FieldSchema(name:statuscode, type:int, comment:null), FieldSchema(name:requestdate, type:string, comment:null), FieldSchema(name:requesthour, type:int, comment:null), FieldSchema(name:numrequests, type:string, comment:null), FieldSchema(name:responsetime, type:string, comment:null), FieldSchema(name:numslowrequests, type:string, comment:null)], location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage <http://c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage> , inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{field.delim= , serialization.format= }), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[], parameters:{last_modified_time=1248826696, last_modified_by=app})

Time taken: 0.235 seconds



On Tue, Jul 28, 2009 at 9:03 PM, Zheng Shao <zshao9@gmail.com <ht...@gmail.com> > wrote:
Can you send the output of these 2 commands?

describe extended ApiUsage;
describe extended ApiUsageTemp;


Zheng

On Tue, Jul 28, 2009 at 6:29 PM, Bill Graham<billgraham@gmail.com <ht...@gmail.com> > wrote:
> Thanks for the tip, but it fails in the same way when I use a string.
>
> On Tue, Jul 28, 2009 at 6:21 PM, David Lerman <dlerman@videoegg.com <ht...@videoegg.com> > wrote:
>>
>> >> hive> create table partTable (a string, b int) partitioned by (dt int);
>>
>> > INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518")
>> > SELECT `(requestDate)?+.+` FROM ApiUsageTemp WHERE requestDate =
>> > '2009/05/18'
>>
>> The table has an int partition column (dt), but you're trying to set a
>> string value (dt = "20090518").
>>
>> Try :
>>
>> create table partTable (a string, b int) partitioned by (dt string);
>>
>> and then do your insert.
>>
>
>



--
Yours,
Zheng






Re: partitions not being created

Posted by Bill Graham <bi...@gmail.com>.
I'm trying to set a string to a string and I'm seeing this error. I also had
an attempt where it was a string to an int, and I also saw the same error.

The /tmp/$USER/hive_job_log*.txt file doesn't contain any exceptions, but
I've included it's output below. Only the Hive server logs show the
exceptions listed above. (Note that the table I'm loading from in this log
output is ApiUsageSmall, which is identical to ApiUsageTemp. For some reason
the data from ApiUsageTemp is now gone.)

QueryStart QUERY_STRING="INSERT OVERWRITE TABLE ApiUsage PARTITION (dt =
"20090518") SELECT `(requestDate)?+.+` FROM ApiUsageSmall WHERE requestDate
= '2009/05/18'" QUERY_ID="app_20090730104242" TIME="1248975752235"
TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver"
TASK_ID="Stage-1" QUERY_ID="app_20090730104242" TIME="1248975752235"
TaskProgress TASK_HADOOP_PROGRESS="2009-07-30 10:42:34,783 map = 0%,  reduce
=0%" TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver"
TASK_COUNTERS="Job Counters .Launched map tasks:1,Job Counters .Data-local
map tasks:1" TASK_ID="Stage-1" QUERY_ID="app_20090730104242"
TASK_HADOOP_ID="job_200906301559_0409" TIME="1248975754785"
TaskProgress ROWS_INSERTED="apiusage~296" TASK_HADOOP_PROGRESS="2009-07-30
10:42:43,031 map = 40%,  reduce =0%"
TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File
Systems.HDFS bytes read:23019,File Systems.HDFS bytes written:19178,Job
Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job
Counters .Data-local map
tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:592,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:6,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:296,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce
Framework.Map input records:302,Map-Reduce Framework.Map input
bytes:23019,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1"
QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409"
TIME="1248975763033"
TaskProgress ROWS_INSERTED="apiusage~1471" TASK_HADOOP_PROGRESS="2009-07-30
10:42:44,068 map = 100%,  reduce =100%"
TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File
Systems.HDFS bytes read:114068,File Systems.HDFS bytes written:95275,Job
Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job
Counters .Data-local map
tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:2942,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:27,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:1471,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce
Framework.Map input records:1498,Map-Reduce Framework.Map input
bytes:114068,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1"
QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409"
TIME="1248975764071"
TaskEnd ROWS_INSERTED="apiusage~1471" TASK_RET_CODE="0"
TASK_HADOOP_PROGRESS="2009-07-30 10:42:44,068 map = 100%,  reduce =100%"
TASK_NAME="org.apache.hadoop.hive.ql.exec.ExecDriver" TASK_COUNTERS="File
Systems.HDFS bytes read:114068,File Systems.HDFS bytes written:95275,Job
Counters .Rack-local map tasks:2,Job Counters .Launched map tasks:5,Job
Counters .Data-local map
tasks:3,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.PASSED:2942,org.apache.hadoop.hive.ql.exec.FilterOperator$Counter.FILTERED:27,org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum.TABLE_ID_1_ROWCOUNT:1471,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce
Framework.Map input records:1498,Map-Reduce Framework.Map input
bytes:114068,Map-Reduce Framework.Map output records:0" TASK_ID="Stage-1"
QUERY_ID="app_20090730104242" TASK_HADOOP_ID="job_200906301559_0409"
TIME="1248975764199"
TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.ConditionalTask"
TASK_ID="Stage-4" QUERY_ID="app_20090730104242" TIME="1248975764199"
TaskEnd TASK_RET_CODE="0"
TASK_NAME="org.apache.hadoop.hive.ql.exec.ConditionalTask" TASK_ID="Stage-4"
QUERY_ID="app_20090730104242" TIME="1248975782277"
TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.MoveTask"
TASK_ID="Stage-0" QUERY_ID="app_20090730104242" TIME="1248975782277"
TaskEnd TASK_RET_CODE="1"
TASK_NAME="org.apache.hadoop.hive.ql.exec.MoveTask" TASK_ID="Stage-0"
QUERY_ID="app_20090730104242" TIME="1248975782473"
QueryEnd ROWS_INSERTED="apiusage~1471" QUERY_STRING="INSERT OVERWRITE TABLE
ApiUsage PARTITION (dt = "20090518") SELECT `(requestDate)?+.+` FROM
ApiUsageSmall WHERE requestDate = '2009/05/18'"
QUERY_ID="app_20090730104242" QUERY_NUM_TASKS="2" TIME="1248975782474"



On Thu, Jul 30, 2009 at 10:09 AM, Prasad Chakka <pc...@facebook.com>wrote:

>  Are you sure you are getting the same error even with the schema below
> (i.e. trying to set a string to an int column?). Can you give the full stack
> trace that you might see in /tmp/$USER/hive.log?
>
>
> ------------------------------
> *From: *Bill Graham <bi...@gmail.com>
> *Reply-To: *<hi...@hadoop.apache.org>, <bi...@gmail.com>
> *Date: *Thu, 30 Jul 2009 10:02:54 -0700
> *To: *Zheng Shao <zs...@gmail.com>
> *Cc: *<hi...@hadoop.apache.org>
> *Subject: *Re: partitions not being created
>
> Based on these describe statements, is what I'm trying to do feasable? I'm
> basically trying to load the contents of ApiUsageTemp into ApiUsage, with
> the ApiUsageTemp.requestdate column becoming the ApiUsage.dt partition.
>
>
> On Wed, Jul 29, 2009 at 9:28 AM, Bill Graham <bi...@gmail.com> wrote:
>
> Sure. The only difference I see is that the ApiUsage has a dt partition,
> instead of the requestdate column:
>
> hive> describe extended
> ApiUsage;
> OK
> user    string
> restresource    string
> statuscode      int
> requesthour     int
> numrequests     string
> responsetime    string
> numslowrequests string
> dt      string
>
> Detailed Table Information      Table(tableName:apiusage, dbName:default,
> owner:grahamb, createTime:1248884801, lastAccessTime:0, retention:0,
> sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string,
> comment:null), FieldSchema(name:restresource, type:string, comment:null),
> FieldSchema(name:statuscode, type:int, comment:null),
> FieldSchema(name:requesthour, type:int, comment:null),
> FieldSchema(name:numrequests, type:string, comment:null),
> FieldSchema(name:responsetime, type:string, comment:null),
> FieldSchema(name:numslowrequests, type:string, comment:null)],
> location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage <
> http://c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage> ,
> inputFormat:org.apache.hadoop.mapred.TextInputFormat,
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
> parameters:{field.delim= , serialization.format= }), bucketCols:[],
> sortCols:[], parameters:{}), partitionKeys:[FieldSchema(name:dt,
> type:string, comment:null)], parameters:{})
> Time taken: 0.277 seconds
> hive> describe extended ApiUsageTemp;
> OK
> user    string
> restresource    string
> statuscode      int
> requestdate     string
> requesthour     int
> numrequests     string
> responsetime    string
> numslowrequests string
>
> Detailed Table Information      Table(tableName:apiusagetemp,
> dbName:default, owner:grahamb, createTime:1248466925, lastAccessTime:0,
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string,
> comment:null), FieldSchema(name:restresource, type:string, comment:null),
> FieldSchema(name:statuscode, type:int, comment:null),
> FieldSchema(name:requestdate, type:string, comment:null),
> FieldSchema(name:requesthour, type:int, comment:null),
> FieldSchema(name:numrequests, type:string, comment:null),
> FieldSchema(name:responsetime, type:string, comment:null),
> FieldSchema(name:numslowrequests, type:string, comment:null)],
> location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage <
> http://c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage> ,
> inputFormat:org.apache.hadoop.mapred.TextInputFormat,
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
> parameters:{field.delim= , serialization.format= }), bucketCols:[],
> sortCols:[], parameters:{}), partitionKeys:[],
> parameters:{last_modified_time=1248826696, last_modified_by=app})
> Time taken: 0.235 seconds
>
>
>
> On Tue, Jul 28, 2009 at 9:03 PM, Zheng Shao <zs...@gmail.com> wrote:
>
> Can you send the output of these 2 commands?
>
> describe extended ApiUsage;
> describe extended ApiUsageTemp;
>
>
> Zheng
>
> On Tue, Jul 28, 2009 at 6:29 PM, Bill Graham<bi...@gmail.com> wrote:
> > Thanks for the tip, but it fails in the same way when I use a string.
> >
> > On Tue, Jul 28, 2009 at 6:21 PM, David Lerman <dl...@videoegg.com>
> wrote:
> >>
> >> >> hive> create table partTable (a string, b int) partitioned by (dt
> int);
> >>
> >> > INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518")
> >> > SELECT `(requestDate)?+.+` FROM ApiUsageTemp WHERE requestDate =
> >> > '2009/05/18'
> >>
> >> The table has an int partition column (dt), but you're trying to set a
> >> string value (dt = "20090518").
> >>
> >> Try :
> >>
> >> create table partTable (a string, b int) partitioned by (dt string);
> >>
> >> and then do your insert.
> >>
> >
> >
>
>
>
> --
> Yours,
> Zheng
>
>
>
>
>

Re: partitions not being created

Posted by Prasad Chakka <pc...@facebook.com>.
Are you sure you are getting the same error even with the schema below (i.e. trying to set a string to an int column?). Can you give the full stack trace that you might see in /tmp/$USER/hive.log?


________________________________
From: Bill Graham <bi...@gmail.com>
Reply-To: <hi...@hadoop.apache.org>, <bi...@gmail.com>
Date: Thu, 30 Jul 2009 10:02:54 -0700
To: Zheng Shao <zs...@gmail.com>
Cc: <hi...@hadoop.apache.org>
Subject: Re: partitions not being created

Based on these describe statements, is what I'm trying to do feasable? I'm basically trying to load the contents of ApiUsageTemp into ApiUsage, with the ApiUsageTemp.requestdate column becoming the ApiUsage.dt partition.


On Wed, Jul 29, 2009 at 9:28 AM, Bill Graham <bi...@gmail.com> wrote:
Sure. The only difference I see is that the ApiUsage has a dt partition, instead of the requestdate column:

hive> describe extended ApiUsage;
OK
user    string
restresource    string
statuscode      int
requesthour     int
numrequests     string
responsetime    string
numslowrequests string
dt      string

Detailed Table Information      Table(tableName:apiusage, dbName:default, owner:grahamb, createTime:1248884801, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string, comment:null), FieldSchema(name:restresource, type:string, comment:null), FieldSchema(name:statuscode, type:int, comment:null), FieldSchema(name:requesthour, type:int, comment:null), FieldSchema(name:numrequests, type:string, comment:null), FieldSchema(name:responsetime, type:string, comment:null), FieldSchema(name:numslowrequests, type:string, comment:null)], location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage <http://c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage> , inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{field.delim= , serialization.format= }), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[FieldSchema(name:dt, type:string, comment:null)], parameters:{})
Time taken: 0.277 seconds
hive> describe extended ApiUsageTemp;
OK
user    string
restresource    string
statuscode      int
requestdate     string
requesthour     int
numrequests     string
responsetime    string
numslowrequests string

Detailed Table Information      Table(tableName:apiusagetemp, dbName:default, owner:grahamb, createTime:1248466925, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string, comment:null), FieldSchema(name:restresource, type:string, comment:null), FieldSchema(name:statuscode, type:int, comment:null), FieldSchema(name:requestdate, type:string, comment:null), FieldSchema(name:requesthour, type:int, comment:null), FieldSchema(name:numrequests, type:string, comment:null), FieldSchema(name:responsetime, type:string, comment:null), FieldSchema(name:numslowrequests, type:string, comment:null)], location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage <http://c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage> , inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{field.delim= , serialization.format= }), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[], parameters:{last_modified_time=1248826696, last_modified_by=app})
Time taken: 0.235 seconds



On Tue, Jul 28, 2009 at 9:03 PM, Zheng Shao <zs...@gmail.com> wrote:
Can you send the output of these 2 commands?

describe extended ApiUsage;
describe extended ApiUsageTemp;


Zheng

On Tue, Jul 28, 2009 at 6:29 PM, Bill Graham<bi...@gmail.com> wrote:
> Thanks for the tip, but it fails in the same way when I use a string.
>
> On Tue, Jul 28, 2009 at 6:21 PM, David Lerman <dl...@videoegg.com> wrote:
>>
>> >> hive> create table partTable (a string, b int) partitioned by (dt int);
>>
>> > INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518")
>> > SELECT `(requestDate)?+.+` FROM ApiUsageTemp WHERE requestDate =
>> > '2009/05/18'
>>
>> The table has an int partition column (dt), but you're trying to set a
>> string value (dt = "20090518").
>>
>> Try :
>>
>> create table partTable (a string, b int) partitioned by (dt string);
>>
>> and then do your insert.
>>
>
>



--
Yours,
Zheng




Re: partitions not being created

Posted by Bill Graham <bi...@gmail.com>.
Based on these describe statements, is what I'm trying to do feasable? I'm
basically trying to load the contents of ApiUsageTemp into ApiUsage, with
the ApiUsageTemp.requestdate column becoming the ApiUsage.dt partition.


On Wed, Jul 29, 2009 at 9:28 AM, Bill Graham <bi...@gmail.com> wrote:

> Sure. The only difference I see is that the ApiUsage has a dt partition,
> instead of the requestdate column:
>
> hive> describe extended
> ApiUsage;
> OK
> user    string
> restresource    string
> statuscode      int
> requesthour     int
> numrequests     string
> responsetime    string
> numslowrequests string
> dt      string
>
> Detailed Table Information      Table(tableName:apiusage, dbName:default,
> owner:grahamb, createTime:1248884801, lastAccessTime:0, retention:0,
> sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string,
> comment:null), FieldSchema(name:restresource, type:string, comment:null),
> FieldSchema(name:statuscode, type:int, comment:null),
> FieldSchema(name:requesthour, type:int, comment:null),
> FieldSchema(name:numrequests, type:string, comment:null),
> FieldSchema(name:responsetime, type:string, comment:null),
> FieldSchema(name:numslowrequests, type:string, comment:null)],
> location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage<http://c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage>,
> inputFormat:org.apache.hadoop.mapred.TextInputFormat,
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
> parameters:{field.delim= , serialization.format= }), bucketCols:[],
> sortCols:[], parameters:{}), partitionKeys:[FieldSchema(name:dt,
> type:string, comment:null)], parameters:{})
> Time taken: 0.277 seconds
> hive> describe extended ApiUsageTemp;
> OK
> user    string
> restresource    string
> statuscode      int
> requestdate     string
> requesthour     int
> numrequests     string
> responsetime    string
> numslowrequests string
>
> Detailed Table Information      Table(tableName:apiusagetemp,
> dbName:default, owner:grahamb, createTime:1248466925, lastAccessTime:0,
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string,
> comment:null), FieldSchema(name:restresource, type:string, comment:null),
> FieldSchema(name:statuscode, type:int, comment:null),
> FieldSchema(name:requestdate, type:string, comment:null),
> FieldSchema(name:requesthour, type:int, comment:null),
> FieldSchema(name:numrequests, type:string, comment:null),
> FieldSchema(name:responsetime, type:string, comment:null),
> FieldSchema(name:numslowrequests, type:string, comment:null)],
> location:hdfs://xxxxxxx:9000/user/hive/warehouse/apiusage<http://c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage>,
> inputFormat:org.apache.hadoop.mapred.TextInputFormat,
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
> parameters:{field.delim= , serialization.format= }), bucketCols:[],
> sortCols:[], parameters:{}), partitionKeys:[],
> parameters:{last_modified_time=1248826696, last_modified_by=app})
> Time taken: 0.235 seconds
>
>
>
> On Tue, Jul 28, 2009 at 9:03 PM, Zheng Shao <zs...@gmail.com> wrote:
>
>> Can you send the output of these 2 commands?
>>
>> describe extended ApiUsage;
>> describe extended ApiUsageTemp;
>>
>>
>> Zheng
>>
>> On Tue, Jul 28, 2009 at 6:29 PM, Bill Graham<bi...@gmail.com> wrote:
>> > Thanks for the tip, but it fails in the same way when I use a string.
>> >
>> > On Tue, Jul 28, 2009 at 6:21 PM, David Lerman <dl...@videoegg.com>
>> wrote:
>> >>
>> >> >> hive> create table partTable (a string, b int) partitioned by (dt
>> int);
>> >>
>> >> > INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518")
>> >> > SELECT `(requestDate)?+.+` FROM ApiUsageTemp WHERE requestDate =
>> >> > '2009/05/18'
>> >>
>> >> The table has an int partition column (dt), but you're trying to set a
>> >> string value (dt = "20090518").
>> >>
>> >> Try :
>> >>
>> >> create table partTable (a string, b int) partitioned by (dt string);
>> >>
>> >> and then do your insert.
>> >>
>> >
>> >
>>
>>
>>
>> --
>> Yours,
>> Zheng
>>
>
>

Re: partitions not being created

Posted by Bill Graham <bi...@gmail.com>.
Sure. The only difference I see is that the ApiUsage has a dt partition,
instead of the requestdate column:

hive> describe extended
ApiUsage;
OK
user    string
restresource    string
statuscode      int
requesthour     int
numrequests     string
responsetime    string
numslowrequests string
dt      string

Detailed Table Information      Table(tableName:apiusage, dbName:default,
owner:grahamb, createTime:1248884801, lastAccessTime:0, retention:0,
sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string,
comment:null), FieldSchema(name:restresource, type:string, comment:null),
FieldSchema(name:statuscode, type:int, comment:null),
FieldSchema(name:requesthour, type:int, comment:null),
FieldSchema(name:numrequests, type:string, comment:null),
FieldSchema(name:responsetime, type:string, comment:null),
FieldSchema(name:numslowrequests, type:string, comment:null)],
location:hdfs://
c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage,
inputFormat:org.apache.hadoop.mapred.TextInputFormat,
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
parameters:{field.delim= , serialization.format= }), bucketCols:[],
sortCols:[], parameters:{}), partitionKeys:[FieldSchema(name:dt,
type:string, comment:null)], parameters:{})
Time taken: 0.277 seconds
hive> describe extended ApiUsageTemp;
OK
user    string
restresource    string
statuscode      int
requestdate     string
requesthour     int
numrequests     string
responsetime    string
numslowrequests string

Detailed Table Information      Table(tableName:apiusagetemp,
dbName:default, owner:grahamb, createTime:1248466925, lastAccessTime:0,
retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:user, type:string,
comment:null), FieldSchema(name:restresource, type:string, comment:null),
FieldSchema(name:statuscode, type:int, comment:null),
FieldSchema(name:requestdate, type:string, comment:null),
FieldSchema(name:requesthour, type:int, comment:null),
FieldSchema(name:numrequests, type:string, comment:null),
FieldSchema(name:responsetime, type:string, comment:null),
FieldSchema(name:numslowrequests, type:string, comment:null)],
location:hdfs://
c18-ssa-dev40-so-qry1.cnet.com:9000/user/hive/warehouse/apiusage,
inputFormat:org.apache.hadoop.mapred.TextInputFormat,
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
parameters:{field.delim= , serialization.format= }), bucketCols:[],
sortCols:[], parameters:{}), partitionKeys:[],
parameters:{last_modified_time=1248826696, last_modified_by=app})
Time taken: 0.235 seconds


On Tue, Jul 28, 2009 at 9:03 PM, Zheng Shao <zs...@gmail.com> wrote:

> Can you send the output of these 2 commands?
>
> describe extended ApiUsage;
> describe extended ApiUsageTemp;
>
>
> Zheng
>
> On Tue, Jul 28, 2009 at 6:29 PM, Bill Graham<bi...@gmail.com> wrote:
> > Thanks for the tip, but it fails in the same way when I use a string.
> >
> > On Tue, Jul 28, 2009 at 6:21 PM, David Lerman <dl...@videoegg.com>
> wrote:
> >>
> >> >> hive> create table partTable (a string, b int) partitioned by (dt
> int);
> >>
> >> > INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518")
> >> > SELECT `(requestDate)?+.+` FROM ApiUsageTemp WHERE requestDate =
> >> > '2009/05/18'
> >>
> >> The table has an int partition column (dt), but you're trying to set a
> >> string value (dt = "20090518").
> >>
> >> Try :
> >>
> >> create table partTable (a string, b int) partitioned by (dt string);
> >>
> >> and then do your insert.
> >>
> >
> >
>
>
>
> --
> Yours,
> Zheng
>

Re: partitions not being created

Posted by Zheng Shao <zs...@gmail.com>.
Can you send the output of these 2 commands?

describe extended ApiUsage;
describe extended ApiUsageTemp;


Zheng

On Tue, Jul 28, 2009 at 6:29 PM, Bill Graham<bi...@gmail.com> wrote:
> Thanks for the tip, but it fails in the same way when I use a string.
>
> On Tue, Jul 28, 2009 at 6:21 PM, David Lerman <dl...@videoegg.com> wrote:
>>
>> >> hive> create table partTable (a string, b int) partitioned by (dt int);
>>
>> > INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518")
>> > SELECT `(requestDate)?+.+` FROM ApiUsageTemp WHERE requestDate =
>> > '2009/05/18'
>>
>> The table has an int partition column (dt), but you're trying to set a
>> string value (dt = "20090518").
>>
>> Try :
>>
>> create table partTable (a string, b int) partitioned by (dt string);
>>
>> and then do your insert.
>>
>
>



-- 
Yours,
Zheng

Re: partitions not being created

Posted by Bill Graham <bi...@gmail.com>.
Thanks for the tip, but it fails in the same way when I use a string.

On Tue, Jul 28, 2009 at 6:21 PM, David Lerman <dl...@videoegg.com> wrote:

> >> hive> create table partTable (a string, b int) partitioned by (dt int);
>
> > INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518")
> > SELECT `(requestDate)?+.+` FROM ApiUsageTemp WHERE requestDate =
> '2009/05/18'
>
> The table has an int partition column (dt), but you're trying to set a
> string value (dt = "20090518").
>
> Try :
>
> create table partTable (a string, b int) partitioned by (dt string);
>
> and then do your insert.
>
>

Re: partitions not being created

Posted by David Lerman <dl...@videoegg.com>.
>> hive> create table partTable (a string, b int) partitioned by (dt int);

> INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518")
> SELECT `(requestDate)?+.+` FROM ApiUsageTemp WHERE requestDate = '2009/05/18'

The table has an int partition column (dt), but you're trying to set a
string value (dt = "20090518").

Try :

create table partTable (a string, b int) partitioned by (dt string);

and then do your insert.


Re: partitions not being created

Posted by Bill Graham <bi...@gmail.com>.
I see now. Show partitions shows the partitions loaded into the table, not
the metadata about what columns are partitions. That makes sense.

I'm trying to load the data using a select from an un-partitioned table into
another partitioned table, which I suspect could be my problem. Is this not
supported? Here's my actual query, inspired from a helpful post earlier
today re moving tables. (My initial 'partTable' example was contrived.)

INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518")
SELECT `(requestDate)?+.+` FROM ApiUsageTemp WHERE requestDate =
'2009/05/18'

And here's the Hive server log output starting just after the successful MR
job:

09/07/28 18:06:15 INFO exec.FileSinkOperator: Moving tmp dir:
hdfs://XXXXXXX:9000/tmp/hive-app/380904979/_tmp.10002 to:
hdfs://XXXXXXX:9000/tmp/hive-app/380904979/_tmp.10002.intermediate
09/07/28 18:06:15 INFO exec.FileSinkOperator: Moving tmp dir:
hdfs://XXXXXXX:9000/tmp/hive-app/380904979/_tmp.10002.intermediate to:
hdfs://XXXXXXX:9000/tmp/hive-app/380904979/10002
Moving data to: hdfs://XXXXXXX:9000/tmp/hive-app/1178771800/10000
09/07/28 18:06:15 INFO exec.MoveTask: Moving data to:
hdfs://XXXXXXX:9000/tmp/hive-app/1178771800/10000 from
hdfs://XXXXXXX:9000/tmp/hive-app/380904979/10002
Loading data to table apiusage partition {dt=20090518}
09/07/28 18:06:15 INFO exec.MoveTask: Loading data to table apiusage
partition {dt=20090518} from
hdfs://XXXXXXX:9000/tmp/hive-app/1178771800/10000
09/07/28 18:06:15 INFO exec.MoveTask: Partition is: {dt=20090518}
09/07/28 18:06:15 INFO hive.metastore: Trying to connect to metastore with
URI thrift://XXXXXXX:10000
09/07/28 18:06:15 INFO hive.metastore: Connected to metastore.
Hive history file=/tmp/app/hive_job_log_app_200907281806_1309448245.txt
09/07/28 18:06:15 INFO exec.HiveHistory: Hive history
file=/tmp/app/hive_job_log_app_200907281806_1309448245.txt
09/07/28 18:06:15 INFO metastore.HiveMetaStore: 25: get_table : db=default
tbl=apiusage
09/07/28 18:06:15 INFO metastore.HiveMetaStore: 25: Opening raw store with
implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
09/07/28 18:06:15 INFO metastore.ObjectStore: ObjectStore, initialize called
09/07/28 18:06:15 INFO metastore.ObjectStore: Initialized ObjectStore
09/07/28 18:06:15 INFO hive.log: DDL: struct apiusage { string user, string
restresource, i32 requesthour, i32 statuscode, string numrequests, string
responsetime, string numslowrequests}
09/07/28 18:06:15 INFO hive.metastore: Trying to connect to metastore with
URI thrift://XXXXXXX:10000
09/07/28 18:06:15 INFO hive.metastore: Connected to metastore.
Hive history file=/tmp/app/hive_job_log_app_200907281806_46989039.txt
09/07/28 18:06:15 INFO exec.HiveHistory: Hive history
file=/tmp/app/hive_job_log_app_200907281806_46989039.txt
09/07/28 18:06:15 INFO metastore.HiveMetaStore: 24: get_partition :
db=default tbl=apiusage
09/07/28 18:06:15 INFO metastore.HiveMetaStore: 24: Opening raw store with
implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
09/07/28 18:06:15 INFO metastore.ObjectStore: ObjectStore, initialize called
09/07/28 18:06:15 INFO metastore.ObjectStore: Initialized ObjectStore
09/07/28 18:06:15 ERROR metadata.Hive:
org.apache.thrift.TApplicationException: get_partition failed: unknown
result
        at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partition(ThriftHiveMetastore.java:784)
        at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partition(ThriftHiveMetastore.java:752)
        at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartition(HiveMetaStoreClient.java:415)
        at
org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:579)
        at
org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:466)
        at
org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:135)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:335)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:241)
        at
org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:105)
        at
org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:264)
        at
org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:252)
        at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:252)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

Failed with exception org.apache.thrift.TApplicationException: get_partition
failed: unknown result
09/07/28 18:06:15 ERROR exec.MoveTask: Failed with exception
org.apache.thrift.TApplicationException: get_partition failed: unknown
result
org.apache.hadoop.hive.ql.metadata.HiveException:
org.apache.thrift.TApplicationException: get_partition failed: unknown
result
        at
org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:589)
        at
org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:466)
        at
org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:135)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:335)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:241)
        at
org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:105)
        at
org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:264)
        at
org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:252)
        at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:252)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.thrift.TApplicationException: get_partition failed:
unknown result
        at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partition(ThriftHiveMetastore.java:784)
        at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partition(ThriftHiveMetastore.java:752)
        at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartition(HiveMetaStoreClient.java:415)
        at
org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:579)
        ... 11 more

FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MoveTask
09/07/28 18:06:15 ERROR ql.Driver: FAILED: Execution Error, return code 1
from org.apache.hadoop.hive.ql.exec.MoveTask




On Tue, Jul 28, 2009 at 5:57 PM, Namit Jain <nj...@facebook.com> wrote:

>  There are no partitions in the table –
>
>
>
> Can you post the output you get while loading the data ?
>
>
>
> *From:* Bill Graham [mailto:billgraham@gmail.com]
> *Sent:* Tuesday, July 28, 2009 5:54 PM
> *To:* hive-user@hadoop.apache.org
> *Subject:* partitions not being created
>
>
>
> Hi,
>
> I'm trying to create a partitioned table and the partition is not appearing
> for some reason. Am I doing something wrong, or is this a bug? Below are the
> commands I'm executing with their output. Note that the 'show partitions'
> command is not returning anything. If I were to try to load data into this
> table I'd get a 'get_partition failed' error. I'm using bleeding-edge Hive,
> built from the trunk.
>
> hive> create table partTable (a string, b int) partitioned by (dt int);
> OK
> Time taken: 0.308 seconds
> hive> show partitions partTable;
> OK
> Time taken: 0.329 seconds
> hive> describe partTable;
> OK
> a       string
> b       int
> dt      int
> Time taken: 0.181 seconds
>
> thanks,
> Bill
>

RE: partitions not being created

Posted by Namit Jain <nj...@facebook.com>.
There are no partitions in the table -

Can you post the output you get while loading the data ?

From: Bill Graham [mailto:billgraham@gmail.com]
Sent: Tuesday, July 28, 2009 5:54 PM
To: hive-user@hadoop.apache.org
Subject: partitions not being created

Hi,

I'm trying to create a partitioned table and the partition is not appearing for some reason. Am I doing something wrong, or is this a bug? Below are the commands I'm executing with their output. Note that the 'show partitions' command is not returning anything. If I were to try to load data into this table I'd get a 'get_partition failed' error. I'm using bleeding-edge Hive, built from the trunk.

hive> create table partTable (a string, b int) partitioned by (dt int);
OK
Time taken: 0.308 seconds
hive> show partitions partTable;
OK
Time taken: 0.329 seconds
hive> describe partTable;
OK
a       string
b       int
dt      int
Time taken: 0.181 seconds

thanks,
Bill