You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Josh Ferguson <jo...@besquared.net> on 2008/11/26 03:18:09 UTC

Trouble Loading Into External Table

Ok so I'm trying to create an external table and load a delimited  
file into it, then just do a basic select out of it, here is a  
description of my scenario along with steps and results I took.  
Hopefully someone can help me figure out what I'm doing wrong.

# Sample.tab

1227422134|2|1|paid:44519,tax:2120,value:42399

# CREATE TABLE

hive> CREATE EXTERNAL TABLE activity_test
     > (occurred_at INT, actor_id INT, actee_id INT, properties  
MAP<STRING, STRING>)
     > ROW FORMAT DELIMITED
     > FIELDS TERMINATED BY "|"
     > COLLECTION ITEMS TERMINATED BY ","
     > MAP KEYS TERMINATED BY ":"
     > LOCATION '/data/sample';
OK

# LOAD DATA

hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO TABLE  
activity_test;
Copying data from file:/Users/josh/Hive/sample.tab
Loading data to table activity_test
OK

# SELECT OVERWRITE DIRECTORY

hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output'  
SELECT activity_test.occurred_at, activity_test.actor_id,  
activity_test.actee_id, activity_test.properties;
Total MapReduce jobs = 1
Starting Job = job_200811250653_0018, Tracking URL = http://{clipped}: 
50030/jobdetails.jsp?jobid=job_200811250653_0018
Kill Command = /Users/josh/Hadoop/bin/hadoop job  - 
Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0018
  map = 0%,  reduce =0%
  map = 50%,  reduce =0%
  map = 100%,  reduce =0%
Ended Job = job_200811250653_0018
Moving data to: /data/output
OK
Time taken: 72.329 seconds

$ hadoop fs -cat /data/output/*
012{}

This obviously isn't the correct output, and are just some default  
values for those columns, what am I doing wrong?

Thanks

Josh Ferguson

Re: Trouble Loading Into External Table

Posted by Josh Ferguson <jo...@besquared.net>.
Table(tableName:activity_test,dbName:default,owner:josh,createTime: 
1227667482,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols: 
[FieldSchema(name:occurred_at,type:int,comment:null), FieldSchema 
(name:actor_id,type:int,comment:null), FieldSchema 
(name:actee_id,type:int,comment:null), FieldSchema 
(name:properties,type:map<string,string>,comment:null)],location:/ 
data/ 
sample,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat 
:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false 
,numBuckets:-1,serdeInfo:SerDeInfo 
(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.D 
ynamicSerDe,parameters: 
{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.had 
oop.hive.serde2.thrift.TCTLSeparatedProtocol,line.delim=10,field.delim=1 
24}),bucketCols:[],sortCols:[],parameters:{}),partitionKeys: 
[],parameters:{EXTERNAL=TRUE})

collection.deilm is spelled wrong, most likely just a typo somewhere.  
The rest looks like what I specified, you guys will know better.

Thanks for all your help guys.

Josh Ferguson

On Nov 25, 2008, at 7:09 PM, Joydeep Sen Sarma wrote:

>
> Can you please send the output of 'describe extended  
> activity_test'. This will help us understand what's happening with  
> all the create table parameters.
>
> Also - as a sanity check - can you please check hadoop dfs -cat  / 
> data/sample/* (to make sure data got loaded/moved into that dir)
>
> -----Original Message-----
> From: Josh Ferguson [mailto:josh@besquared.net]
> Sent: Tuesday, November 25, 2008 7:03 PM
> To: hive-user@hadoop.apache.org
> Subject: Re: Trouble Loading Into External Table
>
> hive> CREATE EXTERNAL TABLE activity_test
>> (occurred_at INT, actor_id INT, actee_id INT, properties
> MAP<STRING, STRING>)
>> ROW FORMAT DELIMITED
>> FIELDS TERMINATED BY '124'
>> COLLECTION ITEMS TERMINATED BY '44'
>> MAP KEYS TERMINATED BY '58'
>> LINES TERMINATED BY '10'
>> STORED AS TEXTFILE
>> LOCATION '/data/sample';
> OK
>
> hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO TABLE
> activity_test;
> Copying data from file:/Users/josh/Hive/sample.tab
> Loading data to table activity_test
> OK
>
> $ hadoop fs -cat /data/sample/sample.tab
> 1227422134|2|1|paid:44519,tax:2120,value:42399
>
> hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output2'
> SELECT activity_test.occurred_at, activity_test.actor_id,
> activity_test.actee_id, activity_test.properties;
> Total MapReduce jobs = 1
> Starting Job = job_200811250653_0022, Tracking URL = http://{clipped}:
> 50030/jobdetails.jsp?jobid=job_200811250653_0022
> Kill Command = /Users/josh/Hadoop/bin/hadoop job  -
> Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0022
>   map = 0%,  reduce =0%
>   map = 50%,  reduce =0%
>   map = 100%,  reduce =0%
> Ended Job = job_200811250653_0022
> Moving data to: /data/output2
> OK
>
> $ hadoop fs -cat /data/output2/*
> 012{}
>
> Still getting incorrect results, is there anything else I could try?
>
> Josh Ferguson
>
> On Nov 25, 2008, at 6:34 PM, Ashish Thusoo wrote:
>
>> Can you try putting the ascii value within quotes, so for example
>> FIELDS TERMINATED BY '124' etc...
>>
>> You can also look at the following file in the source to see an
>> example of how this is done
>>
>> ql/src/test/queries/clientpositive/input_dynamicserde.q
>>
>> Ashish
>>
>> -----Original Message-----
>> From: Josh Ferguson [mailto:josh@besquared.net]
>> Sent: Tuesday, November 25, 2008 6:18 PM
>> To: hive-user@hadoop.apache.org
>> Subject: Trouble Loading Into External Table
>>
>> Ok so I'm trying to create an external table and load a delimited
>> file into it, then just do a basic select out of it, here is a
>> description of my scenario along with steps and results I took.
>> Hopefully someone can help me figure out what I'm doing wrong.
>>
>> # Sample.tab
>>
>> 1227422134|2|1|paid:44519,tax:2120,value:42399
>>
>> # CREATE TABLE
>>
>> hive> CREATE EXTERNAL TABLE activity_test
>>> (occurred_at INT, actor_id INT, actee_id INT, properties
>>> MAP<STRING, STRING>)
>>> ROW FORMAT DELIMITED
>>> FIELDS TERMINATED BY "|"
>>> COLLECTION ITEMS TERMINATED BY ","
>>> MAP KEYS TERMINATED BY ":"
>>> LOCATION '/data/sample';
>> OK
>>
>> # LOAD DATA
>>
>> hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO TABLE
>> activity_test;
>> Copying data from file:/Users/josh/Hive/sample.tab Loading data to
>> table activity_test OK
>>
>> # SELECT OVERWRITE DIRECTORY
>>
>> hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output'
>> SELECT activity_test.occurred_at, activity_test.actor_id,
>> activity_test.actee_id, activity_test.properties; Total MapReduce
>> jobs = 1 Starting Job = job_200811250653_0018, Tracking URL =
>> http://{clipped}:
>> 50030/jobdetails.jsp?jobid=job_200811250653_0018
>> Kill Command = /Users/josh/Hadoop/bin/hadoop job  -
>> Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0018
>>   map = 0%,  reduce =0%
>>   map = 50%,  reduce =0%
>>   map = 100%,  reduce =0%
>> Ended Job = job_200811250653_0018
>> Moving data to: /data/output
>> OK
>> Time taken: 72.329 seconds
>>
>> $ hadoop fs -cat /data/output/*
>> 012{}
>>
>> This obviously isn't the correct output, and are just some default
>> values for those columns, what am I doing wrong?
>>
>> Thanks
>>
>> Josh Ferguson
>


Re: Trouble Loading Into External Table

Posted by Josh Ferguson <jo...@besquared.net>.
Well in theory yes, I didn't test that. It could be an interaction  
between having a default and not having a default, or an interaction  
with which non-default value you choose. Making a create table that  
uses a field seperator of 1 would allow you to check that.

Josh

On Nov 26, 2008, at 1:19 PM, Pete Wyckoff wrote:

> So, the data then was not field delimited by ‘124’ and must be  
> delimited by ‘1’ ?
>
> So, if you had specified ‘1’ as the field delim instead of ‘124’ in  
> the initial create, it should have worked then??
>
>
> Thanks, pete
>
>
> On 11/26/08 12:33 PM, "Josh Ferguson" <jo...@besquared.net> wrote:
>
> This is the describe extended for the properly working table
>
> hive> DESCRIBE EXTENDED basic;
> OK
> actor_id        int
> actee_id        int
> properties      map<string,string>
> Detailed Table Information:
> Table(tableName:basic,dbName:default,owner:josh,createTime: 
> 1227688761,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols: 
> [FieldSchema(name:actor_id,type:int,comment:null), FieldSchema 
> (name:actee_id,type:int,comment:null), FieldSchema 
> (name:properties,type:map<string,string>,comment:null)],location:/ 
> data/ 
> sample2,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFor 
> mat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed: 
> false,numBuckets:-1,serdeInfo:SerDeInfo 
> (name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type 
> .DynamicSerDe,parameters: 
> {colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.h 
> adoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols: 
> [],sortCols:[],parameters:{}),partitionKeys:[],parameters: 
> {EXTERNAL=TRUE})
>
> Josh Ferguson
>
> On Nov 26, 2008, at 11:25 AM, Pete Wyckoff wrote:
>
>
>  Can you send the new output of describe extended?
>
>  Thanks, pete
>
>
>  On 11/26/08 6:46 AM, "Ashish Thusoo" <at...@facebook.com> wrote:
>
>
> congrats!! :)
>
>  please do file a JIRA for this. We will fix this as soon as  
> possible..
>
>  Ashish
>
>  ________________________________________
>  From: Josh Ferguson [josh@besquared.net]
>  Sent: Wednesday, November 26, 2008 12:44 AM
>  To: hive-user@hadoop.apache.org
>  Subject: Re: Trouble Loading Into External Table
>
>  I got it! The combination I got to work was the following:
>
>  CREATE EXTERNAL TABLE basic
>  (actor_id INT, actee_id INT, properties MAP<STRING, STRING>)
>  ROW FORMAT DELIMITED
>  COLLECTION ITEMS TERMINATED BY '44'
>  MAP KEYS TERMINATED BY '58'
>  STORED AS TEXTFILE
>  LOCATION '/data/sample2';
>
>  This forced a DynamicSerde/TCTLSeperatedProtocol combo which didn't
>  work properly when I specified the field delimiter but works fine
>  when you only specify the other two for some reason and leave the
>  field delimiter as the default, I should probably file it in JIRA.
>
>  Josh Ferguson
>
>  On Nov 25, 2008, at 7:09 PM, Joydeep Sen Sarma wrote:
>
>  >
>  > Can you please send the output of 'describe extended
>  > activity_test'. This will help us understand what's happening with
>  > all the create table parameters.
>  >
>  > Also - as a sanity check - can you please check hadoop dfs -cat  /
>  > data/sample/* (to make sure data got loaded/moved into that dir)
>  >
>  > -----Original Message-----
>  > From: Josh Ferguson [mailto:josh@besquared.net]
>  > Sent: Tuesday, November 25, 2008 7:03 PM
>  > To: hive-user@hadoop.apache.org
>  > Subject: Re: Trouble Loading Into External Table
>  >
>  > hive> CREATE EXTERNAL TABLE activity_test
>  >> (occurred_at INT, actor_id INT, actee_id INT, properties
>  > MAP<STRING, STRING>)
>  >> ROW FORMAT DELIMITED
>  >> FIELDS TERMINATED BY '124'
>  >> COLLECTION ITEMS TERMINATED BY '44'
>  >> MAP KEYS TERMINATED BY '58'
>  >> LINES TERMINATED BY '10'
>  >> STORED AS TEXTFILE
>  >> LOCATION '/data/sample';
>  > OK
>  >
>  > hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO  
> TABLE
>  > activity_test;
>  > Copying data from file:/Users/josh/Hive/sample.tab
>  > Loading data to table activity_test
>  > OK
>  >
>  > $ hadoop fs -cat /data/sample/sample.tab
>  > 1227422134|2|1|paid:44519,tax:2120,value:42399
>  >
>  > hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output2'
>  > SELECT activity_test.occurred_at, activity_test.actor_id,
>  > activity_test.actee_id, activity_test.properties;
>  > Total MapReduce jobs = 1
>  > Starting Job = job_200811250653_0022, Tracking URL = http:// 
> {clipped}:
>  > 50030/jobdetails.jsp?jobid=job_200811250653_0022
>  > Kill Command = /Users/josh/Hadoop/bin/hadoop job  -
>  > Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0022
>  >   map = 0%,  reduce =0%
>  >   map = 50%,  reduce =0%
>  >   map = 100%,  reduce =0%
>  > Ended Job = job_200811250653_0022
>  > Moving data to: /data/output2
>  > OK
>  >
>  > $ hadoop fs -cat /data/output2/*
>  > 012{}
>  >
>  > Still getting incorrect results, is there anything else I could  
> try?
>  >
>  > Josh Ferguson
>  >
>  > On Nov 25, 2008, at 6:34 PM, Ashish Thusoo wrote:
>  >
>  >> Can you try putting the ascii value within quotes, so for example
>  >> FIELDS TERMINATED BY '124' etc...
>  >>
>  >> You can also look at the following file in the source to see an
>  >> example of how this is done
>  >>
>  >> ql/src/test/queries/clientpositive/input_dynamicserde.q
>  >>
>  >> Ashish
>  >>
>  >> -----Original Message-----
>  >> From: Josh Ferguson [mailto:josh@besquared.net]
>  >> Sent: Tuesday, November 25, 2008 6:18 PM
>  >> To: hive-user@hadoop.apache.org
>  >> Subject: Trouble Loading Into External Table
>  >>
>  >> Ok so I'm trying to create an external table and load a delimited
>  >> file into it, then just do a basic select out of it, here is a
>  >> description of my scenario along with steps and results I took.
>  >> Hopefully someone can help me figure out what I'm doing wrong.
>  >>
>  >> # Sample.tab
>  >>
>  >> 1227422134|2|1|paid:44519,tax:2120,value:42399
>  >>
>  >> # CREATE TABLE
>  >>
>  >> hive> CREATE EXTERNAL TABLE activity_test
>  >>> (occurred_at INT, actor_id INT, actee_id INT, properties
>  >>> MAP<STRING, STRING>)
>  >>> ROW FORMAT DELIMITED
>  >>> FIELDS TERMINATED BY "|"
>  >>> COLLECTION ITEMS TERMINATED BY ","
>  >>> MAP KEYS TERMINATED BY ":"
>  >>> LOCATION '/data/sample';
>  >> OK
>  >>
>  >> # LOAD DATA
>  >>
>  >> hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO  
> TABLE
>  >> activity_test;
>  >> Copying data from file:/Users/josh/Hive/sample.tab Loading data to
>  >> table activity_test OK
>  >>
>  >> # SELECT OVERWRITE DIRECTORY
>  >>
>  >> hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output'
>  >> SELECT activity_test.occurred_at, activity_test.actor_id,
>  >> activity_test.actee_id, activity_test.properties; Total MapReduce
>  >> jobs = 1 Starting Job = job_200811250653_0018, Tracking URL =
>  >> http://{clipped}:
>  >> 50030/jobdetails.jsp?jobid=job_200811250653_0018
>  >> Kill Command = /Users/josh/Hadoop/bin/hadoop job  -
>  >> Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0018
>  >>   map = 0%,  reduce =0%
>  >>   map = 50%,  reduce =0%
>  >>   map = 100%,  reduce =0%
>  >> Ended Job = job_200811250653_0018
>  >> Moving data to: /data/output
>  >> OK
>  >> Time taken: 72.329 seconds
>  >>
>  >> $ hadoop fs -cat /data/output/*
>  >> 012{}
>  >>
>  >> This obviously isn't the correct output, and are just some default
>  >> values for those columns, what am I doing wrong?
>  >>
>  >> Thanks
>  >>
>  >> Josh Ferguson
>  >
>
>
>
>
>
>
>
>


Re: Trouble Loading Into External Table

Posted by Pete Wyckoff <pw...@facebook.com>.
So, the data then was not field delimited by '124' and must be delimited by '1' ?

So, if you had specified '1' as the field delim instead of '124' in the initial create, it should have worked then??


Thanks, pete


On 11/26/08 12:33 PM, "Josh Ferguson" <jo...@besquared.net> wrote:

This is the describe extended for the properly working table

hive> DESCRIBE EXTENDED basic;
OK
actor_id        int
actee_id        int
properties      map<string,string>
Detailed Table Information:
Table(tableName:basic,dbName:default,owner:josh,createTime:1227688761,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:actor_id,type:int,comment:null), FieldSchema(name:actee_id,type:int,comment:null), FieldSchema(name:properties,type:map<string,string>,comment:null)],location:/data/sample2,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:-1,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[],sortCols:[],parameters:{}),partitionKeys:[],parameters:{EXTERNAL=TRUE})

Josh Ferguson

On Nov 26, 2008, at 11:25 AM, Pete Wyckoff wrote:


 Can you send the new output of describe extended?

 Thanks, pete


 On 11/26/08 6:46 AM, "Ashish Thusoo" <at...@facebook.com> wrote:


congrats!! :)

 please do file a JIRA for this. We will fix this as soon as possible..

 Ashish

 ________________________________________
 From: Josh Ferguson [josh@besquared.net]
 Sent: Wednesday, November 26, 2008 12:44 AM
 To: hive-user@hadoop.apache.org
 Subject: Re: Trouble Loading Into External Table

 I got it! The combination I got to work was the following:

 CREATE EXTERNAL TABLE basic
 (actor_id INT, actee_id INT, properties MAP<STRING, STRING>)
 ROW FORMAT DELIMITED
 COLLECTION ITEMS TERMINATED BY '44'
 MAP KEYS TERMINATED BY '58'
 STORED AS TEXTFILE
 LOCATION '/data/sample2';

 This forced a DynamicSerde/TCTLSeperatedProtocol combo which didn't
 work properly when I specified the field delimiter but works fine
 when you only specify the other two for some reason and leave the
 field delimiter as the default, I should probably file it in JIRA.

 Josh Ferguson

 On Nov 25, 2008, at 7:09 PM, Joydeep Sen Sarma wrote:

 >
 > Can you please send the output of 'describe extended
 > activity_test'. This will help us understand what's happening with
 > all the create table parameters.
 >
 > Also - as a sanity check - can you please check hadoop dfs -cat  /
 > data/sample/* (to make sure data got loaded/moved into that dir)
 >
 > -----Original Message-----
 > From: Josh Ferguson [mailto:josh@besquared.net]
 > Sent: Tuesday, November 25, 2008 7:03 PM
 > To: hive-user@hadoop.apache.org
 > Subject: Re: Trouble Loading Into External Table
 >
 > hive> CREATE EXTERNAL TABLE activity_test
 >> (occurred_at INT, actor_id INT, actee_id INT, properties
 > MAP<STRING, STRING>)
 >> ROW FORMAT DELIMITED
 >> FIELDS TERMINATED BY '124'
 >> COLLECTION ITEMS TERMINATED BY '44'
 >> MAP KEYS TERMINATED BY '58'
 >> LINES TERMINATED BY '10'
 >> STORED AS TEXTFILE
 >> LOCATION '/data/sample';
 > OK
 >
 > hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO TABLE
 > activity_test;
 > Copying data from file:/Users/josh/Hive/sample.tab
 > Loading data to table activity_test
 > OK
 >
 > $ hadoop fs -cat /data/sample/sample.tab
 > 1227422134|2|1|paid:44519,tax:2120,value:42399
 >
 > hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output2'
 > SELECT activity_test.occurred_at, activity_test.actor_id,
 > activity_test.actee_id, activity_test.properties;
 > Total MapReduce jobs = 1
 > Starting Job = job_200811250653_0022, Tracking URL = http://{clipped}:
 > 50030/jobdetails.jsp?jobid=job_200811250653_0022
 > Kill Command = /Users/josh/Hadoop/bin/hadoop job  -
 > Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0022
 >   map = 0%,  reduce =0%
 >   map = 50%,  reduce =0%
 >   map = 100%,  reduce =0%
 > Ended Job = job_200811250653_0022
 > Moving data to: /data/output2
 > OK
 >
 > $ hadoop fs -cat /data/output2/*
 > 012{}
 >
 > Still getting incorrect results, is there anything else I could try?
 >
 > Josh Ferguson
 >
 > On Nov 25, 2008, at 6:34 PM, Ashish Thusoo wrote:
 >
 >> Can you try putting the ascii value within quotes, so for example
 >> FIELDS TERMINATED BY '124' etc...
 >>
 >> You can also look at the following file in the source to see an
 >> example of how this is done
 >>
 >> ql/src/test/queries/clientpositive/input_dynamicserde.q
 >>
 >> Ashish
 >>
 >> -----Original Message-----
 >> From: Josh Ferguson [mailto:josh@besquared.net]
 >> Sent: Tuesday, November 25, 2008 6:18 PM
 >> To: hive-user@hadoop.apache.org
 >> Subject: Trouble Loading Into External Table
 >>
 >> Ok so I'm trying to create an external table and load a delimited
 >> file into it, then just do a basic select out of it, here is a
 >> description of my scenario along with steps and results I took.
 >> Hopefully someone can help me figure out what I'm doing wrong.
 >>
 >> # Sample.tab
 >>
 >> 1227422134|2|1|paid:44519,tax:2120,value:42399
 >>
 >> # CREATE TABLE
 >>
 >> hive> CREATE EXTERNAL TABLE activity_test
 >>> (occurred_at INT, actor_id INT, actee_id INT, properties
 >>> MAP<STRING, STRING>)
 >>> ROW FORMAT DELIMITED
 >>> FIELDS TERMINATED BY "|"
 >>> COLLECTION ITEMS TERMINATED BY ","
 >>> MAP KEYS TERMINATED BY ":"
 >>> LOCATION '/data/sample';
 >> OK
 >>
 >> # LOAD DATA
 >>
 >> hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO TABLE
 >> activity_test;
 >> Copying data from file:/Users/josh/Hive/sample.tab Loading data to
 >> table activity_test OK
 >>
 >> # SELECT OVERWRITE DIRECTORY
 >>
 >> hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output'
 >> SELECT activity_test.occurred_at, activity_test.actor_id,
 >> activity_test.actee_id, activity_test.properties; Total MapReduce
 >> jobs = 1 Starting Job = job_200811250653_0018, Tracking URL =
 >> http://{clipped}:
 >> 50030/jobdetails.jsp?jobid=job_200811250653_0018
 >> Kill Command = /Users/josh/Hadoop/bin/hadoop job  -
 >> Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0018
 >>   map = 0%,  reduce =0%
 >>   map = 50%,  reduce =0%
 >>   map = 100%,  reduce =0%
 >> Ended Job = job_200811250653_0018
 >> Moving data to: /data/output
 >> OK
 >> Time taken: 72.329 seconds
 >>
 >> $ hadoop fs -cat /data/output/*
 >> 012{}
 >>
 >> This obviously isn't the correct output, and are just some default
 >> values for those columns, what am I doing wrong?
 >>
 >> Thanks
 >>
 >> Josh Ferguson
 >









Re: Trouble Loading Into External Table

Posted by Josh Ferguson <jo...@besquared.net>.
This is the describe extended for the properly working table

hive> DESCRIBE EXTENDED basic;
OK
actor_id        int
actee_id        int
properties      map<string,string>
Detailed Table Information:
Table(tableName:basic,dbName:default,owner:josh,createTime: 
1227688761,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols: 
[FieldSchema(name:actor_id,type:int,comment:null), FieldSchema 
(name:actee_id,type:int,comment:null), FieldSchema 
(name:properties,type:map<string,string>,comment:null)],location:/ 
data/ 
sample2,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputForma 
t:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:fals 
e,numBuckets:-1,serdeInfo:SerDeInfo 
(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.D 
ynamicSerDe,parameters: 
{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.had 
oop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[],sortCols: 
[],parameters:{}),partitionKeys:[],parameters:{EXTERNAL=TRUE})

Josh Ferguson

On Nov 26, 2008, at 11:25 AM, Pete Wyckoff wrote:

>
> Can you send the new output of describe extended?
>
> Thanks, pete
>
>
> On 11/26/08 6:46 AM, "Ashish Thusoo" <at...@facebook.com> wrote:
>
> congrats!! :)
>
> please do file a JIRA for this. We will fix this as soon as possible..
>
> Ashish
>
> ________________________________________
> From: Josh Ferguson [josh@besquared.net]
> Sent: Wednesday, November 26, 2008 12:44 AM
> To: hive-user@hadoop.apache.org
> Subject: Re: Trouble Loading Into External Table
>
> I got it! The combination I got to work was the following:
>
> CREATE EXTERNAL TABLE basic
> (actor_id INT, actee_id INT, properties MAP<STRING, STRING>)
> ROW FORMAT DELIMITED
> COLLECTION ITEMS TERMINATED BY '44'
> MAP KEYS TERMINATED BY '58'
> STORED AS TEXTFILE
> LOCATION '/data/sample2';
>
> This forced a DynamicSerde/TCTLSeperatedProtocol combo which didn't
> work properly when I specified the field delimiter but works fine
> when you only specify the other two for some reason and leave the
> field delimiter as the default, I should probably file it in JIRA.
>
> Josh Ferguson
>
> On Nov 25, 2008, at 7:09 PM, Joydeep Sen Sarma wrote:
>
> >
> > Can you please send the output of 'describe extended
> > activity_test'. This will help us understand what's happening with
> > all the create table parameters.
> >
> > Also - as a sanity check - can you please check hadoop dfs -cat  /
> > data/sample/* (to make sure data got loaded/moved into that dir)
> >
> > -----Original Message-----
> > From: Josh Ferguson [mailto:josh@besquared.net]
> > Sent: Tuesday, November 25, 2008 7:03 PM
> > To: hive-user@hadoop.apache.org
> > Subject: Re: Trouble Loading Into External Table
> >
> > hive> CREATE EXTERNAL TABLE activity_test
> >> (occurred_at INT, actor_id INT, actee_id INT, properties
> > MAP<STRING, STRING>)
> >> ROW FORMAT DELIMITED
> >> FIELDS TERMINATED BY '124'
> >> COLLECTION ITEMS TERMINATED BY '44'
> >> MAP KEYS TERMINATED BY '58'
> >> LINES TERMINATED BY '10'
> >> STORED AS TEXTFILE
> >> LOCATION '/data/sample';
> > OK
> >
> > hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO  
> TABLE
> > activity_test;
> > Copying data from file:/Users/josh/Hive/sample.tab
> > Loading data to table activity_test
> > OK
> >
> > $ hadoop fs -cat /data/sample/sample.tab
> > 1227422134|2|1|paid:44519,tax:2120,value:42399
> >
> > hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output2'
> > SELECT activity_test.occurred_at, activity_test.actor_id,
> > activity_test.actee_id, activity_test.properties;
> > Total MapReduce jobs = 1
> > Starting Job = job_200811250653_0022, Tracking URL = http:// 
> {clipped}:
> > 50030/jobdetails.jsp?jobid=job_200811250653_0022
> > Kill Command = /Users/josh/Hadoop/bin/hadoop job  -
> > Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0022
> >   map = 0%,  reduce =0%
> >   map = 50%,  reduce =0%
> >   map = 100%,  reduce =0%
> > Ended Job = job_200811250653_0022
> > Moving data to: /data/output2
> > OK
> >
> > $ hadoop fs -cat /data/output2/*
> > 012{}
> >
> > Still getting incorrect results, is there anything else I could try?
> >
> > Josh Ferguson
> >
> > On Nov 25, 2008, at 6:34 PM, Ashish Thusoo wrote:
> >
> >> Can you try putting the ascii value within quotes, so for example
> >> FIELDS TERMINATED BY '124' etc...
> >>
> >> You can also look at the following file in the source to see an
> >> example of how this is done
> >>
> >> ql/src/test/queries/clientpositive/input_dynamicserde.q
> >>
> >> Ashish
> >>
> >> -----Original Message-----
> >> From: Josh Ferguson [mailto:josh@besquared.net]
> >> Sent: Tuesday, November 25, 2008 6:18 PM
> >> To: hive-user@hadoop.apache.org
> >> Subject: Trouble Loading Into External Table
> >>
> >> Ok so I'm trying to create an external table and load a delimited
> >> file into it, then just do a basic select out of it, here is a
> >> description of my scenario along with steps and results I took.
> >> Hopefully someone can help me figure out what I'm doing wrong.
> >>
> >> # Sample.tab
> >>
> >> 1227422134|2|1|paid:44519,tax:2120,value:42399
> >>
> >> # CREATE TABLE
> >>
> >> hive> CREATE EXTERNAL TABLE activity_test
> >>> (occurred_at INT, actor_id INT, actee_id INT, properties
> >>> MAP<STRING, STRING>)
> >>> ROW FORMAT DELIMITED
> >>> FIELDS TERMINATED BY "|"
> >>> COLLECTION ITEMS TERMINATED BY ","
> >>> MAP KEYS TERMINATED BY ":"
> >>> LOCATION '/data/sample';
> >> OK
> >>
> >> # LOAD DATA
> >>
> >> hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO  
> TABLE
> >> activity_test;
> >> Copying data from file:/Users/josh/Hive/sample.tab Loading data to
> >> table activity_test OK
> >>
> >> # SELECT OVERWRITE DIRECTORY
> >>
> >> hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output'
> >> SELECT activity_test.occurred_at, activity_test.actor_id,
> >> activity_test.actee_id, activity_test.properties; Total MapReduce
> >> jobs = 1 Starting Job = job_200811250653_0018, Tracking URL =
> >> http://{clipped}:
> >> 50030/jobdetails.jsp?jobid=job_200811250653_0018
> >> Kill Command = /Users/josh/Hadoop/bin/hadoop job  -
> >> Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0018
> >>   map = 0%,  reduce =0%
> >>   map = 50%,  reduce =0%
> >>   map = 100%,  reduce =0%
> >> Ended Job = job_200811250653_0018
> >> Moving data to: /data/output
> >> OK
> >> Time taken: 72.329 seconds
> >>
> >> $ hadoop fs -cat /data/output/*
> >> 012{}
> >>
> >> This obviously isn't the correct output, and are just some default
> >> values for those columns, what am I doing wrong?
> >>
> >> Thanks
> >>
> >> Josh Ferguson
> >
>
>
>


Re: Trouble Loading Into External Table

Posted by Pete Wyckoff <pw...@facebook.com>.
Can you send the new output of describe extended?

Thanks, pete


On 11/26/08 6:46 AM, "Ashish Thusoo" <at...@facebook.com> wrote:

congrats!! :)

please do file a JIRA for this. We will fix this as soon as possible..

Ashish

________________________________________
From: Josh Ferguson [josh@besquared.net]
Sent: Wednesday, November 26, 2008 12:44 AM
To: hive-user@hadoop.apache.org
Subject: Re: Trouble Loading Into External Table

I got it! The combination I got to work was the following:

CREATE EXTERNAL TABLE basic
(actor_id INT, actee_id INT, properties MAP<STRING, STRING>)
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '44'
MAP KEYS TERMINATED BY '58'
STORED AS TEXTFILE
LOCATION '/data/sample2';

This forced a DynamicSerde/TCTLSeperatedProtocol combo which didn't
work properly when I specified the field delimiter but works fine
when you only specify the other two for some reason and leave the
field delimiter as the default, I should probably file it in JIRA.

Josh Ferguson

On Nov 25, 2008, at 7:09 PM, Joydeep Sen Sarma wrote:

>
> Can you please send the output of 'describe extended
> activity_test'. This will help us understand what's happening with
> all the create table parameters.
>
> Also - as a sanity check - can you please check hadoop dfs -cat  /
> data/sample/* (to make sure data got loaded/moved into that dir)
>
> -----Original Message-----
> From: Josh Ferguson [mailto:josh@besquared.net]
> Sent: Tuesday, November 25, 2008 7:03 PM
> To: hive-user@hadoop.apache.org
> Subject: Re: Trouble Loading Into External Table
>
> hive> CREATE EXTERNAL TABLE activity_test
>> (occurred_at INT, actor_id INT, actee_id INT, properties
> MAP<STRING, STRING>)
>> ROW FORMAT DELIMITED
>> FIELDS TERMINATED BY '124'
>> COLLECTION ITEMS TERMINATED BY '44'
>> MAP KEYS TERMINATED BY '58'
>> LINES TERMINATED BY '10'
>> STORED AS TEXTFILE
>> LOCATION '/data/sample';
> OK
>
> hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO TABLE
> activity_test;
> Copying data from file:/Users/josh/Hive/sample.tab
> Loading data to table activity_test
> OK
>
> $ hadoop fs -cat /data/sample/sample.tab
> 1227422134|2|1|paid:44519,tax:2120,value:42399
>
> hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output2'
> SELECT activity_test.occurred_at, activity_test.actor_id,
> activity_test.actee_id, activity_test.properties;
> Total MapReduce jobs = 1
> Starting Job = job_200811250653_0022, Tracking URL = http://{clipped}:
> 50030/jobdetails.jsp?jobid=job_200811250653_0022
> Kill Command = /Users/josh/Hadoop/bin/hadoop job  -
> Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0022
>   map = 0%,  reduce =0%
>   map = 50%,  reduce =0%
>   map = 100%,  reduce =0%
> Ended Job = job_200811250653_0022
> Moving data to: /data/output2
> OK
>
> $ hadoop fs -cat /data/output2/*
> 012{}
>
> Still getting incorrect results, is there anything else I could try?
>
> Josh Ferguson
>
> On Nov 25, 2008, at 6:34 PM, Ashish Thusoo wrote:
>
>> Can you try putting the ascii value within quotes, so for example
>> FIELDS TERMINATED BY '124' etc...
>>
>> You can also look at the following file in the source to see an
>> example of how this is done
>>
>> ql/src/test/queries/clientpositive/input_dynamicserde.q
>>
>> Ashish
>>
>> -----Original Message-----
>> From: Josh Ferguson [mailto:josh@besquared.net]
>> Sent: Tuesday, November 25, 2008 6:18 PM
>> To: hive-user@hadoop.apache.org
>> Subject: Trouble Loading Into External Table
>>
>> Ok so I'm trying to create an external table and load a delimited
>> file into it, then just do a basic select out of it, here is a
>> description of my scenario along with steps and results I took.
>> Hopefully someone can help me figure out what I'm doing wrong.
>>
>> # Sample.tab
>>
>> 1227422134|2|1|paid:44519,tax:2120,value:42399
>>
>> # CREATE TABLE
>>
>> hive> CREATE EXTERNAL TABLE activity_test
>>> (occurred_at INT, actor_id INT, actee_id INT, properties
>>> MAP<STRING, STRING>)
>>> ROW FORMAT DELIMITED
>>> FIELDS TERMINATED BY "|"
>>> COLLECTION ITEMS TERMINATED BY ","
>>> MAP KEYS TERMINATED BY ":"
>>> LOCATION '/data/sample';
>> OK
>>
>> # LOAD DATA
>>
>> hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO TABLE
>> activity_test;
>> Copying data from file:/Users/josh/Hive/sample.tab Loading data to
>> table activity_test OK
>>
>> # SELECT OVERWRITE DIRECTORY
>>
>> hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output'
>> SELECT activity_test.occurred_at, activity_test.actor_id,
>> activity_test.actee_id, activity_test.properties; Total MapReduce
>> jobs = 1 Starting Job = job_200811250653_0018, Tracking URL =
>> http://{clipped}:
>> 50030/jobdetails.jsp?jobid=job_200811250653_0018
>> Kill Command = /Users/josh/Hadoop/bin/hadoop job  -
>> Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0018
>>   map = 0%,  reduce =0%
>>   map = 50%,  reduce =0%
>>   map = 100%,  reduce =0%
>> Ended Job = job_200811250653_0018
>> Moving data to: /data/output
>> OK
>> Time taken: 72.329 seconds
>>
>> $ hadoop fs -cat /data/output/*
>> 012{}
>>
>> This obviously isn't the correct output, and are just some default
>> values for those columns, what am I doing wrong?
>>
>> Thanks
>>
>> Josh Ferguson
>




RE: Trouble Loading Into External Table

Posted by Ashish Thusoo <at...@facebook.com>.
congrats!! :)

please do file a JIRA for this. We will fix this as soon as possible..

Ashish

________________________________________
From: Josh Ferguson [josh@besquared.net]
Sent: Wednesday, November 26, 2008 12:44 AM
To: hive-user@hadoop.apache.org
Subject: Re: Trouble Loading Into External Table

I got it! The combination I got to work was the following:

CREATE EXTERNAL TABLE basic
(actor_id INT, actee_id INT, properties MAP<STRING, STRING>)
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '44'
MAP KEYS TERMINATED BY '58'
STORED AS TEXTFILE
LOCATION '/data/sample2';

This forced a DynamicSerde/TCTLSeperatedProtocol combo which didn't
work properly when I specified the field delimiter but works fine
when you only specify the other two for some reason and leave the
field delimiter as the default, I should probably file it in JIRA.

Josh Ferguson

On Nov 25, 2008, at 7:09 PM, Joydeep Sen Sarma wrote:

>
> Can you please send the output of 'describe extended
> activity_test'. This will help us understand what's happening with
> all the create table parameters.
>
> Also - as a sanity check - can you please check hadoop dfs -cat  /
> data/sample/* (to make sure data got loaded/moved into that dir)
>
> -----Original Message-----
> From: Josh Ferguson [mailto:josh@besquared.net]
> Sent: Tuesday, November 25, 2008 7:03 PM
> To: hive-user@hadoop.apache.org
> Subject: Re: Trouble Loading Into External Table
>
> hive> CREATE EXTERNAL TABLE activity_test
>> (occurred_at INT, actor_id INT, actee_id INT, properties
> MAP<STRING, STRING>)
>> ROW FORMAT DELIMITED
>> FIELDS TERMINATED BY '124'
>> COLLECTION ITEMS TERMINATED BY '44'
>> MAP KEYS TERMINATED BY '58'
>> LINES TERMINATED BY '10'
>> STORED AS TEXTFILE
>> LOCATION '/data/sample';
> OK
>
> hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO TABLE
> activity_test;
> Copying data from file:/Users/josh/Hive/sample.tab
> Loading data to table activity_test
> OK
>
> $ hadoop fs -cat /data/sample/sample.tab
> 1227422134|2|1|paid:44519,tax:2120,value:42399
>
> hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output2'
> SELECT activity_test.occurred_at, activity_test.actor_id,
> activity_test.actee_id, activity_test.properties;
> Total MapReduce jobs = 1
> Starting Job = job_200811250653_0022, Tracking URL = http://{clipped}:
> 50030/jobdetails.jsp?jobid=job_200811250653_0022
> Kill Command = /Users/josh/Hadoop/bin/hadoop job  -
> Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0022
>   map = 0%,  reduce =0%
>   map = 50%,  reduce =0%
>   map = 100%,  reduce =0%
> Ended Job = job_200811250653_0022
> Moving data to: /data/output2
> OK
>
> $ hadoop fs -cat /data/output2/*
> 012{}
>
> Still getting incorrect results, is there anything else I could try?
>
> Josh Ferguson
>
> On Nov 25, 2008, at 6:34 PM, Ashish Thusoo wrote:
>
>> Can you try putting the ascii value within quotes, so for example
>> FIELDS TERMINATED BY '124' etc...
>>
>> You can also look at the following file in the source to see an
>> example of how this is done
>>
>> ql/src/test/queries/clientpositive/input_dynamicserde.q
>>
>> Ashish
>>
>> -----Original Message-----
>> From: Josh Ferguson [mailto:josh@besquared.net]
>> Sent: Tuesday, November 25, 2008 6:18 PM
>> To: hive-user@hadoop.apache.org
>> Subject: Trouble Loading Into External Table
>>
>> Ok so I'm trying to create an external table and load a delimited
>> file into it, then just do a basic select out of it, here is a
>> description of my scenario along with steps and results I took.
>> Hopefully someone can help me figure out what I'm doing wrong.
>>
>> # Sample.tab
>>
>> 1227422134|2|1|paid:44519,tax:2120,value:42399
>>
>> # CREATE TABLE
>>
>> hive> CREATE EXTERNAL TABLE activity_test
>>> (occurred_at INT, actor_id INT, actee_id INT, properties
>>> MAP<STRING, STRING>)
>>> ROW FORMAT DELIMITED
>>> FIELDS TERMINATED BY "|"
>>> COLLECTION ITEMS TERMINATED BY ","
>>> MAP KEYS TERMINATED BY ":"
>>> LOCATION '/data/sample';
>> OK
>>
>> # LOAD DATA
>>
>> hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO TABLE
>> activity_test;
>> Copying data from file:/Users/josh/Hive/sample.tab Loading data to
>> table activity_test OK
>>
>> # SELECT OVERWRITE DIRECTORY
>>
>> hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output'
>> SELECT activity_test.occurred_at, activity_test.actor_id,
>> activity_test.actee_id, activity_test.properties; Total MapReduce
>> jobs = 1 Starting Job = job_200811250653_0018, Tracking URL =
>> http://{clipped}:
>> 50030/jobdetails.jsp?jobid=job_200811250653_0018
>> Kill Command = /Users/josh/Hadoop/bin/hadoop job  -
>> Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0018
>>   map = 0%,  reduce =0%
>>   map = 50%,  reduce =0%
>>   map = 100%,  reduce =0%
>> Ended Job = job_200811250653_0018
>> Moving data to: /data/output
>> OK
>> Time taken: 72.329 seconds
>>
>> $ hadoop fs -cat /data/output/*
>> 012{}
>>
>> This obviously isn't the correct output, and are just some default
>> values for those columns, what am I doing wrong?
>>
>> Thanks
>>
>> Josh Ferguson
>


Re: Trouble Loading Into External Table

Posted by Josh Ferguson <jo...@besquared.net>.
I got it! The combination I got to work was the following:

CREATE EXTERNAL TABLE basic
(actor_id INT, actee_id INT, properties MAP<STRING, STRING>)
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '44'
MAP KEYS TERMINATED BY '58'
STORED AS TEXTFILE
LOCATION '/data/sample2';

This forced a DynamicSerde/TCTLSeperatedProtocol combo which didn't  
work properly when I specified the field delimiter but works fine  
when you only specify the other two for some reason and leave the  
field delimiter as the default, I should probably file it in JIRA.

Josh Ferguson

On Nov 25, 2008, at 7:09 PM, Joydeep Sen Sarma wrote:

>
> Can you please send the output of 'describe extended  
> activity_test'. This will help us understand what's happening with  
> all the create table parameters.
>
> Also - as a sanity check - can you please check hadoop dfs -cat  / 
> data/sample/* (to make sure data got loaded/moved into that dir)
>
> -----Original Message-----
> From: Josh Ferguson [mailto:josh@besquared.net]
> Sent: Tuesday, November 25, 2008 7:03 PM
> To: hive-user@hadoop.apache.org
> Subject: Re: Trouble Loading Into External Table
>
> hive> CREATE EXTERNAL TABLE activity_test
>> (occurred_at INT, actor_id INT, actee_id INT, properties
> MAP<STRING, STRING>)
>> ROW FORMAT DELIMITED
>> FIELDS TERMINATED BY '124'
>> COLLECTION ITEMS TERMINATED BY '44'
>> MAP KEYS TERMINATED BY '58'
>> LINES TERMINATED BY '10'
>> STORED AS TEXTFILE
>> LOCATION '/data/sample';
> OK
>
> hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO TABLE
> activity_test;
> Copying data from file:/Users/josh/Hive/sample.tab
> Loading data to table activity_test
> OK
>
> $ hadoop fs -cat /data/sample/sample.tab
> 1227422134|2|1|paid:44519,tax:2120,value:42399
>
> hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output2'
> SELECT activity_test.occurred_at, activity_test.actor_id,
> activity_test.actee_id, activity_test.properties;
> Total MapReduce jobs = 1
> Starting Job = job_200811250653_0022, Tracking URL = http://{clipped}:
> 50030/jobdetails.jsp?jobid=job_200811250653_0022
> Kill Command = /Users/josh/Hadoop/bin/hadoop job  -
> Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0022
>   map = 0%,  reduce =0%
>   map = 50%,  reduce =0%
>   map = 100%,  reduce =0%
> Ended Job = job_200811250653_0022
> Moving data to: /data/output2
> OK
>
> $ hadoop fs -cat /data/output2/*
> 012{}
>
> Still getting incorrect results, is there anything else I could try?
>
> Josh Ferguson
>
> On Nov 25, 2008, at 6:34 PM, Ashish Thusoo wrote:
>
>> Can you try putting the ascii value within quotes, so for example
>> FIELDS TERMINATED BY '124' etc...
>>
>> You can also look at the following file in the source to see an
>> example of how this is done
>>
>> ql/src/test/queries/clientpositive/input_dynamicserde.q
>>
>> Ashish
>>
>> -----Original Message-----
>> From: Josh Ferguson [mailto:josh@besquared.net]
>> Sent: Tuesday, November 25, 2008 6:18 PM
>> To: hive-user@hadoop.apache.org
>> Subject: Trouble Loading Into External Table
>>
>> Ok so I'm trying to create an external table and load a delimited
>> file into it, then just do a basic select out of it, here is a
>> description of my scenario along with steps and results I took.
>> Hopefully someone can help me figure out what I'm doing wrong.
>>
>> # Sample.tab
>>
>> 1227422134|2|1|paid:44519,tax:2120,value:42399
>>
>> # CREATE TABLE
>>
>> hive> CREATE EXTERNAL TABLE activity_test
>>> (occurred_at INT, actor_id INT, actee_id INT, properties
>>> MAP<STRING, STRING>)
>>> ROW FORMAT DELIMITED
>>> FIELDS TERMINATED BY "|"
>>> COLLECTION ITEMS TERMINATED BY ","
>>> MAP KEYS TERMINATED BY ":"
>>> LOCATION '/data/sample';
>> OK
>>
>> # LOAD DATA
>>
>> hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO TABLE
>> activity_test;
>> Copying data from file:/Users/josh/Hive/sample.tab Loading data to
>> table activity_test OK
>>
>> # SELECT OVERWRITE DIRECTORY
>>
>> hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output'
>> SELECT activity_test.occurred_at, activity_test.actor_id,
>> activity_test.actee_id, activity_test.properties; Total MapReduce
>> jobs = 1 Starting Job = job_200811250653_0018, Tracking URL =
>> http://{clipped}:
>> 50030/jobdetails.jsp?jobid=job_200811250653_0018
>> Kill Command = /Users/josh/Hadoop/bin/hadoop job  -
>> Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0018
>>   map = 0%,  reduce =0%
>>   map = 50%,  reduce =0%
>>   map = 100%,  reduce =0%
>> Ended Job = job_200811250653_0018
>> Moving data to: /data/output
>> OK
>> Time taken: 72.329 seconds
>>
>> $ hadoop fs -cat /data/output/*
>> 012{}
>>
>> This obviously isn't the correct output, and are just some default
>> values for those columns, what am I doing wrong?
>>
>> Thanks
>>
>> Josh Ferguson
>


Re: Trouble Loading Into External Table

Posted by Josh Ferguson <jo...@besquared.net>.
Sorry missed that second part, I ran it

$ hadoop dfs -cat  /data/sample/*
1227422134|2|1|paid:44519,tax:2120,value:42399


On Nov 25, 2008, at 7:09 PM, Joydeep Sen Sarma wrote:

>
> Can you please send the output of 'describe extended  
> activity_test'. This will help us understand what's happening with  
> all the create table parameters.
>
> Also - as a sanity check - can you please check hadoop dfs -cat  / 
> data/sample/* (to make sure data got loaded/moved into that dir)
>
> -----Original Message-----
> From: Josh Ferguson [mailto:josh@besquared.net]
> Sent: Tuesday, November 25, 2008 7:03 PM
> To: hive-user@hadoop.apache.org
> Subject: Re: Trouble Loading Into External Table
>
> hive> CREATE EXTERNAL TABLE activity_test
>> (occurred_at INT, actor_id INT, actee_id INT, properties
> MAP<STRING, STRING>)
>> ROW FORMAT DELIMITED
>> FIELDS TERMINATED BY '124'
>> COLLECTION ITEMS TERMINATED BY '44'
>> MAP KEYS TERMINATED BY '58'
>> LINES TERMINATED BY '10'
>> STORED AS TEXTFILE
>> LOCATION '/data/sample';
> OK
>
> hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO TABLE
> activity_test;
> Copying data from file:/Users/josh/Hive/sample.tab
> Loading data to table activity_test
> OK
>
> $ hadoop fs -cat /data/sample/sample.tab
> 1227422134|2|1|paid:44519,tax:2120,value:42399
>
> hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output2'
> SELECT activity_test.occurred_at, activity_test.actor_id,
> activity_test.actee_id, activity_test.properties;
> Total MapReduce jobs = 1
> Starting Job = job_200811250653_0022, Tracking URL = http://{clipped}:
> 50030/jobdetails.jsp?jobid=job_200811250653_0022
> Kill Command = /Users/josh/Hadoop/bin/hadoop job  -
> Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0022
>   map = 0%,  reduce =0%
>   map = 50%,  reduce =0%
>   map = 100%,  reduce =0%
> Ended Job = job_200811250653_0022
> Moving data to: /data/output2
> OK
>
> $ hadoop fs -cat /data/output2/*
> 012{}
>
> Still getting incorrect results, is there anything else I could try?
>
> Josh Ferguson
>
> On Nov 25, 2008, at 6:34 PM, Ashish Thusoo wrote:
>
>> Can you try putting the ascii value within quotes, so for example
>> FIELDS TERMINATED BY '124' etc...
>>
>> You can also look at the following file in the source to see an
>> example of how this is done
>>
>> ql/src/test/queries/clientpositive/input_dynamicserde.q
>>
>> Ashish
>>
>> -----Original Message-----
>> From: Josh Ferguson [mailto:josh@besquared.net]
>> Sent: Tuesday, November 25, 2008 6:18 PM
>> To: hive-user@hadoop.apache.org
>> Subject: Trouble Loading Into External Table
>>
>> Ok so I'm trying to create an external table and load a delimited
>> file into it, then just do a basic select out of it, here is a
>> description of my scenario along with steps and results I took.
>> Hopefully someone can help me figure out what I'm doing wrong.
>>
>> # Sample.tab
>>
>> 1227422134|2|1|paid:44519,tax:2120,value:42399
>>
>> # CREATE TABLE
>>
>> hive> CREATE EXTERNAL TABLE activity_test
>>> (occurred_at INT, actor_id INT, actee_id INT, properties
>>> MAP<STRING, STRING>)
>>> ROW FORMAT DELIMITED
>>> FIELDS TERMINATED BY "|"
>>> COLLECTION ITEMS TERMINATED BY ","
>>> MAP KEYS TERMINATED BY ":"
>>> LOCATION '/data/sample';
>> OK
>>
>> # LOAD DATA
>>
>> hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO TABLE
>> activity_test;
>> Copying data from file:/Users/josh/Hive/sample.tab Loading data to
>> table activity_test OK
>>
>> # SELECT OVERWRITE DIRECTORY
>>
>> hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output'
>> SELECT activity_test.occurred_at, activity_test.actor_id,
>> activity_test.actee_id, activity_test.properties; Total MapReduce
>> jobs = 1 Starting Job = job_200811250653_0018, Tracking URL =
>> http://{clipped}:
>> 50030/jobdetails.jsp?jobid=job_200811250653_0018
>> Kill Command = /Users/josh/Hadoop/bin/hadoop job  -
>> Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0018
>>   map = 0%,  reduce =0%
>>   map = 50%,  reduce =0%
>>   map = 100%,  reduce =0%
>> Ended Job = job_200811250653_0018
>> Moving data to: /data/output
>> OK
>> Time taken: 72.329 seconds
>>
>> $ hadoop fs -cat /data/output/*
>> 012{}
>>
>> This obviously isn't the correct output, and are just some default
>> values for those columns, what am I doing wrong?
>>
>> Thanks
>>
>> Josh Ferguson
>


Re: Trouble Loading Into External Table

Posted by Josh Ferguson <jo...@besquared.net>.
Ok so what are the default separators for all of the delimited  
options available?

I already tested just modifying my data by putting ^A instead of "|"  
and it started working much better. I don't think dynamic serde is  
working as advertised because I've tried every combination of  
possible create table statements and I can't get a single one to work  
except the default one.

Thanks guys, I'm super excited about getting to use and hopefully  
contribute to hive.

Josh Ferguson

On Nov 25, 2008, at 7:09 PM, Joydeep Sen Sarma wrote:

>
> Can you please send the output of 'describe extended  
> activity_test'. This will help us understand what's happening with  
> all the create table parameters.
>
> Also - as a sanity check - can you please check hadoop dfs -cat  / 
> data/sample/* (to make sure data got loaded/moved into that dir)
>
> -----Original Message-----
> From: Josh Ferguson [mailto:josh@besquared.net]
> Sent: Tuesday, November 25, 2008 7:03 PM
> To: hive-user@hadoop.apache.org
> Subject: Re: Trouble Loading Into External Table
>
> hive> CREATE EXTERNAL TABLE activity_test
>> (occurred_at INT, actor_id INT, actee_id INT, properties
> MAP<STRING, STRING>)
>> ROW FORMAT DELIMITED
>> FIELDS TERMINATED BY '124'
>> COLLECTION ITEMS TERMINATED BY '44'
>> MAP KEYS TERMINATED BY '58'
>> LINES TERMINATED BY '10'
>> STORED AS TEXTFILE
>> LOCATION '/data/sample';
> OK
>
> hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO TABLE
> activity_test;
> Copying data from file:/Users/josh/Hive/sample.tab
> Loading data to table activity_test
> OK
>
> $ hadoop fs -cat /data/sample/sample.tab
> 1227422134|2|1|paid:44519,tax:2120,value:42399
>
> hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output2'
> SELECT activity_test.occurred_at, activity_test.actor_id,
> activity_test.actee_id, activity_test.properties;
> Total MapReduce jobs = 1
> Starting Job = job_200811250653_0022, Tracking URL = http://{clipped}:
> 50030/jobdetails.jsp?jobid=job_200811250653_0022
> Kill Command = /Users/josh/Hadoop/bin/hadoop job  -
> Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0022
>   map = 0%,  reduce =0%
>   map = 50%,  reduce =0%
>   map = 100%,  reduce =0%
> Ended Job = job_200811250653_0022
> Moving data to: /data/output2
> OK
>
> $ hadoop fs -cat /data/output2/*
> 012{}
>
> Still getting incorrect results, is there anything else I could try?
>
> Josh Ferguson
>
> On Nov 25, 2008, at 6:34 PM, Ashish Thusoo wrote:
>
>> Can you try putting the ascii value within quotes, so for example
>> FIELDS TERMINATED BY '124' etc...
>>
>> You can also look at the following file in the source to see an
>> example of how this is done
>>
>> ql/src/test/queries/clientpositive/input_dynamicserde.q
>>
>> Ashish
>>
>> -----Original Message-----
>> From: Josh Ferguson [mailto:josh@besquared.net]
>> Sent: Tuesday, November 25, 2008 6:18 PM
>> To: hive-user@hadoop.apache.org
>> Subject: Trouble Loading Into External Table
>>
>> Ok so I'm trying to create an external table and load a delimited
>> file into it, then just do a basic select out of it, here is a
>> description of my scenario along with steps and results I took.
>> Hopefully someone can help me figure out what I'm doing wrong.
>>
>> # Sample.tab
>>
>> 1227422134|2|1|paid:44519,tax:2120,value:42399
>>
>> # CREATE TABLE
>>
>> hive> CREATE EXTERNAL TABLE activity_test
>>> (occurred_at INT, actor_id INT, actee_id INT, properties
>>> MAP<STRING, STRING>)
>>> ROW FORMAT DELIMITED
>>> FIELDS TERMINATED BY "|"
>>> COLLECTION ITEMS TERMINATED BY ","
>>> MAP KEYS TERMINATED BY ":"
>>> LOCATION '/data/sample';
>> OK
>>
>> # LOAD DATA
>>
>> hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO TABLE
>> activity_test;
>> Copying data from file:/Users/josh/Hive/sample.tab Loading data to
>> table activity_test OK
>>
>> # SELECT OVERWRITE DIRECTORY
>>
>> hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output'
>> SELECT activity_test.occurred_at, activity_test.actor_id,
>> activity_test.actee_id, activity_test.properties; Total MapReduce
>> jobs = 1 Starting Job = job_200811250653_0018, Tracking URL =
>> http://{clipped}:
>> 50030/jobdetails.jsp?jobid=job_200811250653_0018
>> Kill Command = /Users/josh/Hadoop/bin/hadoop job  -
>> Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0018
>>   map = 0%,  reduce =0%
>>   map = 50%,  reduce =0%
>>   map = 100%,  reduce =0%
>> Ended Job = job_200811250653_0018
>> Moving data to: /data/output
>> OK
>> Time taken: 72.329 seconds
>>
>> $ hadoop fs -cat /data/output/*
>> 012{}
>>
>> This obviously isn't the correct output, and are just some default
>> values for those columns, what am I doing wrong?
>>
>> Thanks
>>
>> Josh Ferguson
>


RE: Trouble Loading Into External Table

Posted by Joydeep Sen Sarma <js...@facebook.com>.
Can you please send the output of 'describe extended activity_test'. This will help us understand what's happening with all the create table parameters.

Also - as a sanity check - can you please check hadoop dfs -cat  /data/sample/* (to make sure data got loaded/moved into that dir)

-----Original Message-----
From: Josh Ferguson [mailto:josh@besquared.net] 
Sent: Tuesday, November 25, 2008 7:03 PM
To: hive-user@hadoop.apache.org
Subject: Re: Trouble Loading Into External Table

hive> CREATE EXTERNAL TABLE activity_test
     > (occurred_at INT, actor_id INT, actee_id INT, properties  
MAP<STRING, STRING>)
     > ROW FORMAT DELIMITED
     > FIELDS TERMINATED BY '124'
     > COLLECTION ITEMS TERMINATED BY '44'
     > MAP KEYS TERMINATED BY '58'
     > LINES TERMINATED BY '10'
     > STORED AS TEXTFILE
     > LOCATION '/data/sample';
OK

hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO TABLE  
activity_test;
Copying data from file:/Users/josh/Hive/sample.tab
Loading data to table activity_test
OK

$ hadoop fs -cat /data/sample/sample.tab
1227422134|2|1|paid:44519,tax:2120,value:42399

hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output2'  
SELECT activity_test.occurred_at, activity_test.actor_id,  
activity_test.actee_id, activity_test.properties;
Total MapReduce jobs = 1
Starting Job = job_200811250653_0022, Tracking URL = http://{clipped}: 
50030/jobdetails.jsp?jobid=job_200811250653_0022
Kill Command = /Users/josh/Hadoop/bin/hadoop job  - 
Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0022
  map = 0%,  reduce =0%
  map = 50%,  reduce =0%
  map = 100%,  reduce =0%
Ended Job = job_200811250653_0022
Moving data to: /data/output2
OK

$ hadoop fs -cat /data/output2/*
012{}

Still getting incorrect results, is there anything else I could try?

Josh Ferguson

On Nov 25, 2008, at 6:34 PM, Ashish Thusoo wrote:

> Can you try putting the ascii value within quotes, so for example  
> FIELDS TERMINATED BY '124' etc...
>
> You can also look at the following file in the source to see an  
> example of how this is done
>
> ql/src/test/queries/clientpositive/input_dynamicserde.q
>
> Ashish
>
> -----Original Message-----
> From: Josh Ferguson [mailto:josh@besquared.net]
> Sent: Tuesday, November 25, 2008 6:18 PM
> To: hive-user@hadoop.apache.org
> Subject: Trouble Loading Into External Table
>
> Ok so I'm trying to create an external table and load a delimited  
> file into it, then just do a basic select out of it, here is a  
> description of my scenario along with steps and results I took.
> Hopefully someone can help me figure out what I'm doing wrong.
>
> # Sample.tab
>
> 1227422134|2|1|paid:44519,tax:2120,value:42399
>
> # CREATE TABLE
>
> hive> CREATE EXTERNAL TABLE activity_test
>> (occurred_at INT, actor_id INT, actee_id INT, properties  
>> MAP<STRING, STRING>)
>> ROW FORMAT DELIMITED
>> FIELDS TERMINATED BY "|"
>> COLLECTION ITEMS TERMINATED BY ","
>> MAP KEYS TERMINATED BY ":"
>> LOCATION '/data/sample';
> OK
>
> # LOAD DATA
>
> hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO TABLE
> activity_test;
> Copying data from file:/Users/josh/Hive/sample.tab Loading data to  
> table activity_test OK
>
> # SELECT OVERWRITE DIRECTORY
>
> hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output'
> SELECT activity_test.occurred_at, activity_test.actor_id,  
> activity_test.actee_id, activity_test.properties; Total MapReduce  
> jobs = 1 Starting Job = job_200811250653_0018, Tracking URL =  
> http://{clipped}:
> 50030/jobdetails.jsp?jobid=job_200811250653_0018
> Kill Command = /Users/josh/Hadoop/bin/hadoop job  -
> Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0018
>   map = 0%,  reduce =0%
>   map = 50%,  reduce =0%
>   map = 100%,  reduce =0%
> Ended Job = job_200811250653_0018
> Moving data to: /data/output
> OK
> Time taken: 72.329 seconds
>
> $ hadoop fs -cat /data/output/*
> 012{}
>
> This obviously isn't the correct output, and are just some default  
> values for those columns, what am I doing wrong?
>
> Thanks
>
> Josh Ferguson


Re: Trouble Loading Into External Table

Posted by Josh Ferguson <jo...@besquared.net>.
hive> CREATE EXTERNAL TABLE activity_test
     > (occurred_at INT, actor_id INT, actee_id INT, properties  
MAP<STRING, STRING>)
     > ROW FORMAT DELIMITED
     > FIELDS TERMINATED BY '124'
     > COLLECTION ITEMS TERMINATED BY '44'
     > MAP KEYS TERMINATED BY '58'
     > LINES TERMINATED BY '10'
     > STORED AS TEXTFILE
     > LOCATION '/data/sample';
OK

hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO TABLE  
activity_test;
Copying data from file:/Users/josh/Hive/sample.tab
Loading data to table activity_test
OK

$ hadoop fs -cat /data/sample/sample.tab
1227422134|2|1|paid:44519,tax:2120,value:42399

hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output2'  
SELECT activity_test.occurred_at, activity_test.actor_id,  
activity_test.actee_id, activity_test.properties;
Total MapReduce jobs = 1
Starting Job = job_200811250653_0022, Tracking URL = http://{clipped}: 
50030/jobdetails.jsp?jobid=job_200811250653_0022
Kill Command = /Users/josh/Hadoop/bin/hadoop job  - 
Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0022
  map = 0%,  reduce =0%
  map = 50%,  reduce =0%
  map = 100%,  reduce =0%
Ended Job = job_200811250653_0022
Moving data to: /data/output2
OK

$ hadoop fs -cat /data/output2/*
012{}

Still getting incorrect results, is there anything else I could try?

Josh Ferguson

On Nov 25, 2008, at 6:34 PM, Ashish Thusoo wrote:

> Can you try putting the ascii value within quotes, so for example  
> FIELDS TERMINATED BY '124' etc...
>
> You can also look at the following file in the source to see an  
> example of how this is done
>
> ql/src/test/queries/clientpositive/input_dynamicserde.q
>
> Ashish
>
> -----Original Message-----
> From: Josh Ferguson [mailto:josh@besquared.net]
> Sent: Tuesday, November 25, 2008 6:18 PM
> To: hive-user@hadoop.apache.org
> Subject: Trouble Loading Into External Table
>
> Ok so I'm trying to create an external table and load a delimited  
> file into it, then just do a basic select out of it, here is a  
> description of my scenario along with steps and results I took.
> Hopefully someone can help me figure out what I'm doing wrong.
>
> # Sample.tab
>
> 1227422134|2|1|paid:44519,tax:2120,value:42399
>
> # CREATE TABLE
>
> hive> CREATE EXTERNAL TABLE activity_test
>> (occurred_at INT, actor_id INT, actee_id INT, properties  
>> MAP<STRING, STRING>)
>> ROW FORMAT DELIMITED
>> FIELDS TERMINATED BY "|"
>> COLLECTION ITEMS TERMINATED BY ","
>> MAP KEYS TERMINATED BY ":"
>> LOCATION '/data/sample';
> OK
>
> # LOAD DATA
>
> hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO TABLE
> activity_test;
> Copying data from file:/Users/josh/Hive/sample.tab Loading data to  
> table activity_test OK
>
> # SELECT OVERWRITE DIRECTORY
>
> hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output'
> SELECT activity_test.occurred_at, activity_test.actor_id,  
> activity_test.actee_id, activity_test.properties; Total MapReduce  
> jobs = 1 Starting Job = job_200811250653_0018, Tracking URL =  
> http://{clipped}:
> 50030/jobdetails.jsp?jobid=job_200811250653_0018
> Kill Command = /Users/josh/Hadoop/bin/hadoop job  -
> Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0018
>   map = 0%,  reduce =0%
>   map = 50%,  reduce =0%
>   map = 100%,  reduce =0%
> Ended Job = job_200811250653_0018
> Moving data to: /data/output
> OK
> Time taken: 72.329 seconds
>
> $ hadoop fs -cat /data/output/*
> 012{}
>
> This obviously isn't the correct output, and are just some default  
> values for those columns, what am I doing wrong?
>
> Thanks
>
> Josh Ferguson


RE: Trouble Loading Into External Table

Posted by Ashish Thusoo <at...@facebook.com>.
Can you try putting the ascii value within quotes, so for example FIELDS TERMINATED BY '124' etc...

You can also look at the following file in the source to see an example of how this is done

ql/src/test/queries/clientpositive/input_dynamicserde.q

Ashish 

-----Original Message-----
From: Josh Ferguson [mailto:josh@besquared.net] 
Sent: Tuesday, November 25, 2008 6:18 PM
To: hive-user@hadoop.apache.org
Subject: Trouble Loading Into External Table

Ok so I'm trying to create an external table and load a delimited file into it, then just do a basic select out of it, here is a description of my scenario along with steps and results I took.  
Hopefully someone can help me figure out what I'm doing wrong.

# Sample.tab

1227422134|2|1|paid:44519,tax:2120,value:42399

# CREATE TABLE

hive> CREATE EXTERNAL TABLE activity_test
     > (occurred_at INT, actor_id INT, actee_id INT, properties MAP<STRING, STRING>)
     > ROW FORMAT DELIMITED
     > FIELDS TERMINATED BY "|"
     > COLLECTION ITEMS TERMINATED BY ","
     > MAP KEYS TERMINATED BY ":"
     > LOCATION '/data/sample';
OK

# LOAD DATA

hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO TABLE
activity_test;
Copying data from file:/Users/josh/Hive/sample.tab Loading data to table activity_test OK

# SELECT OVERWRITE DIRECTORY

hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output'  
SELECT activity_test.occurred_at, activity_test.actor_id, activity_test.actee_id, activity_test.properties; Total MapReduce jobs = 1 Starting Job = job_200811250653_0018, Tracking URL = http://{clipped}: 
50030/jobdetails.jsp?jobid=job_200811250653_0018
Kill Command = /Users/josh/Hadoop/bin/hadoop job  -
Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0018
  map = 0%,  reduce =0%
  map = 50%,  reduce =0%
  map = 100%,  reduce =0%
Ended Job = job_200811250653_0018
Moving data to: /data/output
OK
Time taken: 72.329 seconds

$ hadoop fs -cat /data/output/*
012{}

This obviously isn't the correct output, and are just some default values for those columns, what am I doing wrong?

Thanks

Josh Ferguson