You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by David Novogrodsky <da...@gmail.com> on 2014/11/12 20:52:25 UTC

ingesting unstructured data into Hadoop, problem creating tables using Hive

I am trying to ingest unstructured data into Hive so it can be queried.  I
am trying to follow the steps in Tutorial Exercise 3, I am having some
problems.  The created tables has no data in it.  Here is a sample of the
unstructured data&colon;

560)211-5250 437)810-5830 04:35 21 May 2014 17:26:39
356)539-2237 889)650-7326 30:29 26 Feb 2014 11:56:08



the data is tab-delimited.





Here are the steps I am following:

1. a. make destination folder
sudo -u hdfs hadoop fs -mkdir /user/cloudera/vector/callRecords

b. copy data into destination folder
sudo -u hdfs hadoop fs -copyFromLocal ~/Desktop/CDRecords.txt
/user/cloudera/vector/callRecords/



2. create Hive tables using the command line:

CREATE EXTERNAL TABLE intermediate_call_records (
callFrom STRING,
callTo STRING,
callDuration STRING,
date STRING,
timeOfCall STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)\n",
"output.format.string" = "%1$s %2$s %3$s %4$s %5$s"
)
LOCATION '/user/cloudera/vector/callRecords';


David Novogrodsky
david.novogrodsky@gmail.com
http://www.linkedin.com/in/davidnovogrodsky