You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by David Novogrodsky <da...@gmail.com> on 2014/11/12 20:52:25 UTC
ingesting unstructured data into Hadoop, problem creating tables
using Hive
I am trying to ingest unstructured data into Hive so it can be queried. I
am trying to follow the steps in Tutorial Exercise 3, I am having some
problems. The created tables has no data in it. Here is a sample of the
unstructured data:
560)211-5250 437)810-5830 04:35 21 May 2014 17:26:39
356)539-2237 889)650-7326 30:29 26 Feb 2014 11:56:08
the data is tab-delimited.
Here are the steps I am following:
1. a. make destination folder
sudo -u hdfs hadoop fs -mkdir /user/cloudera/vector/callRecords
b. copy data into destination folder
sudo -u hdfs hadoop fs -copyFromLocal ~/Desktop/CDRecords.txt
/user/cloudera/vector/callRecords/
2. create Hive tables using the command line:
CREATE EXTERNAL TABLE intermediate_call_records (
callFrom STRING,
callTo STRING,
callDuration STRING,
date STRING,
timeOfCall STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)\n",
"output.format.string" = "%1$s %2$s %3$s %4$s %5$s"
)
LOCATION '/user/cloudera/vector/callRecords';
David Novogrodsky
david.novogrodsky@gmail.com
http://www.linkedin.com/in/davidnovogrodsky