You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Mich Talebzadeh <mi...@gmail.com> on 2016/06/09 09:03:09 UTC
Using Hive table for twitter data
Hi,
I am just exploring this.
Has anyone done recent load of twitter data into Hive table.
I used few of them.
This one I tried
ADD JAR /home/hduser/jars/hive-serdes-1.0-SNAPSHOT.jar;
--SET hive.support.sql11.reserved.keywords=false;
use test;
drop table if exists tweets;
CREATE EXTERNAL TABLE tweets (
id BIGINT,
created_at STRING,
source STRING,
favorited BOOLEAN,
retweeted_status STRUCT<
text:STRING,
user1:STRUCT<screen_name:STRING,name:STRING>,
retweet_count:INT>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
text STRING,
user1 STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>,
in_reply_to_screen_name STRING
)
PARTITIONED BY (datehour INT)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
LOCATION '/twitter_data'
;
It creates OK but no data is there.
I use Flume to populate that external directory
hdfs dfs -ls /twitter_data
-rw-r--r-- 2 hduser supergroup 433868 2016-06-09 09:52
/twitter_data/FlumeData.1465462333430
-rw-r--r-- 2 hduser supergroup 438933 2016-06-09 09:53
/twitter_data/FlumeData.1465462365382
-rw-r--r-- 2 hduser supergroup 559724 2016-06-09 09:53
/twitter_data/FlumeData.1465462403606
-rw-r--r-- 2 hduser supergroup 455594 2016-06-09 09:54
/twitter_data/FlumeData.1465462435124
Thanks
Dr Mich Talebzadeh
LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
Re: Using Hive table for twitter data
Posted by Gopal Vijayaraghavan <go...@apache.org>.
> Any reason why that table in Hive cannot read data in?
No idea how you're loading data with flume, but it isn't doing it right.
>> PARTITIONED BY (datehour INT)
...
>> -rw-r--r-- 2 hduser supergroup 433868 2016-06-09 09:52
>>/twitter_data/FlumeData.1465462333430
No ideas on how to get that to create partitions either.
Cheers,
Gopal
Re: Using Hive table for twitter data
Posted by Mich Talebzadeh <mi...@gmail.com>.
thanks Gopal
that link
404 - OOPS!
Looks like you wandered too far from the herd!
LOL
Any reason why that table in Hive cannot read data in?
cheers
Dr Mich Talebzadeh
LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 9 June 2016 at 10:09, Gopal Vijayaraghavan <go...@apache.org> wrote:
>
> > Has anyone done recent load of twitter data into Hive table.
>
> Not anytime recently, but the twitter corpus was heavily used to demo Hive.
>
> Here's the original post on auto-learning schemas from an arbitrary
> collection of JSON docs (like a MongoDB dump).
>
> http://hortonworks.com/blog/discovering-hive-schema-in-collections-of-json-
> documents/
>
>
> Cheers,
> Gopal
>
>
>
Re: Using Hive table for twitter data
Posted by Gopal Vijayaraghavan <go...@apache.org>.
> Has anyone done recent load of twitter data into Hive table.
Not anytime recently, but the twitter corpus was heavily used to demo Hive.
Here's the original post on auto-learning schemas from an arbitrary
collection of JSON docs (like a MongoDB dump).
http://hortonworks.com/blog/discovering-hive-schema-in-collections-of-json-
documents/
Cheers,
Gopal