You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Mich Talebzadeh <mi...@gmail.com> on 2016/06/09 09:03:09 UTC

Using Hive table for twitter data

Hi,

I am just exploring this.

Has anyone done recent load of twitter data into Hive table.

I used few of them.

This one I tried

ADD JAR /home/hduser/jars/hive-serdes-1.0-SNAPSHOT.jar;
--SET hive.support.sql11.reserved.keywords=false;
use test;
drop table if exists tweets;
CREATE EXTERNAL TABLE tweets (
  id BIGINT,
  created_at STRING,
  source STRING,
  favorited BOOLEAN,
  retweeted_status STRUCT<
    text:STRING,
    user1:STRUCT<screen_name:STRING,name:STRING>,
    retweet_count:INT>,
  entities STRUCT<
    urls:ARRAY<STRUCT<expanded_url:STRING>>,
    user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
    hashtags:ARRAY<STRUCT<text:STRING>>>,
  text STRING,
  user1 STRUCT<
    screen_name:STRING,
    name:STRING,
    friends_count:INT,
    followers_count:INT,
    statuses_count:INT,
    verified:BOOLEAN,
    utc_offset:INT,
    time_zone:STRING>,
  in_reply_to_screen_name STRING
)
PARTITIONED BY (datehour INT)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
LOCATION '/twitter_data'
;

It creates OK but no data is there.

I use Flume to populate that external directory

hdfs dfs -ls /twitter_data
-rw-r--r--   2 hduser supergroup     433868 2016-06-09 09:52
/twitter_data/FlumeData.1465462333430
-rw-r--r--   2 hduser supergroup     438933 2016-06-09 09:53
/twitter_data/FlumeData.1465462365382
-rw-r--r--   2 hduser supergroup     559724 2016-06-09 09:53
/twitter_data/FlumeData.1465462403606
-rw-r--r--   2 hduser supergroup     455594 2016-06-09 09:54
/twitter_data/FlumeData.1465462435124

Thanks


Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com

Re: Using Hive table for twitter data

Posted by Gopal Vijayaraghavan <go...@apache.org>.
> Any reason why that table in Hive cannot read data in?

No idea how you're loading data with flume, but it isn't doing it right.

>> PARTITIONED BY (datehour INT)

...

>> -rw-r--r--   2 hduser supergroup     433868 2016-06-09 09:52
>>/twitter_data/FlumeData.1465462333430

No ideas on how to get that to create partitions either.

Cheers,
Gopal



Re: Using Hive table for twitter data

Posted by Mich Talebzadeh <mi...@gmail.com>.
thanks Gopal

that link

404 - OOPS!
Looks like you wandered too far from the herd!

LOL

Any reason why that table in Hive cannot read data in?

cheers

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 9 June 2016 at 10:09, Gopal Vijayaraghavan <go...@apache.org> wrote:

>
> > Has anyone done recent load of twitter data into Hive table.
>
> Not anytime recently, but the twitter corpus was heavily used to demo Hive.
>
> Here's the original post on auto-learning schemas from an arbitrary
> collection of JSON docs (like a MongoDB dump).
>
> http://hortonworks.com/blog/discovering-hive-schema-in-collections-of-json-
> documents/
>
>
> Cheers,
> Gopal
>
>
>

Re: Using Hive table for twitter data

Posted by Gopal Vijayaraghavan <go...@apache.org>.
> Has anyone done recent load of twitter data into Hive table.

Not anytime recently, but the twitter corpus was heavily used to demo Hive.

Here's the original post on auto-learning schemas from an arbitrary
collection of JSON docs (like a MongoDB dump).

http://hortonworks.com/blog/discovering-hive-schema-in-collections-of-json-
documents/


Cheers,
Gopal