You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "mohan (JIRA)" <ji...@apache.org> on 2016/09/01 10:52:20 UTC

[jira] [Updated] (PIG-5018) Mohan.V

     [ https://issues.apache.org/jira/browse/PIG-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

mohan updated PIG-5018:
-----------------------
    Description: 
up vote
0
down vote
favorite
I am trying to write Hadoop Pig script which will take 2 files and filter based on string i.e

words.txt

google 
facebook 
twitter 
linkedin
tweets.json

{"created_time": "18:47:31 ", "text": "RT @Joey7Barton: ..give a facebook about whether the americans wins a Ryder cup. I mean surely he has slightly more important matters. #fami ...", "user_id": 450990391, "id": 252479809098223616, "created_date": "Sun Sep 30 2012"}
SCRIPT

twitter  = LOAD 'Twitter.json' USING JsonLoader('created_time:chararray, text:chararray, user_id:chararray, id:chararray, created_date:chararray');
    filtered = FILTER twitter BY (text MATCHES '.*facebook.*');
    extracted = FOREACH filtered GENERATE 'facebook' AS pattern,id, user_id, created_time, created_date, text;
    final = GROUP extracted BY pattern;
    dump final;
OUTPUT

(facebook,{(facebook,252545104890449921,291041644,23:06:59 ,Sun Sep 30 2012,RT @Joey7Barton: ..give a facebook about whether the americans wins a Ryder cup. I mean surely he has slightly more important matters. #fami ...)})
the output that im getting is, without loading the words.txt file i.e by filtering the tweet directly.

I need to get the output as

(facebook)(complete tweet of that facebook word contained)
i.e it should read the words.txt and as words are reading according to that it should get all the tweets from tweets.json file

Any help

Mohan.V

> Mohan.V
> -------
>
>                 Key: PIG-5018
>                 URL: https://issues.apache.org/jira/browse/PIG-5018
>             Project: Pig
>          Issue Type: Bug
>            Reporter: mohan
>
> up vote
> 0
> down vote
> favorite
> I am trying to write Hadoop Pig script which will take 2 files and filter based on string i.e
> words.txt
> google 
> facebook 
> twitter 
> linkedin
> tweets.json
> {"created_time": "18:47:31 ", "text": "RT @Joey7Barton: ..give a facebook about whether the americans wins a Ryder cup. I mean surely he has slightly more important matters. #fami ...", "user_id": 450990391, "id": 252479809098223616, "created_date": "Sun Sep 30 2012"}
> SCRIPT
> twitter  = LOAD 'Twitter.json' USING JsonLoader('created_time:chararray, text:chararray, user_id:chararray, id:chararray, created_date:chararray');
>     filtered = FILTER twitter BY (text MATCHES '.*facebook.*');
>     extracted = FOREACH filtered GENERATE 'facebook' AS pattern,id, user_id, created_time, created_date, text;
>     final = GROUP extracted BY pattern;
>     dump final;
> OUTPUT
> (facebook,{(facebook,252545104890449921,291041644,23:06:59 ,Sun Sep 30 2012,RT @Joey7Barton: ..give a facebook about whether the americans wins a Ryder cup. I mean surely he has slightly more important matters. #fami ...)})
> the output that im getting is, without loading the words.txt file i.e by filtering the tweet directly.
> I need to get the output as
> (facebook)(complete tweet of that facebook word contained)
> i.e it should read the words.txt and as words are reading according to that it should get all the tweets from tweets.json file
> Any help
> Mohan.V



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)