You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Tamara Mendt <ta...@gmail.com> on 2014/11/07 10:46:10 UTC

Using Flink to analyze GDELT

Hello!

I was wondering if anyone has tried to use Flink to perform analysis on the
GDELT (http://gdeltproject.org/). This database is a structured (csv)
repository of global events. It contains about 100 GB of data (aprox. 250M
events, 50 attributes for each event) and is updated with new events every
day.

I am a bit concerned that since this is a structured database that is not
too big Flink may not be the ideal tool to work with it. Any insight?

Thanks!

-- 
Tamara Mendt

Re: Using Flink to analyze GDELT

Posted by Kostas Tzoumas <kt...@apache.org>.
Hi Tamara!

I have not used GDELT, looks pretty cool!

You can certainly use Flink to analyze structured csv files, and people
have worked with larger, as well as with smaller datasets using Flink.

So, you can certainly give Flink a spin. Whether Flink is the ideal tool
also depends on what kind of analysis you want to run on this data. Posting
some more details about your jobs would be helpful.

Kostas


On Fri, Nov 7, 2014 at 10:46 AM, Tamara Mendt <ta...@gmail.com> wrote:

> Hello!
>
> I was wondering if anyone has tried to use Flink to perform analysis on
> the GDELT (http://gdeltproject.org/). This database is a structured (csv)
> repository of global events. It contains about 100 GB of data (aprox. 250M
> events, 50 attributes for each event) and is updated with new events every
> day.
>
> I am a bit concerned that since this is a structured database that is not
> too big Flink may not be the ideal tool to work with it. Any insight?
>
> Thanks!
>
> --
> Tamara Mendt
>