You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hama.apache.org by chris h <ch...@gmail.com> on 2012/08/28 19:34:57 UTC
Looking for help analyzing Hama for our use case
Hello there,
I'm architecting an analytics system and I'm having trouble building an
elegant data model in SQL or MongoDB. I THINK a distributed graph DB would
be more suited to the task, but I've never worked with one! :) I have a
quick description of my data and the issues I'm having below. I'd love to
get some input from someone who could tell me if this is a solid use case
for Hama, and/or point me in the right direction.
The system has users who click on links, and clients who get custom reports
on what happened.
Here's the data objects we're dealing with:
Clients: Up to a few thousand
- The clients of the service, they are the root of all the other data
Users: 0 - 10,000,000 per client
- The users whos actions are being tracked, they are owned by a client
Links: 0 - 10,000 per client
- The links that a user can click on, they are owned by a client
Clicks: 0 - 5 generated per client each day
- An instance where a user "clicks" on a link...
So far nothing is too crazy here, though the volume of records is pretty
high. The problem is with reporting the data. The app is VERY report
heavy, clients will run a ton of them and they will need them back in under
a few seconds (few = like 3). I don't foresee an issue running reports
like this:
- show total clicks on this link
- show unique clicks on this link
- show which users clicked on this link the most
- show which users did not click on this link
Those may take a map/reduce job but it should be very do-able in most
databases. The problem is when we add filters (time and demographics) to
the data like this:
- show unique clicks on this link between oct 23rd 1000 and oct 24th 0959
- show unique clicks on this link by females between the ages of 35 and
45
- show unique clicks on this link by females inside date range "A".
With basically a infinite number of time and demographic filters, a massive
amount of users and actions, and the necessity for results to be returned
back to the client in under a few seconds... is this something Hama is well
suited for?
Thanks for taking the time,
Chris.