You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by "shweta.agrawal" <sh...@orkash.com> on 2015/09/24 15:03:44 UTC

Time based aggregation problem on storing data in D4M schema

Hi all,

I have stored twitter graph data in the form of D4M schema.
As in D4M schema we have tweet id in rowid. But I want to  aggregate 
fields on the basis of time. If I apply timestamp filter for this query 
it will work slow the query, as data is large. And also if I want to 
check condition also before aggregation.

I have 10 years of tweets data and want to run second level aggregations 
on two months data.
Like I want to aggregate all location field of tweets having hashtag 
modi and tweets of 2 months.
I can create reverse index on time but cannot apply any additional 
conditions on it with the help of index like hashtag modi condition.
So can anyone tell me how to aggregate fields with some condition on the 
basis of time on D4M style data?

Thanks and Regards
Shweta