You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Gimantha Bandara <gi...@wso2.com> on 2015/03/06 10:43:55 UTC

Sampled Hit counts using Lucene Facets.

Hi,

I am trying to create some APIs using lucene facets APIs. First I will
explain my requirement with an example. Lets say I am keeping track of the
count of  people who enter through a certain door. Lets say the time range
I am interested in Last 6 hours( to get the total count, I know that I ll
have to use Ranged Facets). How do I sample this time range and get the
counts of each sample? In other words, as an example, If I split the last
6 hours into 5 minutes samples, I get 72 (6*60/5 ) different time ranges. I
would be interested in getting hit counts for each of these 72 ranges in an
array with the respective lower bound of each sample. Can someone point me
the direction I should follow/ the classes which can be helpful looking at?
ElasticSearch already has this feature exposed by their Javascript API.

Is it possible to implement the same with lucene?
Is there a Facets user guide for lucene 4.10.3 or lucene 5.0.0 ?

Thanks,

-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919

Re: Sampled Hit counts using Lucene Facets.

Posted by Gimantha Bandara <gi...@wso2.com>.
Hi Shai,

Yes.. Bucketing is the word :) .. IMO it would be better if bucketing is
moved to a utility class. I ll create a JIRA and provide a patch.

Thanks!

On Wed, Mar 11, 2015 at 4:33 PM, Shai Erera <se...@gmail.com> wrote:

> OK yes then sampling isn't the right word. So what you would want to have
> is API like "count faces in N buckets between a range of [min..max]
> values". That would create the ranges for you and then you would be able to
> use the RangeFacetCounts as usual.
>
> Would you like to open a JIRA issue and post a patch? I guess it can either
> be an additional constructor on LongRangeFacetCounts (and Double), or a
> separate utility class which given min/max values and numBuckets, creates
> the proper Range[]?
>
> Shai
>
> On Tue, Mar 10, 2015 at 4:07 PM, Gimantha Bandara <gi...@wso2.com>
> wrote:
>
> > Hi Shai,
> >
> > Yes, Splitting ranges into smaller ranges is not as same as sampling. I
> > have used the wrong word there. I think RandomSamplingFacetsCollector is
> > for "sampling" a larger dataset and that class cannot be used to
> implement
> > the described example above. I think I ll have to prepare the Ranges
> > manually and pass them to LongRangeFacetsCounts.
> >
> > On Tue, Mar 10, 2015 at 4:54 PM, Shai Erera <se...@gmail.com> wrote:
> >
> > > I am not sure that splitting the ranges into smaller ranges is the same
> > as
> > > sampling.
> > >
> > > Take a look RandomSamplingFacetsCollector - it implements sampling by
> > > sampling the document space, not the facet values space.
> > >
> > > So if for instance you use a LongRangeFacetCounts in conjunction with a
> > > RandomSamplingFacetsCollector, you would get the matching documents
> space
> > > sampled, and the counts you would get for each range could be
> considered
> > > "sampled" too. This is at least how we implemented facet sampling.
> > >
> > > Shai
> > >
> > > On Tue, Mar 10, 2015 at 10:21 AM, Gimantha Bandara <gi...@wso2.com>
> > > wrote:
> > >
> > > > What I am planning to do is, split the given time range into smaller
> > time
> > > > ranges  by myself and pass them to a LongRangeFacetsCount object and
> > get
> > > > the counts for each sub range. Is this the correct way?
> > > >
> > > > On Tue, Mar 10, 2015 at 12:01 AM, Gimantha Bandara <
> gimantha@wso2.com>
> > > > wrote:
> > > >
> > > > > Any updates on this please? Do I have to write my own code to
> sample
> > > and
> > > > > get the hitcount?
> > > > >
> > > > > On Sat, Mar 7, 2015 at 2:14 PM, Gimantha Bandara <
> gimantha@wso2.com>
> > > > > wrote:
> > > > >
> > > > >> Any help on this please?
> > > > >>
> > > > >> On Fri, Mar 6, 2015 at 3:13 PM, Gimantha Bandara <
> gimantha@wso2.com
> > >
> > > > >> wrote:
> > > > >>
> > > > >>> Hi,
> > > > >>>
> > > > >>> I am trying to create some APIs using lucene facets APIs. First I
> > > will
> > > > >>> explain my requirement with an example. Lets say I am keeping
> track
> > > of
> > > > the
> > > > >>> count of  people who enter through a certain door. Lets say the
> > time
> > > > range
> > > > >>> I am interested in Last 6 hours( to get the total count, I know
> > that
> > > I
> > > > ll
> > > > >>> have to use Ranged Facets). How do I sample this time range and
> get
> > > the
> > > > >>> counts of each sample? In other words, as an example, If I split
> > the
> > > > last
> > > > >>> 6 hours into 5 minutes samples, I get 72 (6*60/5 ) different time
> > > > ranges. I
> > > > >>> would be interested in getting hit counts for each of these 72
> > ranges
> > > > in an
> > > > >>> array with the respective lower bound of each sample. Can someone
> > > > point me
> > > > >>> the direction I should follow/ the classes which can be helpful
> > > > looking at?
> > > > >>> ElasticSearch already has this feature exposed by their
> Javascript
> > > API.
> > > > >>>
> > > > >>> Is it possible to implement the same with lucene?
> > > > >>> Is there a Facets user guide for lucene 4.10.3 or lucene 5.0.0 ?
> > > > >>>
> > > > >>> Thanks,
> > > > >>>
> > > > >>> --
> > > > >>> Gimantha Bandara
> > > > >>> Software Engineer
> > > > >>> WSO2. Inc : http://wso2.com
> > > > >>> Mobile : +94714961919
> > > > >>>
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Gimantha Bandara
> > > > >> Software Engineer
> > > > >> WSO2. Inc : http://wso2.com
> > > > >> Mobile : +94714961919
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Gimantha Bandara
> > > > > Software Engineer
> > > > > WSO2. Inc : http://wso2.com
> > > > > Mobile : +94714961919
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Gimantha Bandara
> > > > Software Engineer
> > > > WSO2. Inc : http://wso2.com
> > > > Mobile : +94714961919
> > > >
> > >
> >
> >
> >
> > --
> > Gimantha Bandara
> > Software Engineer
> > WSO2. Inc : http://wso2.com
> > Mobile : +94714961919
> >
>



-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919

Re: Sampled Hit counts using Lucene Facets.

Posted by Shai Erera <se...@gmail.com>.
OK yes then sampling isn't the right word. So what you would want to have
is API like "count faces in N buckets between a range of [min..max]
values". That would create the ranges for you and then you would be able to
use the RangeFacetCounts as usual.

Would you like to open a JIRA issue and post a patch? I guess it can either
be an additional constructor on LongRangeFacetCounts (and Double), or a
separate utility class which given min/max values and numBuckets, creates
the proper Range[]?

Shai

On Tue, Mar 10, 2015 at 4:07 PM, Gimantha Bandara <gi...@wso2.com> wrote:

> Hi Shai,
>
> Yes, Splitting ranges into smaller ranges is not as same as sampling. I
> have used the wrong word there. I think RandomSamplingFacetsCollector is
> for "sampling" a larger dataset and that class cannot be used to implement
> the described example above. I think I ll have to prepare the Ranges
> manually and pass them to LongRangeFacetsCounts.
>
> On Tue, Mar 10, 2015 at 4:54 PM, Shai Erera <se...@gmail.com> wrote:
>
> > I am not sure that splitting the ranges into smaller ranges is the same
> as
> > sampling.
> >
> > Take a look RandomSamplingFacetsCollector - it implements sampling by
> > sampling the document space, not the facet values space.
> >
> > So if for instance you use a LongRangeFacetCounts in conjunction with a
> > RandomSamplingFacetsCollector, you would get the matching documents space
> > sampled, and the counts you would get for each range could be considered
> > "sampled" too. This is at least how we implemented facet sampling.
> >
> > Shai
> >
> > On Tue, Mar 10, 2015 at 10:21 AM, Gimantha Bandara <gi...@wso2.com>
> > wrote:
> >
> > > What I am planning to do is, split the given time range into smaller
> time
> > > ranges  by myself and pass them to a LongRangeFacetsCount object and
> get
> > > the counts for each sub range. Is this the correct way?
> > >
> > > On Tue, Mar 10, 2015 at 12:01 AM, Gimantha Bandara <gi...@wso2.com>
> > > wrote:
> > >
> > > > Any updates on this please? Do I have to write my own code to sample
> > and
> > > > get the hitcount?
> > > >
> > > > On Sat, Mar 7, 2015 at 2:14 PM, Gimantha Bandara <gi...@wso2.com>
> > > > wrote:
> > > >
> > > >> Any help on this please?
> > > >>
> > > >> On Fri, Mar 6, 2015 at 3:13 PM, Gimantha Bandara <gimantha@wso2.com
> >
> > > >> wrote:
> > > >>
> > > >>> Hi,
> > > >>>
> > > >>> I am trying to create some APIs using lucene facets APIs. First I
> > will
> > > >>> explain my requirement with an example. Lets say I am keeping track
> > of
> > > the
> > > >>> count of  people who enter through a certain door. Lets say the
> time
> > > range
> > > >>> I am interested in Last 6 hours( to get the total count, I know
> that
> > I
> > > ll
> > > >>> have to use Ranged Facets). How do I sample this time range and get
> > the
> > > >>> counts of each sample? In other words, as an example, If I split
> the
> > > last
> > > >>> 6 hours into 5 minutes samples, I get 72 (6*60/5 ) different time
> > > ranges. I
> > > >>> would be interested in getting hit counts for each of these 72
> ranges
> > > in an
> > > >>> array with the respective lower bound of each sample. Can someone
> > > point me
> > > >>> the direction I should follow/ the classes which can be helpful
> > > looking at?
> > > >>> ElasticSearch already has this feature exposed by their Javascript
> > API.
> > > >>>
> > > >>> Is it possible to implement the same with lucene?
> > > >>> Is there a Facets user guide for lucene 4.10.3 or lucene 5.0.0 ?
> > > >>>
> > > >>> Thanks,
> > > >>>
> > > >>> --
> > > >>> Gimantha Bandara
> > > >>> Software Engineer
> > > >>> WSO2. Inc : http://wso2.com
> > > >>> Mobile : +94714961919
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Gimantha Bandara
> > > >> Software Engineer
> > > >> WSO2. Inc : http://wso2.com
> > > >> Mobile : +94714961919
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Gimantha Bandara
> > > > Software Engineer
> > > > WSO2. Inc : http://wso2.com
> > > > Mobile : +94714961919
> > > >
> > >
> > >
> > >
> > > --
> > > Gimantha Bandara
> > > Software Engineer
> > > WSO2. Inc : http://wso2.com
> > > Mobile : +94714961919
> > >
> >
>
>
>
> --
> Gimantha Bandara
> Software Engineer
> WSO2. Inc : http://wso2.com
> Mobile : +94714961919
>

Re: Sampled Hit counts using Lucene Facets.

Posted by Gimantha Bandara <gi...@wso2.com>.
Hi Shai,

Yes, Splitting ranges into smaller ranges is not as same as sampling. I
have used the wrong word there. I think RandomSamplingFacetsCollector is
for "sampling" a larger dataset and that class cannot be used to implement
the described example above. I think I ll have to prepare the Ranges
manually and pass them to LongRangeFacetsCounts.

On Tue, Mar 10, 2015 at 4:54 PM, Shai Erera <se...@gmail.com> wrote:

> I am not sure that splitting the ranges into smaller ranges is the same as
> sampling.
>
> Take a look RandomSamplingFacetsCollector - it implements sampling by
> sampling the document space, not the facet values space.
>
> So if for instance you use a LongRangeFacetCounts in conjunction with a
> RandomSamplingFacetsCollector, you would get the matching documents space
> sampled, and the counts you would get for each range could be considered
> "sampled" too. This is at least how we implemented facet sampling.
>
> Shai
>
> On Tue, Mar 10, 2015 at 10:21 AM, Gimantha Bandara <gi...@wso2.com>
> wrote:
>
> > What I am planning to do is, split the given time range into smaller time
> > ranges  by myself and pass them to a LongRangeFacetsCount object and get
> > the counts for each sub range. Is this the correct way?
> >
> > On Tue, Mar 10, 2015 at 12:01 AM, Gimantha Bandara <gi...@wso2.com>
> > wrote:
> >
> > > Any updates on this please? Do I have to write my own code to sample
> and
> > > get the hitcount?
> > >
> > > On Sat, Mar 7, 2015 at 2:14 PM, Gimantha Bandara <gi...@wso2.com>
> > > wrote:
> > >
> > >> Any help on this please?
> > >>
> > >> On Fri, Mar 6, 2015 at 3:13 PM, Gimantha Bandara <gi...@wso2.com>
> > >> wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> I am trying to create some APIs using lucene facets APIs. First I
> will
> > >>> explain my requirement with an example. Lets say I am keeping track
> of
> > the
> > >>> count of  people who enter through a certain door. Lets say the time
> > range
> > >>> I am interested in Last 6 hours( to get the total count, I know that
> I
> > ll
> > >>> have to use Ranged Facets). How do I sample this time range and get
> the
> > >>> counts of each sample? In other words, as an example, If I split the
> > last
> > >>> 6 hours into 5 minutes samples, I get 72 (6*60/5 ) different time
> > ranges. I
> > >>> would be interested in getting hit counts for each of these 72 ranges
> > in an
> > >>> array with the respective lower bound of each sample. Can someone
> > point me
> > >>> the direction I should follow/ the classes which can be helpful
> > looking at?
> > >>> ElasticSearch already has this feature exposed by their Javascript
> API.
> > >>>
> > >>> Is it possible to implement the same with lucene?
> > >>> Is there a Facets user guide for lucene 4.10.3 or lucene 5.0.0 ?
> > >>>
> > >>> Thanks,
> > >>>
> > >>> --
> > >>> Gimantha Bandara
> > >>> Software Engineer
> > >>> WSO2. Inc : http://wso2.com
> > >>> Mobile : +94714961919
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> Gimantha Bandara
> > >> Software Engineer
> > >> WSO2. Inc : http://wso2.com
> > >> Mobile : +94714961919
> > >>
> > >
> > >
> > >
> > > --
> > > Gimantha Bandara
> > > Software Engineer
> > > WSO2. Inc : http://wso2.com
> > > Mobile : +94714961919
> > >
> >
> >
> >
> > --
> > Gimantha Bandara
> > Software Engineer
> > WSO2. Inc : http://wso2.com
> > Mobile : +94714961919
> >
>



-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919

Re: Sampled Hit counts using Lucene Facets.

Posted by Shai Erera <se...@gmail.com>.
I am not sure that splitting the ranges into smaller ranges is the same as
sampling.

Take a look RandomSamplingFacetsCollector - it implements sampling by
sampling the document space, not the facet values space.

So if for instance you use a LongRangeFacetCounts in conjunction with a
RandomSamplingFacetsCollector, you would get the matching documents space
sampled, and the counts you would get for each range could be considered
"sampled" too. This is at least how we implemented facet sampling.

Shai

On Tue, Mar 10, 2015 at 10:21 AM, Gimantha Bandara <gi...@wso2.com>
wrote:

> What I am planning to do is, split the given time range into smaller time
> ranges  by myself and pass them to a LongRangeFacetsCount object and get
> the counts for each sub range. Is this the correct way?
>
> On Tue, Mar 10, 2015 at 12:01 AM, Gimantha Bandara <gi...@wso2.com>
> wrote:
>
> > Any updates on this please? Do I have to write my own code to sample and
> > get the hitcount?
> >
> > On Sat, Mar 7, 2015 at 2:14 PM, Gimantha Bandara <gi...@wso2.com>
> > wrote:
> >
> >> Any help on this please?
> >>
> >> On Fri, Mar 6, 2015 at 3:13 PM, Gimantha Bandara <gi...@wso2.com>
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> I am trying to create some APIs using lucene facets APIs. First I will
> >>> explain my requirement with an example. Lets say I am keeping track of
> the
> >>> count of  people who enter through a certain door. Lets say the time
> range
> >>> I am interested in Last 6 hours( to get the total count, I know that I
> ll
> >>> have to use Ranged Facets). How do I sample this time range and get the
> >>> counts of each sample? In other words, as an example, If I split the
> last
> >>> 6 hours into 5 minutes samples, I get 72 (6*60/5 ) different time
> ranges. I
> >>> would be interested in getting hit counts for each of these 72 ranges
> in an
> >>> array with the respective lower bound of each sample. Can someone
> point me
> >>> the direction I should follow/ the classes which can be helpful
> looking at?
> >>> ElasticSearch already has this feature exposed by their Javascript API.
> >>>
> >>> Is it possible to implement the same with lucene?
> >>> Is there a Facets user guide for lucene 4.10.3 or lucene 5.0.0 ?
> >>>
> >>> Thanks,
> >>>
> >>> --
> >>> Gimantha Bandara
> >>> Software Engineer
> >>> WSO2. Inc : http://wso2.com
> >>> Mobile : +94714961919
> >>>
> >>
> >>
> >>
> >> --
> >> Gimantha Bandara
> >> Software Engineer
> >> WSO2. Inc : http://wso2.com
> >> Mobile : +94714961919
> >>
> >
> >
> >
> > --
> > Gimantha Bandara
> > Software Engineer
> > WSO2. Inc : http://wso2.com
> > Mobile : +94714961919
> >
>
>
>
> --
> Gimantha Bandara
> Software Engineer
> WSO2. Inc : http://wso2.com
> Mobile : +94714961919
>

Re: Sampled Hit counts using Lucene Facets.

Posted by Gimantha Bandara <gi...@wso2.com>.
What I am planning to do is, split the given time range into smaller time
ranges  by myself and pass them to a LongRangeFacetsCount object and get
the counts for each sub range. Is this the correct way?

On Tue, Mar 10, 2015 at 12:01 AM, Gimantha Bandara <gi...@wso2.com>
wrote:

> Any updates on this please? Do I have to write my own code to sample and
> get the hitcount?
>
> On Sat, Mar 7, 2015 at 2:14 PM, Gimantha Bandara <gi...@wso2.com>
> wrote:
>
>> Any help on this please?
>>
>> On Fri, Mar 6, 2015 at 3:13 PM, Gimantha Bandara <gi...@wso2.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am trying to create some APIs using lucene facets APIs. First I will
>>> explain my requirement with an example. Lets say I am keeping track of the
>>> count of  people who enter through a certain door. Lets say the time range
>>> I am interested in Last 6 hours( to get the total count, I know that I ll
>>> have to use Ranged Facets). How do I sample this time range and get the
>>> counts of each sample? In other words, as an example, If I split the last
>>> 6 hours into 5 minutes samples, I get 72 (6*60/5 ) different time ranges. I
>>> would be interested in getting hit counts for each of these 72 ranges in an
>>> array with the respective lower bound of each sample. Can someone point me
>>> the direction I should follow/ the classes which can be helpful looking at?
>>> ElasticSearch already has this feature exposed by their Javascript API.
>>>
>>> Is it possible to implement the same with lucene?
>>> Is there a Facets user guide for lucene 4.10.3 or lucene 5.0.0 ?
>>>
>>> Thanks,
>>>
>>> --
>>> Gimantha Bandara
>>> Software Engineer
>>> WSO2. Inc : http://wso2.com
>>> Mobile : +94714961919
>>>
>>
>>
>>
>> --
>> Gimantha Bandara
>> Software Engineer
>> WSO2. Inc : http://wso2.com
>> Mobile : +94714961919
>>
>
>
>
> --
> Gimantha Bandara
> Software Engineer
> WSO2. Inc : http://wso2.com
> Mobile : +94714961919
>



-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919

Re: Sampled Hit counts using Lucene Facets.

Posted by Gimantha Bandara <gi...@wso2.com>.
Any updates on this please? Do I have to write my own code to sample and
get the hitcount?

On Sat, Mar 7, 2015 at 2:14 PM, Gimantha Bandara <gi...@wso2.com> wrote:

> Any help on this please?
>
> On Fri, Mar 6, 2015 at 3:13 PM, Gimantha Bandara <gi...@wso2.com>
> wrote:
>
>> Hi,
>>
>> I am trying to create some APIs using lucene facets APIs. First I will
>> explain my requirement with an example. Lets say I am keeping track of the
>> count of  people who enter through a certain door. Lets say the time range
>> I am interested in Last 6 hours( to get the total count, I know that I ll
>> have to use Ranged Facets). How do I sample this time range and get the
>> counts of each sample? In other words, as an example, If I split the last
>> 6 hours into 5 minutes samples, I get 72 (6*60/5 ) different time ranges. I
>> would be interested in getting hit counts for each of these 72 ranges in an
>> array with the respective lower bound of each sample. Can someone point me
>> the direction I should follow/ the classes which can be helpful looking at?
>> ElasticSearch already has this feature exposed by their Javascript API.
>>
>> Is it possible to implement the same with lucene?
>> Is there a Facets user guide for lucene 4.10.3 or lucene 5.0.0 ?
>>
>> Thanks,
>>
>> --
>> Gimantha Bandara
>> Software Engineer
>> WSO2. Inc : http://wso2.com
>> Mobile : +94714961919
>>
>
>
>
> --
> Gimantha Bandara
> Software Engineer
> WSO2. Inc : http://wso2.com
> Mobile : +94714961919
>



-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919

Re: Sampled Hit counts using Lucene Facets.

Posted by Gimantha Bandara <gi...@wso2.com>.
Any help on this please?

On Fri, Mar 6, 2015 at 3:13 PM, Gimantha Bandara <gi...@wso2.com> wrote:

> Hi,
>
> I am trying to create some APIs using lucene facets APIs. First I will
> explain my requirement with an example. Lets say I am keeping track of the
> count of  people who enter through a certain door. Lets say the time range
> I am interested in Last 6 hours( to get the total count, I know that I ll
> have to use Ranged Facets). How do I sample this time range and get the
> counts of each sample? In other words, as an example, If I split the last
> 6 hours into 5 minutes samples, I get 72 (6*60/5 ) different time ranges. I
> would be interested in getting hit counts for each of these 72 ranges in an
> array with the respective lower bound of each sample. Can someone point me
> the direction I should follow/ the classes which can be helpful looking at?
> ElasticSearch already has this feature exposed by their Javascript API.
>
> Is it possible to implement the same with lucene?
> Is there a Facets user guide for lucene 4.10.3 or lucene 5.0.0 ?
>
> Thanks,
>
> --
> Gimantha Bandara
> Software Engineer
> WSO2. Inc : http://wso2.com
> Mobile : +94714961919
>



-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919