You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tanguy Moal <ta...@gmail.com> on 2010/10/14 17:31:31 UTC

"Virtual field", Statistics

Dear solr-user folks,

I would like to use the stats module to perform very basic statistics
(mean, min and max) which is actually working just fine.

Nethertheless I found a little limitation that bothers me a tiny bit :
how to perform the exact same statistics, but on the result of a
function query rather than a field.

Example :
schema :
- string : id
- float : width
- float : height
- float : depth
- string : color
- float : price

What I'd like to do is something like :
select?price:[45.5 TO
99.99]&stats=on&stats.facet=color&stats.field={volume=product(product(width,
height), depth)}
I would expect to obtain :

<lst name="stats">
 <lst name="stats_fields">
  <lst name="(product(product(width,height),depth))">
   <double name="min">...</double>
   <double name="max">...</double>
   <double name="sum">...</double>
   <long name="count">...</long>
   <long name="missing">...</long>
   <double name="sumOfSquares">...</double>
   <double name="mean">...</double>
   <double name="stddev">...</double>
   <lst name="facets">
    <lst name="color">
     <lst name="white">
      <double name="min">...</double>
      <double name="max">...</double>
      <double name="sum">...</double>
      <long name="count">...</long>
      <long name="missing">...</long>
      <double name="sumOfSquares">...</double>
      <double name="mean">...</double>
      <double name="stddev">...</double>
    </lst>
    <lst name="red">
      <double name="min">...</double>
      <double name="max">...</double>
      <double name="sum">...</double>
      <long name="count">...</long>
      <long name="missing">...</long>
      <double name="sumOfSquares">...</double>
      <double name="mean">...</double>
      <double name="stddev">...</double>
    </lst>
    <!-- Other facets on other colors go here -->
   </lst><!-- end of statistical facets on volumes -->
  </lst><!-- end of stats on volumes -->
 </lst><!-- end of stats_fields node -->
</lst>

Of course computing the volume can be performed before indexing data,
but defining virtual fields on the fly given an arbitrary function is
powerful and I am comfortable with the idea that many others would
appreciate. Especially for BI needs and so on... :-D
Is there a way to do it easily that I would have not been able to
find, or is it actually impossible ?

Thank you very much in advance for your help.

--
Tanguy

Re: "Virtual field", Statistics

Posted by Erick Erickson <er...@gmail.com>.
The beauty/problem with open source is issues are picked up when
"somebody"  thinks they're important enough and has the time/energy
to work on it. And that person can be you <G>...

What usually happens is that someone submits a patch, various
people comment on it, look it over, ask for changes or provide
other feedback (e.g. "Have you considered XYZ", or "You
do realize that if we implement this patch, the universe
will end, don't you? <G>"). Then, after a bunch of back-and
forths one of the committers decides that it's ready to be included
in the trunk and/or the branches.

The chances of the particular changed you need being included in
trunk go up dramatically if you provide a patch. And
keep pushing (gently) on the issue.

One tip, though. Before investing a lot of time and energy in
creating a patch, figure out how you expect to change the code
and ask some questions (via commenting on the
JIRA issue) about what you're thinking about doing. You'll often
get some really valuable feedback before investing lots of time...

See: http://wiki.apache.org/solr/HowToContribute for the details
of getting the source, compiling, running unit tests, setting
up your IDE, etc.

Best
Erick


On Mon, Oct 18, 2010 at 6:59 AM, Tanguy Moal <ta...@gmail.com> wrote:

> Hello Lance, thank you for your reply.
>
> I created the following JIRA issue:
> https://issues.apache.org/jira/browse/SOLR-2171, as suggested.
>
> Can you tell me how new issues are handled by the development teams,
> and whether there's a way I could help/contribute ?
>
> --
> Tanguy
>
> 2010/10/16 Lance Norskog <go...@gmail.com>:
> > Please add a JIRA issue requesting this. A bunch of things are not
> > supported for functions: returning as a field value, for example.
> >
> > On Thu, Oct 14, 2010 at 8:31 AM, Tanguy Moal <ta...@gmail.com>
> wrote:
> >> Dear solr-user folks,
> >>
> >> I would like to use the stats module to perform very basic statistics
> >> (mean, min and max) which is actually working just fine.
> >>
> >> Nethertheless I found a little limitation that bothers me a tiny bit :
> >> how to perform the exact same statistics, but on the result of a
> >> function query rather than a field.
> >>
> >> Example :
> >> schema :
> >> - string : id
> >> - float : width
> >> - float : height
> >> - float : depth
> >> - string : color
> >> - float : price
> >>
> >> What I'd like to do is something like :
> >> select?price:[45.5 TO
> >>
> 99.99]&stats=on&stats.facet=color&stats.field={volume=product(product(width,
> >> height), depth)}
> >> I would expect to obtain :
> >>
> >> <lst name="stats">
> >>  <lst name="stats_fields">
> >>  <lst name="(product(product(width,height),depth))">
> >>   <double name="min">...</double>
> >>   <double name="max">...</double>
> >>   <double name="sum">...</double>
> >>   <long name="count">...</long>
> >>   <long name="missing">...</long>
> >>   <double name="sumOfSquares">...</double>
> >>   <double name="mean">...</double>
> >>   <double name="stddev">...</double>
> >>   <lst name="facets">
> >>    <lst name="color">
> >>     <lst name="white">
> >>      <double name="min">...</double>
> >>      <double name="max">...</double>
> >>      <double name="sum">...</double>
> >>      <long name="count">...</long>
> >>      <long name="missing">...</long>
> >>      <double name="sumOfSquares">...</double>
> >>      <double name="mean">...</double>
> >>      <double name="stddev">...</double>
> >>    </lst>
> >>    <lst name="red">
> >>      <double name="min">...</double>
> >>      <double name="max">...</double>
> >>      <double name="sum">...</double>
> >>      <long name="count">...</long>
> >>      <long name="missing">...</long>
> >>      <double name="sumOfSquares">...</double>
> >>      <double name="mean">...</double>
> >>      <double name="stddev">...</double>
> >>    </lst>
> >>    <!-- Other facets on other colors go here -->
> >>   </lst><!-- end of statistical facets on volumes -->
> >>  </lst><!-- end of stats on volumes -->
> >>  </lst><!-- end of stats_fields node -->
> >> </lst>
> >>
> >> Of course computing the volume can be performed before indexing data,
> >> but defining virtual fields on the fly given an arbitrary function is
> >> powerful and I am comfortable with the idea that many others would
> >> appreciate. Especially for BI needs and so on... :-D
> >> Is there a way to do it easily that I would have not been able to
> >> find, or is it actually impossible ?
> >>
> >> Thank you very much in advance for your help.
> >>
> >> --
> >> Tanguy
> >>
> >
> >
> >
> > --
> > Lance Norskog
> > goksron@gmail.com
> >
>

Re: "Virtual field", Statistics

Posted by Tanguy Moal <ta...@gmail.com>.
Hello Lance, thank you for your reply.

I created the following JIRA issue:
https://issues.apache.org/jira/browse/SOLR-2171, as suggested.

Can you tell me how new issues are handled by the development teams,
and whether there's a way I could help/contribute ?

--
Tanguy

2010/10/16 Lance Norskog <go...@gmail.com>:
> Please add a JIRA issue requesting this. A bunch of things are not
> supported for functions: returning as a field value, for example.
>
> On Thu, Oct 14, 2010 at 8:31 AM, Tanguy Moal <ta...@gmail.com> wrote:
>> Dear solr-user folks,
>>
>> I would like to use the stats module to perform very basic statistics
>> (mean, min and max) which is actually working just fine.
>>
>> Nethertheless I found a little limitation that bothers me a tiny bit :
>> how to perform the exact same statistics, but on the result of a
>> function query rather than a field.
>>
>> Example :
>> schema :
>> - string : id
>> - float : width
>> - float : height
>> - float : depth
>> - string : color
>> - float : price
>>
>> What I'd like to do is something like :
>> select?price:[45.5 TO
>> 99.99]&stats=on&stats.facet=color&stats.field={volume=product(product(width,
>> height), depth)}
>> I would expect to obtain :
>>
>> <lst name="stats">
>>  <lst name="stats_fields">
>>  <lst name="(product(product(width,height),depth))">
>>   <double name="min">...</double>
>>   <double name="max">...</double>
>>   <double name="sum">...</double>
>>   <long name="count">...</long>
>>   <long name="missing">...</long>
>>   <double name="sumOfSquares">...</double>
>>   <double name="mean">...</double>
>>   <double name="stddev">...</double>
>>   <lst name="facets">
>>    <lst name="color">
>>     <lst name="white">
>>      <double name="min">...</double>
>>      <double name="max">...</double>
>>      <double name="sum">...</double>
>>      <long name="count">...</long>
>>      <long name="missing">...</long>
>>      <double name="sumOfSquares">...</double>
>>      <double name="mean">...</double>
>>      <double name="stddev">...</double>
>>    </lst>
>>    <lst name="red">
>>      <double name="min">...</double>
>>      <double name="max">...</double>
>>      <double name="sum">...</double>
>>      <long name="count">...</long>
>>      <long name="missing">...</long>
>>      <double name="sumOfSquares">...</double>
>>      <double name="mean">...</double>
>>      <double name="stddev">...</double>
>>    </lst>
>>    <!-- Other facets on other colors go here -->
>>   </lst><!-- end of statistical facets on volumes -->
>>  </lst><!-- end of stats on volumes -->
>>  </lst><!-- end of stats_fields node -->
>> </lst>
>>
>> Of course computing the volume can be performed before indexing data,
>> but defining virtual fields on the fly given an arbitrary function is
>> powerful and I am comfortable with the idea that many others would
>> appreciate. Especially for BI needs and so on... :-D
>> Is there a way to do it easily that I would have not been able to
>> find, or is it actually impossible ?
>>
>> Thank you very much in advance for your help.
>>
>> --
>> Tanguy
>>
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>

Re: "Virtual field", Statistics

Posted by Lance Norskog <go...@gmail.com>.
Please add a JIRA issue requesting this. A bunch of things are not
supported for functions: returning as a field value, for example.

On Thu, Oct 14, 2010 at 8:31 AM, Tanguy Moal <ta...@gmail.com> wrote:
> Dear solr-user folks,
>
> I would like to use the stats module to perform very basic statistics
> (mean, min and max) which is actually working just fine.
>
> Nethertheless I found a little limitation that bothers me a tiny bit :
> how to perform the exact same statistics, but on the result of a
> function query rather than a field.
>
> Example :
> schema :
> - string : id
> - float : width
> - float : height
> - float : depth
> - string : color
> - float : price
>
> What I'd like to do is something like :
> select?price:[45.5 TO
> 99.99]&stats=on&stats.facet=color&stats.field={volume=product(product(width,
> height), depth)}
> I would expect to obtain :
>
> <lst name="stats">
>  <lst name="stats_fields">
>  <lst name="(product(product(width,height),depth))">
>   <double name="min">...</double>
>   <double name="max">...</double>
>   <double name="sum">...</double>
>   <long name="count">...</long>
>   <long name="missing">...</long>
>   <double name="sumOfSquares">...</double>
>   <double name="mean">...</double>
>   <double name="stddev">...</double>
>   <lst name="facets">
>    <lst name="color">
>     <lst name="white">
>      <double name="min">...</double>
>      <double name="max">...</double>
>      <double name="sum">...</double>
>      <long name="count">...</long>
>      <long name="missing">...</long>
>      <double name="sumOfSquares">...</double>
>      <double name="mean">...</double>
>      <double name="stddev">...</double>
>    </lst>
>    <lst name="red">
>      <double name="min">...</double>
>      <double name="max">...</double>
>      <double name="sum">...</double>
>      <long name="count">...</long>
>      <long name="missing">...</long>
>      <double name="sumOfSquares">...</double>
>      <double name="mean">...</double>
>      <double name="stddev">...</double>
>    </lst>
>    <!-- Other facets on other colors go here -->
>   </lst><!-- end of statistical facets on volumes -->
>  </lst><!-- end of stats on volumes -->
>  </lst><!-- end of stats_fields node -->
> </lst>
>
> Of course computing the volume can be performed before indexing data,
> but defining virtual fields on the fly given an arbitrary function is
> powerful and I am comfortable with the idea that many others would
> appreciate. Especially for BI needs and so on... :-D
> Is there a way to do it easily that I would have not been able to
> find, or is it actually impossible ?
>
> Thank you very much in advance for your help.
>
> --
> Tanguy
>



-- 
Lance Norskog
goksron@gmail.com