You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by tom135 <t....@itspree.pl> on 2011/08/22 12:52:47 UTC

Count rows with tokens

Hello,

I want to use Solr as a search engine. I have indexed data like:
ID | TEXT | CREATION_DATE

Daily increase by 500 000 rows.

My problem:
*INPUT:* fixed set of tokens (max size 40), set of days
*RESULT:* How many rows (TEXT) contain fixed set of tokens and are created
in day1, day2, ..., day20

I tried to build aggregates like:
*1. Solution*
DATE (days) | TOKEN_1 | TOKEN_2 | ... | TOKEN_40

where for example:
TOKEN_3 - string like "ID_1,ID_2,...,ID_N", where ID_* contain the TOKEN_3

then I can split TOKEN_* to Set<Long> and size of Set<Long> is the number of
distinct rows.
*PROBLEM:* But here is the problem with sending to long strings that must be
splitted by the client side (to big response data).

*2. Solution*
DATE (days) | TOKENS | COUNT

where 
TOKENS contains combination of input tokens.
For 3 tokens I have 7 combinations
For 5 tokens I have 31 combinations
For 10 tokens I have 1023 combinations
For 20 tokens I have 1048575 combinations
etc.
*PROBLEM:* To many cases (combinations) with 40 tokens

Maybe the 1 Solution would be good if I could split the strings by some Solr
function (custom function) or...?

Thanks for any ideas





--
View this message in context: http://lucene.472066.n3.nabble.com/Count-rows-with-tokens-tp3274643p3274643.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Count rows with tokens

Posted by tom135 <t....@itspree.pl>.
Facet Indexing is good solution for me :)

Thanks for your help!



--
View this message in context: http://lucene.472066.n3.nabble.com/Count-rows-with-tokens-tp3274643p3338556.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Count rows with tokens

Posted by lee carroll <le...@googlemail.com>.
Hi This looks like a facteing problem.

See
http://wiki.apache.org/solr/SolrFacetingOverview

cheers lee c

On 22 August 2011 11:52, tom135 <t....@itspree.pl> wrote:
> Hello,
>
> I want to use Solr as a search engine. I have indexed data like:
> ID | TEXT | CREATION_DATE
>
> Daily increase by 500 000 rows.
>
> My problem:
> *INPUT:* fixed set of tokens (max size 40), set of days
> *RESULT:* How many rows (TEXT) contain fixed set of tokens and are created
> in day1, day2, ..., day20
>
> I tried to build aggregates like:
> *1. Solution*
> DATE (days) | TOKEN_1 | TOKEN_2 | ... | TOKEN_40
>
> where for example:
> TOKEN_3 - string like "ID_1,ID_2,...,ID_N", where ID_* contain the TOKEN_3
>
> then I can split TOKEN_* to Set<Long> and size of Set<Long> is the number of
> distinct rows.
> *PROBLEM:* But here is the problem with sending to long strings that must be
> splitted by the client side (to big response data).
>
> *2. Solution*
> DATE (days) | TOKENS | COUNT
>
> where
> TOKENS contains combination of input tokens.
> For 3 tokens I have 7 combinations
> For 5 tokens I have 31 combinations
> For 10 tokens I have 1023 combinations
> For 20 tokens I have 1048575 combinations
> etc.
> *PROBLEM:* To many cases (combinations) with 40 tokens
>
> Maybe the 1 Solution would be good if I could split the strings by some Solr
> function (custom function) or...?
>
> Thanks for any ideas
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Count-rows-with-tokens-tp3274643p3274643.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>