You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Fuad Efendi <fu...@efendi.ca> on 2012/08/13 20:38:09 UTC

Near Real Time + Facets + Hierarchical Faceting (Pivot Table) with Date Range: huge data set

SOLR-4.0

I am trying to implement this; funny idea to share:

1. http://wiki.apache.org/solr/HierarchicalFaceting
unfortunately it does not support date ranges. However, workaround: use
"String" type instead of "*_tdt" and define fields such as
published_hour
published_day
published_week
Š

Of course you will need to stick with timezone; but you can add an index(es)
for each timezone. And most important, "string" facets are much faster than
"Date Trie" ranges.



2. Our index is overs 100 millions (from social networks) and rapidly grows
(millions a day); cache warm up takes few minutes; Near-Real-Time does not
work with faceting.

HoweverŠ another workaround: we can have Daily Core (optimized at midnight),
plus Current Core (only today's data, optimized), plus Last Hour Core (near
real time)

"Last Hour Data" is small enough and we can use Facets with Near Real Time
feature

Service layer will accumulate search results from three layers, it will be
near real time.



Any thoughts? Thanks,




-- 
Fuad Efendi
416-993-2060
Tokenizer Inc., Canada
http://www.tokenizer.ca
http://www.linkedin.com/in/lucene




Re: Near Real Time + Facets + Hierarchical Faceting (Pivot Table) with Date Range: huge data set

Posted by Fuad Efendi <fu...@efendi.ca>.
NRT does not work because index updates hundreds times per second vs.
"cache" warm-up time few minutesŠ and we are in a loopŠ

> allowing you to query
> your huge index in ms.

Solr also allows to query in ms. What is the difference? No one can sort
1,000,000 terms in descending "counts" order faster than current Solr
implementation, and FieldCache & UnInvertedCache can't be used together
with NRTŠ cache discarded few times per second!

- Fuad
http://www.tokenizer.ca




On 12-08-14 8:17 AM, "Nagendra Nagarajayya"
<nn...@transaxtions.com> wrote:

>You should try realtime NRT available with Apache Solr 4.0 with
>RankingAlgorithm 1.4.4, allows faceting in realtime.
>
>RankingAlgorithm 1.4.4 also provides an age feature that allows you to
>retrieve the most recent changed docs in realtime, allowing you to query
>your huge index in ms.
>
>You can get more information and also download from here:
>
>http://solr-ra.tgels.org
>
>Regards
>
>- Nagendra Nagarajayya
>http://solr-ra.tgels.org
>http://rankingalgorithm.tgels.org
>
>ps. Note: Apache Solr 4.0 with RankingAlgorithm 1.4.4 is an external
>implementation
>
>
>On 8/13/2012 11:38 AM, Fuad Efendi wrote:
>> SOLR-4.0
>>
>> I am trying to implement this; funny idea to share:
>>
>> 1. http://wiki.apache.org/solr/HierarchicalFaceting
>> unfortunately it does not support date ranges. However, workaround: use
>> "String" type instead of "*_tdt" and define fields such as
>> published_hour
>> published_day
>> published_week
>> S(
>>
>> Of course you will need to stick with timezone; but you can add an
>>index(es)
>> for each timezone. And most important, "string" facets are much faster
>>than
>> "Date Trie" ranges.
>>
>>
>>
>> 2. Our index is overs 100 millions (from social networks) and rapidly
>>grows
>> (millions a day); cache warm up takes few minutes; Near-Real-Time does
>>not
>> work with faceting.
>>
>> HoweverS( another workaround: we can have Daily Core (optimized at
>>midnight),
>> plus Current Core (only today's data, optimized), plus Last Hour Core
>>(near
>> real time)
>>
>> "Last Hour Data" is small enough and we can use Facets with Near Real
>>Time
>> feature
>>
>> Service layer will accumulate search results from three layers, it will
>>be
>> near real time.
>>
>>
>>
>> Any thoughts? Thanks,
>>
>>
>>
>>
>



Re: Near Real Time + Facets + Hierarchical Faceting (Pivot Table) with Date Range: huge data set

Posted by Nagendra Nagarajayya <nn...@transaxtions.com>.
You should try realtime NRT available with Apache Solr 4.0 with 
RankingAlgorithm 1.4.4, allows faceting in realtime.

RankingAlgorithm 1.4.4 also provides an age feature that allows you to 
retrieve the most recent changed docs in realtime, allowing you to query 
your huge index in ms.

You can get more information and also download from here:

http://solr-ra.tgels.org

Regards

- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org

ps. Note: Apache Solr 4.0 with RankingAlgorithm 1.4.4 is an external 
implementation


On 8/13/2012 11:38 AM, Fuad Efendi wrote:
> SOLR-4.0
>
> I am trying to implement this; funny idea to share:
>
> 1. http://wiki.apache.org/solr/HierarchicalFaceting
> unfortunately it does not support date ranges. However, workaround: use
> "String" type instead of "*_tdt" and define fields such as
> published_hour
> published_day
> published_week
> S(
>
> Of course you will need to stick with timezone; but you can add an index(es)
> for each timezone. And most important, "string" facets are much faster than
> "Date Trie" ranges.
>
>
>
> 2. Our index is overs 100 millions (from social networks) and rapidly grows
> (millions a day); cache warm up takes few minutes; Near-Real-Time does not
> work with faceting.
>
> HoweverS( another workaround: we can have Daily Core (optimized at midnight),
> plus Current Core (only today's data, optimized), plus Last Hour Core (near
> real time)
>
> "Last Hour Data" is small enough and we can use Facets with Near Real Time
> feature
>
> Service layer will accumulate search results from three layers, it will be
> near real time.
>
>
>
> Any thoughts? Thanks,
>
>
>
>


Re: Near Real Time + Facets + Hierarchical Faceting (Pivot Table) with Date Range: huge data set

Posted by Mark Miller <ma...@gmail.com>.
There is a per segment faceting option - but I think just for single value
fields right now?


On Mon, Aug 13, 2012 at 2:38 PM, Fuad Efendi <fu...@efendi.ca> wrote:

> SOLR-4.0
>
> I am trying to implement this; funny idea to share:
>
> 1. http://wiki.apache.org/solr/HierarchicalFaceting
> unfortunately it does not support date ranges. However, workaround: use
> "String" type instead of "*_tdt" and define fields such as
> published_hour
> published_day
> published_week
> Š
>
> Of course you will need to stick with timezone; but you can add an
> index(es)
> for each timezone. And most important, "string" facets are much faster than
> "Date Trie" ranges.
>
>
>
> 2. Our index is overs 100 millions (from social networks) and rapidly grows
> (millions a day); cache warm up takes few minutes; Near-Real-Time does not
> work with faceting.
>
> HoweverŠ another workaround: we can have Daily Core (optimized at
> midnight),
> plus Current Core (only today's data, optimized), plus Last Hour Core (near
> real time)
>
> "Last Hour Data" is small enough and we can use Facets with Near Real Time
> feature
>
> Service layer will accumulate search results from three layers, it will be
> near real time.
>
>
>
> Any thoughts? Thanks,
>
>
>
>
> --
> Fuad Efendi
> 416-993-2060
> Tokenizer Inc., Canada
> http://www.tokenizer.ca
> http://www.linkedin.com/in/lucene
>
>
>
>


-- 
- Mark

http://www.lucidimagination.com