You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Ankul Garg (JIRA)" <ji...@apache.org> on 2009/09/14 01:03:57 UTC

[jira] Updated: (SOLR-1316) Create autosuggest component

     [ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankul Garg updated SOLR-1316:
-----------------------------

    Attachment: TernarySearchTree.tar.gz

Hi Jason,
My TST implementation is here. The zip contains 4 benchmarking results too : TST1.txt , TST2.txt etc.

The 4 datasets were as follows :
All words are real life words extracted from dbpedia dump.
1. The first dataset contains 1,00,000 tokens consisting of single words, phrases of two words and phrases of three words.
2. The second dataset contains 5,00,000 tokens consisting of single words, phrases of two words and phrases of three words.
3. The third dataset contains 10,00,000 tokens consisting of single words, phrases of two words and phrases of three words.
4. The fourth dataset contains 50,00,000 tokens consisting of single words, phrases of two words and phrases of three words.

These were the environment details while benchmarking :
Platfrom : Linux
java version "1.6.0_16"
Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)
RAM : 16GiB
Java HeapSize : default

Is there any other way to balance the tree? Also, what's your progress?

> Create autosuggest component
> ----------------------------
>
>                 Key: SOLR-1316
>                 URL: https://issues.apache.org/jira/browse/SOLR-1316
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: TernarySearchTree.tar.gz
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.