You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by eks dev <ek...@yahoo.co.uk> on 2010/08/04 16:04:46 UTC

"Lucene + My Server" - > "Solr" Kick-start Questions

sorry for posting here, this didn't get through solr-user for some reason...
(didn't solr and lucene mailing lists merge?)



 
 Hi Solr experts, 
 We are contemplating a switch from our infrastructure  around Lucene  to Solr 
 based solutions. 
 
 
 I would appreciate  a kick-start pointers where to look at, in order not to 
read 

 *everything*  about solr (it is quite a lot) :)  Apologies in advance if dumb, 
 RTFM-like questions.
 
 We are dealing with mainly structured data,  extremely short documents with a 
 couple of fields indexing huge CVS files  (200Mio Documents and up).
 
 Q1:  
 If one field in this CVS has  YYYMMDD format, and I want to index it as a 
 DateField without wasting CPU  cycles on “YYYYMMDD” -> “ISO 8601” (We do not 
want 

 to use DIH). 
 
 
 -         Should I introduce my own type MyDateField  extends/wraps DateField 
 (TrieDateField), and override toInternal()  /toExternal().. methods? Indexed 
 format of the field would remain the same.  Now the question, can solr utilize 
 such MyDateField  for  faceting/searching and all goodies as if it were pure 
 DateField? 
 
 -         Deal with it in UpdateRequestProcessor  somehow?
 
 Q2: 
 We intend to try 1-Master / N-Slaves configuration with  solr and it looks like 

 it can upgrade slaves “out of the box” with index  changes (great!). But my 
 problem is following, how I can distribute “my  configuration” to slaves. One 
 example for “configuration”, during full  update (complete re-index) we count 
 symbol statistics for all characters  appearing in particular fields that are 
 used to seed  a battery of  static HuffmanDecoders/Encoders that compress our 
 stored fields. How I  distribute such “configuration”… can the same process for 

 distributing  changed Lucene segments be “enhanced”, or “copy-pasted” or 
whatever 

 to look  into some user specified “folder root” and replicate complete sub-tree 

 along  with Index? Versioning of such things can be done in user code.
 On receiving  side I just need to be notified of “change happened in your 
files” 

 in order  to reload my “configuration bits”. I guess this is somehow already 
 possible;  people want to distribute their apps, not only lucene index?
 
 Q3: 
 Our  app uses Lucene Index as a search index and as a database. In this app, 
user 

 issues a Request that is nothing at all like Lucene search request. Our user 
 does not know how to write Queries. End user code sends only Key-value pairs 
 (Field Name, Value)  to solr and we internally do the following:
 1.  Rewrite this “UserRequest” to *many* Lucene Queries
 2. for each hit we fetch  one stored field containing our original document 
from 

 CSV in compressed  form, so we decompress it. 
 
 3. We Clussify  these Lucene responses  to some “Hit Classes”  (we add “Hit 
type” 

 field in response)
 4. We  Cluster such Hits (“classID” field in response)
 
 Where should I insert  all this work into solr Request->Response Chain? 
 RequestHandler?
 
 
 Thanks in advance, 
 eks


      

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org