You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by eks dev <ek...@yahoo.co.uk> on 2010/08/04 16:04:46 UTC
"Lucene + My Server" - > "Solr" Kick-start Questions
sorry for posting here, this didn't get through solr-user for some reason...
(didn't solr and lucene mailing lists merge?)
Hi Solr experts,
We are contemplating a switch from our infrastructure around Lucene to Solr
based solutions.
I would appreciate a kick-start pointers where to look at, in order not to
read
*everything* about solr (it is quite a lot) :) Apologies in advance if dumb,
RTFM-like questions.
We are dealing with mainly structured data, extremely short documents with a
couple of fields indexing huge CVS files (200Mio Documents and up).
Q1:
If one field in this CVS has YYYMMDD format, and I want to index it as a
DateField without wasting CPU cycles on “YYYYMMDD” -> “ISO 8601” (We do not
want
to use DIH).
- Should I introduce my own type MyDateField extends/wraps DateField
(TrieDateField), and override toInternal() /toExternal().. methods? Indexed
format of the field would remain the same. Now the question, can solr utilize
such MyDateField for faceting/searching and all goodies as if it were pure
DateField?
- Deal with it in UpdateRequestProcessor somehow?
Q2:
We intend to try 1-Master / N-Slaves configuration with solr and it looks like
it can upgrade slaves “out of the box” with index changes (great!). But my
problem is following, how I can distribute “my configuration” to slaves. One
example for “configuration”, during full update (complete re-index) we count
symbol statistics for all characters appearing in particular fields that are
used to seed a battery of static HuffmanDecoders/Encoders that compress our
stored fields. How I distribute such “configuration”… can the same process for
distributing changed Lucene segments be “enhanced”, or “copy-pasted” or
whatever
to look into some user specified “folder root” and replicate complete sub-tree
along with Index? Versioning of such things can be done in user code.
On receiving side I just need to be notified of “change happened in your
files”
in order to reload my “configuration bits”. I guess this is somehow already
possible; people want to distribute their apps, not only lucene index?
Q3:
Our app uses Lucene Index as a search index and as a database. In this app,
user
issues a Request that is nothing at all like Lucene search request. Our user
does not know how to write Queries. End user code sends only Key-value pairs
(Field Name, Value) to solr and we internally do the following:
1. Rewrite this “UserRequest” to *many* Lucene Queries
2. for each hit we fetch one stored field containing our original document
from
CSV in compressed form, so we decompress it.
3. We Clussify these Lucene responses to some “Hit Classes” (we add “Hit
type”
field in response)
4. We Cluster such Hits (“classID” field in response)
Where should I insert all this work into solr Request->Response Chain?
RequestHandler?
Thanks in advance,
eks
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org