You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@lucene.apache.org by Wanderer2019 <lo...@mail.ru> on 2011/11/24 12:39:16 UTC

Newbie question - use of SOLR

Hi,

Im totally new to SOLR and Lucene. So, for now i would really appreciate
some feedback from experienced people.
Im thinkin to use SOLR/Lucene engine for archiving documents. Documents
typically have similar layout with different values in specified positions.
I would like to keep all the docs full indexed. Once user performs search,
id like to return a list of documents that contain searched criterion.
Question - how can i specify area on document to search in?

Lets say i have 1000 full indexed documents, each one has similar data
placed in, with differences in only few areas (id like to search). For
instance, document may have some text in header, body and footer. Each of
this peaces may have some searched word, but, id like to find only those
docs, that contain searched word in body only?

Please give me some suggestions, what to start from, what feature of
SOLR/Lucene i should look into and learn.

Thanks!

--
View this message in context: http://lucene.472066.n3.nabble.com/Newbie-question-use-of-SOLR-tp3533416p3533416.html
Sent from the Lucene - General mailing list archive at Nabble.com.

Re: Newbie question - use of SOLR

Posted by Wanderer2019 <lo...@mail.ru>.
Good day,

Thanks for your response!

Well, initial idea is fairy simple. 
Let me try to elaborate a bit more here.
Lets say i have 3000 same structured PDF documents. Each one of those 3000
is same structured, but has diff data in specified areas. 
For instance:

----------------------------------
-Header: John Smith                   -
-Body: Dear .............................-
-.............................................-
-.............................................-
-.............................................-
-.............................................-
-.............................................-
-.............................................-
-.............................................-
-Footer: from corp to John Smith  -
----------------------------------

Where John Smith is some dynamic text. Each PDF document will be have own
name written in.

Idea - 
1. id like to keep all of those 3000 documents full indexed - this is more
or less clear
2. id like to be able query only those, where value in header equals to John
Smith, not care about any other part of my PDF, even though footer also
contains John Smith - how can i do it??



--
View this message in context: http://lucene.472066.n3.nabble.com/Newbie-question-use-of-SOLR-tp3533416p3564100.html
Sent from the Lucene - General mailing list archive at Nabble.com.

Re: Newbie question - use of SOLR

Posted by Chris Hostetter <ho...@fucit.org>.
The overall problem you seem to be describing is how to parse your 
files to extract structured data and then index that data in discreet 
fields of Solr documents.

How you should go about thta depends largely on the formats of these files 
and how well structured they are - for instance: it's a lot easier to 
extract documents and fields out of well structured XML files then it is 
out of plain text -- but if the plain text files are really uniform and 
every one hsa the exact same layout, then the problem gets easier.

I would suggest you send more details about hte types of files you are 
working and the types of fields you'd like to search on to the 
solr-user@lucene mailing list.  that list is more suitable for discussions 
baout how to "use" solr to achieve goals, this general@lucene list is 
primarily for broad discussions about the project as a whole, (and for 
people with no idea what lucene is to have a place to start with their 
questions)


Good Luck!

: Im totally new to SOLR and Lucene. So, for now i would really appreciate
: some feedback from experienced people.
: Im thinkin to use SOLR/Lucene engine for archiving documents. Documents
: typically have similar layout with different values in specified positions.
: I would like to keep all the docs full indexed. Once user performs search,
: id like to return a list of documents that contain searched criterion.
: Question - how can i specify area on document to search in?
: 
: Lets say i have 1000 full indexed documents, each one has similar data
: placed in, with differences in only few areas (id like to search). For
: instance, document may have some text in header, body and footer. Each of
: this peaces may have some searched word, but, id like to find only those
: docs, that contain searched word in body only?
: 
: Please give me some suggestions, what to start from, what feature of
: SOLR/Lucene i should look into and learn.


-Hoss