You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Peter Spam <ps...@mac.com> on 2010/06/28 22:46:01 UTC

Very basic questions: Indexing text

Hi everyone,

I'm looking for a way to index a bunch of (potentially large) text files.  I would love to see results like Google, so I went through a few tutorials, but I've still got questions:

1) I can get my docs in the index, but when I search, it returns the entire document.  I'd love to have it only return the line (or two) around the search term.

2) There are one or two fields at the beginning of the file that I would like to search on, so these should be indexed differently, right?

3) Is there a nice front-end example anywhere?  Something that would return results kind of like Google?

Thanks for your time - Solr / Lucene seem to be very powerful.


-Pete

Re: Very basic questions: Indexing text

Posted by Ahmet Arslan <io...@yahoo.com>.

> Could you give an example?
> E.g. lets say I have a field 'title' and a field 'fulltext'
> and my
> search term is 'solr'. What would be the right set of
> parameters to get
> back the whole title-field but only a sniplet of 50 words
> (or three
> sentences or whatever the unit) from the fulltext field.

&q=solr&hl=true&hl.fl=title,fulltext&f.title.hl.fragsize=0

should do it. You can increase the size of snippet (in terms of characters) generated from fulltext by increasing hl.fragsize value.

Re: Very basic questions: Indexing text

Posted by Michael Lackhoff <mi...@lackhoff.de>.

On 28.06.2010 23:00 Ahmet Arslan wrote:

>> 1) I can get my docs in the index, but when I search, it
>> returns the entire document.  I'd love to have it only
>> return the line (or two) around the search term.
> 
> Solr can generate Google-like snippets as you describe. 
> http://wiki.apache.org/solr/HighlightingParameters

I didn't know this is possible and am also interested in this feature
but even after reading the given Wiki page I cannot make out which is
the parameter to use. The only paramter that could be similar is
'hl.maxAlternateFieldLength' where it is possible to give a length to
return but according to the description that is for the case "no match".
And there is "hl.fragmentsBuilder" but with no explanation (the refered
page SolrFragmentsBuilder does not yet exist).

Could you give an example?
E.g. lets say I have a field 'title' and a field 'fulltext' and my
search term is 'solr'. What would be the right set of parameters to get
back the whole title-field but only a sniplet of 50 words (or three
sentences or whatever the unit) from the fulltext field.

Thanks
-Michael

Re: Very basic questions: Indexing text

Posted by Peter Spam <ps...@mac.com>.

Great, thanks for the pointers.


Thanks,
Peter

On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote:

>> 1) I can get my docs in the index, but when I search, it
>> returns the entire document.  I'd love to have it only
>> return the line (or two) around the search term.
> 
> Solr can generate Google-like snippets as you describe. 
> http://wiki.apache.org/solr/HighlightingParameters
> 
>> 2) There are one or two fields at the beginning of the file
>> that I would like to search on, so these should be indexed
>> differently, right?
> 
> Probably yes. 
> 
>> 3) Is there a nice front-end example anywhere? 
>> Something that would return results kind of like Google?
> 
> http://wiki.apache.org/solr/PublicServers
> http://search-lucene.com/
> 
> 
>

AW: Very basic questions: Faceted front-end?

Posted by Ma...@rzf.fin-nrw.de.

it is not that complicated to write an own GUI.
we are working on an integration to our intranet server...
 

> -----Ursprüngliche Nachricht-----
> Von: Peter Spam [mailto:pspam@mac.com] 
> Gesendet: Donnerstag, 1. Juli 2010 03:21
> An: solr-user@lucene.apache.org
> Betreff: Re: Very basic questions: Faceted front-end?
> 
> Ah, I found this:
> 
> 	https://issues.apache.org/jira/browse/SOLR-634
> 
> ... aka "solr-ui".  Is there anything else along these lines?  Thanks!
> 
> 
> -Peter
> 
> On Jun 30, 2010, at 3:59 PM, Peter Spam wrote:
> 
> > Wow, thanks Lance - it's really fast now!
> > 
> > The last piece of the puzzle is setting up a nice 
> front-end.  Are there any pre-built front-ends available, 
> that mimic Google (for example), with facets?
> > 
> > 
> > -Peter
> > 
> > On Jun 29, 2010, at 9:04 PM, Lance Norskog wrote:
> > 
> >> To highlight a field, Solr needs some extra Lucene values. If these
> >> are not configured for the field in the schema, Solr has 
> to re-analyze
> >> the field to highlight it. If you want faster 
> highlighting, you have
> >> to add term vectors to the schema. Here is the grand map of such
> >> things:
> >> 
> >> http://wiki.apache.org/solr/FieldOptionsByUseCase
> >> 
> >> On Tue, Jun 29, 2010 at 6:29 PM, Erick Erickson 
> <er...@gmail.com> wrote:
> >>> What are you actual highlighting requirements? you could try
> >>> things like maxAnalyzedChars, requireFieldMatch, etc....
> >>> 
> >>> http://wiki.apache.org/solr/HighlightingParameters
> >>> has a good list, but you've probably already seen that page....
> >>> 
> >>> Best
> >>> Erick
> >>> 
> >>> On Tue, Jun 29, 2010 at 9:11 PM, Peter Spam <ps...@mac.com> wrote:
> >>> 
> >>>> To follow up, I've found that my queries are very fast 
> (even with &fq=),
> >>>> until I add &hl=true.  What can I do to speed up 
> highlighting?  Should I
> >>>> consider injecting a line at a time, rather than the 
> entire file as a field?
> >>>> 
> >>>> 
> >>>> -Pete
> >>>> 
> >>>> On Jun 29, 2010, at 11:07 AM, Peter Spam wrote:
> >>>> 
> >>>>> Thanks for everyone's help - I have this working now, 
> but sometimes the
> >>>> queries are incredibly slow!!  For example, <int 
> name="QTime">461360</int>.
> >>>> Also, I had to bump up the min/max RAM size to 1GB/3.5GB 
> for things to
> >>>> inject without throwing heap memory errors.  However, my 
> data set is very
> >>>> small!  36 text files, for a total of 113MB.  (It will 
> grow to many TB, but
> >>>> for now, this is a test).  The largest file is 34MB.
> >>>>> 
> >>>>> Therefore, I'm sure I'm doing something wrong :-)  
> Here's my config:
> >>>>> 
> >>>>> 
> >>>> 
> --------------------------------------------------------------
> ---------------------------------
> >>>>> 
> >>>>> For the schema.xml, <types> is all default.  For 
> fields, here are the
> >>>> only lines that aren't commented out:
> >>>>> 
> >>>>>  <field name="id" type="string" indexed="true" stored="true"
> >>>> required="true" />
> >>>>>  <field name="body" type="text" indexed="true" stored="true"
> >>>> multiValued="true"/>
> >>>>>  <field name="timestamp" type="date" indexed="true" 
> stored="true"
> >>>> default="NOW" multiValued="false"/>
> >>>>>  <field name="build" type="string" indexed="true" stored="true"
> >>>> multiValued="false"/>
> >>>>>  <field name="device" type="string" indexed="true" stored="true"
> >>>> multiValued="false"/>
> >>>>>  <dynamicField name="*" type="ignored" multiValued="true" />
> >>>>> 
> >>>>> ... then, for the rest:
> >>>>> 
> >>>>> <uniqueKey>id</uniqueKey>
> >>>>> 
> >>>>> <!-- field for the QueryParser to use when an explicit 
> fieldname is
> >>>> absent -->
> >>>>> <defaultSearchField>body</defaultSearchField>
> >>>>> 
> >>>>> <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
> >>>>> <solrQueryParser defaultOperator="AND"/>
> >>>>> 
> >>>>> 
> >>>>> 
> >>>> 
> --------------------------------------------------------------
> ---------------------------------
> >>>>> 
> >>>>> 
> >>>>> Invoking:  java -Xmx3584M -Xms1024M -jar start.jar
> >>>>> 
> >>>>> 
> >>>>> 
> >>>> 
> --------------------------------------------------------------
> ---------------------------------
> >>>>> 
> >>>>> 
> >>>>> Injecting:
> >>>>> 
> >>>>> #!/bin/sh
> >>>>> 
> >>>>> J=0
> >>>>> for i in `find . -name \*.txt`; do
> >>>>>      (( J++ ))
> >>>>>      curl "
> >>>> 
> http://localhost:8983/solr/update/extract?literal.id=doc$J&fma
> p.content=body"
> >>>> -F "myfile=@$i";
> >>>>> done;
> >>>>> 
> >>>>> 
> >>>>> echo "------------- Committing"
> >>>>> curl "http://localhost:8983/solr/update/extract?commit=true"
> >>>>> 
> >>>>> 
> >>>>> 
> >>>> 
> --------------------------------------------------------------
> ---------------------------------
> >>>>> 
> >>>>> 
> >>>>> Searching:
> >>>>> 
> >>>>> 
> >>>> 
> http://localhost:8983/solr/select?q=testing&hl=true&fl=id,scor
> e&hl.snippets=5&hl.mergeContiguous=true
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> -Pete
> >>>>> 
> >>>>> On Jun 28, 2010, at 5:22 PM, Erick Erickson wrote:
> >>>>> 
> >>>>>> try adding &hl.fl=text
> >>>>>> to specify your highlight field. I don't understand 
> why you're only
> >>>>>> getting the ID field back though. Do note that the highlighting
> >>>>>> is after the docs, related by the ID.
> >>>>>> 
> >>>>>> Try a (non highlighting) query of just * to verify that you're
> >>>>>> pointing at the index you think you are. It's possible that
> >>>>>> you've modified a different index with SolrJ than your web
> >>>>>> server is pointing at.
> >>>>>> 
> >>>>>> Also, SOLR has no way of knowing you're modified your index
> >>>>>> with SolrJ, so it may not be automatically reopening an
> >>>>>> IndexReader so your recent changes may not be visible
> >>>>>> until you force the SOLR reader to reopen.
> >>>>>> 
> >>>>>> HTH
> >>>>>> Erick
> >>>>>> 
> >>>>>> On Mon, Jun 28, 2010 at 6:49 PM, Peter Spam 
> <ps...@mac.com> wrote:
> >>>>>> 
> >>>>>>> On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote:
> >>>>>>> 
> >>>>>>>>> 1) I can get my docs in the index, but when I search, it
> >>>>>>>>> returns the entire document.  I'd love to have it only
> >>>>>>>>> return the line (or two) around the search term.
> >>>>>>>> 
> >>>>>>>> Solr can generate Google-like snippets as you describe.
> >>>>>>>> http://wiki.apache.org/solr/HighlightingParameters
> >>>>>>> 
> >>>>>>> Here's how I commit my documents:
> >>>>>>> 
> >>>>>>> J=0;
> >>>>>>> for i in `find . -name \*.txt`; do
> >>>>>>>     (( J++ ))
> >>>>>>>     curl 
> "http://localhost:8983/solr/update/extract?literal.id=doc$J"
> >>>>>>> -F "myfile=@$i";
> >>>>>>> done;
> >>>>>>> 
> >>>>>>> echo "------------- Committing"
> >>>>>>> curl "http://localhost:8983/solr/update/extract?commit=true"
> >>>>>>> 
> >>>>>>> 
> >>>>>>> Then, I try to query using
> >>>>>>> 
> >>>> 
> http://localhost:8983/solr/select?rows=10&start=0&fl=*,score&h
> l=true&q=testing
> >>>>>>> but I only get back the document ID rather than the snippet:
> >>>>>>> 
> >>>>>>> <doc>
> >>>>>>> <float name="score">0.05030759</float>
> >>>>>>> <arr name="content_type">
> >>>>>>> <str>text/plain</str>
> >>>>>>> </arr>
> >>>>>>> <str name="id">doc16</str>
> >>>>>>> </doc>
> >>>>>>> 
> >>>>>>> I'm using the schema.xml from the "lucid imagination: 
> Indexing text and
> >>>>>>> html files" tutorial.
> >>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> -Pete
> >>>>>>> 
> >>>>> 
> >>>> 
> >>>> 
> >>> 
> >> 
> >> 
> >> 
> >> -- 
> >> Lance Norskog
> >> goksron@gmail.com
> > 
> 
>

Re: Very basic questions: Faceted front-end?

Posted by Erik Hatcher <er...@gmail.com>.

On Jul 1, 2010, at 10:33 AM, Mark Allan wrote:

> Very nice indeed!  That definitely needs to be shouted about in the  
> docs.

Why thanks!   And yeah, marketing isn't my strong point, but it is  
indeed a way cool feature of Solr that deserves more attention that I  
can give it.

> Any way to make it work with facet queries or can dismax requests  
> not do that? I tried adding a few &facet.query parameters but it  
> came back with nothing in the facet list.

You'll have to adjust the templates to pull facet queries out into the  
view.  I'll try to do that later today unless you beat me to it and  
provide a patch :)   It'll be pretty trivial to do so.

It also needs to support date range faceting too.

	Erik

Re: Very basic questions: Faceted front-end?

Posted by Mark Allan <ma...@ed.ac.uk>.

Very nice indeed!  That definitely needs to be shouted about in the  
docs.

Any way to make it work with facet queries or can dismax requests not  
do that? I tried adding a few &facet.query parameters but it came back  
with nothing in the facet list.

Mark

On 1 Jul 2010, at 12:36 pm, Erik Hatcher wrote:

> Solr trunk now has a built-in UI, and it is also something that  
> works with Solr 1.4 as well (with some effort).   Here's how to get  
> it working with Solr 1.4:
>
>   <http://www.lucidimagination.com/blog/2009/11/04/solritas-solr-1-4s-hidden-gem/ 
> >
>
> In Solr trunk, all you have to do is navigate to /solr/browse and  
> you get a "google-like" UI that does highlighting, faceting, spell- 
> checking, etc.
>
> There's a partial screenshot (of the debug feature) attached to this  
> issue:
>
>  <https://issues.apache.org/jira/browse/SOLR-1957>
>
> 	Erik
>
>
> On Jun 30, 2010, at 9:21 PM, Peter Spam wrote:
>
>> Ah, I found this:
>>
>> 	https://issues.apache.org/jira/browse/SOLR-634
>>
>> ... aka "solr-ui".  Is there anything else along these lines?   
>> Thanks!
>>
>>
>> -Peter
>>
>> On Jun 30, 2010, at 3:59 PM, Peter Spam wrote:
>>
>>> Wow, thanks Lance - it's really fast now!
>>>
>>> The last piece of the puzzle is setting up a nice front-end.  Are  
>>> there any pre-built front-ends available, that mimic Google (for  
>>> example), with facets?
>>>
>>>
>>> -Peter
>>>
>>> On Jun 29, 2010, at 9:04 PM, Lance Norskog wrote:
>>>
>>>> To highlight a field, Solr needs some extra Lucene values. If these
>>>> are not configured for the field in the schema, Solr has to re- 
>>>> analyze
>>>> the field to highlight it. If you want faster highlighting, you  
>>>> have
>>>> to add term vectors to the schema. Here is the grand map of such
>>>> things:
>>>>
>>>> http://wiki.apache.org/solr/FieldOptionsByUseCase
>>>>
>>>> On Tue, Jun 29, 2010 at 6:29 PM, Erick Erickson <erickerickson@gmail.com 
>>>> > wrote:
>>>>> What are you actual highlighting requirements? you could try
>>>>> things like maxAnalyzedChars, requireFieldMatch, etc....
>>>>>
>>>>> http://wiki.apache.org/solr/HighlightingParameters
>>>>> has a good list, but you've probably already seen that page....
>>>>>
>>>>> Best
>>>>> Erick
>>>>>
>>>>> On Tue, Jun 29, 2010 at 9:11 PM, Peter Spam <ps...@mac.com> wrote:
>>>>>
>>>>>> To follow up, I've found that my queries are very fast (even  
>>>>>> with &fq=),
>>>>>> until I add &hl=true.  What can I do to speed up highlighting?   
>>>>>> Should I
>>>>>> consider injecting a line at a time, rather than the entire  
>>>>>> file as a field?
>>>>>>
>>>>>>
>>>>>> -Pete
>>>>>>
>>>>>> On Jun 29, 2010, at 11:07 AM, Peter Spam wrote:
>>>>>>
>>>>>>> Thanks for everyone's help - I have this working now, but  
>>>>>>> sometimes the
>>>>>> queries are incredibly slow!!  For example, <int  
>>>>>> name="QTime">461360</int>.
>>>>>> Also, I had to bump up the min/max RAM size to 1GB/3.5GB for  
>>>>>> things to
>>>>>> inject without throwing heap memory errors.  However, my data  
>>>>>> set is very
>>>>>> small!  36 text files, for a total of 113MB.  (It will grow to  
>>>>>> many TB, but
>>>>>> for now, this is a test).  The largest file is 34MB.
>>>>>>>
>>>>>>> Therefore, I'm sure I'm doing something wrong :-)  Here's my  
>>>>>>> config:
>>>>>>>
>>>>>>>
>>>>>> -----------------------------------------------------------------------------------------------
>>>>>>>
>>>>>>> For the schema.xml, <types> is all default.  For fields, here  
>>>>>>> are the
>>>>>> only lines that aren't commented out:
>>>>>>>
>>>>>>> <field name="id" type="string" indexed="true" stored="true"
>>>>>> required="true" />
>>>>>>> <field name="body" type="text" indexed="true" stored="true"
>>>>>> multiValued="true"/>
>>>>>>> <field name="timestamp" type="date" indexed="true" stored="true"
>>>>>> default="NOW" multiValued="false"/>
>>>>>>> <field name="build" type="string" indexed="true" stored="true"
>>>>>> multiValued="false"/>
>>>>>>> <field name="device" type="string" indexed="true" stored="true"
>>>>>> multiValued="false"/>
>>>>>>> <dynamicField name="*" type="ignored" multiValued="true" />
>>>>>>>
>>>>>>> ... then, for the rest:
>>>>>>>
>>>>>>> <uniqueKey>id</uniqueKey>
>>>>>>>
>>>>>>> <!-- field for the QueryParser to use when an explicit  
>>>>>>> fieldname is
>>>>>> absent -->
>>>>>>> <defaultSearchField>body</defaultSearchField>
>>>>>>>
>>>>>>> <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
>>>>>>> <solrQueryParser defaultOperator="AND"/>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> -----------------------------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> Invoking:  java -Xmx3584M -Xms1024M -jar start.jar
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> -----------------------------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> Injecting:
>>>>>>>
>>>>>>> #!/bin/sh
>>>>>>>
>>>>>>> J=0
>>>>>>> for i in `find . -name \*.txt`; do
>>>>>>>    (( J++ ))
>>>>>>>    curl "
>>>>>> http://localhost:8983/solr/update/extract?literal.id=doc$J&fmap.content=body 
>>>>>> "
>>>>>> -F "myfile=@$i";
>>>>>>> done;
>>>>>>>
>>>>>>>
>>>>>>> echo "------------- Committing"
>>>>>>> curl "http://localhost:8983/solr/update/extract?commit=true"
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> -----------------------------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> Searching:
>>>>>>>
>>>>>>>
>>>>>> http://localhost:8983/solr/select?q=testing&hl=true&fl=id,score&hl.snippets=5&hl.mergeContiguous=true
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -Pete
>>>>>>>
>>>>>>> On Jun 28, 2010, at 5:22 PM, Erick Erickson wrote:
>>>>>>>
>>>>>>>> try adding &hl.fl=text
>>>>>>>> to specify your highlight field. I don't understand why  
>>>>>>>> you're only
>>>>>>>> getting the ID field back though. Do note that the highlighting
>>>>>>>> is after the docs, related by the ID.
>>>>>>>>
>>>>>>>> Try a (non highlighting) query of just * to verify that you're
>>>>>>>> pointing at the index you think you are. It's possible that
>>>>>>>> you've modified a different index with SolrJ than your web
>>>>>>>> server is pointing at.
>>>>>>>>
>>>>>>>> Also, SOLR has no way of knowing you're modified your index
>>>>>>>> with SolrJ, so it may not be automatically reopening an
>>>>>>>> IndexReader so your recent changes may not be visible
>>>>>>>> until you force the SOLR reader to reopen.
>>>>>>>>
>>>>>>>> HTH
>>>>>>>> Erick
>>>>>>>>
>>>>>>>> On Mon, Jun 28, 2010 at 6:49 PM, Peter Spam <ps...@mac.com>  
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote:
>>>>>>>>>
>>>>>>>>>>> 1) I can get my docs in the index, but when I search, it
>>>>>>>>>>> returns the entire document.  I'd love to have it only
>>>>>>>>>>> return the line (or two) around the search term.
>>>>>>>>>>
>>>>>>>>>> Solr can generate Google-like snippets as you describe.
>>>>>>>>>> http://wiki.apache.org/solr/HighlightingParameters
>>>>>>>>>
>>>>>>>>> Here's how I commit my documents:
>>>>>>>>>
>>>>>>>>> J=0;
>>>>>>>>> for i in `find . -name \*.txt`; do
>>>>>>>>>   (( J++ ))
>>>>>>>>>   curl "http://localhost:8983/solr/update/extract?literal.id=doc$J 
>>>>>>>>> "
>>>>>>>>> -F "myfile=@$i";
>>>>>>>>> done;
>>>>>>>>>
>>>>>>>>> echo "------------- Committing"
>>>>>>>>> curl "http://localhost:8983/solr/update/extract?commit=true"
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Then, I try to query using
>>>>>>>>>
>>>>>> http://localhost:8983/solr/select?rows=10&start=0&fl=*,score&hl=true&q=testing
>>>>>>>>> but I only get back the document ID rather than the snippet:
>>>>>>>>>
>>>>>>>>> <doc>
>>>>>>>>> <float name="score">0.05030759</float>
>>>>>>>>> <arr name="content_type">
>>>>>>>>> <str>text/plain</str>
>>>>>>>>> </arr>
>>>>>>>>> <str name="id">doc16</str>
>>>>>>>>> </doc>
>>>>>>>>>
>>>>>>>>> I'm using the schema.xml from the "lucid imagination:  
>>>>>>>>> Indexing text and
>>>>>>>>> html files" tutorial.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Pete
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>> Lance Norskog
>>>> goksron@gmail.com
>>>
>>
>
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Re: Very basic questions: Faceted front-end?

Posted by Erik Hatcher <er...@gmail.com>.

Solr trunk now has a built-in UI, and it is also something that works  
with Solr 1.4 as well (with some effort).   Here's how to get it  
working with Solr 1.4:

    <http://www.lucidimagination.com/blog/2009/11/04/solritas-solr-1-4s-hidden-gem/ 
 >

In Solr trunk, all you have to do is navigate to /solr/browse and you  
get a "google-like" UI that does highlighting, faceting, spell- 
checking, etc.

There's a partial screenshot (of the debug feature) attached to this  
issue:

   <https://issues.apache.org/jira/browse/SOLR-1957>

	Erik


On Jun 30, 2010, at 9:21 PM, Peter Spam wrote:

> Ah, I found this:
>
> 	https://issues.apache.org/jira/browse/SOLR-634
>
> ... aka "solr-ui".  Is there anything else along these lines?  Thanks!
>
>
> -Peter
>
> On Jun 30, 2010, at 3:59 PM, Peter Spam wrote:
>
>> Wow, thanks Lance - it's really fast now!
>>
>> The last piece of the puzzle is setting up a nice front-end.  Are  
>> there any pre-built front-ends available, that mimic Google (for  
>> example), with facets?
>>
>>
>> -Peter
>>
>> On Jun 29, 2010, at 9:04 PM, Lance Norskog wrote:
>>
>>> To highlight a field, Solr needs some extra Lucene values. If these
>>> are not configured for the field in the schema, Solr has to re- 
>>> analyze
>>> the field to highlight it. If you want faster highlighting, you have
>>> to add term vectors to the schema. Here is the grand map of such
>>> things:
>>>
>>> http://wiki.apache.org/solr/FieldOptionsByUseCase
>>>
>>> On Tue, Jun 29, 2010 at 6:29 PM, Erick Erickson <erickerickson@gmail.com 
>>> > wrote:
>>>> What are you actual highlighting requirements? you could try
>>>> things like maxAnalyzedChars, requireFieldMatch, etc....
>>>>
>>>> http://wiki.apache.org/solr/HighlightingParameters
>>>> has a good list, but you've probably already seen that page....
>>>>
>>>> Best
>>>> Erick
>>>>
>>>> On Tue, Jun 29, 2010 at 9:11 PM, Peter Spam <ps...@mac.com> wrote:
>>>>
>>>>> To follow up, I've found that my queries are very fast (even  
>>>>> with &fq=),
>>>>> until I add &hl=true.  What can I do to speed up highlighting?   
>>>>> Should I
>>>>> consider injecting a line at a time, rather than the entire file  
>>>>> as a field?
>>>>>
>>>>>
>>>>> -Pete
>>>>>
>>>>> On Jun 29, 2010, at 11:07 AM, Peter Spam wrote:
>>>>>
>>>>>> Thanks for everyone's help - I have this working now, but  
>>>>>> sometimes the
>>>>> queries are incredibly slow!!  For example, <int  
>>>>> name="QTime">461360</int>.
>>>>> Also, I had to bump up the min/max RAM size to 1GB/3.5GB for  
>>>>> things to
>>>>> inject without throwing heap memory errors.  However, my data  
>>>>> set is very
>>>>> small!  36 text files, for a total of 113MB.  (It will grow to  
>>>>> many TB, but
>>>>> for now, this is a test).  The largest file is 34MB.
>>>>>>
>>>>>> Therefore, I'm sure I'm doing something wrong :-)  Here's my  
>>>>>> config:
>>>>>>
>>>>>>
>>>>> -----------------------------------------------------------------------------------------------
>>>>>>
>>>>>> For the schema.xml, <types> is all default.  For fields, here  
>>>>>> are the
>>>>> only lines that aren't commented out:
>>>>>>
>>>>>> <field name="id" type="string" indexed="true" stored="true"
>>>>> required="true" />
>>>>>> <field name="body" type="text" indexed="true" stored="true"
>>>>> multiValued="true"/>
>>>>>> <field name="timestamp" type="date" indexed="true" stored="true"
>>>>> default="NOW" multiValued="false"/>
>>>>>> <field name="build" type="string" indexed="true" stored="true"
>>>>> multiValued="false"/>
>>>>>> <field name="device" type="string" indexed="true" stored="true"
>>>>> multiValued="false"/>
>>>>>> <dynamicField name="*" type="ignored" multiValued="true" />
>>>>>>
>>>>>> ... then, for the rest:
>>>>>>
>>>>>> <uniqueKey>id</uniqueKey>
>>>>>>
>>>>>> <!-- field for the QueryParser to use when an explicit  
>>>>>> fieldname is
>>>>> absent -->
>>>>>> <defaultSearchField>body</defaultSearchField>
>>>>>>
>>>>>> <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
>>>>>> <solrQueryParser defaultOperator="AND"/>
>>>>>>
>>>>>>
>>>>>>
>>>>> -----------------------------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>> Invoking:  java -Xmx3584M -Xms1024M -jar start.jar
>>>>>>
>>>>>>
>>>>>>
>>>>> -----------------------------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>> Injecting:
>>>>>>
>>>>>> #!/bin/sh
>>>>>>
>>>>>> J=0
>>>>>> for i in `find . -name \*.txt`; do
>>>>>>     (( J++ ))
>>>>>>     curl "
>>>>> http://localhost:8983/solr/update/extract?literal.id=doc$J&fmap.content=body 
>>>>> "
>>>>> -F "myfile=@$i";
>>>>>> done;
>>>>>>
>>>>>>
>>>>>> echo "------------- Committing"
>>>>>> curl "http://localhost:8983/solr/update/extract?commit=true"
>>>>>>
>>>>>>
>>>>>>
>>>>> -----------------------------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>> Searching:
>>>>>>
>>>>>>
>>>>> http://localhost:8983/solr/select?q=testing&hl=true&fl=id,score&hl.snippets=5&hl.mergeContiguous=true
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> -Pete
>>>>>>
>>>>>> On Jun 28, 2010, at 5:22 PM, Erick Erickson wrote:
>>>>>>
>>>>>>> try adding &hl.fl=text
>>>>>>> to specify your highlight field. I don't understand why you're  
>>>>>>> only
>>>>>>> getting the ID field back though. Do note that the highlighting
>>>>>>> is after the docs, related by the ID.
>>>>>>>
>>>>>>> Try a (non highlighting) query of just * to verify that you're
>>>>>>> pointing at the index you think you are. It's possible that
>>>>>>> you've modified a different index with SolrJ than your web
>>>>>>> server is pointing at.
>>>>>>>
>>>>>>> Also, SOLR has no way of knowing you're modified your index
>>>>>>> with SolrJ, so it may not be automatically reopening an
>>>>>>> IndexReader so your recent changes may not be visible
>>>>>>> until you force the SOLR reader to reopen.
>>>>>>>
>>>>>>> HTH
>>>>>>> Erick
>>>>>>>
>>>>>>> On Mon, Jun 28, 2010 at 6:49 PM, Peter Spam <ps...@mac.com>  
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote:
>>>>>>>>
>>>>>>>>>> 1) I can get my docs in the index, but when I search, it
>>>>>>>>>> returns the entire document.  I'd love to have it only
>>>>>>>>>> return the line (or two) around the search term.
>>>>>>>>>
>>>>>>>>> Solr can generate Google-like snippets as you describe.
>>>>>>>>> http://wiki.apache.org/solr/HighlightingParameters
>>>>>>>>
>>>>>>>> Here's how I commit my documents:
>>>>>>>>
>>>>>>>> J=0;
>>>>>>>> for i in `find . -name \*.txt`; do
>>>>>>>>    (( J++ ))
>>>>>>>>    curl "http://localhost:8983/solr/update/extract?literal.id=doc$J 
>>>>>>>> "
>>>>>>>> -F "myfile=@$i";
>>>>>>>> done;
>>>>>>>>
>>>>>>>> echo "------------- Committing"
>>>>>>>> curl "http://localhost:8983/solr/update/extract?commit=true"
>>>>>>>>
>>>>>>>>
>>>>>>>> Then, I try to query using
>>>>>>>>
>>>>> http://localhost:8983/solr/select?rows=10&start=0&fl=*,score&hl=true&q=testing
>>>>>>>> but I only get back the document ID rather than the snippet:
>>>>>>>>
>>>>>>>> <doc>
>>>>>>>> <float name="score">0.05030759</float>
>>>>>>>> <arr name="content_type">
>>>>>>>> <str>text/plain</str>
>>>>>>>> </arr>
>>>>>>>> <str name="id">doc16</str>
>>>>>>>> </doc>
>>>>>>>>
>>>>>>>> I'm using the schema.xml from the "lucid imagination:  
>>>>>>>> Indexing text and
>>>>>>>> html files" tutorial.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -Pete
>>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>> -- 
>>> Lance Norskog
>>> goksron@gmail.com
>>
>

Re: Very basic questions: Faceted front-end?

Posted by Peter Spam <ps...@mac.com>.

Ah, I found this:

	https://issues.apache.org/jira/browse/SOLR-634

... aka "solr-ui".  Is there anything else along these lines?  Thanks!


-Peter

On Jun 30, 2010, at 3:59 PM, Peter Spam wrote:

> Wow, thanks Lance - it's really fast now!
> 
> The last piece of the puzzle is setting up a nice front-end.  Are there any pre-built front-ends available, that mimic Google (for example), with facets?
> 
> 
> -Peter
> 
> On Jun 29, 2010, at 9:04 PM, Lance Norskog wrote:
> 
>> To highlight a field, Solr needs some extra Lucene values. If these
>> are not configured for the field in the schema, Solr has to re-analyze
>> the field to highlight it. If you want faster highlighting, you have
>> to add term vectors to the schema. Here is the grand map of such
>> things:
>> 
>> http://wiki.apache.org/solr/FieldOptionsByUseCase
>> 
>> On Tue, Jun 29, 2010 at 6:29 PM, Erick Erickson <er...@gmail.com> wrote:
>>> What are you actual highlighting requirements? you could try
>>> things like maxAnalyzedChars, requireFieldMatch, etc....
>>> 
>>> http://wiki.apache.org/solr/HighlightingParameters
>>> has a good list, but you've probably already seen that page....
>>> 
>>> Best
>>> Erick
>>> 
>>> On Tue, Jun 29, 2010 at 9:11 PM, Peter Spam <ps...@mac.com> wrote:
>>> 
>>>> To follow up, I've found that my queries are very fast (even with &fq=),
>>>> until I add &hl=true.  What can I do to speed up highlighting?  Should I
>>>> consider injecting a line at a time, rather than the entire file as a field?
>>>> 
>>>> 
>>>> -Pete
>>>> 
>>>> On Jun 29, 2010, at 11:07 AM, Peter Spam wrote:
>>>> 
>>>>> Thanks for everyone's help - I have this working now, but sometimes the
>>>> queries are incredibly slow!!  For example, <int name="QTime">461360</int>.
>>>> Also, I had to bump up the min/max RAM size to 1GB/3.5GB for things to
>>>> inject without throwing heap memory errors.  However, my data set is very
>>>> small!  36 text files, for a total of 113MB.  (It will grow to many TB, but
>>>> for now, this is a test).  The largest file is 34MB.
>>>>> 
>>>>> Therefore, I'm sure I'm doing something wrong :-)  Here's my config:
>>>>> 
>>>>> 
>>>> -----------------------------------------------------------------------------------------------
>>>>> 
>>>>> For the schema.xml, <types> is all default.  For fields, here are the
>>>> only lines that aren't commented out:
>>>>> 
>>>>>  <field name="id" type="string" indexed="true" stored="true"
>>>> required="true" />
>>>>>  <field name="body" type="text" indexed="true" stored="true"
>>>> multiValued="true"/>
>>>>>  <field name="timestamp" type="date" indexed="true" stored="true"
>>>> default="NOW" multiValued="false"/>
>>>>>  <field name="build" type="string" indexed="true" stored="true"
>>>> multiValued="false"/>
>>>>>  <field name="device" type="string" indexed="true" stored="true"
>>>> multiValued="false"/>
>>>>>  <dynamicField name="*" type="ignored" multiValued="true" />
>>>>> 
>>>>> ... then, for the rest:
>>>>> 
>>>>> <uniqueKey>id</uniqueKey>
>>>>> 
>>>>> <!-- field for the QueryParser to use when an explicit fieldname is
>>>> absent -->
>>>>> <defaultSearchField>body</defaultSearchField>
>>>>> 
>>>>> <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
>>>>> <solrQueryParser defaultOperator="AND"/>
>>>>> 
>>>>> 
>>>>> 
>>>> -----------------------------------------------------------------------------------------------
>>>>> 
>>>>> 
>>>>> Invoking:  java -Xmx3584M -Xms1024M -jar start.jar
>>>>> 
>>>>> 
>>>>> 
>>>> -----------------------------------------------------------------------------------------------
>>>>> 
>>>>> 
>>>>> Injecting:
>>>>> 
>>>>> #!/bin/sh
>>>>> 
>>>>> J=0
>>>>> for i in `find . -name \*.txt`; do
>>>>>      (( J++ ))
>>>>>      curl "
>>>> http://localhost:8983/solr/update/extract?literal.id=doc$J&fmap.content=body"
>>>> -F "myfile=@$i";
>>>>> done;
>>>>> 
>>>>> 
>>>>> echo "------------- Committing"
>>>>> curl "http://localhost:8983/solr/update/extract?commit=true"
>>>>> 
>>>>> 
>>>>> 
>>>> -----------------------------------------------------------------------------------------------
>>>>> 
>>>>> 
>>>>> Searching:
>>>>> 
>>>>> 
>>>> http://localhost:8983/solr/select?q=testing&hl=true&fl=id,score&hl.snippets=5&hl.mergeContiguous=true
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> -Pete
>>>>> 
>>>>> On Jun 28, 2010, at 5:22 PM, Erick Erickson wrote:
>>>>> 
>>>>>> try adding &hl.fl=text
>>>>>> to specify your highlight field. I don't understand why you're only
>>>>>> getting the ID field back though. Do note that the highlighting
>>>>>> is after the docs, related by the ID.
>>>>>> 
>>>>>> Try a (non highlighting) query of just * to verify that you're
>>>>>> pointing at the index you think you are. It's possible that
>>>>>> you've modified a different index with SolrJ than your web
>>>>>> server is pointing at.
>>>>>> 
>>>>>> Also, SOLR has no way of knowing you're modified your index
>>>>>> with SolrJ, so it may not be automatically reopening an
>>>>>> IndexReader so your recent changes may not be visible
>>>>>> until you force the SOLR reader to reopen.
>>>>>> 
>>>>>> HTH
>>>>>> Erick
>>>>>> 
>>>>>> On Mon, Jun 28, 2010 at 6:49 PM, Peter Spam <ps...@mac.com> wrote:
>>>>>> 
>>>>>>> On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote:
>>>>>>> 
>>>>>>>>> 1) I can get my docs in the index, but when I search, it
>>>>>>>>> returns the entire document.  I'd love to have it only
>>>>>>>>> return the line (or two) around the search term.
>>>>>>>> 
>>>>>>>> Solr can generate Google-like snippets as you describe.
>>>>>>>> http://wiki.apache.org/solr/HighlightingParameters
>>>>>>> 
>>>>>>> Here's how I commit my documents:
>>>>>>> 
>>>>>>> J=0;
>>>>>>> for i in `find . -name \*.txt`; do
>>>>>>>     (( J++ ))
>>>>>>>     curl "http://localhost:8983/solr/update/extract?literal.id=doc$J"
>>>>>>> -F "myfile=@$i";
>>>>>>> done;
>>>>>>> 
>>>>>>> echo "------------- Committing"
>>>>>>> curl "http://localhost:8983/solr/update/extract?commit=true"
>>>>>>> 
>>>>>>> 
>>>>>>> Then, I try to query using
>>>>>>> 
>>>> http://localhost:8983/solr/select?rows=10&start=0&fl=*,score&hl=true&q=testing
>>>>>>> but I only get back the document ID rather than the snippet:
>>>>>>> 
>>>>>>> <doc>
>>>>>>> <float name="score">0.05030759</float>
>>>>>>> <arr name="content_type">
>>>>>>> <str>text/plain</str>
>>>>>>> </arr>
>>>>>>> <str name="id">doc16</str>
>>>>>>> </doc>
>>>>>>> 
>>>>>>> I'm using the schema.xml from the "lucid imagination: Indexing text and
>>>>>>> html files" tutorial.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -Pete
>>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
>> 
>> -- 
>> Lance Norskog
>> goksron@gmail.com
>

Re: Very basic questions: Faceted front-end?

Posted by Jan Høydahl / Cominvent <ja...@cominvent.com>.

Have you had a look at www.twigkit.com ? Could be worth the bucks...

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 1. juli 2010, at 00.59, Peter Spam wrote:

> Wow, thanks Lance - it's really fast now!
> 
> The last piece of the puzzle is setting up a nice front-end.  Are there any pre-built front-ends available, that mimic Google (for example), with facets?
> 
> 
> -Peter
> 
> On Jun 29, 2010, at 9:04 PM, Lance Norskog wrote:
> 
>> To highlight a field, Solr needs some extra Lucene values. If these
>> are not configured for the field in the schema, Solr has to re-analyze
>> the field to highlight it. If you want faster highlighting, you have
>> to add term vectors to the schema. Here is the grand map of such
>> things:
>> 
>> http://wiki.apache.org/solr/FieldOptionsByUseCase
>> 
>> On Tue, Jun 29, 2010 at 6:29 PM, Erick Erickson <er...@gmail.com> wrote:
>>> What are you actual highlighting requirements? you could try
>>> things like maxAnalyzedChars, requireFieldMatch, etc....
>>> 
>>> http://wiki.apache.org/solr/HighlightingParameters
>>> has a good list, but you've probably already seen that page....
>>> 
>>> Best
>>> Erick
>>> 
>>> On Tue, Jun 29, 2010 at 9:11 PM, Peter Spam <ps...@mac.com> wrote:
>>> 
>>>> To follow up, I've found that my queries are very fast (even with &fq=),
>>>> until I add &hl=true.  What can I do to speed up highlighting?  Should I
>>>> consider injecting a line at a time, rather than the entire file as a field?
>>>> 
>>>> 
>>>> -Pete
>>>> 
>>>> On Jun 29, 2010, at 11:07 AM, Peter Spam wrote:
>>>> 
>>>>> Thanks for everyone's help - I have this working now, but sometimes the
>>>> queries are incredibly slow!!  For example, <int name="QTime">461360</int>.
>>>> Also, I had to bump up the min/max RAM size to 1GB/3.5GB for things to
>>>> inject without throwing heap memory errors.  However, my data set is very
>>>> small!  36 text files, for a total of 113MB.  (It will grow to many TB, but
>>>> for now, this is a test).  The largest file is 34MB.
>>>>> 
>>>>> Therefore, I'm sure I'm doing something wrong :-)  Here's my config:
>>>>> 
>>>>> 
>>>> -----------------------------------------------------------------------------------------------
>>>>> 
>>>>> For the schema.xml, <types> is all default.  For fields, here are the
>>>> only lines that aren't commented out:
>>>>> 
>>>>>  <field name="id" type="string" indexed="true" stored="true"
>>>> required="true" />
>>>>>  <field name="body" type="text" indexed="true" stored="true"
>>>> multiValued="true"/>
>>>>>  <field name="timestamp" type="date" indexed="true" stored="true"
>>>> default="NOW" multiValued="false"/>
>>>>>  <field name="build" type="string" indexed="true" stored="true"
>>>> multiValued="false"/>
>>>>>  <field name="device" type="string" indexed="true" stored="true"
>>>> multiValued="false"/>
>>>>>  <dynamicField name="*" type="ignored" multiValued="true" />
>>>>> 
>>>>> ... then, for the rest:
>>>>> 
>>>>> <uniqueKey>id</uniqueKey>
>>>>> 
>>>>> <!-- field for the QueryParser to use when an explicit fieldname is
>>>> absent -->
>>>>> <defaultSearchField>body</defaultSearchField>
>>>>> 
>>>>> <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
>>>>> <solrQueryParser defaultOperator="AND"/>
>>>>> 
>>>>> 
>>>>> 
>>>> -----------------------------------------------------------------------------------------------
>>>>> 
>>>>> 
>>>>> Invoking:  java -Xmx3584M -Xms1024M -jar start.jar
>>>>> 
>>>>> 
>>>>> 
>>>> -----------------------------------------------------------------------------------------------
>>>>> 
>>>>> 
>>>>> Injecting:
>>>>> 
>>>>> #!/bin/sh
>>>>> 
>>>>> J=0
>>>>> for i in `find . -name \*.txt`; do
>>>>>      (( J++ ))
>>>>>      curl "
>>>> http://localhost:8983/solr/update/extract?literal.id=doc$J&fmap.content=body"
>>>> -F "myfile=@$i";
>>>>> done;
>>>>> 
>>>>> 
>>>>> echo "------------- Committing"
>>>>> curl "http://localhost:8983/solr/update/extract?commit=true"
>>>>> 
>>>>> 
>>>>> 
>>>> -----------------------------------------------------------------------------------------------
>>>>> 
>>>>> 
>>>>> Searching:
>>>>> 
>>>>> 
>>>> http://localhost:8983/solr/select?q=testing&hl=true&fl=id,score&hl.snippets=5&hl.mergeContiguous=true
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> -Pete
>>>>> 
>>>>> On Jun 28, 2010, at 5:22 PM, Erick Erickson wrote:
>>>>> 
>>>>>> try adding &hl.fl=text
>>>>>> to specify your highlight field. I don't understand why you're only
>>>>>> getting the ID field back though. Do note that the highlighting
>>>>>> is after the docs, related by the ID.
>>>>>> 
>>>>>> Try a (non highlighting) query of just * to verify that you're
>>>>>> pointing at the index you think you are. It's possible that
>>>>>> you've modified a different index with SolrJ than your web
>>>>>> server is pointing at.
>>>>>> 
>>>>>> Also, SOLR has no way of knowing you're modified your index
>>>>>> with SolrJ, so it may not be automatically reopening an
>>>>>> IndexReader so your recent changes may not be visible
>>>>>> until you force the SOLR reader to reopen.
>>>>>> 
>>>>>> HTH
>>>>>> Erick
>>>>>> 
>>>>>> On Mon, Jun 28, 2010 at 6:49 PM, Peter Spam <ps...@mac.com> wrote:
>>>>>> 
>>>>>>> On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote:
>>>>>>> 
>>>>>>>>> 1) I can get my docs in the index, but when I search, it
>>>>>>>>> returns the entire document.  I'd love to have it only
>>>>>>>>> return the line (or two) around the search term.
>>>>>>>> 
>>>>>>>> Solr can generate Google-like snippets as you describe.
>>>>>>>> http://wiki.apache.org/solr/HighlightingParameters
>>>>>>> 
>>>>>>> Here's how I commit my documents:
>>>>>>> 
>>>>>>> J=0;
>>>>>>> for i in `find . -name \*.txt`; do
>>>>>>>     (( J++ ))
>>>>>>>     curl "http://localhost:8983/solr/update/extract?literal.id=doc$J"
>>>>>>> -F "myfile=@$i";
>>>>>>> done;
>>>>>>> 
>>>>>>> echo "------------- Committing"
>>>>>>> curl "http://localhost:8983/solr/update/extract?commit=true"
>>>>>>> 
>>>>>>> 
>>>>>>> Then, I try to query using
>>>>>>> 
>>>> http://localhost:8983/solr/select?rows=10&start=0&fl=*,score&hl=true&q=testing
>>>>>>> but I only get back the document ID rather than the snippet:
>>>>>>> 
>>>>>>> <doc>
>>>>>>> <float name="score">0.05030759</float>
>>>>>>> <arr name="content_type">
>>>>>>> <str>text/plain</str>
>>>>>>> </arr>
>>>>>>> <str name="id">doc16</str>
>>>>>>> </doc>
>>>>>>> 
>>>>>>> I'm using the schema.xml from the "lucid imagination: Indexing text and
>>>>>>> html files" tutorial.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -Pete
>>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
>> 
>> -- 
>> Lance Norskog
>> goksron@gmail.com
>

Re: Very basic questions: Faceted front-end?

Posted by Peter Spam <ps...@mac.com>.

Wow, thanks Lance - it's really fast now!

The last piece of the puzzle is setting up a nice front-end.  Are there any pre-built front-ends available, that mimic Google (for example), with facets?


-Peter

On Jun 29, 2010, at 9:04 PM, Lance Norskog wrote:

> To highlight a field, Solr needs some extra Lucene values. If these
> are not configured for the field in the schema, Solr has to re-analyze
> the field to highlight it. If you want faster highlighting, you have
> to add term vectors to the schema. Here is the grand map of such
> things:
> 
> http://wiki.apache.org/solr/FieldOptionsByUseCase
> 
> On Tue, Jun 29, 2010 at 6:29 PM, Erick Erickson <er...@gmail.com> wrote:
>> What are you actual highlighting requirements? you could try
>> things like maxAnalyzedChars, requireFieldMatch, etc....
>> 
>> http://wiki.apache.org/solr/HighlightingParameters
>> has a good list, but you've probably already seen that page....
>> 
>> Best
>> Erick
>> 
>> On Tue, Jun 29, 2010 at 9:11 PM, Peter Spam <ps...@mac.com> wrote:
>> 
>>> To follow up, I've found that my queries are very fast (even with &fq=),
>>> until I add &hl=true.  What can I do to speed up highlighting?  Should I
>>> consider injecting a line at a time, rather than the entire file as a field?
>>> 
>>> 
>>> -Pete
>>> 
>>> On Jun 29, 2010, at 11:07 AM, Peter Spam wrote:
>>> 
>>>> Thanks for everyone's help - I have this working now, but sometimes the
>>> queries are incredibly slow!!  For example, <int name="QTime">461360</int>.
>>>  Also, I had to bump up the min/max RAM size to 1GB/3.5GB for things to
>>> inject without throwing heap memory errors.  However, my data set is very
>>> small!  36 text files, for a total of 113MB.  (It will grow to many TB, but
>>> for now, this is a test).  The largest file is 34MB.
>>>> 
>>>> Therefore, I'm sure I'm doing something wrong :-)  Here's my config:
>>>> 
>>>> 
>>> -----------------------------------------------------------------------------------------------
>>>> 
>>>> For the schema.xml, <types> is all default.  For fields, here are the
>>> only lines that aren't commented out:
>>>> 
>>>>   <field name="id" type="string" indexed="true" stored="true"
>>> required="true" />
>>>>   <field name="body" type="text" indexed="true" stored="true"
>>> multiValued="true"/>
>>>>   <field name="timestamp" type="date" indexed="true" stored="true"
>>> default="NOW" multiValued="false"/>
>>>>   <field name="build" type="string" indexed="true" stored="true"
>>> multiValued="false"/>
>>>>   <field name="device" type="string" indexed="true" stored="true"
>>> multiValued="false"/>
>>>>   <dynamicField name="*" type="ignored" multiValued="true" />
>>>> 
>>>> ... then, for the rest:
>>>> 
>>>> <uniqueKey>id</uniqueKey>
>>>> 
>>>> <!-- field for the QueryParser to use when an explicit fieldname is
>>> absent -->
>>>> <defaultSearchField>body</defaultSearchField>
>>>> 
>>>> <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
>>>> <solrQueryParser defaultOperator="AND"/>
>>>> 
>>>> 
>>>> 
>>> -----------------------------------------------------------------------------------------------
>>>> 
>>>> 
>>>> Invoking:  java -Xmx3584M -Xms1024M -jar start.jar
>>>> 
>>>> 
>>>> 
>>> -----------------------------------------------------------------------------------------------
>>>> 
>>>> 
>>>> Injecting:
>>>> 
>>>> #!/bin/sh
>>>> 
>>>> J=0
>>>> for i in `find . -name \*.txt`; do
>>>>       (( J++ ))
>>>>       curl "
>>> http://localhost:8983/solr/update/extract?literal.id=doc$J&fmap.content=body"
>>> -F "myfile=@$i";
>>>> done;
>>>> 
>>>> 
>>>> echo "------------- Committing"
>>>> curl "http://localhost:8983/solr/update/extract?commit=true"
>>>> 
>>>> 
>>>> 
>>> -----------------------------------------------------------------------------------------------
>>>> 
>>>> 
>>>> Searching:
>>>> 
>>>> 
>>> http://localhost:8983/solr/select?q=testing&hl=true&fl=id,score&hl.snippets=5&hl.mergeContiguous=true
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -Pete
>>>> 
>>>> On Jun 28, 2010, at 5:22 PM, Erick Erickson wrote:
>>>> 
>>>>> try adding &hl.fl=text
>>>>> to specify your highlight field. I don't understand why you're only
>>>>> getting the ID field back though. Do note that the highlighting
>>>>> is after the docs, related by the ID.
>>>>> 
>>>>> Try a (non highlighting) query of just * to verify that you're
>>>>> pointing at the index you think you are. It's possible that
>>>>> you've modified a different index with SolrJ than your web
>>>>> server is pointing at.
>>>>> 
>>>>> Also, SOLR has no way of knowing you're modified your index
>>>>> with SolrJ, so it may not be automatically reopening an
>>>>> IndexReader so your recent changes may not be visible
>>>>> until you force the SOLR reader to reopen.
>>>>> 
>>>>> HTH
>>>>> Erick
>>>>> 
>>>>> On Mon, Jun 28, 2010 at 6:49 PM, Peter Spam <ps...@mac.com> wrote:
>>>>> 
>>>>>> On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote:
>>>>>> 
>>>>>>>> 1) I can get my docs in the index, but when I search, it
>>>>>>>> returns the entire document.  I'd love to have it only
>>>>>>>> return the line (or two) around the search term.
>>>>>>> 
>>>>>>> Solr can generate Google-like snippets as you describe.
>>>>>>> http://wiki.apache.org/solr/HighlightingParameters
>>>>>> 
>>>>>> Here's how I commit my documents:
>>>>>> 
>>>>>> J=0;
>>>>>> for i in `find . -name \*.txt`; do
>>>>>>      (( J++ ))
>>>>>>      curl "http://localhost:8983/solr/update/extract?literal.id=doc$J"
>>>>>> -F "myfile=@$i";
>>>>>> done;
>>>>>> 
>>>>>> echo "------------- Committing"
>>>>>> curl "http://localhost:8983/solr/update/extract?commit=true"
>>>>>> 
>>>>>> 
>>>>>> Then, I try to query using
>>>>>> 
>>> http://localhost:8983/solr/select?rows=10&start=0&fl=*,score&hl=true&q=testing
>>>>>> but I only get back the document ID rather than the snippet:
>>>>>> 
>>>>>> <doc>
>>>>>> <float name="score">0.05030759</float>
>>>>>> <arr name="content_type">
>>>>>> <str>text/plain</str>
>>>>>> </arr>
>>>>>> <str name="id">doc16</str>
>>>>>> </doc>
>>>>>> 
>>>>>> I'm using the schema.xml from the "lucid imagination: Indexing text and
>>>>>> html files" tutorial.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -Pete
>>>>>> 
>>>> 
>>> 
>>> 
>> 
> 
> 
> 
> -- 
> Lance Norskog
> goksron@gmail.com

Re: Very basic questions: Indexing text - working, but slow!

Posted by Lance Norskog <go...@gmail.com>.

To highlight a field, Solr needs some extra Lucene values. If these
are not configured for the field in the schema, Solr has to re-analyze
the field to highlight it. If you want faster highlighting, you have
to add term vectors to the schema. Here is the grand map of such
things:

http://wiki.apache.org/solr/FieldOptionsByUseCase

On Tue, Jun 29, 2010 at 6:29 PM, Erick Erickson <er...@gmail.com> wrote:
> What are you actual highlighting requirements? you could try
> things like maxAnalyzedChars, requireFieldMatch, etc....
>
> http://wiki.apache.org/solr/HighlightingParameters
> has a good list, but you've probably already seen that page....
>
> Best
> Erick
>
> On Tue, Jun 29, 2010 at 9:11 PM, Peter Spam <ps...@mac.com> wrote:
>
>> To follow up, I've found that my queries are very fast (even with &fq=),
>> until I add &hl=true.  What can I do to speed up highlighting?  Should I
>> consider injecting a line at a time, rather than the entire file as a field?
>>
>>
>> -Pete
>>
>> On Jun 29, 2010, at 11:07 AM, Peter Spam wrote:
>>
>> > Thanks for everyone's help - I have this working now, but sometimes the
>> queries are incredibly slow!!  For example, <int name="QTime">461360</int>.
>>  Also, I had to bump up the min/max RAM size to 1GB/3.5GB for things to
>> inject without throwing heap memory errors.  However, my data set is very
>> small!  36 text files, for a total of 113MB.  (It will grow to many TB, but
>> for now, this is a test).  The largest file is 34MB.
>> >
>> > Therefore, I'm sure I'm doing something wrong :-)  Here's my config:
>> >
>> >
>> -----------------------------------------------------------------------------------------------
>> >
>> > For the schema.xml, <types> is all default.  For fields, here are the
>> only lines that aren't commented out:
>> >
>> >   <field name="id" type="string" indexed="true" stored="true"
>> required="true" />
>> >   <field name="body" type="text" indexed="true" stored="true"
>> multiValued="true"/>
>> >   <field name="timestamp" type="date" indexed="true" stored="true"
>> default="NOW" multiValued="false"/>
>> >   <field name="build" type="string" indexed="true" stored="true"
>> multiValued="false"/>
>> >   <field name="device" type="string" indexed="true" stored="true"
>> multiValued="false"/>
>> >   <dynamicField name="*" type="ignored" multiValued="true" />
>> >
>> > ... then, for the rest:
>> >
>> > <uniqueKey>id</uniqueKey>
>> >
>> > <!-- field for the QueryParser to use when an explicit fieldname is
>> absent -->
>> > <defaultSearchField>body</defaultSearchField>
>> >
>> > <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
>> > <solrQueryParser defaultOperator="AND"/>
>> >
>> >
>> >
>> -----------------------------------------------------------------------------------------------
>> >
>> >
>> > Invoking:  java -Xmx3584M -Xms1024M -jar start.jar
>> >
>> >
>> >
>> -----------------------------------------------------------------------------------------------
>> >
>> >
>> > Injecting:
>> >
>> > #!/bin/sh
>> >
>> > J=0
>> > for i in `find . -name \*.txt`; do
>> >       (( J++ ))
>> >       curl "
>> http://localhost:8983/solr/update/extract?literal.id=doc$J&fmap.content=body"
>> -F "myfile=@$i";
>> > done;
>> >
>> >
>> > echo "------------- Committing"
>> > curl "http://localhost:8983/solr/update/extract?commit=true"
>> >
>> >
>> >
>> -----------------------------------------------------------------------------------------------
>> >
>> >
>> > Searching:
>> >
>> >
>> http://localhost:8983/solr/select?q=testing&hl=true&fl=id,score&hl.snippets=5&hl.mergeContiguous=true
>> >
>> >
>> >
>> >
>> >
>> > -Pete
>> >
>> > On Jun 28, 2010, at 5:22 PM, Erick Erickson wrote:
>> >
>> >> try adding &hl.fl=text
>> >> to specify your highlight field. I don't understand why you're only
>> >> getting the ID field back though. Do note that the highlighting
>> >> is after the docs, related by the ID.
>> >>
>> >> Try a (non highlighting) query of just * to verify that you're
>> >> pointing at the index you think you are. It's possible that
>> >> you've modified a different index with SolrJ than your web
>> >> server is pointing at.
>> >>
>> >> Also, SOLR has no way of knowing you're modified your index
>> >> with SolrJ, so it may not be automatically reopening an
>> >> IndexReader so your recent changes may not be visible
>> >> until you force the SOLR reader to reopen.
>> >>
>> >> HTH
>> >> Erick
>> >>
>> >> On Mon, Jun 28, 2010 at 6:49 PM, Peter Spam <ps...@mac.com> wrote:
>> >>
>> >>> On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote:
>> >>>
>> >>>>> 1) I can get my docs in the index, but when I search, it
>> >>>>> returns the entire document.  I'd love to have it only
>> >>>>> return the line (or two) around the search term.
>> >>>>
>> >>>> Solr can generate Google-like snippets as you describe.
>> >>>> http://wiki.apache.org/solr/HighlightingParameters
>> >>>
>> >>> Here's how I commit my documents:
>> >>>
>> >>> J=0;
>> >>> for i in `find . -name \*.txt`; do
>> >>>      (( J++ ))
>> >>>      curl "http://localhost:8983/solr/update/extract?literal.id=doc$J"
>> >>> -F "myfile=@$i";
>> >>> done;
>> >>>
>> >>> echo "------------- Committing"
>> >>> curl "http://localhost:8983/solr/update/extract?commit=true"
>> >>>
>> >>>
>> >>> Then, I try to query using
>> >>>
>> http://localhost:8983/solr/select?rows=10&start=0&fl=*,score&hl=true&q=testing
>> >>> but I only get back the document ID rather than the snippet:
>> >>>
>> >>> <doc>
>> >>> <float name="score">0.05030759</float>
>> >>> <arr name="content_type">
>> >>> <str>text/plain</str>
>> >>> </arr>
>> >>> <str name="id">doc16</str>
>> >>> </doc>
>> >>>
>> >>> I'm using the schema.xml from the "lucid imagination: Indexing text and
>> >>> html files" tutorial.
>> >>>
>> >>>
>> >>>
>> >>> -Pete
>> >>>
>> >
>>
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Very basic questions: Indexing text - working, but slow!

Posted by Erick Erickson <er...@gmail.com>.

What are you actual highlighting requirements? you could try
things like maxAnalyzedChars, requireFieldMatch, etc....

http://wiki.apache.org/solr/HighlightingParameters
has a good list, but you've probably already seen that page....

Best
Erick

On Tue, Jun 29, 2010 at 9:11 PM, Peter Spam <ps...@mac.com> wrote:

> To follow up, I've found that my queries are very fast (even with &fq=),
> until I add &hl=true.  What can I do to speed up highlighting?  Should I
> consider injecting a line at a time, rather than the entire file as a field?
>
>
> -Pete
>
> On Jun 29, 2010, at 11:07 AM, Peter Spam wrote:
>
> > Thanks for everyone's help - I have this working now, but sometimes the
> queries are incredibly slow!!  For example, <int name="QTime">461360</int>.
>  Also, I had to bump up the min/max RAM size to 1GB/3.5GB for things to
> inject without throwing heap memory errors.  However, my data set is very
> small!  36 text files, for a total of 113MB.  (It will grow to many TB, but
> for now, this is a test).  The largest file is 34MB.
> >
> > Therefore, I'm sure I'm doing something wrong :-)  Here's my config:
> >
> >
> -----------------------------------------------------------------------------------------------
> >
> > For the schema.xml, <types> is all default.  For fields, here are the
> only lines that aren't commented out:
> >
> >   <field name="id" type="string" indexed="true" stored="true"
> required="true" />
> >   <field name="body" type="text" indexed="true" stored="true"
> multiValued="true"/>
> >   <field name="timestamp" type="date" indexed="true" stored="true"
> default="NOW" multiValued="false"/>
> >   <field name="build" type="string" indexed="true" stored="true"
> multiValued="false"/>
> >   <field name="device" type="string" indexed="true" stored="true"
> multiValued="false"/>
> >   <dynamicField name="*" type="ignored" multiValued="true" />
> >
> > ... then, for the rest:
> >
> > <uniqueKey>id</uniqueKey>
> >
> > <!-- field for the QueryParser to use when an explicit fieldname is
> absent -->
> > <defaultSearchField>body</defaultSearchField>
> >
> > <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
> > <solrQueryParser defaultOperator="AND"/>
> >
> >
> >
> -----------------------------------------------------------------------------------------------
> >
> >
> > Invoking:  java -Xmx3584M -Xms1024M -jar start.jar
> >
> >
> >
> -----------------------------------------------------------------------------------------------
> >
> >
> > Injecting:
> >
> > #!/bin/sh
> >
> > J=0
> > for i in `find . -name \*.txt`; do
> >       (( J++ ))
> >       curl "
> http://localhost:8983/solr/update/extract?literal.id=doc$J&fmap.content=body"
> -F "myfile=@$i";
> > done;
> >
> >
> > echo "------------- Committing"
> > curl "http://localhost:8983/solr/update/extract?commit=true"
> >
> >
> >
> -----------------------------------------------------------------------------------------------
> >
> >
> > Searching:
> >
> >
> http://localhost:8983/solr/select?q=testing&hl=true&fl=id,score&hl.snippets=5&hl.mergeContiguous=true
> >
> >
> >
> >
> >
> > -Pete
> >
> > On Jun 28, 2010, at 5:22 PM, Erick Erickson wrote:
> >
> >> try adding &hl.fl=text
> >> to specify your highlight field. I don't understand why you're only
> >> getting the ID field back though. Do note that the highlighting
> >> is after the docs, related by the ID.
> >>
> >> Try a (non highlighting) query of just * to verify that you're
> >> pointing at the index you think you are. It's possible that
> >> you've modified a different index with SolrJ than your web
> >> server is pointing at.
> >>
> >> Also, SOLR has no way of knowing you're modified your index
> >> with SolrJ, so it may not be automatically reopening an
> >> IndexReader so your recent changes may not be visible
> >> until you force the SOLR reader to reopen.
> >>
> >> HTH
> >> Erick
> >>
> >> On Mon, Jun 28, 2010 at 6:49 PM, Peter Spam <ps...@mac.com> wrote:
> >>
> >>> On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote:
> >>>
> >>>>> 1) I can get my docs in the index, but when I search, it
> >>>>> returns the entire document.  I'd love to have it only
> >>>>> return the line (or two) around the search term.
> >>>>
> >>>> Solr can generate Google-like snippets as you describe.
> >>>> http://wiki.apache.org/solr/HighlightingParameters
> >>>
> >>> Here's how I commit my documents:
> >>>
> >>> J=0;
> >>> for i in `find . -name \*.txt`; do
> >>>      (( J++ ))
> >>>      curl "http://localhost:8983/solr/update/extract?literal.id=doc$J"
> >>> -F "myfile=@$i";
> >>> done;
> >>>
> >>> echo "------------- Committing"
> >>> curl "http://localhost:8983/solr/update/extract?commit=true"
> >>>
> >>>
> >>> Then, I try to query using
> >>>
> http://localhost:8983/solr/select?rows=10&start=0&fl=*,score&hl=true&q=testing
> >>> but I only get back the document ID rather than the snippet:
> >>>
> >>> <doc>
> >>> <float name="score">0.05030759</float>
> >>> <arr name="content_type">
> >>> <str>text/plain</str>
> >>> </arr>
> >>> <str name="id">doc16</str>
> >>> </doc>
> >>>
> >>> I'm using the schema.xml from the "lucid imagination: Indexing text and
> >>> html files" tutorial.
> >>>
> >>>
> >>>
> >>> -Pete
> >>>
> >
>
>

Re: Very basic questions: Indexing text - working, but slow!

Posted by Peter Spam <ps...@mac.com>.

To follow up, I've found that my queries are very fast (even with &fq=), until I add &hl=true.  What can I do to speed up highlighting?  Should I consider injecting a line at a time, rather than the entire file as a field?


-Pete

On Jun 29, 2010, at 11:07 AM, Peter Spam wrote:

> Thanks for everyone's help - I have this working now, but sometimes the queries are incredibly slow!!  For example, <int name="QTime">461360</int>.  Also, I had to bump up the min/max RAM size to 1GB/3.5GB for things to inject without throwing heap memory errors.  However, my data set is very small!  36 text files, for a total of 113MB.  (It will grow to many TB, but for now, this is a test).  The largest file is 34MB.
> 
> Therefore, I'm sure I'm doing something wrong :-)  Here's my config:
> 
> -----------------------------------------------------------------------------------------------
> 
> For the schema.xml, <types> is all default.  For fields, here are the only lines that aren't commented out:
> 
>   <field name="id" type="string" indexed="true" stored="true" required="true" />
>   <field name="body" type="text" indexed="true" stored="true" multiValued="true"/>
>   <field name="timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/>
>   <field name="build" type="string" indexed="true" stored="true" multiValued="false"/>
>   <field name="device" type="string" indexed="true" stored="true" multiValued="false"/>
>   <dynamicField name="*" type="ignored" multiValued="true" />
> 
> ... then, for the rest:
> 
> <uniqueKey>id</uniqueKey>
> 
> <!-- field for the QueryParser to use when an explicit fieldname is absent -->
> <defaultSearchField>body</defaultSearchField>
> 
> <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
> <solrQueryParser defaultOperator="AND"/>
> 
> 
> -----------------------------------------------------------------------------------------------
> 
> 
> Invoking:  java -Xmx3584M -Xms1024M -jar start.jar
> 
> 
> -----------------------------------------------------------------------------------------------
> 
> 
> Injecting:
> 
> #!/bin/sh
> 
> J=0
> for i in `find . -name \*.txt`; do 
> 	(( J++ ))
> 	curl "http://localhost:8983/solr/update/extract?literal.id=doc$J&fmap.content=body" -F "myfile=@$i"; 
> done;
> 
> 
> echo "------------- Committing"
> curl "http://localhost:8983/solr/update/extract?commit=true"
> 
> 
> -----------------------------------------------------------------------------------------------
> 
> 
> Searching:
> 
> http://localhost:8983/solr/select?q=testing&hl=true&fl=id,score&hl.snippets=5&hl.mergeContiguous=true
> 
> 
> 
> 
> 
> -Pete
> 
> On Jun 28, 2010, at 5:22 PM, Erick Erickson wrote:
> 
>> try adding &hl.fl=text
>> to specify your highlight field. I don't understand why you're only
>> getting the ID field back though. Do note that the highlighting
>> is after the docs, related by the ID.
>> 
>> Try a (non highlighting) query of just * to verify that you're
>> pointing at the index you think you are. It's possible that
>> you've modified a different index with SolrJ than your web
>> server is pointing at.
>> 
>> Also, SOLR has no way of knowing you're modified your index
>> with SolrJ, so it may not be automatically reopening an
>> IndexReader so your recent changes may not be visible
>> until you force the SOLR reader to reopen.
>> 
>> HTH
>> Erick
>> 
>> On Mon, Jun 28, 2010 at 6:49 PM, Peter Spam <ps...@mac.com> wrote:
>> 
>>> On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote:
>>> 
>>>>> 1) I can get my docs in the index, but when I search, it
>>>>> returns the entire document.  I'd love to have it only
>>>>> return the line (or two) around the search term.
>>>> 
>>>> Solr can generate Google-like snippets as you describe.
>>>> http://wiki.apache.org/solr/HighlightingParameters
>>> 
>>> Here's how I commit my documents:
>>> 
>>> J=0;
>>> for i in `find . -name \*.txt`; do
>>>      (( J++ ))
>>>      curl "http://localhost:8983/solr/update/extract?literal.id=doc$J"
>>> -F "myfile=@$i";
>>> done;
>>> 
>>> echo "------------- Committing"
>>> curl "http://localhost:8983/solr/update/extract?commit=true"
>>> 
>>> 
>>> Then, I try to query using
>>> http://localhost:8983/solr/select?rows=10&start=0&fl=*,score&hl=true&q=testing
>>> but I only get back the document ID rather than the snippet:
>>> 
>>> <doc>
>>> <float name="score">0.05030759</float>
>>> <arr name="content_type">
>>> <str>text/plain</str>
>>> </arr>
>>> <str name="id">doc16</str>
>>> </doc>
>>> 
>>> I'm using the schema.xml from the "lucid imagination: Indexing text and
>>> html files" tutorial.
>>> 
>>> 
>>> 
>>> -Pete
>>> 
>

Re: Very basic questions: Indexing text - working, but slow!

Posted by Peter Spam <ps...@mac.com>.

Thanks for everyone's help - I have this working now, but sometimes the queries are incredibly slow!!  For example, <int name="QTime">461360</int>.  Also, I had to bump up the min/max RAM size to 1GB/3.5GB for things to inject without throwing heap memory errors.  However, my data set is very small!  36 text files, for a total of 113MB.  (It will grow to many TB, but for now, this is a test).  The largest file is 34MB.

Therefore, I'm sure I'm doing something wrong :-)  Here's my config:

-----------------------------------------------------------------------------------------------

For the schema.xml, <types> is all default.  For fields, here are the only lines that aren't commented out:

   <field name="id" type="string" indexed="true" stored="true" required="true" />
   <field name="body" type="text" indexed="true" stored="true" multiValued="true"/>
   <field name="timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/>
   <field name="build" type="string" indexed="true" stored="true" multiValued="false"/>
   <field name="device" type="string" indexed="true" stored="true" multiValued="false"/>
   <dynamicField name="*" type="ignored" multiValued="true" />

... then, for the rest:

 <uniqueKey>id</uniqueKey>

 <!-- field for the QueryParser to use when an explicit fieldname is absent -->
 <defaultSearchField>body</defaultSearchField>

 <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
 <solrQueryParser defaultOperator="AND"/>


-----------------------------------------------------------------------------------------------


Invoking:  java -Xmx3584M -Xms1024M -jar start.jar


-----------------------------------------------------------------------------------------------


Injecting:

#!/bin/sh

J=0
for i in `find . -name \*.txt`; do 
	(( J++ ))
	curl "http://localhost:8983/solr/update/extract?literal.id=doc$J&fmap.content=body" -F "myfile=@$i"; 
done;


echo "------------- Committing"
curl "http://localhost:8983/solr/update/extract?commit=true"


-----------------------------------------------------------------------------------------------


Searching:

http://localhost:8983/solr/select?q=testing&hl=true&fl=id,score&hl.snippets=5&hl.mergeContiguous=true





-Pete

On Jun 28, 2010, at 5:22 PM, Erick Erickson wrote:

> try adding &hl.fl=text
> to specify your highlight field. I don't understand why you're only
> getting the ID field back though. Do note that the highlighting
> is after the docs, related by the ID.
> 
> Try a (non highlighting) query of just * to verify that you're
> pointing at the index you think you are. It's possible that
> you've modified a different index with SolrJ than your web
> server is pointing at.
> 
> Also, SOLR has no way of knowing you're modified your index
> with SolrJ, so it may not be automatically reopening an
> IndexReader so your recent changes may not be visible
> until you force the SOLR reader to reopen.
> 
> HTH
> Erick
> 
> On Mon, Jun 28, 2010 at 6:49 PM, Peter Spam <ps...@mac.com> wrote:
> 
>> On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote:
>> 
>>>> 1) I can get my docs in the index, but when I search, it
>>>> returns the entire document.  I'd love to have it only
>>>> return the line (or two) around the search term.
>>> 
>>> Solr can generate Google-like snippets as you describe.
>>> http://wiki.apache.org/solr/HighlightingParameters
>> 
>> Here's how I commit my documents:
>> 
>> J=0;
>> for i in `find . -name \*.txt`; do
>>       (( J++ ))
>>       curl "http://localhost:8983/solr/update/extract?literal.id=doc$J"
>> -F "myfile=@$i";
>> done;
>> 
>> echo "------------- Committing"
>> curl "http://localhost:8983/solr/update/extract?commit=true"
>> 
>> 
>> Then, I try to query using
>> http://localhost:8983/solr/select?rows=10&start=0&fl=*,score&hl=true&q=testing
>> but I only get back the document ID rather than the snippet:
>> 
>> <doc>
>> <float name="score">0.05030759</float>
>> <arr name="content_type">
>> <str>text/plain</str>
>> </arr>
>> <str name="id">doc16</str>
>> </doc>
>> 
>> I'm using the schema.xml from the "lucid imagination: Indexing text and
>> html files" tutorial.
>> 
>> 
>> 
>> -Pete
>>

Re: Very basic questions: Indexing text

Posted by Erick Erickson <er...@gmail.com>.

try adding &hl.fl=text
to specify your highlight field. I don't understand why you're only
getting the ID field back though. Do note that the highlighting
is after the docs, related by the ID.

Try a (non highlighting) query of just * to verify that you're
pointing at the index you think you are. It's possible that
you've modified a different index with SolrJ than your web
server is pointing at.

Also, SOLR has no way of knowing you're modified your index
with SolrJ, so it may not be automatically reopening an
IndexReader so your recent changes may not be visible
until you force the SOLR reader to reopen.

HTH
Erick

On Mon, Jun 28, 2010 at 6:49 PM, Peter Spam <ps...@mac.com> wrote:

> On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote:
>
> >> 1) I can get my docs in the index, but when I search, it
> >> returns the entire document.  I'd love to have it only
> >> return the line (or two) around the search term.
> >
> > Solr can generate Google-like snippets as you describe.
> > http://wiki.apache.org/solr/HighlightingParameters
>
> Here's how I commit my documents:
>
> J=0;
> for i in `find . -name \*.txt`; do
>        (( J++ ))
>        curl "http://localhost:8983/solr/update/extract?literal.id=doc$J"
> -F "myfile=@$i";
> done;
>
> echo "------------- Committing"
> curl "http://localhost:8983/solr/update/extract?commit=true"
>
>
> Then, I try to query using
> http://localhost:8983/solr/select?rows=10&start=0&fl=*,score&hl=true&q=testing
> but I only get back the document ID rather than the snippet:
>
> <doc>
> <float name="score">0.05030759</float>
> <arr name="content_type">
> <str>text/plain</str>
> </arr>
> <str name="id">doc16</str>
> </doc>
>
>  I'm using the schema.xml from the "lucid imagination: Indexing text and
> html files" tutorial.
>
>
>
> -Pete
>

Re: Very basic questions: Indexing text

Posted by Peter Spam <ps...@mac.com>.

On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote:

>> 1) I can get my docs in the index, but when I search, it
>> returns the entire document.  I'd love to have it only
>> return the line (or two) around the search term.
> 
> Solr can generate Google-like snippets as you describe. 
> http://wiki.apache.org/solr/HighlightingParameters

Here's how I commit my documents:

J=0;
for i in `find . -name \*.txt`; do
        (( J++ ))
        curl "http://localhost:8983/solr/update/extract?literal.id=doc$J" -F "myfile=@$i";
done;

echo "------------- Committing"
curl "http://localhost:8983/solr/update/extract?commit=true"

Then, I try to query using http://localhost:8983/solr/select?rows=10&start=0&fl=*,score&hl=true&q=testing
but I only get back the document ID rather than the snippet:

<doc>
<float name="score">0.05030759</float>
<arr name="content_type">
<str>text/plain</str>
</arr>
<str name="id">doc16</str>
</doc>

 I'm using the schema.xml from the "lucid imagination: Indexing text and html files" tutorial.

-Pete

Re: Very basic questions: Indexing text

Posted by Ahmet Arslan <io...@yahoo.com>.

> 1) I can get my docs in the index, but when I search, it
> returns the entire document.  I'd love to have it only
> return the line (or two) around the search term.

Solr can generate Google-like snippets as you describe. 
http://wiki.apache.org/solr/HighlightingParameters

> 2) There are one or two fields at the beginning of the file
> that I would like to search on, so these should be indexed
> differently, right?

Probably yes. 
 
> 3) Is there a nice front-end example anywhere? 
> Something that would return results kind of like Google?

http://wiki.apache.org/solr/PublicServers
http://search-lucene.com/