You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by "Evert R." <ev...@gmail.com> on 2015/12/15 12:25:24 UTC

Solr Basic Configuration - Highlight - Begginer

Hi there!

It´s my first installation, not sure if here is the right channel...

Here is my steps:

1. Set up a basic install of solr 5.4.0

2. Create a new core through command line (bin/solr create -c test)

3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)

4. Query over the browser and it brings the correct search, but it does not
show the part of the text I am querying, the highlight.

  I have already flagled the 'hl' option. But still it does not word...

Exemple: I am looking for the word 'peace' in my pdf file (book) I have 4
matches for this word, it shows me the book name (pdf file) but does not
bring which part of the text it has the word peace on it.


I am problably missing some configuration in schema.xml, which is missing
from my folder.... /solr/server/solr/test/conf/

Or even the solrconfig.xml...

I have read a bunch of things about highlight check these files, copied the
standard schema.xml to my core/conf folder, but still it does not bring the
highlight.


Attached a copy of my solrconfig.xml file.


I am very sorry for this, probably, dumb and too basic question... First
time I see solr in live.


Any help will be appreciated.



Best regards,



*Evert Ramos*
*evert.ramos@gmail.com <ev...@gmail.com>*

Re: Solr Basic Configuration - Highlight - Begginer

Posted by "Evert R." <ev...@gmail.com>.

Hi Erick,

I think you are right!

When I use the form 'features:accents' in my case 'content:nietava', it
show as if there was not matching words... but if I take the field off
having only the 'q=searchword' (q=nietava) it brings the pdf content file,
as below (in XML out type):

#partial snip:
<arr name="content">
<str>
Microsoft Word - André Luiz - Sexo e Destino _Chico e Waldo_.doc Francisco
Cândido Xavier e Waldo Vieira Sexo e Destino 12o livro da Coleção “A Vida
no Mundo Espiritual” Ditado pelo Espírito André Luiz FEDERAÇÃO ESPÍRITA
BRASILEIRA DEPARTAMENTO EDITORIAL Rua Souza Valente, 17 20941-040 - Rio -
RJ - Brasil http://www.febnet.org.br/ Francisco Cândido Xavier - Sexo e
Destino - pelo Espírito André Luiz 2 Coleção “A Vida no Mundo Espiritual”
01 - Nosso Lar 02 - Os Mensageiros 03 - Missionários da Luz 04 - Obreiros
da Vida Eterna 05 - No Mundo Maior 06 - Libertação 07 - Entre a Terra e o
Céu 08 - Nos Domínios da Mediunidade 09 - Ação e Reação 10 - Evolução em
Dois Mundos 11 - Mecanismos da Mediunidade 12 - Sexo e Destino 13 - E a
Vida Continua... Francisco Cândid

So, using:

1. q=content:nietava&hl=true&hl.fl=content  -> results:

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">3</int>
<lst name="params">
<str name="q">content:nietava</str>
<str name="hl">true</str>
<str name="hl.fl">content</str>
</lst>
</lst>
<result name="response" numFound="0" start="0"/>
<lst name="highlighting"/>
</response>

2.q=nietava&hl=true&hl.fl=content  -> results:

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">93</int>
<lst name="params">
<str name="q">nietava</str>
<str name="hl">true</str>
<str name="hl.fl">content</str>
</lst>
</lst>
<result name="response" numFound="1" start="0">
<doc>
<str name="id">pdf1</str>
<date name="last_modified">2011-07-28T20:39:26Z</date>
<arr name="title">
<str>
Microsoft Word - André Luiz - Sexo e Destino _Chico e Waldo_.doc
</str>
</arr>
<arr name="content_type">
<str>application/pdf</str>
</arr>
<str name="author">Wander</str>
<str name="author_s">Wander</str>
<arr name="content">
<str>
Microsoft Word - André Luiz - Sexo e Destino _Chico e Waldo_.doc Francisco
Cândido Xavier e Waldo Vieira Sexo e Destino 12o livro da Coleção “A Vida
no Mundo Espiritual” Ditado pelo Espírito André Luiz FEDERAÇÃO ESPÍRITA
BRASILEIRA DEPARTAMENTO EDITORIAL Rua Souza Valente, 17 20941-040 - Rio -
RJ - Brasil http://www.febnet.org.br/ Francisco Cândido Xavier - Sexo e
Destino - pelo Espírito André Luiz 2 Coleção “A Vida no Mundo Espiritual”
01 - Nosso Lar 02 - Os Mensageiros 03 - Missionários da Luz 04 - Obreiros
da Vida Eterna 05 - No Mundo Maior 06 - Libertação 07 - Entre a Terra e o
Céu 08 - Nos Domínios da Mediunidade 09 - Ação e Reação 10 - Evolução em
Dois Mundos 11 - Mecanismos da Mediunidade 12 - Sexo e Destino 13 - E a
Vida Continua... Francisco Cândido Xavier - ...........(long text...
including the word 'nietava'
                  </str>
</arr>
<long name="_version_">1520731379641352192</long>
</doc>
</result>
<lst name="highlighting">
<lst name="pdf1"/>
</lst>
</response>

.... =(

Thanks!


*Evert*

2015-12-16 15:17 GMT-02:00 Erick Erickson <er...@gmail.com>:

> Ok, you're getting confused by all the options, an easy thing to do.
> You're trying to do too many things at once without making sure
> the basics work....
>
> 1> Forget all about the f.content.hl.... stuff. That's there in case
> you want to specify different parameters for different fields in the same
> highlight request. That's an advanced option for later....
>
> 2> start with the basic techproducts example. Then this should show
> you hightlights:
> q=features:accents&hl=true&hl.fl=features
>
> That's about as basic as you get. It's searching for "accents" in the
> features field and returning highlights on the features field.
>
> Once that's working, _then_ refine.
>
> Best,
> Erick
>
> On Wed, Dec 16, 2015 at 8:21 AM, Evert R. <ev...@gmail.com> wrote:
> > Hi Andrea,
> >
> > ok, let´s do it:
> >
> > 1. it does has the 'nietava' term, so it brings the only book (pdf file)
> > has this word, and all its content as my previous message to Erick, so
> the
> > content field is there.
> >
> > 2. using content:nietava it does not show any result.... as below:
> >
> > { "responseHeader": { "status": 400, "QTime": 12, "params": { "q":
> > "contents:nietava", "indent": "true", "fl": "id", "wt": "json", "_":
> > "1450282631352" } }, "error": { "msg": "undefined field contents",
> "code":
> > 400 } }
> >
> > 3. Here is what I found when grepping 'content' from the techproducts
> conf
> > folder:
> >
> > schema.xml: <field name="content_type" type="string" indexed="true"
> > stored="true" multiValued="true"/> schema.xml: <field name="content"
> > type="text_general" indexed="false" stored="true" multiValued="true"/>
> > schema.xml: <copyField source="content" dest="text"/> schema.xml:
> > <copyField source="content_type" dest="text"/> solrconfig.xml: <str
> > name="facet.field">content_type</str> solrconfig.xml: <str
> > name="hl.fl">content features title name</str> solrconfig.xml: <str
> > name="f.content.hl.snippets">3</str> solrconfig.xml: <str
> > name="f.content.hl.fragsize">200</str> solrconfig.xml: <str
> > name="f.content.hl.alternateField">content</str> solrconfig.xml: <str
> > name="f.content.hl.maxAlternateFieldLength">750</str> solrconfig.xml:
> <str
> > name="stream.contentType">application/json</str> solrconfig.xml: <str
> > name="stream.contentType">application/csv</str> solrconfig.xml: <str
> > name="content-type">text/plain; charset=UTF-8</str>
> >
> > and the grep on 'content_type':
> >
> > schema.xml:   <field name="content_type" type="string" indexed="true"
> > stored="true" multiValued="true"/>
> > schema.xml:   <copyField source="content_type" dest="text"/>
> > solrconfig.xml:       <str name="facet.field">content_type</str>
> >
> > =)
> >
> > Thanks for checking out.
> >
> >
> >
> > *Evert *
> >
> > 2015-12-16 12:59 GMT-02:00 Andrea Gazzarini <a....@gmail.com>:
> >
> >> hl=f.content.hl.content (I guess) is definitely wrong. Some questions:
> >>
> >>    - First, sorry, the obvious question: are you sure the documents
> contain
> >>    the "nietava" term?
> >>    - Could you try to use q=content:nietaval?
> >>    - Could you paste the definition (field & fieldtype) of the content
> >>    field?
> >>
> >> > Should I have this configuration in the XML file?
> >>
> >> You could, but it's up to you and it strongly depends on your context.
> The
> >> simple thing is that if you have those parameters within the
> configuration
> >> you can avoid to pass them (as part of the requests), but probably in
> this
> >> phase, where you are testing, it's better to have them there (in the
> >> request).
> >>
> >> Andrea
> >>
> >> 2015-12-16 15:28 GMT+01:00 Evert R. <ev...@gmail.com>:
> >>
> >> > Hi Andrea,
> >> >
> >> > Thanks for the reply!
> >> >
> >> > I tried with the hl.fl parameter as well, using as below:
> >> >
> >> >
> >> >
> >>
> http://localhost:8983/solr/techproducts/select?q=nietava&fl=id%2C+content&wt=json&indent=true&hl=true&
> >> >
> >> >
> >>
> hl.fl=f.content.hl.content%3D4&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
> >> >
> >> > with the parameter under the hl field in the solr ui:
> >> >
> >> > 1. f.content.hl.snnipets=2
> >> > 2. f.content.hl.content=4
> >> > 3. content
> >> >
> >> > with no success...
> >> >
> >> > Should I have this configuration in the XML file?
> >> >
> >> > Regards,
> >> >
> >> > *Evert *
> >> >
> >> > 2015-12-16 11:23 GMT-02:00 Andrea Gazzarini <a....@gmail.com>:
> >> >
> >> > > Hi Evert,
> >> > > what is the configuration of the default request handler? Did you
> set
> >> the
> >> > > hl.fl parameter?
> >> > >
> >> > > Please check here [1] the parameters that the highlighting component
> >> > > expects. Required parameters should be in the query string or
> declared
> >> > > within the request handler which answers to your query.
> >> > >
> >> > > Andrea
> >> > >
> >> > > [1] https://wiki.apache.org/solr/HighlightingParameters
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > 2015-12-16 12:51 GMT+01:00 Evert R. <ev...@gmail.com>:
> >> > >
> >> > > > Hi everyone!
> >> > > >
> >> > > > I think I should not have posted my server name... never had that
> >> many
> >> > > > access attempts...
> >> > > >
> >> > > >
> >> > > >
> >> > > > 2015-12-16 9:03 GMT-02:00 Evert R. <ev...@gmail.com>:
> >> > > >
> >> > > > > Hello Erick,
> >> > > > >
> >> > > > > Thanks again for your time.
> >> > > > >
> >> > > > > Here is as far as I have gone:
> >> > > > >
> >> > > > > 1. I started a fresh install and did the following:
> >> > > > >
> >> > > > > [evert@nix]$ bin/solr start -e techproducts
> >> > > > > [evert@nix]$ curl '
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true
> >> > > > '
> >> > > > > -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"
> >> > > > >
> >> > > > > 2. I am using only the Solr Admin UI to check the query respond,
> >> here
> >> > > is
> >> > > > > an example:
> >> > > > >
> >> > > > > Query: http://
> >> > > > > localhost
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> :8983/solr/techproducts/select?q=nietava&fl=id%2C+author%2C+content&wt=json&indent=true&hl=true&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
> >> > > > >
> >> > > > > Result: {
> >> > > > >   "responseHeader": {
> >> > > > >     "status": 0,
> >> > > > >     "QTime": 14,
> >> > > > >     "params": {
> >> > > > >       "q": "nietava",
> >> > > > >       "hl": "true",
> >> > > > >       "hl.simple.post": "</em>",
> >> > > > >       "indent": "true",
> >> > > > >       "fl": "id, author, content",
> >> > > > >       "wt": "json",
> >> > > > >       "hl.simple.pre": "<em>",
> >> > > > >       "_": "1450262674102"
> >> > > > >     }
> >> > > > >   },
> >> > > > >   "response": {
> >> > > > >     "numFound": 1,
> >> > > > >     "start": 0,
> >> > > > >     "docs": [
> >> > > > >       {
> >> > > > >         "id": "pdf1",
> >> > > > >         "author": "Wander",
> >> > > > >         "content": [
> >> > > > >           "André Luiz - Sexo e Destino _Chico e Waldo_.doc \n \n
> >> > \n
> >> > > > > Francisco Cândido Xavier \ne \n \n Waldo Vieira \n \n \n \n \n
> >> Sexo e
> >> > > > > Destino \n \n \n \n 12o livro da Coleção \n“A Vida no Mundo
> >> > Espiritual”
> >> > > > \n
> >> > > > > \n  \n \n \n \n Ditado pelo Espírito \nAndré Luiz \n \n  \n \n
> \n
> >> \n
> >> > \n
> >> > > > \n
> >> > > > > \n FEDERAÇÃO ESPÍRITA BRASILEIRA \nDEPARTAMENTO EDITORIAL \n \n
> Rua
> >> > > Souza
> >> > > > > Valente, 17 \n20941-040 - Rio - RJ - Brasil \n \n  \nhttp://
> >> > > > > www.febnet.org.br/  \n  \n \n   \n Francisco Cândido Xavier -
> >> Sexo e
> >> > > > > Destino - pelo Espírito André Luiz \n \n  \n2 \n \n  \n \n \n \n
> >> > > Coleção
> >> > > > > \n“A Vida no Mundo Espiritual” \n"
> >> > > > >         ]
> >> > > > >       }
> >> > > > >     ]
> >> > > > >   },
> >> > > > >   "highlighting": {
> >> > > > >     "pdf1": {}
> >> > > > >   }
> >> > > > > }
> >> > > > >
> >> > > > > **On the content it brings the whole pdf content (book), and
> notice
> >> > > that
> >> > > > > in the highlight it shows empty.
> >> > > > >
> >> > > > > I tried creating a new core with bin/solr create -c test, using
> the
> >> > > > > schema.xml and solrconfig.xml standard found in
> >> > > > > /solr/server/solr/configsets/basic_configs/conf
> >> > > > >
> >> > > > > But even though... not working as expected (I think).
> >> > > > >
> >> > > > >
> >> > > > > Would you know how to set this techproducts example to bring the
> >> > > snnipets
> >> > > > > of text?
> >> > > > >
> >> > > > > The server only allows specific ip address for this port, if you
> >> > > would, I
> >> > > > > could get it open for you to check.
> >> > > > >
> >> > > > >
> >> > > > > Thanks again and best regards!
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > *Evert
> >> > > > >
> >> > > > >
> >> > > > > 2015-12-15 18:14 GMT-02:00 Erick Erickson <
> erickerickson@gmail.com
> >> >:
> >> > > > >
> >> > > > >> No, that's not what I meant. The highlight component adds a
> >> special
> >> > > > >> section to the return packet that will contain "snippets" of
> text
> >> > with
> >> > > > >> highlights. You control how big those snippets are via various
> >> > > > >> parameters in the highlight component and they'll have the tags
> >> you
> >> > > > >> specify for highlighting.
> >> > > > >>
> >> > > > >> Your app needs to pull the information from the highlight
> portion
> >> of
> >> > > > >> the response packet rather than the document list. Just execute
> >> your
> >> > > > >> queries via cURL or a browser to see the structure of a
> response
> >> to
> >> > > > >> see what I mean.
> >> > > > >>
> >> > > > >> And note that you do _not_ need to return the fields you're
> >> > > > >> highlighting in the "fl" list so you do _not_ need to return
> the
> >> > > > >> entire document contents.
> >> > > > >>
> >> > > > >> What are you using to display the results anyway?
> >> > > > >>
> >> > > > >> Best,
> >> > > > >> Erick
> >> > > > >>
> >> > > > >> On Tue, Dec 15, 2015 at 10:02 AM, Evert R. <
> evert.ramos@gmail.com
> >> >
> >> > > > wrote:
> >> > > > >> > Hi Erick,
> >> > > > >> >
> >> > > > >> > Thank you very much for the reply!!
> >> > > > >> >
> >> > > > >> > I do get back the full text, autor, and a whole lots of stuff
> >> > which
> >> > > > >> doesn´t
> >> > > > >> > really matter for my project.
> >> > > > >> >
> >> > > > >> > So, what you are saying is that the solr gets me back the
> full
> >> > > content
> >> > > > >> and
> >> > > > >> > my application will fix the rest? Which means for me that
> all my
> >> > > books
> >> > > > >> (pdf
> >> > > > >> > files) when searching for an specific word it will bring me
> the
> >> > > whole
> >> > > > >> book
> >> > > > >> > content that has the requested query. And my application
> (php)
> >> in
> >> > > this
> >> > > > >> > case... will take care of show only part of the text (such
> as in
> >> > > > >> highlight,
> >> > > > >> > as I was understandind) and hightlight the key word I was
> >> looking
> >> > > for?
> >> > > > >> >
> >> > > > >> > If so, Erick, you gave me a big help clearing out... I
> thought I
> >> > > would
> >> > > > >> do
> >> > > > >> > that with Solr in an easy way. =)
> >> > > > >> >
> >> > > > >> > Thanks for the attachements tip!
> >> > > > >> >
> >> > > > >> > Best regards,
> >> > > > >> >
> >> > > > >> > Evert
> >> > > > >> >
> >> > > > >> > 2015-12-15 14:56 GMT-02:00 Erick Erickson <
> >> > erickerickson@gmail.com
> >> > > >:
> >> > > > >> >
> >> > > > >> >> How are you trying to display the results? Highlighting is a
> >> bit
> >> > of
> >> > > > an
> >> > > > >> >> odd beast. Assuming it's correctly configured, the response
> >> > packet
> >> > > > >> >> will have a separate highlight section, it's the
> application's
> >> > > > >> >> responsibility to present that pleasingly.
> >> > > > >> >>
> >> > > > >> >> What _do_ you get bak in the response?
> >> > > > >> >>
> >> > > > >> >> BTW, the mail sever pretty aggressively strips attachments,
> >> > your's
> >> > > > >> >> didn't come through.
> >> > > > >> >>
> >> > > > >> >> Best,
> >> > > > >> >> Erick
> >> > > > >> >>
> >> > > > >> >> On Tue, Dec 15, 2015 at 3:25 AM, Evert R. <
> >> evert.ramos@gmail.com
> >> > >
> >> > > > >> wrote:
> >> > > > >> >> > Hi there!
> >> > > > >> >> >
> >> > > > >> >> > It´s my first installation, not sure if here is the right
> >> > > > channel...
> >> > > > >> >> >
> >> > > > >> >> > Here is my steps:
> >> > > > >> >> >
> >> > > > >> >> > 1. Set up a basic install of solr 5.4.0
> >> > > > >> >> >
> >> > > > >> >> > 2. Create a new core through command line (bin/solr
> create -c
> >> > > test)
> >> > > > >> >> >
> >> > > > >> >> > 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test
> >> > > /docs/test/)
> >> > > > >> >> >
> >> > > > >> >> > 4. Query over the browser and it brings the correct
> search,
> >> but
> >> > > it
> >> > > > >> does
> >> > > > >> >> not
> >> > > > >> >> > show the part of the text I am querying, the highlight.
> >> > > > >> >> >
> >> > > > >> >> >   I have already flagled the 'hl' option. But still it
> does
> >> not
> >> > > > >> word...
> >> > > > >> >> >
> >> > > > >> >> > Exemple: I am looking for the word 'peace' in my pdf file
> >> > (book)
> >> > > I
> >> > > > >> have 4
> >> > > > >> >> > matches for this word, it shows me the book name (pdf
> file)
> >> but
> >> > > > does
> >> > > > >> not
> >> > > > >> >> > bring which part of the text it has the word peace on it.
> >> > > > >> >> >
> >> > > > >> >> >
> >> > > > >> >> > I am problably missing some configuration in schema.xml,
> >> which
> >> > is
> >> > > > >> missing
> >> > > > >> >> > from my folder.... /solr/server/solr/test/conf/
> >> > > > >> >> >
> >> > > > >> >> > Or even the solrconfig.xml...
> >> > > > >> >> >
> >> > > > >> >> > I have read a bunch of things about highlight check these
> >> > files,
> >> > > > >> copied
> >> > > > >> >> the
> >> > > > >> >> > standard schema.xml to my core/conf folder, but still it
> does
> >> > not
> >> > > > >> bring
> >> > > > >> >> the
> >> > > > >> >> > highlight.
> >> > > > >> >> >
> >> > > > >> >> >
> >> > > > >> >> > Attached a copy of my solrconfig.xml file.
> >> > > > >> >> >
> >> > > > >> >> >
> >> > > > >> >> > I am very sorry for this, probably, dumb and too basic
> >> > > question...
> >> > > > >> First
> >> > > > >> >> > time I see solr in live.
> >> > > > >> >> >
> >> > > > >> >> >
> >> > > > >> >> > Any help will be appreciated.
> >> > > > >> >> >
> >> > > > >> >> >
> >> > > > >> >> >
> >> > > > >> >> > Best regards,
> >> > > > >> >> >
> >> > > > >> >> >
> >> > > > >> >> > Evert Ramos
> >> > > > >> >> >
> >> > > > >> >> > evert.ramos@gmail.com
> >> > > > >> >> >
> >> > > > >> >>
> >> > > > >>
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
>

Re: Solr Basic Configuration - Highlight - Begginer

Posted by Erick Erickson <er...@gmail.com>.

Ok, you're getting confused by all the options, an easy thing to do.
You're trying to do too many things at once without making sure
the basics work....

1> Forget all about the f.content.hl.... stuff. That's there in case
you want to specify different parameters for different fields in the same
highlight request. That's an advanced option for later....

2> start with the basic techproducts example. Then this should show
you hightlights:
q=features:accents&hl=true&hl.fl=features

That's about as basic as you get. It's searching for "accents" in the
features field and returning highlights on the features field.

Once that's working, _then_ refine.

Best,
Erick

On Wed, Dec 16, 2015 at 8:21 AM, Evert R. <ev...@gmail.com> wrote:
> Hi Andrea,
>
> ok, let´s do it:
>
> 1. it does has the 'nietava' term, so it brings the only book (pdf file)
> has this word, and all its content as my previous message to Erick, so the
> content field is there.
>
> 2. using content:nietava it does not show any result.... as below:
>
> { "responseHeader": { "status": 400, "QTime": 12, "params": { "q":
> "contents:nietava", "indent": "true", "fl": "id", "wt": "json", "_":
> "1450282631352" } }, "error": { "msg": "undefined field contents", "code":
> 400 } }
>
> 3. Here is what I found when grepping 'content' from the techproducts conf
> folder:
>
> schema.xml: <field name="content_type" type="string" indexed="true"
> stored="true" multiValued="true"/> schema.xml: <field name="content"
> type="text_general" indexed="false" stored="true" multiValued="true"/>
> schema.xml: <copyField source="content" dest="text"/> schema.xml:
> <copyField source="content_type" dest="text"/> solrconfig.xml: <str
> name="facet.field">content_type</str> solrconfig.xml: <str
> name="hl.fl">content features title name</str> solrconfig.xml: <str
> name="f.content.hl.snippets">3</str> solrconfig.xml: <str
> name="f.content.hl.fragsize">200</str> solrconfig.xml: <str
> name="f.content.hl.alternateField">content</str> solrconfig.xml: <str
> name="f.content.hl.maxAlternateFieldLength">750</str> solrconfig.xml: <str
> name="stream.contentType">application/json</str> solrconfig.xml: <str
> name="stream.contentType">application/csv</str> solrconfig.xml: <str
> name="content-type">text/plain; charset=UTF-8</str>
>
> and the grep on 'content_type':
>
> schema.xml:   <field name="content_type" type="string" indexed="true"
> stored="true" multiValued="true"/>
> schema.xml:   <copyField source="content_type" dest="text"/>
> solrconfig.xml:       <str name="facet.field">content_type</str>
>
> =)
>
> Thanks for checking out.
>
>
>
> *Evert *
>
> 2015-12-16 12:59 GMT-02:00 Andrea Gazzarini <a....@gmail.com>:
>
>> hl=f.content.hl.content (I guess) is definitely wrong. Some questions:
>>
>>    - First, sorry, the obvious question: are you sure the documents contain
>>    the "nietava" term?
>>    - Could you try to use q=content:nietaval?
>>    - Could you paste the definition (field & fieldtype) of the content
>>    field?
>>
>> > Should I have this configuration in the XML file?
>>
>> You could, but it's up to you and it strongly depends on your context. The
>> simple thing is that if you have those parameters within the configuration
>> you can avoid to pass them (as part of the requests), but probably in this
>> phase, where you are testing, it's better to have them there (in the
>> request).
>>
>> Andrea
>>
>> 2015-12-16 15:28 GMT+01:00 Evert R. <ev...@gmail.com>:
>>
>> > Hi Andrea,
>> >
>> > Thanks for the reply!
>> >
>> > I tried with the hl.fl parameter as well, using as below:
>> >
>> >
>> >
>> http://localhost:8983/solr/techproducts/select?q=nietava&fl=id%2C+content&wt=json&indent=true&hl=true&
>> >
>> >
>> hl.fl=f.content.hl.content%3D4&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
>> >
>> > with the parameter under the hl field in the solr ui:
>> >
>> > 1. f.content.hl.snnipets=2
>> > 2. f.content.hl.content=4
>> > 3. content
>> >
>> > with no success...
>> >
>> > Should I have this configuration in the XML file?
>> >
>> > Regards,
>> >
>> > *Evert *
>> >
>> > 2015-12-16 11:23 GMT-02:00 Andrea Gazzarini <a....@gmail.com>:
>> >
>> > > Hi Evert,
>> > > what is the configuration of the default request handler? Did you set
>> the
>> > > hl.fl parameter?
>> > >
>> > > Please check here [1] the parameters that the highlighting component
>> > > expects. Required parameters should be in the query string or declared
>> > > within the request handler which answers to your query.
>> > >
>> > > Andrea
>> > >
>> > > [1] https://wiki.apache.org/solr/HighlightingParameters
>> > >
>> > >
>> > >
>> > >
>> > > 2015-12-16 12:51 GMT+01:00 Evert R. <ev...@gmail.com>:
>> > >
>> > > > Hi everyone!
>> > > >
>> > > > I think I should not have posted my server name... never had that
>> many
>> > > > access attempts...
>> > > >
>> > > >
>> > > >
>> > > > 2015-12-16 9:03 GMT-02:00 Evert R. <ev...@gmail.com>:
>> > > >
>> > > > > Hello Erick,
>> > > > >
>> > > > > Thanks again for your time.
>> > > > >
>> > > > > Here is as far as I have gone:
>> > > > >
>> > > > > 1. I started a fresh install and did the following:
>> > > > >
>> > > > > [evert@nix]$ bin/solr start -e techproducts
>> > > > > [evert@nix]$ curl '
>> > > > >
>> > > >
>> > >
>> >
>> http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true
>> > > > '
>> > > > > -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"
>> > > > >
>> > > > > 2. I am using only the Solr Admin UI to check the query respond,
>> here
>> > > is
>> > > > > an example:
>> > > > >
>> > > > > Query: http://
>> > > > > localhost
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> :8983/solr/techproducts/select?q=nietava&fl=id%2C+author%2C+content&wt=json&indent=true&hl=true&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
>> > > > >
>> > > > > Result: {
>> > > > >   "responseHeader": {
>> > > > >     "status": 0,
>> > > > >     "QTime": 14,
>> > > > >     "params": {
>> > > > >       "q": "nietava",
>> > > > >       "hl": "true",
>> > > > >       "hl.simple.post": "</em>",
>> > > > >       "indent": "true",
>> > > > >       "fl": "id, author, content",
>> > > > >       "wt": "json",
>> > > > >       "hl.simple.pre": "<em>",
>> > > > >       "_": "1450262674102"
>> > > > >     }
>> > > > >   },
>> > > > >   "response": {
>> > > > >     "numFound": 1,
>> > > > >     "start": 0,
>> > > > >     "docs": [
>> > > > >       {
>> > > > >         "id": "pdf1",
>> > > > >         "author": "Wander",
>> > > > >         "content": [
>> > > > >           "André Luiz - Sexo e Destino _Chico e Waldo_.doc \n \n
>> > \n
>> > > > > Francisco Cândido Xavier \ne \n \n Waldo Vieira \n \n \n \n \n
>> Sexo e
>> > > > > Destino \n \n \n \n 12o livro da Coleção \n“A Vida no Mundo
>> > Espiritual”
>> > > > \n
>> > > > > \n  \n \n \n \n Ditado pelo Espírito \nAndré Luiz \n \n  \n \n \n
>> \n
>> > \n
>> > > > \n
>> > > > > \n FEDERAÇÃO ESPÍRITA BRASILEIRA \nDEPARTAMENTO EDITORIAL \n \n Rua
>> > > Souza
>> > > > > Valente, 17 \n20941-040 - Rio - RJ - Brasil \n \n  \nhttp://
>> > > > > www.febnet.org.br/  \n  \n \n   \n Francisco Cândido Xavier -
>> Sexo e
>> > > > > Destino - pelo Espírito André Luiz \n \n  \n2 \n \n  \n \n \n \n
>> > > Coleção
>> > > > > \n“A Vida no Mundo Espiritual” \n"
>> > > > >         ]
>> > > > >       }
>> > > > >     ]
>> > > > >   },
>> > > > >   "highlighting": {
>> > > > >     "pdf1": {}
>> > > > >   }
>> > > > > }
>> > > > >
>> > > > > **On the content it brings the whole pdf content (book), and notice
>> > > that
>> > > > > in the highlight it shows empty.
>> > > > >
>> > > > > I tried creating a new core with bin/solr create -c test, using the
>> > > > > schema.xml and solrconfig.xml standard found in
>> > > > > /solr/server/solr/configsets/basic_configs/conf
>> > > > >
>> > > > > But even though... not working as expected (I think).
>> > > > >
>> > > > >
>> > > > > Would you know how to set this techproducts example to bring the
>> > > snnipets
>> > > > > of text?
>> > > > >
>> > > > > The server only allows specific ip address for this port, if you
>> > > would, I
>> > > > > could get it open for you to check.
>> > > > >
>> > > > >
>> > > > > Thanks again and best regards!
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > *Evert
>> > > > >
>> > > > >
>> > > > > 2015-12-15 18:14 GMT-02:00 Erick Erickson <erickerickson@gmail.com
>> >:
>> > > > >
>> > > > >> No, that's not what I meant. The highlight component adds a
>> special
>> > > > >> section to the return packet that will contain "snippets" of text
>> > with
>> > > > >> highlights. You control how big those snippets are via various
>> > > > >> parameters in the highlight component and they'll have the tags
>> you
>> > > > >> specify for highlighting.
>> > > > >>
>> > > > >> Your app needs to pull the information from the highlight portion
>> of
>> > > > >> the response packet rather than the document list. Just execute
>> your
>> > > > >> queries via cURL or a browser to see the structure of a response
>> to
>> > > > >> see what I mean.
>> > > > >>
>> > > > >> And note that you do _not_ need to return the fields you're
>> > > > >> highlighting in the "fl" list so you do _not_ need to return the
>> > > > >> entire document contents.
>> > > > >>
>> > > > >> What are you using to display the results anyway?
>> > > > >>
>> > > > >> Best,
>> > > > >> Erick
>> > > > >>
>> > > > >> On Tue, Dec 15, 2015 at 10:02 AM, Evert R. <evert.ramos@gmail.com
>> >
>> > > > wrote:
>> > > > >> > Hi Erick,
>> > > > >> >
>> > > > >> > Thank you very much for the reply!!
>> > > > >> >
>> > > > >> > I do get back the full text, autor, and a whole lots of stuff
>> > which
>> > > > >> doesn´t
>> > > > >> > really matter for my project.
>> > > > >> >
>> > > > >> > So, what you are saying is that the solr gets me back the full
>> > > content
>> > > > >> and
>> > > > >> > my application will fix the rest? Which means for me that all my
>> > > books
>> > > > >> (pdf
>> > > > >> > files) when searching for an specific word it will bring me the
>> > > whole
>> > > > >> book
>> > > > >> > content that has the requested query. And my application (php)
>> in
>> > > this
>> > > > >> > case... will take care of show only part of the text (such as in
>> > > > >> highlight,
>> > > > >> > as I was understandind) and hightlight the key word I was
>> looking
>> > > for?
>> > > > >> >
>> > > > >> > If so, Erick, you gave me a big help clearing out... I thought I
>> > > would
>> > > > >> do
>> > > > >> > that with Solr in an easy way. =)
>> > > > >> >
>> > > > >> > Thanks for the attachements tip!
>> > > > >> >
>> > > > >> > Best regards,
>> > > > >> >
>> > > > >> > Evert
>> > > > >> >
>> > > > >> > 2015-12-15 14:56 GMT-02:00 Erick Erickson <
>> > erickerickson@gmail.com
>> > > >:
>> > > > >> >
>> > > > >> >> How are you trying to display the results? Highlighting is a
>> bit
>> > of
>> > > > an
>> > > > >> >> odd beast. Assuming it's correctly configured, the response
>> > packet
>> > > > >> >> will have a separate highlight section, it's the application's
>> > > > >> >> responsibility to present that pleasingly.
>> > > > >> >>
>> > > > >> >> What _do_ you get bak in the response?
>> > > > >> >>
>> > > > >> >> BTW, the mail sever pretty aggressively strips attachments,
>> > your's
>> > > > >> >> didn't come through.
>> > > > >> >>
>> > > > >> >> Best,
>> > > > >> >> Erick
>> > > > >> >>
>> > > > >> >> On Tue, Dec 15, 2015 at 3:25 AM, Evert R. <
>> evert.ramos@gmail.com
>> > >
>> > > > >> wrote:
>> > > > >> >> > Hi there!
>> > > > >> >> >
>> > > > >> >> > It´s my first installation, not sure if here is the right
>> > > > channel...
>> > > > >> >> >
>> > > > >> >> > Here is my steps:
>> > > > >> >> >
>> > > > >> >> > 1. Set up a basic install of solr 5.4.0
>> > > > >> >> >
>> > > > >> >> > 2. Create a new core through command line (bin/solr create -c
>> > > test)
>> > > > >> >> >
>> > > > >> >> > 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test
>> > > /docs/test/)
>> > > > >> >> >
>> > > > >> >> > 4. Query over the browser and it brings the correct search,
>> but
>> > > it
>> > > > >> does
>> > > > >> >> not
>> > > > >> >> > show the part of the text I am querying, the highlight.
>> > > > >> >> >
>> > > > >> >> >   I have already flagled the 'hl' option. But still it does
>> not
>> > > > >> word...
>> > > > >> >> >
>> > > > >> >> > Exemple: I am looking for the word 'peace' in my pdf file
>> > (book)
>> > > I
>> > > > >> have 4
>> > > > >> >> > matches for this word, it shows me the book name (pdf file)
>> but
>> > > > does
>> > > > >> not
>> > > > >> >> > bring which part of the text it has the word peace on it.
>> > > > >> >> >
>> > > > >> >> >
>> > > > >> >> > I am problably missing some configuration in schema.xml,
>> which
>> > is
>> > > > >> missing
>> > > > >> >> > from my folder.... /solr/server/solr/test/conf/
>> > > > >> >> >
>> > > > >> >> > Or even the solrconfig.xml...
>> > > > >> >> >
>> > > > >> >> > I have read a bunch of things about highlight check these
>> > files,
>> > > > >> copied
>> > > > >> >> the
>> > > > >> >> > standard schema.xml to my core/conf folder, but still it does
>> > not
>> > > > >> bring
>> > > > >> >> the
>> > > > >> >> > highlight.
>> > > > >> >> >
>> > > > >> >> >
>> > > > >> >> > Attached a copy of my solrconfig.xml file.
>> > > > >> >> >
>> > > > >> >> >
>> > > > >> >> > I am very sorry for this, probably, dumb and too basic
>> > > question...
>> > > > >> First
>> > > > >> >> > time I see solr in live.
>> > > > >> >> >
>> > > > >> >> >
>> > > > >> >> > Any help will be appreciated.
>> > > > >> >> >
>> > > > >> >> >
>> > > > >> >> >
>> > > > >> >> > Best regards,
>> > > > >> >> >
>> > > > >> >> >
>> > > > >> >> > Evert Ramos
>> > > > >> >> >
>> > > > >> >> > evert.ramos@gmail.com
>> > > > >> >> >
>> > > > >> >>
>> > > > >>
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>

Re: Solr Basic Configuration - Highlight - Begginer

Posted by "Evert R." <ev...@gmail.com>.

Hi Andrea,

ok, let´s do it:

1. it does has the 'nietava' term, so it brings the only book (pdf file)
has this word, and all its content as my previous message to Erick, so the
content field is there.

2. using content:nietava it does not show any result.... as below:

{ "responseHeader": { "status": 400, "QTime": 12, "params": { "q":
"contents:nietava", "indent": "true", "fl": "id", "wt": "json", "_":
"1450282631352" } }, "error": { "msg": "undefined field contents", "code":
400 } }

3. Here is what I found when grepping 'content' from the techproducts conf
folder:

schema.xml: <field name="content_type" type="string" indexed="true"
stored="true" multiValued="true"/> schema.xml: <field name="content"
type="text_general" indexed="false" stored="true" multiValued="true"/>
schema.xml: <copyField source="content" dest="text"/> schema.xml:
<copyField source="content_type" dest="text"/> solrconfig.xml: <str
name="facet.field">content_type</str> solrconfig.xml: <str
name="hl.fl">content features title name</str> solrconfig.xml: <str
name="f.content.hl.snippets">3</str> solrconfig.xml: <str
name="f.content.hl.fragsize">200</str> solrconfig.xml: <str
name="f.content.hl.alternateField">content</str> solrconfig.xml: <str
name="f.content.hl.maxAlternateFieldLength">750</str> solrconfig.xml: <str
name="stream.contentType">application/json</str> solrconfig.xml: <str
name="stream.contentType">application/csv</str> solrconfig.xml: <str
name="content-type">text/plain; charset=UTF-8</str>

and the grep on 'content_type':

schema.xml:   <field name="content_type" type="string" indexed="true"
stored="true" multiValued="true"/>
schema.xml:   <copyField source="content_type" dest="text"/>
solrconfig.xml:       <str name="facet.field">content_type</str>

=)

Thanks for checking out.



*Evert *

2015-12-16 12:59 GMT-02:00 Andrea Gazzarini <a....@gmail.com>:

> hl=f.content.hl.content (I guess) is definitely wrong. Some questions:
>
>    - First, sorry, the obvious question: are you sure the documents contain
>    the "nietava" term?
>    - Could you try to use q=content:nietaval?
>    - Could you paste the definition (field & fieldtype) of the content
>    field?
>
> > Should I have this configuration in the XML file?
>
> You could, but it's up to you and it strongly depends on your context. The
> simple thing is that if you have those parameters within the configuration
> you can avoid to pass them (as part of the requests), but probably in this
> phase, where you are testing, it's better to have them there (in the
> request).
>
> Andrea
>
> 2015-12-16 15:28 GMT+01:00 Evert R. <ev...@gmail.com>:
>
> > Hi Andrea,
> >
> > Thanks for the reply!
> >
> > I tried with the hl.fl parameter as well, using as below:
> >
> >
> >
> http://localhost:8983/solr/techproducts/select?q=nietava&fl=id%2C+content&wt=json&indent=true&hl=true&
> >
> >
> hl.fl=f.content.hl.content%3D4&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
> >
> > with the parameter under the hl field in the solr ui:
> >
> > 1. f.content.hl.snnipets=2
> > 2. f.content.hl.content=4
> > 3. content
> >
> > with no success...
> >
> > Should I have this configuration in the XML file?
> >
> > Regards,
> >
> > *Evert *
> >
> > 2015-12-16 11:23 GMT-02:00 Andrea Gazzarini <a....@gmail.com>:
> >
> > > Hi Evert,
> > > what is the configuration of the default request handler? Did you set
> the
> > > hl.fl parameter?
> > >
> > > Please check here [1] the parameters that the highlighting component
> > > expects. Required parameters should be in the query string or declared
> > > within the request handler which answers to your query.
> > >
> > > Andrea
> > >
> > > [1] https://wiki.apache.org/solr/HighlightingParameters
> > >
> > >
> > >
> > >
> > > 2015-12-16 12:51 GMT+01:00 Evert R. <ev...@gmail.com>:
> > >
> > > > Hi everyone!
> > > >
> > > > I think I should not have posted my server name... never had that
> many
> > > > access attempts...
> > > >
> > > >
> > > >
> > > > 2015-12-16 9:03 GMT-02:00 Evert R. <ev...@gmail.com>:
> > > >
> > > > > Hello Erick,
> > > > >
> > > > > Thanks again for your time.
> > > > >
> > > > > Here is as far as I have gone:
> > > > >
> > > > > 1. I started a fresh install and did the following:
> > > > >
> > > > > [evert@nix]$ bin/solr start -e techproducts
> > > > > [evert@nix]$ curl '
> > > > >
> > > >
> > >
> >
> http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true
> > > > '
> > > > > -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"
> > > > >
> > > > > 2. I am using only the Solr Admin UI to check the query respond,
> here
> > > is
> > > > > an example:
> > > > >
> > > > > Query: http://
> > > > > localhost
> > > > >
> > > > >
> > > >
> > >
> >
> :8983/solr/techproducts/select?q=nietava&fl=id%2C+author%2C+content&wt=json&indent=true&hl=true&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
> > > > >
> > > > > Result: {
> > > > >   "responseHeader": {
> > > > >     "status": 0,
> > > > >     "QTime": 14,
> > > > >     "params": {
> > > > >       "q": "nietava",
> > > > >       "hl": "true",
> > > > >       "hl.simple.post": "</em>",
> > > > >       "indent": "true",
> > > > >       "fl": "id, author, content",
> > > > >       "wt": "json",
> > > > >       "hl.simple.pre": "<em>",
> > > > >       "_": "1450262674102"
> > > > >     }
> > > > >   },
> > > > >   "response": {
> > > > >     "numFound": 1,
> > > > >     "start": 0,
> > > > >     "docs": [
> > > > >       {
> > > > >         "id": "pdf1",
> > > > >         "author": "Wander",
> > > > >         "content": [
> > > > >           "André Luiz - Sexo e Destino _Chico e Waldo_.doc \n \n
> > \n
> > > > > Francisco Cândido Xavier \ne \n \n Waldo Vieira \n \n \n \n \n
> Sexo e
> > > > > Destino \n \n \n \n 12o livro da Coleção \n“A Vida no Mundo
> > Espiritual”
> > > > \n
> > > > > \n  \n \n \n \n Ditado pelo Espírito \nAndré Luiz \n \n  \n \n \n
> \n
> > \n
> > > > \n
> > > > > \n FEDERAÇÃO ESPÍRITA BRASILEIRA \nDEPARTAMENTO EDITORIAL \n \n Rua
> > > Souza
> > > > > Valente, 17 \n20941-040 - Rio - RJ - Brasil \n \n  \nhttp://
> > > > > www.febnet.org.br/  \n  \n \n   \n Francisco Cândido Xavier -
> Sexo e
> > > > > Destino - pelo Espírito André Luiz \n \n  \n2 \n \n  \n \n \n \n
> > > Coleção
> > > > > \n“A Vida no Mundo Espiritual” \n"
> > > > >         ]
> > > > >       }
> > > > >     ]
> > > > >   },
> > > > >   "highlighting": {
> > > > >     "pdf1": {}
> > > > >   }
> > > > > }
> > > > >
> > > > > **On the content it brings the whole pdf content (book), and notice
> > > that
> > > > > in the highlight it shows empty.
> > > > >
> > > > > I tried creating a new core with bin/solr create -c test, using the
> > > > > schema.xml and solrconfig.xml standard found in
> > > > > /solr/server/solr/configsets/basic_configs/conf
> > > > >
> > > > > But even though... not working as expected (I think).
> > > > >
> > > > >
> > > > > Would you know how to set this techproducts example to bring the
> > > snnipets
> > > > > of text?
> > > > >
> > > > > The server only allows specific ip address for this port, if you
> > > would, I
> > > > > could get it open for you to check.
> > > > >
> > > > >
> > > > > Thanks again and best regards!
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > *Evert
> > > > >
> > > > >
> > > > > 2015-12-15 18:14 GMT-02:00 Erick Erickson <erickerickson@gmail.com
> >:
> > > > >
> > > > >> No, that's not what I meant. The highlight component adds a
> special
> > > > >> section to the return packet that will contain "snippets" of text
> > with
> > > > >> highlights. You control how big those snippets are via various
> > > > >> parameters in the highlight component and they'll have the tags
> you
> > > > >> specify for highlighting.
> > > > >>
> > > > >> Your app needs to pull the information from the highlight portion
> of
> > > > >> the response packet rather than the document list. Just execute
> your
> > > > >> queries via cURL or a browser to see the structure of a response
> to
> > > > >> see what I mean.
> > > > >>
> > > > >> And note that you do _not_ need to return the fields you're
> > > > >> highlighting in the "fl" list so you do _not_ need to return the
> > > > >> entire document contents.
> > > > >>
> > > > >> What are you using to display the results anyway?
> > > > >>
> > > > >> Best,
> > > > >> Erick
> > > > >>
> > > > >> On Tue, Dec 15, 2015 at 10:02 AM, Evert R. <evert.ramos@gmail.com
> >
> > > > wrote:
> > > > >> > Hi Erick,
> > > > >> >
> > > > >> > Thank you very much for the reply!!
> > > > >> >
> > > > >> > I do get back the full text, autor, and a whole lots of stuff
> > which
> > > > >> doesn´t
> > > > >> > really matter for my project.
> > > > >> >
> > > > >> > So, what you are saying is that the solr gets me back the full
> > > content
> > > > >> and
> > > > >> > my application will fix the rest? Which means for me that all my
> > > books
> > > > >> (pdf
> > > > >> > files) when searching for an specific word it will bring me the
> > > whole
> > > > >> book
> > > > >> > content that has the requested query. And my application (php)
> in
> > > this
> > > > >> > case... will take care of show only part of the text (such as in
> > > > >> highlight,
> > > > >> > as I was understandind) and hightlight the key word I was
> looking
> > > for?
> > > > >> >
> > > > >> > If so, Erick, you gave me a big help clearing out... I thought I
> > > would
> > > > >> do
> > > > >> > that with Solr in an easy way. =)
> > > > >> >
> > > > >> > Thanks for the attachements tip!
> > > > >> >
> > > > >> > Best regards,
> > > > >> >
> > > > >> > Evert
> > > > >> >
> > > > >> > 2015-12-15 14:56 GMT-02:00 Erick Erickson <
> > erickerickson@gmail.com
> > > >:
> > > > >> >
> > > > >> >> How are you trying to display the results? Highlighting is a
> bit
> > of
> > > > an
> > > > >> >> odd beast. Assuming it's correctly configured, the response
> > packet
> > > > >> >> will have a separate highlight section, it's the application's
> > > > >> >> responsibility to present that pleasingly.
> > > > >> >>
> > > > >> >> What _do_ you get bak in the response?
> > > > >> >>
> > > > >> >> BTW, the mail sever pretty aggressively strips attachments,
> > your's
> > > > >> >> didn't come through.
> > > > >> >>
> > > > >> >> Best,
> > > > >> >> Erick
> > > > >> >>
> > > > >> >> On Tue, Dec 15, 2015 at 3:25 AM, Evert R. <
> evert.ramos@gmail.com
> > >
> > > > >> wrote:
> > > > >> >> > Hi there!
> > > > >> >> >
> > > > >> >> > It´s my first installation, not sure if here is the right
> > > > channel...
> > > > >> >> >
> > > > >> >> > Here is my steps:
> > > > >> >> >
> > > > >> >> > 1. Set up a basic install of solr 5.4.0
> > > > >> >> >
> > > > >> >> > 2. Create a new core through command line (bin/solr create -c
> > > test)
> > > > >> >> >
> > > > >> >> > 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test
> > > /docs/test/)
> > > > >> >> >
> > > > >> >> > 4. Query over the browser and it brings the correct search,
> but
> > > it
> > > > >> does
> > > > >> >> not
> > > > >> >> > show the part of the text I am querying, the highlight.
> > > > >> >> >
> > > > >> >> >   I have already flagled the 'hl' option. But still it does
> not
> > > > >> word...
> > > > >> >> >
> > > > >> >> > Exemple: I am looking for the word 'peace' in my pdf file
> > (book)
> > > I
> > > > >> have 4
> > > > >> >> > matches for this word, it shows me the book name (pdf file)
> but
> > > > does
> > > > >> not
> > > > >> >> > bring which part of the text it has the word peace on it.
> > > > >> >> >
> > > > >> >> >
> > > > >> >> > I am problably missing some configuration in schema.xml,
> which
> > is
> > > > >> missing
> > > > >> >> > from my folder.... /solr/server/solr/test/conf/
> > > > >> >> >
> > > > >> >> > Or even the solrconfig.xml...
> > > > >> >> >
> > > > >> >> > I have read a bunch of things about highlight check these
> > files,
> > > > >> copied
> > > > >> >> the
> > > > >> >> > standard schema.xml to my core/conf folder, but still it does
> > not
> > > > >> bring
> > > > >> >> the
> > > > >> >> > highlight.
> > > > >> >> >
> > > > >> >> >
> > > > >> >> > Attached a copy of my solrconfig.xml file.
> > > > >> >> >
> > > > >> >> >
> > > > >> >> > I am very sorry for this, probably, dumb and too basic
> > > question...
> > > > >> First
> > > > >> >> > time I see solr in live.
> > > > >> >> >
> > > > >> >> >
> > > > >> >> > Any help will be appreciated.
> > > > >> >> >
> > > > >> >> >
> > > > >> >> >
> > > > >> >> > Best regards,
> > > > >> >> >
> > > > >> >> >
> > > > >> >> > Evert Ramos
> > > > >> >> >
> > > > >> >> > evert.ramos@gmail.com
> > > > >> >> >
> > > > >> >>
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Solr Basic Configuration - Highlight - Begginer

Posted by Andrea Gazzarini <a....@gmail.com>.

hl=f.content.hl.content (I guess) is definitely wrong. Some questions:

   - First, sorry, the obvious question: are you sure the documents contain
   the "nietava" term?
   - Could you try to use q=content:nietaval?
   - Could you paste the definition (field & fieldtype) of the content
   field?

> Should I have this configuration in the XML file?

You could, but it's up to you and it strongly depends on your context. The
simple thing is that if you have those parameters within the configuration
you can avoid to pass them (as part of the requests), but probably in this
phase, where you are testing, it's better to have them there (in the
request).

Andrea

2015-12-16 15:28 GMT+01:00 Evert R. <ev...@gmail.com>:

> Hi Andrea,
>
> Thanks for the reply!
>
> I tried with the hl.fl parameter as well, using as below:
>
>
> http://localhost:8983/solr/techproducts/select?q=nietava&fl=id%2C+content&wt=json&indent=true&hl=true&
>
> hl.fl=f.content.hl.content%3D4&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
>
> with the parameter under the hl field in the solr ui:
>
> 1. f.content.hl.snnipets=2
> 2. f.content.hl.content=4
> 3. content
>
> with no success...
>
> Should I have this configuration in the XML file?
>
> Regards,
>
> *Evert *
>
> 2015-12-16 11:23 GMT-02:00 Andrea Gazzarini <a....@gmail.com>:
>
> > Hi Evert,
> > what is the configuration of the default request handler? Did you set the
> > hl.fl parameter?
> >
> > Please check here [1] the parameters that the highlighting component
> > expects. Required parameters should be in the query string or declared
> > within the request handler which answers to your query.
> >
> > Andrea
> >
> > [1] https://wiki.apache.org/solr/HighlightingParameters
> >
> >
> >
> >
> > 2015-12-16 12:51 GMT+01:00 Evert R. <ev...@gmail.com>:
> >
> > > Hi everyone!
> > >
> > > I think I should not have posted my server name... never had that many
> > > access attempts...
> > >
> > >
> > >
> > > 2015-12-16 9:03 GMT-02:00 Evert R. <ev...@gmail.com>:
> > >
> > > > Hello Erick,
> > > >
> > > > Thanks again for your time.
> > > >
> > > > Here is as far as I have gone:
> > > >
> > > > 1. I started a fresh install and did the following:
> > > >
> > > > [evert@nix]$ bin/solr start -e techproducts
> > > > [evert@nix]$ curl '
> > > >
> > >
> >
> http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true
> > > '
> > > > -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"
> > > >
> > > > 2. I am using only the Solr Admin UI to check the query respond, here
> > is
> > > > an example:
> > > >
> > > > Query: http://
> > > > localhost
> > > >
> > > >
> > >
> >
> :8983/solr/techproducts/select?q=nietava&fl=id%2C+author%2C+content&wt=json&indent=true&hl=true&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
> > > >
> > > > Result: {
> > > >   "responseHeader": {
> > > >     "status": 0,
> > > >     "QTime": 14,
> > > >     "params": {
> > > >       "q": "nietava",
> > > >       "hl": "true",
> > > >       "hl.simple.post": "</em>",
> > > >       "indent": "true",
> > > >       "fl": "id, author, content",
> > > >       "wt": "json",
> > > >       "hl.simple.pre": "<em>",
> > > >       "_": "1450262674102"
> > > >     }
> > > >   },
> > > >   "response": {
> > > >     "numFound": 1,
> > > >     "start": 0,
> > > >     "docs": [
> > > >       {
> > > >         "id": "pdf1",
> > > >         "author": "Wander",
> > > >         "content": [
> > > >           "André Luiz - Sexo e Destino _Chico e Waldo_.doc \n \n
> \n
> > > > Francisco Cândido Xavier \ne \n \n Waldo Vieira \n \n \n \n \n Sexo e
> > > > Destino \n \n \n \n 12o livro da Coleção \n“A Vida no Mundo
> Espiritual”
> > > \n
> > > > \n  \n \n \n \n Ditado pelo Espírito \nAndré Luiz \n \n  \n \n \n \n
> \n
> > > \n
> > > > \n FEDERAÇÃO ESPÍRITA BRASILEIRA \nDEPARTAMENTO EDITORIAL \n \n Rua
> > Souza
> > > > Valente, 17 \n20941-040 - Rio - RJ - Brasil \n \n  \nhttp://
> > > > www.febnet.org.br/  \n  \n \n   \n Francisco Cândido Xavier - Sexo e
> > > > Destino - pelo Espírito André Luiz \n \n  \n2 \n \n  \n \n \n \n
> > Coleção
> > > > \n“A Vida no Mundo Espiritual” \n"
> > > >         ]
> > > >       }
> > > >     ]
> > > >   },
> > > >   "highlighting": {
> > > >     "pdf1": {}
> > > >   }
> > > > }
> > > >
> > > > **On the content it brings the whole pdf content (book), and notice
> > that
> > > > in the highlight it shows empty.
> > > >
> > > > I tried creating a new core with bin/solr create -c test, using the
> > > > schema.xml and solrconfig.xml standard found in
> > > > /solr/server/solr/configsets/basic_configs/conf
> > > >
> > > > But even though... not working as expected (I think).
> > > >
> > > >
> > > > Would you know how to set this techproducts example to bring the
> > snnipets
> > > > of text?
> > > >
> > > > The server only allows specific ip address for this port, if you
> > would, I
> > > > could get it open for you to check.
> > > >
> > > >
> > > > Thanks again and best regards!
> > > >
> > > >
> > > >
> > > >
> > > > *Evert
> > > >
> > > >
> > > > 2015-12-15 18:14 GMT-02:00 Erick Erickson <er...@gmail.com>:
> > > >
> > > >> No, that's not what I meant. The highlight component adds a special
> > > >> section to the return packet that will contain "snippets" of text
> with
> > > >> highlights. You control how big those snippets are via various
> > > >> parameters in the highlight component and they'll have the tags you
> > > >> specify for highlighting.
> > > >>
> > > >> Your app needs to pull the information from the highlight portion of
> > > >> the response packet rather than the document list. Just execute your
> > > >> queries via cURL or a browser to see the structure of a response to
> > > >> see what I mean.
> > > >>
> > > >> And note that you do _not_ need to return the fields you're
> > > >> highlighting in the "fl" list so you do _not_ need to return the
> > > >> entire document contents.
> > > >>
> > > >> What are you using to display the results anyway?
> > > >>
> > > >> Best,
> > > >> Erick
> > > >>
> > > >> On Tue, Dec 15, 2015 at 10:02 AM, Evert R. <ev...@gmail.com>
> > > wrote:
> > > >> > Hi Erick,
> > > >> >
> > > >> > Thank you very much for the reply!!
> > > >> >
> > > >> > I do get back the full text, autor, and a whole lots of stuff
> which
> > > >> doesn´t
> > > >> > really matter for my project.
> > > >> >
> > > >> > So, what you are saying is that the solr gets me back the full
> > content
> > > >> and
> > > >> > my application will fix the rest? Which means for me that all my
> > books
> > > >> (pdf
> > > >> > files) when searching for an specific word it will bring me the
> > whole
> > > >> book
> > > >> > content that has the requested query. And my application (php) in
> > this
> > > >> > case... will take care of show only part of the text (such as in
> > > >> highlight,
> > > >> > as I was understandind) and hightlight the key word I was looking
> > for?
> > > >> >
> > > >> > If so, Erick, you gave me a big help clearing out... I thought I
> > would
> > > >> do
> > > >> > that with Solr in an easy way. =)
> > > >> >
> > > >> > Thanks for the attachements tip!
> > > >> >
> > > >> > Best regards,
> > > >> >
> > > >> > Evert
> > > >> >
> > > >> > 2015-12-15 14:56 GMT-02:00 Erick Erickson <
> erickerickson@gmail.com
> > >:
> > > >> >
> > > >> >> How are you trying to display the results? Highlighting is a bit
> of
> > > an
> > > >> >> odd beast. Assuming it's correctly configured, the response
> packet
> > > >> >> will have a separate highlight section, it's the application's
> > > >> >> responsibility to present that pleasingly.
> > > >> >>
> > > >> >> What _do_ you get bak in the response?
> > > >> >>
> > > >> >> BTW, the mail sever pretty aggressively strips attachments,
> your's
> > > >> >> didn't come through.
> > > >> >>
> > > >> >> Best,
> > > >> >> Erick
> > > >> >>
> > > >> >> On Tue, Dec 15, 2015 at 3:25 AM, Evert R. <evert.ramos@gmail.com
> >
> > > >> wrote:
> > > >> >> > Hi there!
> > > >> >> >
> > > >> >> > It´s my first installation, not sure if here is the right
> > > channel...
> > > >> >> >
> > > >> >> > Here is my steps:
> > > >> >> >
> > > >> >> > 1. Set up a basic install of solr 5.4.0
> > > >> >> >
> > > >> >> > 2. Create a new core through command line (bin/solr create -c
> > test)
> > > >> >> >
> > > >> >> > 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test
> > /docs/test/)
> > > >> >> >
> > > >> >> > 4. Query over the browser and it brings the correct search, but
> > it
> > > >> does
> > > >> >> not
> > > >> >> > show the part of the text I am querying, the highlight.
> > > >> >> >
> > > >> >> >   I have already flagled the 'hl' option. But still it does not
> > > >> word...
> > > >> >> >
> > > >> >> > Exemple: I am looking for the word 'peace' in my pdf file
> (book)
> > I
> > > >> have 4
> > > >> >> > matches for this word, it shows me the book name (pdf file) but
> > > does
> > > >> not
> > > >> >> > bring which part of the text it has the word peace on it.
> > > >> >> >
> > > >> >> >
> > > >> >> > I am problably missing some configuration in schema.xml, which
> is
> > > >> missing
> > > >> >> > from my folder.... /solr/server/solr/test/conf/
> > > >> >> >
> > > >> >> > Or even the solrconfig.xml...
> > > >> >> >
> > > >> >> > I have read a bunch of things about highlight check these
> files,
> > > >> copied
> > > >> >> the
> > > >> >> > standard schema.xml to my core/conf folder, but still it does
> not
> > > >> bring
> > > >> >> the
> > > >> >> > highlight.
> > > >> >> >
> > > >> >> >
> > > >> >> > Attached a copy of my solrconfig.xml file.
> > > >> >> >
> > > >> >> >
> > > >> >> > I am very sorry for this, probably, dumb and too basic
> > question...
> > > >> First
> > > >> >> > time I see solr in live.
> > > >> >> >
> > > >> >> >
> > > >> >> > Any help will be appreciated.
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> > Best regards,
> > > >> >> >
> > > >> >> >
> > > >> >> > Evert Ramos
> > > >> >> >
> > > >> >> > evert.ramos@gmail.com
> > > >> >> >
> > > >> >>
> > > >>
> > > >
> > > >
> > >
> >
>

Re: Solr Basic Configuration - Highlight - Begginer

Posted by "Evert R." <ev...@gmail.com>.

Hi Andrea,

Thanks for the reply!

I tried with the hl.fl parameter as well, using as below:

http://localhost:8983/solr/techproducts/select?q=nietava&fl=id%2C+content&wt=json&indent=true&hl=true&
hl.fl=f.content.hl.content%3D4&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E

with the parameter under the hl field in the solr ui:

1. f.content.hl.snnipets=2
2. f.content.hl.content=4
3. content

with no success...

Should I have this configuration in the XML file?

Regards,

*Evert *

2015-12-16 11:23 GMT-02:00 Andrea Gazzarini <a....@gmail.com>:

> Hi Evert,
> what is the configuration of the default request handler? Did you set the
> hl.fl parameter?
>
> Please check here [1] the parameters that the highlighting component
> expects. Required parameters should be in the query string or declared
> within the request handler which answers to your query.
>
> Andrea
>
> [1] https://wiki.apache.org/solr/HighlightingParameters
>
>
>
>
> 2015-12-16 12:51 GMT+01:00 Evert R. <ev...@gmail.com>:
>
> > Hi everyone!
> >
> > I think I should not have posted my server name... never had that many
> > access attempts...
> >
> >
> >
> > 2015-12-16 9:03 GMT-02:00 Evert R. <ev...@gmail.com>:
> >
> > > Hello Erick,
> > >
> > > Thanks again for your time.
> > >
> > > Here is as far as I have gone:
> > >
> > > 1. I started a fresh install and did the following:
> > >
> > > [evert@nix]$ bin/solr start -e techproducts
> > > [evert@nix]$ curl '
> > >
> >
> http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true
> > '
> > > -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"
> > >
> > > 2. I am using only the Solr Admin UI to check the query respond, here
> is
> > > an example:
> > >
> > > Query: http://
> > > localhost
> > >
> > >
> >
> :8983/solr/techproducts/select?q=nietava&fl=id%2C+author%2C+content&wt=json&indent=true&hl=true&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
> > >
> > > Result: {
> > >   "responseHeader": {
> > >     "status": 0,
> > >     "QTime": 14,
> > >     "params": {
> > >       "q": "nietava",
> > >       "hl": "true",
> > >       "hl.simple.post": "</em>",
> > >       "indent": "true",
> > >       "fl": "id, author, content",
> > >       "wt": "json",
> > >       "hl.simple.pre": "<em>",
> > >       "_": "1450262674102"
> > >     }
> > >   },
> > >   "response": {
> > >     "numFound": 1,
> > >     "start": 0,
> > >     "docs": [
> > >       {
> > >         "id": "pdf1",
> > >         "author": "Wander",
> > >         "content": [
> > >           "André Luiz - Sexo e Destino _Chico e Waldo_.doc \n \n    \n
> > > Francisco Cândido Xavier \ne \n \n Waldo Vieira \n \n \n \n \n Sexo e
> > > Destino \n \n \n \n 12o livro da Coleção \n“A Vida no Mundo Espiritual”
> > \n
> > > \n  \n \n \n \n Ditado pelo Espírito \nAndré Luiz \n \n  \n \n \n \n \n
> > \n
> > > \n FEDERAÇÃO ESPÍRITA BRASILEIRA \nDEPARTAMENTO EDITORIAL \n \n Rua
> Souza
> > > Valente, 17 \n20941-040 - Rio - RJ - Brasil \n \n  \nhttp://
> > > www.febnet.org.br/  \n  \n \n   \n Francisco Cândido Xavier - Sexo e
> > > Destino - pelo Espírito André Luiz \n \n  \n2 \n \n  \n \n \n \n
> Coleção
> > > \n“A Vida no Mundo Espiritual” \n"
> > >         ]
> > >       }
> > >     ]
> > >   },
> > >   "highlighting": {
> > >     "pdf1": {}
> > >   }
> > > }
> > >
> > > **On the content it brings the whole pdf content (book), and notice
> that
> > > in the highlight it shows empty.
> > >
> > > I tried creating a new core with bin/solr create -c test, using the
> > > schema.xml and solrconfig.xml standard found in
> > > /solr/server/solr/configsets/basic_configs/conf
> > >
> > > But even though... not working as expected (I think).
> > >
> > >
> > > Would you know how to set this techproducts example to bring the
> snnipets
> > > of text?
> > >
> > > The server only allows specific ip address for this port, if you
> would, I
> > > could get it open for you to check.
> > >
> > >
> > > Thanks again and best regards!
> > >
> > >
> > >
> > >
> > > *Evert
> > >
> > >
> > > 2015-12-15 18:14 GMT-02:00 Erick Erickson <er...@gmail.com>:
> > >
> > >> No, that's not what I meant. The highlight component adds a special
> > >> section to the return packet that will contain "snippets" of text with
> > >> highlights. You control how big those snippets are via various
> > >> parameters in the highlight component and they'll have the tags you
> > >> specify for highlighting.
> > >>
> > >> Your app needs to pull the information from the highlight portion of
> > >> the response packet rather than the document list. Just execute your
> > >> queries via cURL or a browser to see the structure of a response to
> > >> see what I mean.
> > >>
> > >> And note that you do _not_ need to return the fields you're
> > >> highlighting in the "fl" list so you do _not_ need to return the
> > >> entire document contents.
> > >>
> > >> What are you using to display the results anyway?
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >> On Tue, Dec 15, 2015 at 10:02 AM, Evert R. <ev...@gmail.com>
> > wrote:
> > >> > Hi Erick,
> > >> >
> > >> > Thank you very much for the reply!!
> > >> >
> > >> > I do get back the full text, autor, and a whole lots of stuff which
> > >> doesn´t
> > >> > really matter for my project.
> > >> >
> > >> > So, what you are saying is that the solr gets me back the full
> content
> > >> and
> > >> > my application will fix the rest? Which means for me that all my
> books
> > >> (pdf
> > >> > files) when searching for an specific word it will bring me the
> whole
> > >> book
> > >> > content that has the requested query. And my application (php) in
> this
> > >> > case... will take care of show only part of the text (such as in
> > >> highlight,
> > >> > as I was understandind) and hightlight the key word I was looking
> for?
> > >> >
> > >> > If so, Erick, you gave me a big help clearing out... I thought I
> would
> > >> do
> > >> > that with Solr in an easy way. =)
> > >> >
> > >> > Thanks for the attachements tip!
> > >> >
> > >> > Best regards,
> > >> >
> > >> > Evert
> > >> >
> > >> > 2015-12-15 14:56 GMT-02:00 Erick Erickson <erickerickson@gmail.com
> >:
> > >> >
> > >> >> How are you trying to display the results? Highlighting is a bit of
> > an
> > >> >> odd beast. Assuming it's correctly configured, the response packet
> > >> >> will have a separate highlight section, it's the application's
> > >> >> responsibility to present that pleasingly.
> > >> >>
> > >> >> What _do_ you get bak in the response?
> > >> >>
> > >> >> BTW, the mail sever pretty aggressively strips attachments, your's
> > >> >> didn't come through.
> > >> >>
> > >> >> Best,
> > >> >> Erick
> > >> >>
> > >> >> On Tue, Dec 15, 2015 at 3:25 AM, Evert R. <ev...@gmail.com>
> > >> wrote:
> > >> >> > Hi there!
> > >> >> >
> > >> >> > It´s my first installation, not sure if here is the right
> > channel...
> > >> >> >
> > >> >> > Here is my steps:
> > >> >> >
> > >> >> > 1. Set up a basic install of solr 5.4.0
> > >> >> >
> > >> >> > 2. Create a new core through command line (bin/solr create -c
> test)
> > >> >> >
> > >> >> > 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test
> /docs/test/)
> > >> >> >
> > >> >> > 4. Query over the browser and it brings the correct search, but
> it
> > >> does
> > >> >> not
> > >> >> > show the part of the text I am querying, the highlight.
> > >> >> >
> > >> >> >   I have already flagled the 'hl' option. But still it does not
> > >> word...
> > >> >> >
> > >> >> > Exemple: I am looking for the word 'peace' in my pdf file (book)
> I
> > >> have 4
> > >> >> > matches for this word, it shows me the book name (pdf file) but
> > does
> > >> not
> > >> >> > bring which part of the text it has the word peace on it.
> > >> >> >
> > >> >> >
> > >> >> > I am problably missing some configuration in schema.xml, which is
> > >> missing
> > >> >> > from my folder.... /solr/server/solr/test/conf/
> > >> >> >
> > >> >> > Or even the solrconfig.xml...
> > >> >> >
> > >> >> > I have read a bunch of things about highlight check these files,
> > >> copied
> > >> >> the
> > >> >> > standard schema.xml to my core/conf folder, but still it does not
> > >> bring
> > >> >> the
> > >> >> > highlight.
> > >> >> >
> > >> >> >
> > >> >> > Attached a copy of my solrconfig.xml file.
> > >> >> >
> > >> >> >
> > >> >> > I am very sorry for this, probably, dumb and too basic
> question...
> > >> First
> > >> >> > time I see solr in live.
> > >> >> >
> > >> >> >
> > >> >> > Any help will be appreciated.
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > Best regards,
> > >> >> >
> > >> >> >
> > >> >> > Evert Ramos
> > >> >> >
> > >> >> > evert.ramos@gmail.com
> > >> >> >
> > >> >>
> > >>
> > >
> > >
> >
>

Re: Solr Basic Configuration - Highlight - Begginer

Posted by Andrea Gazzarini <a....@gmail.com>.

Hi Evert,
what is the configuration of the default request handler? Did you set the
hl.fl parameter?

Please check here [1] the parameters that the highlighting component
expects. Required parameters should be in the query string or declared
within the request handler which answers to your query.

Andrea

[1] https://wiki.apache.org/solr/HighlightingParameters




2015-12-16 12:51 GMT+01:00 Evert R. <ev...@gmail.com>:

> Hi everyone!
>
> I think I should not have posted my server name... never had that many
> access attempts...
>
>
>
> 2015-12-16 9:03 GMT-02:00 Evert R. <ev...@gmail.com>:
>
> > Hello Erick,
> >
> > Thanks again for your time.
> >
> > Here is as far as I have gone:
> >
> > 1. I started a fresh install and did the following:
> >
> > [evert@nix]$ bin/solr start -e techproducts
> > [evert@nix]$ curl '
> >
> http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true
> '
> > -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"
> >
> > 2. I am using only the Solr Admin UI to check the query respond, here is
> > an example:
> >
> > Query: http://
> > localhost
> >
> >
> :8983/solr/techproducts/select?q=nietava&fl=id%2C+author%2C+content&wt=json&indent=true&hl=true&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
> >
> > Result: {
> >   "responseHeader": {
> >     "status": 0,
> >     "QTime": 14,
> >     "params": {
> >       "q": "nietava",
> >       "hl": "true",
> >       "hl.simple.post": "</em>",
> >       "indent": "true",
> >       "fl": "id, author, content",
> >       "wt": "json",
> >       "hl.simple.pre": "<em>",
> >       "_": "1450262674102"
> >     }
> >   },
> >   "response": {
> >     "numFound": 1,
> >     "start": 0,
> >     "docs": [
> >       {
> >         "id": "pdf1",
> >         "author": "Wander",
> >         "content": [
> >           "André Luiz - Sexo e Destino _Chico e Waldo_.doc \n \n    \n
> > Francisco Cândido Xavier \ne \n \n Waldo Vieira \n \n \n \n \n Sexo e
> > Destino \n \n \n \n 12o livro da Coleção \n“A Vida no Mundo Espiritual”
> \n
> > \n  \n \n \n \n Ditado pelo Espírito \nAndré Luiz \n \n  \n \n \n \n \n
> \n
> > \n FEDERAÇÃO ESPÍRITA BRASILEIRA \nDEPARTAMENTO EDITORIAL \n \n Rua Souza
> > Valente, 17 \n20941-040 - Rio - RJ - Brasil \n \n  \nhttp://
> > www.febnet.org.br/  \n  \n \n   \n Francisco Cândido Xavier - Sexo e
> > Destino - pelo Espírito André Luiz \n \n  \n2 \n \n  \n \n \n \n Coleção
> > \n“A Vida no Mundo Espiritual” \n"
> >         ]
> >       }
> >     ]
> >   },
> >   "highlighting": {
> >     "pdf1": {}
> >   }
> > }
> >
> > **On the content it brings the whole pdf content (book), and notice that
> > in the highlight it shows empty.
> >
> > I tried creating a new core with bin/solr create -c test, using the
> > schema.xml and solrconfig.xml standard found in
> > /solr/server/solr/configsets/basic_configs/conf
> >
> > But even though... not working as expected (I think).
> >
> >
> > Would you know how to set this techproducts example to bring the snnipets
> > of text?
> >
> > The server only allows specific ip address for this port, if you would, I
> > could get it open for you to check.
> >
> >
> > Thanks again and best regards!
> >
> >
> >
> >
> > *Evert Ramos*
> > *evert.ramos@gmail.com <ev...@gmail.com>*
> >
> >
> > 2015-12-15 18:14 GMT-02:00 Erick Erickson <er...@gmail.com>:
> >
> >> No, that's not what I meant. The highlight component adds a special
> >> section to the return packet that will contain "snippets" of text with
> >> highlights. You control how big those snippets are via various
> >> parameters in the highlight component and they'll have the tags you
> >> specify for highlighting.
> >>
> >> Your app needs to pull the information from the highlight portion of
> >> the response packet rather than the document list. Just execute your
> >> queries via cURL or a browser to see the structure of a response to
> >> see what I mean.
> >>
> >> And note that you do _not_ need to return the fields you're
> >> highlighting in the "fl" list so you do _not_ need to return the
> >> entire document contents.
> >>
> >> What are you using to display the results anyway?
> >>
> >> Best,
> >> Erick
> >>
> >> On Tue, Dec 15, 2015 at 10:02 AM, Evert R. <ev...@gmail.com>
> wrote:
> >> > Hi Erick,
> >> >
> >> > Thank you very much for the reply!!
> >> >
> >> > I do get back the full text, autor, and a whole lots of stuff which
> >> doesn´t
> >> > really matter for my project.
> >> >
> >> > So, what you are saying is that the solr gets me back the full content
> >> and
> >> > my application will fix the rest? Which means for me that all my books
> >> (pdf
> >> > files) when searching for an specific word it will bring me the whole
> >> book
> >> > content that has the requested query. And my application (php) in this
> >> > case... will take care of show only part of the text (such as in
> >> highlight,
> >> > as I was understandind) and hightlight the key word I was looking for?
> >> >
> >> > If so, Erick, you gave me a big help clearing out... I thought I would
> >> do
> >> > that with Solr in an easy way. =)
> >> >
> >> > Thanks for the attachements tip!
> >> >
> >> > Best regards,
> >> >
> >> > Evert
> >> >
> >> > 2015-12-15 14:56 GMT-02:00 Erick Erickson <er...@gmail.com>:
> >> >
> >> >> How are you trying to display the results? Highlighting is a bit of
> an
> >> >> odd beast. Assuming it's correctly configured, the response packet
> >> >> will have a separate highlight section, it's the application's
> >> >> responsibility to present that pleasingly.
> >> >>
> >> >> What _do_ you get bak in the response?
> >> >>
> >> >> BTW, the mail sever pretty aggressively strips attachments, your's
> >> >> didn't come through.
> >> >>
> >> >> Best,
> >> >> Erick
> >> >>
> >> >> On Tue, Dec 15, 2015 at 3:25 AM, Evert R. <ev...@gmail.com>
> >> wrote:
> >> >> > Hi there!
> >> >> >
> >> >> > It´s my first installation, not sure if here is the right
> channel...
> >> >> >
> >> >> > Here is my steps:
> >> >> >
> >> >> > 1. Set up a basic install of solr 5.4.0
> >> >> >
> >> >> > 2. Create a new core through command line (bin/solr create -c test)
> >> >> >
> >> >> > 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)
> >> >> >
> >> >> > 4. Query over the browser and it brings the correct search, but it
> >> does
> >> >> not
> >> >> > show the part of the text I am querying, the highlight.
> >> >> >
> >> >> >   I have already flagled the 'hl' option. But still it does not
> >> word...
> >> >> >
> >> >> > Exemple: I am looking for the word 'peace' in my pdf file (book) I
> >> have 4
> >> >> > matches for this word, it shows me the book name (pdf file) but
> does
> >> not
> >> >> > bring which part of the text it has the word peace on it.
> >> >> >
> >> >> >
> >> >> > I am problably missing some configuration in schema.xml, which is
> >> missing
> >> >> > from my folder.... /solr/server/solr/test/conf/
> >> >> >
> >> >> > Or even the solrconfig.xml...
> >> >> >
> >> >> > I have read a bunch of things about highlight check these files,
> >> copied
> >> >> the
> >> >> > standard schema.xml to my core/conf folder, but still it does not
> >> bring
> >> >> the
> >> >> > highlight.
> >> >> >
> >> >> >
> >> >> > Attached a copy of my solrconfig.xml file.
> >> >> >
> >> >> >
> >> >> > I am very sorry for this, probably, dumb and too basic question...
> >> First
> >> >> > time I see solr in live.
> >> >> >
> >> >> >
> >> >> > Any help will be appreciated.
> >> >> >
> >> >> >
> >> >> >
> >> >> > Best regards,
> >> >> >
> >> >> >
> >> >> > Evert Ramos
> >> >> >
> >> >> > evert.ramos@gmail.com
> >> >> >
> >> >>
> >>
> >
> >
>

Re: Solr Basic Configuration - Highlight - Begginer

Posted by "Evert R." <ev...@gmail.com>.

Hi everyone!

I think I should not have posted my server name... never had that many
access attempts...



2015-12-16 9:03 GMT-02:00 Evert R. <ev...@gmail.com>:

> Hello Erick,
>
> Thanks again for your time.
>
> Here is as far as I have gone:
>
> 1. I started a fresh install and did the following:
>
> [evert@nix]$ bin/solr start -e techproducts
> [evert@nix]$ curl '
> http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true'
> -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"
>
> 2. I am using only the Solr Admin UI to check the query respond, here is
> an example:
>
> Query: http://
> localhost
>
> :8983/solr/techproducts/select?q=nietava&fl=id%2C+author%2C+content&wt=json&indent=true&hl=true&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
>
> Result: {
>   "responseHeader": {
>     "status": 0,
>     "QTime": 14,
>     "params": {
>       "q": "nietava",
>       "hl": "true",
>       "hl.simple.post": "</em>",
>       "indent": "true",
>       "fl": "id, author, content",
>       "wt": "json",
>       "hl.simple.pre": "<em>",
>       "_": "1450262674102"
>     }
>   },
>   "response": {
>     "numFound": 1,
>     "start": 0,
>     "docs": [
>       {
>         "id": "pdf1",
>         "author": "Wander",
>         "content": [
>           "André Luiz - Sexo e Destino _Chico e Waldo_.doc \n \n    \n
> Francisco Cândido Xavier \ne \n \n Waldo Vieira \n \n \n \n \n Sexo e
> Destino \n \n \n \n 12o livro da Coleção \n“A Vida no Mundo Espiritual” \n
> \n  \n \n \n \n Ditado pelo Espírito \nAndré Luiz \n \n  \n \n \n \n \n \n
> \n FEDERAÇÃO ESPÍRITA BRASILEIRA \nDEPARTAMENTO EDITORIAL \n \n Rua Souza
> Valente, 17 \n20941-040 - Rio - RJ - Brasil \n \n  \nhttp://
> www.febnet.org.br/  \n  \n \n   \n Francisco Cândido Xavier - Sexo e
> Destino - pelo Espírito André Luiz \n \n  \n2 \n \n  \n \n \n \n Coleção
> \n“A Vida no Mundo Espiritual” \n"
>         ]
>       }
>     ]
>   },
>   "highlighting": {
>     "pdf1": {}
>   }
> }
>
> **On the content it brings the whole pdf content (book), and notice that
> in the highlight it shows empty.
>
> I tried creating a new core with bin/solr create -c test, using the
> schema.xml and solrconfig.xml standard found in
> /solr/server/solr/configsets/basic_configs/conf
>
> But even though... not working as expected (I think).
>
>
> Would you know how to set this techproducts example to bring the snnipets
> of text?
>
> The server only allows specific ip address for this port, if you would, I
> could get it open for you to check.
>
>
> Thanks again and best regards!
>
>
>
>
> *Evert Ramos*
> *evert.ramos@gmail.com <ev...@gmail.com>*
>
>
> 2015-12-15 18:14 GMT-02:00 Erick Erickson <er...@gmail.com>:
>
>> No, that's not what I meant. The highlight component adds a special
>> section to the return packet that will contain "snippets" of text with
>> highlights. You control how big those snippets are via various
>> parameters in the highlight component and they'll have the tags you
>> specify for highlighting.
>>
>> Your app needs to pull the information from the highlight portion of
>> the response packet rather than the document list. Just execute your
>> queries via cURL or a browser to see the structure of a response to
>> see what I mean.
>>
>> And note that you do _not_ need to return the fields you're
>> highlighting in the "fl" list so you do _not_ need to return the
>> entire document contents.
>>
>> What are you using to display the results anyway?
>>
>> Best,
>> Erick
>>
>> On Tue, Dec 15, 2015 at 10:02 AM, Evert R. <ev...@gmail.com> wrote:
>> > Hi Erick,
>> >
>> > Thank you very much for the reply!!
>> >
>> > I do get back the full text, autor, and a whole lots of stuff which
>> doesn´t
>> > really matter for my project.
>> >
>> > So, what you are saying is that the solr gets me back the full content
>> and
>> > my application will fix the rest? Which means for me that all my books
>> (pdf
>> > files) when searching for an specific word it will bring me the whole
>> book
>> > content that has the requested query. And my application (php) in this
>> > case... will take care of show only part of the text (such as in
>> highlight,
>> > as I was understandind) and hightlight the key word I was looking for?
>> >
>> > If so, Erick, you gave me a big help clearing out... I thought I would
>> do
>> > that with Solr in an easy way. =)
>> >
>> > Thanks for the attachements tip!
>> >
>> > Best regards,
>> >
>> > Evert
>> >
>> > 2015-12-15 14:56 GMT-02:00 Erick Erickson <er...@gmail.com>:
>> >
>> >> How are you trying to display the results? Highlighting is a bit of an
>> >> odd beast. Assuming it's correctly configured, the response packet
>> >> will have a separate highlight section, it's the application's
>> >> responsibility to present that pleasingly.
>> >>
>> >> What _do_ you get bak in the response?
>> >>
>> >> BTW, the mail sever pretty aggressively strips attachments, your's
>> >> didn't come through.
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Tue, Dec 15, 2015 at 3:25 AM, Evert R. <ev...@gmail.com>
>> wrote:
>> >> > Hi there!
>> >> >
>> >> > It´s my first installation, not sure if here is the right channel...
>> >> >
>> >> > Here is my steps:
>> >> >
>> >> > 1. Set up a basic install of solr 5.4.0
>> >> >
>> >> > 2. Create a new core through command line (bin/solr create -c test)
>> >> >
>> >> > 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)
>> >> >
>> >> > 4. Query over the browser and it brings the correct search, but it
>> does
>> >> not
>> >> > show the part of the text I am querying, the highlight.
>> >> >
>> >> >   I have already flagled the 'hl' option. But still it does not
>> word...
>> >> >
>> >> > Exemple: I am looking for the word 'peace' in my pdf file (book) I
>> have 4
>> >> > matches for this word, it shows me the book name (pdf file) but does
>> not
>> >> > bring which part of the text it has the word peace on it.
>> >> >
>> >> >
>> >> > I am problably missing some configuration in schema.xml, which is
>> missing
>> >> > from my folder.... /solr/server/solr/test/conf/
>> >> >
>> >> > Or even the solrconfig.xml...
>> >> >
>> >> > I have read a bunch of things about highlight check these files,
>> copied
>> >> the
>> >> > standard schema.xml to my core/conf folder, but still it does not
>> bring
>> >> the
>> >> > highlight.
>> >> >
>> >> >
>> >> > Attached a copy of my solrconfig.xml file.
>> >> >
>> >> >
>> >> > I am very sorry for this, probably, dumb and too basic question...
>> First
>> >> > time I see solr in live.
>> >> >
>> >> >
>> >> > Any help will be appreciated.
>> >> >
>> >> >
>> >> >
>> >> > Best regards,
>> >> >
>> >> >
>> >> > Evert Ramos
>> >> >
>> >> > evert.ramos@gmail.com
>> >> >
>> >>
>>
>
>

Re: Solr Basic Configuration - Highlight - Begginer

Posted by "Evert R." <ev...@gmail.com>.

Hello Erick,

Thanks again for your time.

Here is as far as I have gone:

1. I started a fresh install and did the following:

[evert@nix]$ bin/solr start -e techproducts
[evert@nix]$ curl '
http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true'
-F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"

2. I am using only the Solr Admin UI to check the query respond, here is an
example:

Query:
http://nix.budhi.com.br:8983/solr/techproducts/select?q=nietava&fl=id%2C+author%2C+content&wt=json&indent=true&hl=true&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E

Result: {
  "responseHeader": {
    "status": 0,
    "QTime": 14,
    "params": {
      "q": "nietava",
      "hl": "true",
      "hl.simple.post": "</em>",
      "indent": "true",
      "fl": "id, author, content",
      "wt": "json",
      "hl.simple.pre": "<em>",
      "_": "1450262674102"
    }
  },
  "response": {
    "numFound": 1,
    "start": 0,
    "docs": [
      {
        "id": "pdf1",
        "author": "Wander",
        "content": [
          "André Luiz - Sexo e Destino _Chico e Waldo_.doc \n \n    \n
Francisco Cândido Xavier \ne \n \n Waldo Vieira \n \n \n \n \n Sexo e
Destino \n \n \n \n 12o livro da Coleção \n“A Vida no Mundo Espiritual” \n
\n  \n \n \n \n Ditado pelo Espírito \nAndré Luiz \n \n  \n \n \n \n \n \n
\n FEDERAÇÃO ESPÍRITA BRASILEIRA \nDEPARTAMENTO EDITORIAL \n \n Rua Souza
Valente, 17 \n20941-040 - Rio - RJ - Brasil \n \n  \nhttp://
www.febnet.org.br/  \n  \n \n   \n Francisco Cândido Xavier - Sexo e
Destino - pelo Espírito André Luiz \n \n  \n2 \n \n  \n \n \n \n Coleção
\n“A Vida no Mundo Espiritual” \n"
        ]
      }
    ]
  },
  "highlighting": {
    "pdf1": {}
  }
}

**On the content it brings the whole pdf content (book), and notice that in
the highlight it shows empty.

I tried creating a new core with bin/solr create -c test, using the
schema.xml and solrconfig.xml standard found in
/solr/server/solr/configsets/basic_configs/conf

But even though... not working as expected (I think).


Would you know how to set this techproducts example to bring the snnipets
of text?

The server only allows specific ip address for this port, if you would, I
could get it open for you to check.


Thanks again and best regards!




*Evert Ramos*
*evert.ramos@gmail.com <ev...@gmail.com>*


2015-12-15 18:14 GMT-02:00 Erick Erickson <er...@gmail.com>:

> No, that's not what I meant. The highlight component adds a special
> section to the return packet that will contain "snippets" of text with
> highlights. You control how big those snippets are via various
> parameters in the highlight component and they'll have the tags you
> specify for highlighting.
>
> Your app needs to pull the information from the highlight portion of
> the response packet rather than the document list. Just execute your
> queries via cURL or a browser to see the structure of a response to
> see what I mean.
>
> And note that you do _not_ need to return the fields you're
> highlighting in the "fl" list so you do _not_ need to return the
> entire document contents.
>
> What are you using to display the results anyway?
>
> Best,
> Erick
>
> On Tue, Dec 15, 2015 at 10:02 AM, Evert R. <ev...@gmail.com> wrote:
> > Hi Erick,
> >
> > Thank you very much for the reply!!
> >
> > I do get back the full text, autor, and a whole lots of stuff which
> doesn´t
> > really matter for my project.
> >
> > So, what you are saying is that the solr gets me back the full content
> and
> > my application will fix the rest? Which means for me that all my books
> (pdf
> > files) when searching for an specific word it will bring me the whole
> book
> > content that has the requested query. And my application (php) in this
> > case... will take care of show only part of the text (such as in
> highlight,
> > as I was understandind) and hightlight the key word I was looking for?
> >
> > If so, Erick, you gave me a big help clearing out... I thought I would do
> > that with Solr in an easy way. =)
> >
> > Thanks for the attachements tip!
> >
> > Best regards,
> >
> > Evert
> >
> > 2015-12-15 14:56 GMT-02:00 Erick Erickson <er...@gmail.com>:
> >
> >> How are you trying to display the results? Highlighting is a bit of an
> >> odd beast. Assuming it's correctly configured, the response packet
> >> will have a separate highlight section, it's the application's
> >> responsibility to present that pleasingly.
> >>
> >> What _do_ you get bak in the response?
> >>
> >> BTW, the mail sever pretty aggressively strips attachments, your's
> >> didn't come through.
> >>
> >> Best,
> >> Erick
> >>
> >> On Tue, Dec 15, 2015 at 3:25 AM, Evert R. <ev...@gmail.com>
> wrote:
> >> > Hi there!
> >> >
> >> > It´s my first installation, not sure if here is the right channel...
> >> >
> >> > Here is my steps:
> >> >
> >> > 1. Set up a basic install of solr 5.4.0
> >> >
> >> > 2. Create a new core through command line (bin/solr create -c test)
> >> >
> >> > 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)
> >> >
> >> > 4. Query over the browser and it brings the correct search, but it
> does
> >> not
> >> > show the part of the text I am querying, the highlight.
> >> >
> >> >   I have already flagled the 'hl' option. But still it does not
> word...
> >> >
> >> > Exemple: I am looking for the word 'peace' in my pdf file (book) I
> have 4
> >> > matches for this word, it shows me the book name (pdf file) but does
> not
> >> > bring which part of the text it has the word peace on it.
> >> >
> >> >
> >> > I am problably missing some configuration in schema.xml, which is
> missing
> >> > from my folder.... /solr/server/solr/test/conf/
> >> >
> >> > Or even the solrconfig.xml...
> >> >
> >> > I have read a bunch of things about highlight check these files,
> copied
> >> the
> >> > standard schema.xml to my core/conf folder, but still it does not
> bring
> >> the
> >> > highlight.
> >> >
> >> >
> >> > Attached a copy of my solrconfig.xml file.
> >> >
> >> >
> >> > I am very sorry for this, probably, dumb and too basic question...
> First
> >> > time I see solr in live.
> >> >
> >> >
> >> > Any help will be appreciated.
> >> >
> >> >
> >> >
> >> > Best regards,
> >> >
> >> >
> >> > Evert Ramos
> >> >
> >> > evert.ramos@gmail.com
> >> >
> >>
>

Re: Solr Basic Configuration - Highlight - Begginer

Posted by Erick Erickson <er...@gmail.com>.

No, that's not what I meant. The highlight component adds a special
section to the return packet that will contain "snippets" of text with
highlights. You control how big those snippets are via various
parameters in the highlight component and they'll have the tags you
specify for highlighting.

Your app needs to pull the information from the highlight portion of
the response packet rather than the document list. Just execute your
queries via cURL or a browser to see the structure of a response to
see what I mean.

And note that you do _not_ need to return the fields you're
highlighting in the "fl" list so you do _not_ need to return the
entire document contents.

What are you using to display the results anyway?

Best,
Erick

On Tue, Dec 15, 2015 at 10:02 AM, Evert R. <ev...@gmail.com> wrote:
> Hi Erick,
>
> Thank you very much for the reply!!
>
> I do get back the full text, autor, and a whole lots of stuff which doesn´t
> really matter for my project.
>
> So, what you are saying is that the solr gets me back the full content and
> my application will fix the rest? Which means for me that all my books (pdf
> files) when searching for an specific word it will bring me the whole book
> content that has the requested query. And my application (php) in this
> case... will take care of show only part of the text (such as in highlight,
> as I was understandind) and hightlight the key word I was looking for?
>
> If so, Erick, you gave me a big help clearing out... I thought I would do
> that with Solr in an easy way. =)
>
> Thanks for the attachements tip!
>
> Best regards,
>
> Evert
>
> 2015-12-15 14:56 GMT-02:00 Erick Erickson <er...@gmail.com>:
>
>> How are you trying to display the results? Highlighting is a bit of an
>> odd beast. Assuming it's correctly configured, the response packet
>> will have a separate highlight section, it's the application's
>> responsibility to present that pleasingly.
>>
>> What _do_ you get bak in the response?
>>
>> BTW, the mail sever pretty aggressively strips attachments, your's
>> didn't come through.
>>
>> Best,
>> Erick
>>
>> On Tue, Dec 15, 2015 at 3:25 AM, Evert R. <ev...@gmail.com> wrote:
>> > Hi there!
>> >
>> > It´s my first installation, not sure if here is the right channel...
>> >
>> > Here is my steps:
>> >
>> > 1. Set up a basic install of solr 5.4.0
>> >
>> > 2. Create a new core through command line (bin/solr create -c test)
>> >
>> > 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)
>> >
>> > 4. Query over the browser and it brings the correct search, but it does
>> not
>> > show the part of the text I am querying, the highlight.
>> >
>> >   I have already flagled the 'hl' option. But still it does not word...
>> >
>> > Exemple: I am looking for the word 'peace' in my pdf file (book) I have 4
>> > matches for this word, it shows me the book name (pdf file) but does not
>> > bring which part of the text it has the word peace on it.
>> >
>> >
>> > I am problably missing some configuration in schema.xml, which is missing
>> > from my folder.... /solr/server/solr/test/conf/
>> >
>> > Or even the solrconfig.xml...
>> >
>> > I have read a bunch of things about highlight check these files, copied
>> the
>> > standard schema.xml to my core/conf folder, but still it does not bring
>> the
>> > highlight.
>> >
>> >
>> > Attached a copy of my solrconfig.xml file.
>> >
>> >
>> > I am very sorry for this, probably, dumb and too basic question... First
>> > time I see solr in live.
>> >
>> >
>> > Any help will be appreciated.
>> >
>> >
>> >
>> > Best regards,
>> >
>> >
>> > Evert Ramos
>> >
>> > evert.ramos@gmail.com
>> >
>>

Re: Solr Basic Configuration - Highlight - Begginer

Posted by "Evert R." <ev...@gmail.com>.

Hi Erick,

Thank you very much for the reply!!

I do get back the full text, autor, and a whole lots of stuff which doesn´t
really matter for my project.

So, what you are saying is that the solr gets me back the full content and
my application will fix the rest? Which means for me that all my books (pdf
files) when searching for an specific word it will bring me the whole book
content that has the requested query. And my application (php) in this
case... will take care of show only part of the text (such as in highlight,
as I was understandind) and hightlight the key word I was looking for?

If so, Erick, you gave me a big help clearing out... I thought I would do
that with Solr in an easy way. =)

Thanks for the attachements tip!

Best regards,

Evert

2015-12-15 14:56 GMT-02:00 Erick Erickson <er...@gmail.com>:

> How are you trying to display the results? Highlighting is a bit of an
> odd beast. Assuming it's correctly configured, the response packet
> will have a separate highlight section, it's the application's
> responsibility to present that pleasingly.
>
> What _do_ you get bak in the response?
>
> BTW, the mail sever pretty aggressively strips attachments, your's
> didn't come through.
>
> Best,
> Erick
>
> On Tue, Dec 15, 2015 at 3:25 AM, Evert R. <ev...@gmail.com> wrote:
> > Hi there!
> >
> > It´s my first installation, not sure if here is the right channel...
> >
> > Here is my steps:
> >
> > 1. Set up a basic install of solr 5.4.0
> >
> > 2. Create a new core through command line (bin/solr create -c test)
> >
> > 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)
> >
> > 4. Query over the browser and it brings the correct search, but it does
> not
> > show the part of the text I am querying, the highlight.
> >
> >   I have already flagled the 'hl' option. But still it does not word...
> >
> > Exemple: I am looking for the word 'peace' in my pdf file (book) I have 4
> > matches for this word, it shows me the book name (pdf file) but does not
> > bring which part of the text it has the word peace on it.
> >
> >
> > I am problably missing some configuration in schema.xml, which is missing
> > from my folder.... /solr/server/solr/test/conf/
> >
> > Or even the solrconfig.xml...
> >
> > I have read a bunch of things about highlight check these files, copied
> the
> > standard schema.xml to my core/conf folder, but still it does not bring
> the
> > highlight.
> >
> >
> > Attached a copy of my solrconfig.xml file.
> >
> >
> > I am very sorry for this, probably, dumb and too basic question... First
> > time I see solr in live.
> >
> >
> > Any help will be appreciated.
> >
> >
> >
> > Best regards,
> >
> >
> > Evert Ramos
> >
> > evert.ramos@gmail.com
> >
>

Re: Solr Basic Configuration - Highlight - Begginer

Posted by Erick Erickson <er...@gmail.com>.

How are you trying to display the results? Highlighting is a bit of an
odd beast. Assuming it's correctly configured, the response packet
will have a separate highlight section, it's the application's
responsibility to present that pleasingly.

What _do_ you get bak in the response?

BTW, the mail sever pretty aggressively strips attachments, your's
didn't come through.

Best,
Erick

On Tue, Dec 15, 2015 at 3:25 AM, Evert R. <ev...@gmail.com> wrote:
> Hi there!
>
> It´s my first installation, not sure if here is the right channel...
>
> Here is my steps:
>
> 1. Set up a basic install of solr 5.4.0
>
> 2. Create a new core through command line (bin/solr create -c test)
>
> 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)
>
> 4. Query over the browser and it brings the correct search, but it does not
> show the part of the text I am querying, the highlight.
>
>   I have already flagled the 'hl' option. But still it does not word...
>
> Exemple: I am looking for the word 'peace' in my pdf file (book) I have 4
> matches for this word, it shows me the book name (pdf file) but does not
> bring which part of the text it has the word peace on it.
>
>
> I am problably missing some configuration in schema.xml, which is missing
> from my folder.... /solr/server/solr/test/conf/
>
> Or even the solrconfig.xml...
>
> I have read a bunch of things about highlight check these files, copied the
> standard schema.xml to my core/conf folder, but still it does not bring the
> highlight.
>
>
> Attached a copy of my solrconfig.xml file.
>
>
> I am very sorry for this, probably, dumb and too basic question... First
> time I see solr in live.
>
>
> Any help will be appreciated.
>
>
>
> Best regards,
>
>
> Evert Ramos
>
> evert.ramos@gmail.com
>

Re: Solr Basic Configuration - Highlight - Begginer

Posted by "Evert R." <ev...@gmail.com>.

Hello Erick,

Sorry for my mistakes. Here is everything I got so far:

1. It bring the result perfectly but the hightlight (empty) field as below:
{

  "responseHeader":{
    "status":0,
    "QTime":15,
    "params":{
      "q":"text:nietava",
      "debug":"query",
      "hl":"true",
      "hl.simple.post":"</em>",
      "indent":"true",
      "fq":"id:pdf1",
      "hl.fl":"text",
      "wt":"json",
      "hl.simple.pre":"<em>"}},
  "response":{"numFound":1,"start":0,"docs":[
      {
        "id":"pdf1",
        "last_modified":"2011-07-28T20:39:26Z",
        "title":["Microsoft Word - André Luiz - Sexo e Destino _Chico
e Waldo_.doc"],
        "content_type":["application/pdf"],
        "author":"Wander",
        "author_s":"Wander",
        "content":["André Luiz - Sexo e Destino _Chico e Waldo_.doc
***the whole content*** nietava"],

        "_version_":1520765393269948416}]
  },
  *"highlighting":{
    "pdf1":{***I THINK THE SNIPPETS OF TEXT SHOULD BE IN HERE, RIGHT?***}},*
  "debug":{
    "rawquerystring":"text:nietava",
    "querystring":"text:nietava",
    "parsedquery":"text:nietava",
    "parsedquery_toString":"text:nietava",
    "QParser":"LuceneQParser",
    "filter_queries":["id:pdf1"],

    "parsed_filter_queries":["id:pdf1"]}}


2. Here is my settings:

In schema.xml:

<field name="text" type="text_general" indexed="true" stored="true"
multiValued="true"/>

<fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100">

      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
        <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
</fieldType>

In solrconfig.xml:

<requestHandler name="/select" class="solr.SearchHandler"> <lst
name="defaults"> <str name="echoParams">explicit</str> <int
name="rows">10</int> <bool name="preferLocalShards">false</bool> </lst>

I have tried:

schema.xml:   <field name="text" type="text_general" indexed="true"
stored="true" multiValued="true"/>

schema.xml:   <field name="text" type="text_general" indexed="true"
stored="true" multiValued="true"  termVectors="true"
termOffsets="true" termPositions="true"/>

schema.xml:
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
<filter class="solr.WordDelimiterFilterFactory" catenateAll="1"
preserveOriginal="1" generateNumberParts="0" generateWordParts="0" />
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.ApostropheFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" catenateAll="1"
preserveOriginal="1" generateWordParts="0" />
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ApostropheFilterFactory"/>
</analyzer>

solrconfig.xml:

                        <str name="df">text</str>
                        <str name="hl">on</str>
                        <str name="hl.fl">text</str>
                        <str name="hl.useFastVectorHighlighter">true</str>
                        <str name="hl.snippets">100</str>
                        <str name="hl.tag.pre"><b></str>
                        <str name="hl.tag.post"></b></str>

The debug is in the reply I have received.


I am still using the standard techproducts.


I hope this is complete enough.


Thanks again!



*Evert*

2015-12-17 2:01 GMT-02:00 Erick Erickson <er...@gmail.com>:

> bq: but when highlight, using the text field...nothing comes up...
>
>
> http://localhost:8983/solr/techproducts/select?q=text:nietava&fq=id:pdf1&wt=json&indent=true&hl=true&hl.fl=text&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
>
> It's unclear what this means. No results showed up (i.e. numFound==0)
> or no highlighting showed up? Assuming that
> 1> the "text" field has stored=true and
> 2> you find documents when searching on the "text" field
> the above should show something in the highlights section.
>
> Please take the time to provide complete details. Guessing what you're
> doing is wasting time, mine and yours. Once more:
> 1> what is the schema definition for the "text" field. Include the
> fieldType definition
> 2> What is the result of adding &debug=query to the field when you
> don't get highlights
>
> You might review: http://wiki.apache.org/solr/UsingMailingLists
> because it's becoming quite frustrating that you give us little bits
> of information that leave us guessing what you're _really_ doing.
> Highlighting is working for lots of people in lots of sites, it's not
> likely that this functionality is completely broken so the answer will
> be in the docs.
>
> Best,
> ERick
>
> On Wed, Dec 16, 2015 at 5:54 PM, Evert R. <ev...@gmail.com> wrote:
> > Hi Erick and Teague,
> >
> >
> > I found that when using the field 'text' it shows the pdf file result
> > id:pdf1 in this case, like:
> >
> > http://localhost:8983/solr/techproducts/select?fq=id:pdf1&q=nietava
> >
> > but when highlight, using the text field...nothing comes up...
> >
> >
> http://localhost:8983/solr/techproducts/select?q=text:nietava&fq=id:pdf1&wt=json&indent=true&hl=true&hl.fl=text&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
> >
> > of even with the option
> >
> > f.text.hl.snippets=2 under the hl.fl field.
> >
> >
> > I tried as well with the standard configuration, did it all over,
> reindexed
> > a couple times... and still did not work.
> >
> > Also,
> >
> > Using the Analysis, it brings below information:
> >
> > ST
> > textraw_bytesstartendpositionLengthtypeposition
> > nietava[6e 69 65 74 61 76 61]071<ALPHANUM>1
> > SF
> > textraw_bytesstartendpositionLengthtypeposition
> > nietava[6e 69 65 74 61 76 61]071<ALPHANUM>1
> > LCF
> > textraw_bytesstartendpositionLengthtypeposition
> > nietava[6e 69 65 74 61 76 61]071<ALPHANUM>1
> >
> >
> > Alphanumeric I think... so, it´s 'string', right? would that be a
> problem?
> > Should be some other indication?
> >
> >
> > Thanks again!
> >
> >
> > *Evert*
> >
> > 2015-12-16 21:09 GMT-02:00 Erick Erickson <er...@gmail.com>:
> >
> >> I think you're still missing the critical bit. Highlighting is
> >> completely separate from searching. In other words, you can search on
> >> one field and highlight another. What field is searched is governed by
> >> the "qf" parameter when using edismax and by the the "df" parameter
> >> configured in your request handler in solrconfig.xml. These defaults
> >> are overridden when you do a "fielded search" like
> >>
> >> q=content:nietava
> >>
> >> So this: q=content:nietava&hl=true&hl.fl=content
> >> is searching the "content" field. The word you're looking for isn't in
> >> the content field so naturally no docs are returned. And no
> >> highlighting either.
> >>
> >> This: q=nietava&hl=true&hl.fl=content
> >>
> >> is searching somewhere else, thus getting the hit. We already know
> >> that "nietava" is not in the content field because the first search
> >> failed. You need to find out what field is being matched (probably
> >> something like "text") and then try highlighting on _that_ field. Try
> >> adding "debug=query" to the URL and look at the "parsed_query" section
> >> of the return and you'll see what field(s) is/are actually being
> >> searched against.
> >>
> >> NOTE: The field you highlight on _must_ have stored="true" in
> schema.xml.
> >>
> >> As to why "nietava" isn't being found in the content field, probably
> >> you have some kind of analysis chain configured for that field that
> >> isn't searching as you expect. See the admin/analysis page for some
> >> insight into why that would be. The most frequent reason is that the
> >> field is a "string" type which is not broken up into words. Another
> >> possibility is that your analysis chain is leaving in the quotes or
> >> something similar. As James says, looking at admin/analysis is a good
> >> way to figure this out.
> >>
> >> I still strongly recommend you go from the stock techproducts example
> >> and get familiar with how Solr (and highlighting) work before jumping
> >> in and changing things. There are a number of ways things can be
> >> mis-configured and trying to change several things at once is a fine
> >> way to go mad. The admin UI>>schema browser is another way you can see
> >> what kind of terms are _actually_ in your index in a particular field.
> >>
> >> Best,
> >> Erick
> >>
> >>
> >>
> >>
> >> On Wed, Dec 16, 2015 at 12:26 PM, Teague James <
> teaguej@insystechinc.com>
> >> wrote:
> >> > Sorry to hear that didn't work! Let me ask a couple of questions...
> >> >
> >> > Have you tried the analyzer inside of the Admin Interface? It has
> helped
> >> me sort out a number of highlighting issues in the past. To access it,
> go
> >> to your Admin interface, select your core, then select Analysis from the
> >> list of options on the left. In the analyzer, enter the term you are
> >> indexing in the top left (in other words the term in the document you
> are
> >> indexing that you expect to get a hit on) and right input fields. Select
> >> the field that it is destined for (in your case that would be
> 'content'),
> >> then hit analyze. Helps if you have a big screen!
> >> >
> >> > This will show you the impact of the various filter factories that you
> >> have engaged and their effect on whether or not a 'hit' is being
> generated.
> >> Hits are idietified by a very feint highlight. (PSST... Developers... It
> >> would be really cool if the highlight color were more visible or
> >> customizable... Thanks y'all) If it looks like you're getting hits, but
> not
> >> getting highlighting, then open up a new tab with the Admin's query
> >> interface. Same place on the left as the analyzer. Replace the "*:*"
> with
> >> your search term (assuming you already indexed your document) and if
> >> necessary you can put something in the FQ like "id:123456" to target a
> >> specific record.
> >> >
> >> > Did you get a hit? If no, then it's not highlighting that's the issue.
> >> If yes, then try dumping this in your address bar (using your URL/IP,
> >> search term, and core name of course. The fq= is an example) :
> >> > http://
> [URL/IP]/solr/[CORE-NAME]/select?fq=id:123456&q="[SEARCH-TERM]"
> >> >
> >> > That will dump Solr's output to your browser where you can see exactly
> >> what is getting hit.
> >> >
> >> > Hope that helps! Let me know how it goes. Good luck.
> >> >
> >> > -Teague
> >> >
> >> > -----Original Message-----
> >> > From: Evert R. [mailto:evert.ramos@gmail.com]
> >> > Sent: Wednesday, December 16, 2015 1:46 PM
> >> > To: solr-user <so...@lucene.apache.org>
> >> > Subject: Re: Solr Basic Configuration - Highlight - Begginer
> >> >
> >> > Hi Teague!
> >> >
> >> > I configured the solrconf.xml and schema.xml exactly the way you did,
> >> only substituting the word 'documentText' per 'content' used by the
> >> techproducts sample, I reindex through :
> >> >
> >> >  curl '
> >> >
> >>
> http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true
> >> '
> >> > -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"
> >> >
> >> > with the same result.... no highlight in the respond as below:
> >> >
> >> > "highlighting": { "pdf1": {} }
> >> >
> >> > =(
> >> >
> >> > Really... do not know what to do...
> >> >
> >> > Thanks for your time, if you have any more suggestion where I could be
> >> missing something... please let me know.
> >> >
> >> >
> >> > Best regards,
> >> >
> >> > *Evert*
> >> >
> >> > 2015-12-16 15:30 GMT-02:00 Teague James <te...@insystechinc.com>:
> >> >
> >> >> Hi Evert,
> >> >>
> >> >> I recently needed help with phrase highlighting and was pointed to
> the
> >> >> FastVectorHighlighter which worked out great. I just made a change to
> >> >> the configuration to add generateWordParts="0" and
> >> >> generateNumberParts="0" so that searches for things like "1a" would
> >> >> get highlighted correctly. You may or may not need that feature. You
> >> >> can always remove them or change the value to "1" to switch them on
> >> explicitly. Anyway, hope this helps!
> >> >>
> >> >> solrconfig.xml (partial snip)
> >> >> <requestHandler name="/select" class="solr.SearchHandler">
> >> >>                 <lst name="defaults">
> >> >>                         <str name="wt">xml</str>
> >> >>                         <str name="echoParams">explicit</str>
> >> >>                         <int name="rows">10</int>
> >> >>                         <str name="df">documentText</str>
> >> >>                         <str name="hl">on</str>
> >> >>                         <str name="hl.fl">text</str>
> >> >>                         <str
> >> name="hl.useFastVectorHighlighter">true</str>
> >> >>                         <str name="hl.snippets">100</str>
> >> >>                         <str name="hl.tag.pre"><b></str>
> >> >>                         <str name="hl.tag.post"></b></str>
> >> >>                 </lst>
> >> >> </requestHandler>
> >> >>
> >> >> schema.xml (partial snip)
> >> >>    <field name="id" type="string" indexed="true" stored="true"
> >> >> required="true" multiValued="false" />
> >> >>    <field name="documentText" type="text_general" indexed="true"
> >> >> multivalued="true" termVectors="true" termOffsets="true"
> >> >> termPositions="true" />
> >> >>
> >> >> <fieldType name="text_general" class="solr.TextField"
> >> >> positionIncrementGap="100">
> >> >>         <analyzer type="index">
> >> >>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >> >>                 <filter class="solr.StopFilterFactory"
> ignoreCase="true"
> >> >> words="stopwords.txt" />
> >> >>                 <filter class="solr.WordDelimiterFilterFactory"
> >> >> catenateAll="1" preserveOriginal="1" generateNumberParts="0"
> >> >> generateWordParts="0" />
> >> >>                 <filter class="solr.SynonymFilterFactory"
> >> >> synonyms="index_synonyms.txt" ignoreCase="true" expand="true"/>
> >> >>                 <filter class="solr.LowerCaseFilterFactory"/>
> >> >>                 <filter class="solr.PorterStemFilterFactory"/>
> >> >>                 <filter class="solr.ApostropheFilterFactory"/>
> >> >>         </analyzer>
> >> >>         <analyzer type="query">
> >> >>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >> >>                 <filter class="solr.WordDelimiterFilterFactory"
> >> >> catenateAll="1" preserveOriginal="1" generateWordParts="0" />
> >> >>                 <filter class="solr.StopFilterFactory"
> ignoreCase="true"
> >> >> words="stopwords.txt" />
> >> >>                 <filter class="solr.LowerCaseFilterFactory"/>
> >> >>                 <filter class="solr.ApostropheFilterFactory"/>
> >> >>         </analyzer>
> >> >> </fieldType>
> >> >>
> >> >> -Teague
> >> >>
> >> >> From: Evert R. [mailto:evert.ramos@gmail.com]
> >> >> Sent: Tuesday, December 15, 2015 6:25 AM
> >> >> To: solr-user@lucene.apache.org
> >> >> Subject: Solr Basic Configuration - Highlight - Begginer
> >> >>
> >> >> Hi there!
> >> >>
> >> >> It´s my first installation, not sure if here is the right channel...
> >> >>
> >> >> Here is my steps:
> >> >>
> >> >> 1. Set up a basic install of solr 5.4.0
> >> >>
> >> >> 2. Create a new core through command line (bin/solr create -c test)
> >> >>
> >> >> 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)
> >> >>
> >> >> 4. Query over the browser and it brings the correct search, but it
> >> >> does not show the part of the text I am querying, the highlight.
> >> >>
> >> >>   I have already flagled the 'hl' option. But still it does not
> word...
> >> >>
> >> >> Exemple: I am looking for the word 'peace' in my pdf file (book) I
> >> >> have 4 matches for this word, it shows me the book name (pdf file)
> but
> >> >> does not bring which part of the text it has the word peace on it.
> >> >>
> >> >>
> >> >> I am problably missing some configuration in schema.xml, which is
> >> >> missing from my folder.... /solr/server/solr/test/conf/
> >> >>
> >> >> Or even the solrconfig.xml...
> >> >>
> >> >> I have read a bunch of things about highlight check these files,
> >> >> copied the standard schema.xml to my core/conf folder, but still it
> >> >> does not bring the highlight.
> >> >>
> >> >>
> >> >> Attached a copy of my solrconfig.xml file.
> >> >>
> >> >>
> >> >> I am very sorry for this, probably, dumb and too basic question...
> >> >> First time I see solr in live.
> >> >>
> >> >>
> >> >> Any help will be appreciated.
> >> >>
> >> >>
> >> >>
> >> >> Best regards,
> >> >>
> >> >>
> >> >> Evert Ramos
> >> >>
> >> >> mailto:evert.ramos@gmail.com
> >> >>
> >> >>
> >> >>
> >> >
> >>
>

Re: Solr Basic Configuration - Highlight - Begginer

Posted by Erick Erickson <er...@gmail.com>.

bq: but when highlight, using the text field...nothing comes up...

http://localhost:8983/solr/techproducts/select?q=text:nietava&fq=id:pdf1&wt=json&indent=true&hl=true&hl.fl=text&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E

It's unclear what this means. No results showed up (i.e. numFound==0)
or no highlighting showed up? Assuming that
1> the "text" field has stored=true and
2> you find documents when searching on the "text" field
the above should show something in the highlights section.

Please take the time to provide complete details. Guessing what you're
doing is wasting time, mine and yours. Once more:
1> what is the schema definition for the "text" field. Include the
fieldType definition
2> What is the result of adding &debug=query to the field when you
don't get highlights

You might review: http://wiki.apache.org/solr/UsingMailingLists
because it's becoming quite frustrating that you give us little bits
of information that leave us guessing what you're _really_ doing.
Highlighting is working for lots of people in lots of sites, it's not
likely that this functionality is completely broken so the answer will
be in the docs.

Best,
ERick

On Wed, Dec 16, 2015 at 5:54 PM, Evert R. <ev...@gmail.com> wrote:
> Hi Erick and Teague,
>
>
> I found that when using the field 'text' it shows the pdf file result
> id:pdf1 in this case, like:
>
> http://localhost:8983/solr/techproducts/select?fq=id:pdf1&q=nietava
>
> but when highlight, using the text field...nothing comes up...
>
> http://localhost:8983/solr/techproducts/select?q=text:nietava&fq=id:pdf1&wt=json&indent=true&hl=true&hl.fl=text&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
>
> of even with the option
>
> f.text.hl.snippets=2 under the hl.fl field.
>
>
> I tried as well with the standard configuration, did it all over, reindexed
> a couple times... and still did not work.
>
> Also,
>
> Using the Analysis, it brings below information:
>
> ST
> textraw_bytesstartendpositionLengthtypeposition
> nietava[6e 69 65 74 61 76 61]071<ALPHANUM>1
> SF
> textraw_bytesstartendpositionLengthtypeposition
> nietava[6e 69 65 74 61 76 61]071<ALPHANUM>1
> LCF
> textraw_bytesstartendpositionLengthtypeposition
> nietava[6e 69 65 74 61 76 61]071<ALPHANUM>1
>
>
> Alphanumeric I think... so, it´s 'string', right? would that be a problem?
> Should be some other indication?
>
>
> Thanks again!
>
>
> *Evert*
>
> 2015-12-16 21:09 GMT-02:00 Erick Erickson <er...@gmail.com>:
>
>> I think you're still missing the critical bit. Highlighting is
>> completely separate from searching. In other words, you can search on
>> one field and highlight another. What field is searched is governed by
>> the "qf" parameter when using edismax and by the the "df" parameter
>> configured in your request handler in solrconfig.xml. These defaults
>> are overridden when you do a "fielded search" like
>>
>> q=content:nietava
>>
>> So this: q=content:nietava&hl=true&hl.fl=content
>> is searching the "content" field. The word you're looking for isn't in
>> the content field so naturally no docs are returned. And no
>> highlighting either.
>>
>> This: q=nietava&hl=true&hl.fl=content
>>
>> is searching somewhere else, thus getting the hit. We already know
>> that "nietava" is not in the content field because the first search
>> failed. You need to find out what field is being matched (probably
>> something like "text") and then try highlighting on _that_ field. Try
>> adding "debug=query" to the URL and look at the "parsed_query" section
>> of the return and you'll see what field(s) is/are actually being
>> searched against.
>>
>> NOTE: The field you highlight on _must_ have stored="true" in schema.xml.
>>
>> As to why "nietava" isn't being found in the content field, probably
>> you have some kind of analysis chain configured for that field that
>> isn't searching as you expect. See the admin/analysis page for some
>> insight into why that would be. The most frequent reason is that the
>> field is a "string" type which is not broken up into words. Another
>> possibility is that your analysis chain is leaving in the quotes or
>> something similar. As James says, looking at admin/analysis is a good
>> way to figure this out.
>>
>> I still strongly recommend you go from the stock techproducts example
>> and get familiar with how Solr (and highlighting) work before jumping
>> in and changing things. There are a number of ways things can be
>> mis-configured and trying to change several things at once is a fine
>> way to go mad. The admin UI>>schema browser is another way you can see
>> what kind of terms are _actually_ in your index in a particular field.
>>
>> Best,
>> Erick
>>
>>
>>
>>
>> On Wed, Dec 16, 2015 at 12:26 PM, Teague James <te...@insystechinc.com>
>> wrote:
>> > Sorry to hear that didn't work! Let me ask a couple of questions...
>> >
>> > Have you tried the analyzer inside of the Admin Interface? It has helped
>> me sort out a number of highlighting issues in the past. To access it, go
>> to your Admin interface, select your core, then select Analysis from the
>> list of options on the left. In the analyzer, enter the term you are
>> indexing in the top left (in other words the term in the document you are
>> indexing that you expect to get a hit on) and right input fields. Select
>> the field that it is destined for (in your case that would be 'content'),
>> then hit analyze. Helps if you have a big screen!
>> >
>> > This will show you the impact of the various filter factories that you
>> have engaged and their effect on whether or not a 'hit' is being generated.
>> Hits are idietified by a very feint highlight. (PSST... Developers... It
>> would be really cool if the highlight color were more visible or
>> customizable... Thanks y'all) If it looks like you're getting hits, but not
>> getting highlighting, then open up a new tab with the Admin's query
>> interface. Same place on the left as the analyzer. Replace the "*:*" with
>> your search term (assuming you already indexed your document) and if
>> necessary you can put something in the FQ like "id:123456" to target a
>> specific record.
>> >
>> > Did you get a hit? If no, then it's not highlighting that's the issue.
>> If yes, then try dumping this in your address bar (using your URL/IP,
>> search term, and core name of course. The fq= is an example) :
>> > http://[URL/IP]/solr/[CORE-NAME]/select?fq=id:123456&q="[SEARCH-TERM]"
>> >
>> > That will dump Solr's output to your browser where you can see exactly
>> what is getting hit.
>> >
>> > Hope that helps! Let me know how it goes. Good luck.
>> >
>> > -Teague
>> >
>> > -----Original Message-----
>> > From: Evert R. [mailto:evert.ramos@gmail.com]
>> > Sent: Wednesday, December 16, 2015 1:46 PM
>> > To: solr-user <so...@lucene.apache.org>
>> > Subject: Re: Solr Basic Configuration - Highlight - Begginer
>> >
>> > Hi Teague!
>> >
>> > I configured the solrconf.xml and schema.xml exactly the way you did,
>> only substituting the word 'documentText' per 'content' used by the
>> techproducts sample, I reindex through :
>> >
>> >  curl '
>> >
>> http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true
>> '
>> > -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"
>> >
>> > with the same result.... no highlight in the respond as below:
>> >
>> > "highlighting": { "pdf1": {} }
>> >
>> > =(
>> >
>> > Really... do not know what to do...
>> >
>> > Thanks for your time, if you have any more suggestion where I could be
>> missing something... please let me know.
>> >
>> >
>> > Best regards,
>> >
>> > *Evert*
>> >
>> > 2015-12-16 15:30 GMT-02:00 Teague James <te...@insystechinc.com>:
>> >
>> >> Hi Evert,
>> >>
>> >> I recently needed help with phrase highlighting and was pointed to the
>> >> FastVectorHighlighter which worked out great. I just made a change to
>> >> the configuration to add generateWordParts="0" and
>> >> generateNumberParts="0" so that searches for things like "1a" would
>> >> get highlighted correctly. You may or may not need that feature. You
>> >> can always remove them or change the value to "1" to switch them on
>> explicitly. Anyway, hope this helps!
>> >>
>> >> solrconfig.xml (partial snip)
>> >> <requestHandler name="/select" class="solr.SearchHandler">
>> >>                 <lst name="defaults">
>> >>                         <str name="wt">xml</str>
>> >>                         <str name="echoParams">explicit</str>
>> >>                         <int name="rows">10</int>
>> >>                         <str name="df">documentText</str>
>> >>                         <str name="hl">on</str>
>> >>                         <str name="hl.fl">text</str>
>> >>                         <str
>> name="hl.useFastVectorHighlighter">true</str>
>> >>                         <str name="hl.snippets">100</str>
>> >>                         <str name="hl.tag.pre"><b></str>
>> >>                         <str name="hl.tag.post"></b></str>
>> >>                 </lst>
>> >> </requestHandler>
>> >>
>> >> schema.xml (partial snip)
>> >>    <field name="id" type="string" indexed="true" stored="true"
>> >> required="true" multiValued="false" />
>> >>    <field name="documentText" type="text_general" indexed="true"
>> >> multivalued="true" termVectors="true" termOffsets="true"
>> >> termPositions="true" />
>> >>
>> >> <fieldType name="text_general" class="solr.TextField"
>> >> positionIncrementGap="100">
>> >>         <analyzer type="index">
>> >>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> >>                 <filter class="solr.StopFilterFactory" ignoreCase="true"
>> >> words="stopwords.txt" />
>> >>                 <filter class="solr.WordDelimiterFilterFactory"
>> >> catenateAll="1" preserveOriginal="1" generateNumberParts="0"
>> >> generateWordParts="0" />
>> >>                 <filter class="solr.SynonymFilterFactory"
>> >> synonyms="index_synonyms.txt" ignoreCase="true" expand="true"/>
>> >>                 <filter class="solr.LowerCaseFilterFactory"/>
>> >>                 <filter class="solr.PorterStemFilterFactory"/>
>> >>                 <filter class="solr.ApostropheFilterFactory"/>
>> >>         </analyzer>
>> >>         <analyzer type="query">
>> >>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> >>                 <filter class="solr.WordDelimiterFilterFactory"
>> >> catenateAll="1" preserveOriginal="1" generateWordParts="0" />
>> >>                 <filter class="solr.StopFilterFactory" ignoreCase="true"
>> >> words="stopwords.txt" />
>> >>                 <filter class="solr.LowerCaseFilterFactory"/>
>> >>                 <filter class="solr.ApostropheFilterFactory"/>
>> >>         </analyzer>
>> >> </fieldType>
>> >>
>> >> -Teague
>> >>
>> >> From: Evert R. [mailto:evert.ramos@gmail.com]
>> >> Sent: Tuesday, December 15, 2015 6:25 AM
>> >> To: solr-user@lucene.apache.org
>> >> Subject: Solr Basic Configuration - Highlight - Begginer
>> >>
>> >> Hi there!
>> >>
>> >> It´s my first installation, not sure if here is the right channel...
>> >>
>> >> Here is my steps:
>> >>
>> >> 1. Set up a basic install of solr 5.4.0
>> >>
>> >> 2. Create a new core through command line (bin/solr create -c test)
>> >>
>> >> 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)
>> >>
>> >> 4. Query over the browser and it brings the correct search, but it
>> >> does not show the part of the text I am querying, the highlight.
>> >>
>> >>   I have already flagled the 'hl' option. But still it does not word...
>> >>
>> >> Exemple: I am looking for the word 'peace' in my pdf file (book) I
>> >> have 4 matches for this word, it shows me the book name (pdf file) but
>> >> does not bring which part of the text it has the word peace on it.
>> >>
>> >>
>> >> I am problably missing some configuration in schema.xml, which is
>> >> missing from my folder.... /solr/server/solr/test/conf/
>> >>
>> >> Or even the solrconfig.xml...
>> >>
>> >> I have read a bunch of things about highlight check these files,
>> >> copied the standard schema.xml to my core/conf folder, but still it
>> >> does not bring the highlight.
>> >>
>> >>
>> >> Attached a copy of my solrconfig.xml file.
>> >>
>> >>
>> >> I am very sorry for this, probably, dumb and too basic question...
>> >> First time I see solr in live.
>> >>
>> >>
>> >> Any help will be appreciated.
>> >>
>> >>
>> >>
>> >> Best regards,
>> >>
>> >>
>> >> Evert Ramos
>> >>
>> >> mailto:evert.ramos@gmail.com
>> >>
>> >>
>> >>
>> >
>>

Re: Solr Basic Configuration - Highlight - Begginer

Posted by Erick Erickson <er...@gmail.com>.

I just tried it (admittedly using just a simple input obviously not a
PDF file) and
it works perfectly as I'd expect.

So a couple of things:
1> what happens if you highlight the content field? The text field
should be fine.
2> Did you completely blow away your index whenever you changed the
schema file? As in "rm -rf data" where the "data" directory is the
parent of "index"?
3> I'd consider backing off a bit and start with the standard
"techproducts" example and get highlighting to work _there_ first. My
guess is that there's something you're doing that I don't know to ask
about specifically with the PDF conversions.

Erick@Baffled.com

On Thu, Dec 17, 2015 at 3:00 AM, Evert R. <ev...@gmail.com> wrote:
> Hello Teague,
>
> Thanks for your reply and tip! I think Solr will give me a better result
> than just using Tika to read up my files and send to a Fulltext Index in my
> MySQL, which has the precise point of not highlighting the text snippets...
>
> So, I will keep on trying to fix Solr to my needs, and sure it works... I
> am missing something.
>
> Thanks again and I will keep on track.
>
> When I find the solution I will post all files and configs here for future
> references.
>
> Best regards,
>
> *Evert*
>
> 2015-12-17 6:11 GMT-02:00 Teague James <te...@insystechinc.com>:
>
>> Erik's comments not withstanding, there are some gaps in my understanding
>> of your precise situation. Here's a few things that weren't necessarily
>> obvious to me when I took my first try with Solr.
>>
>> Highlighting is the end result of a good hit. It is essentially formatting
>> applied to your hit. It is possible to get a hit without a highlight if
>> certain conditions exist.
>>
>> First, start by making sure you are indexing your target (a PDF file?)
>> correctly. Assuming you are indexing PDFs, are you extracting meta data
>> only or are you parsing the document with Tika? If you want hits on the
>> contents of your PDF, then you have to parse it at index time and store
>> that.That was why I suggested just running some queries through the
>> interface and the URL to see what Solr actually captured from your indexed
>> PDF before worrying about how it looks on the screen.
>>
>> Next, you should look carefully at the Analyzer's output. Notice the
>> abbreviations to the left of the columns? Hover over those to see what
>> filter factory it is. When words are split into multiple columns at one of
>> those points, it indicates that the filter factory broke apart the word
>> while analyzing it. Do a search for the filter filter factories that you
>> find and read up on them. In my case "1a" was being split into 4 by a word
>> delimiter filter factory - "1a", "1", "a", "1a" which caused highlighting
>> to fail in my case while still getting a hit. It also caused erroneous hits
>> elsewhere. Adding some switches to the schema is all it took to correct
>> that for me. However, every case is different based on your needs. That is
>> why it is important to go through the analyzer and see if Solr's indexing
>> and querying are doing what you expect.
>>
>> If that looks good and you've got solid hits all the way down, then it is
>> time to start looking at your highlighter implementation in the index and
>> query analyzers that you are using. My original issue of not being able to
>> highlight phrases with one set of tags necessitated me switching to the
>> fast vector highlighter - which had its own requirements for certain
>> parameters to be set. Here again - going to the Solr docs and reading up on
>> the various highlighters will be helpful in most cases.
>>
>> Solr has a very steep learning curve. I've been using it for several years
>> and I still consider myself a noob. It can be a deep dive, but don't be
>> discouraged. Keep at it. Cheers!
>>
>> -Teague
>>
>> On Wed, Dec 16, 2015 at 8:54 PM, Evert R. <ev...@gmail.com> wrote:
>>
>> > Hi Erick and Teague,
>> >
>> >
>> > I found that when using the field 'text' it shows the pdf file result
>> > id:pdf1 in this case, like:
>> >
>> > http://localhost:8983/solr/techproducts/select?fq=id:pdf1&q=nietava
>> >
>> > but when highlight, using the text field...nothing comes up...
>> >
>> >
>> >
>> http://localhost:8983/solr/techproducts/select?q=text:nietava&fq=id:pdf1&wt=json&indent=true&hl=true&hl.fl=text&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
>> >
>> > of even with the option
>> >
>> > f.text.hl.snippets=2 under the hl.fl field.
>> >
>> >
>> > I tried as well with the standard configuration, did it all over,
>> reindexed
>> > a couple times... and still did not work.
>> >
>> > Also,
>> >
>> > Using the Analysis, it brings below information:
>> >
>> > ST
>> > textraw_bytesstartendpositionLengthtypeposition
>> > nietava[6e 69 65 74 61 76 61]071<ALPHANUM>1
>> > SF
>> > textraw_bytesstartendpositionLengthtypeposition
>> > nietava[6e 69 65 74 61 76 61]071<ALPHANUM>1
>> > LCF
>> > textraw_bytesstartendpositionLengthtypeposition
>> > nietava[6e 69 65 74 61 76 61]071<ALPHANUM>1
>> >
>> >
>> > Alphanumeric I think... so, it´s 'string', right? would that be a
>> problem?
>> > Should be some other indication?
>> >
>> >
>> > Thanks again!
>> >
>> >
>> > *Evert*
>> >
>> > 2015-12-16 21:09 GMT-02:00 Erick Erickson <er...@gmail.com>:
>> >
>> > > I think you're still missing the critical bit. Highlighting is
>> > > completely separate from searching. In other words, you can search on
>> > > one field and highlight another. What field is searched is governed by
>> > > the "qf" parameter when using edismax and by the the "df" parameter
>> > > configured in your request handler in solrconfig.xml. These defaults
>> > > are overridden when you do a "fielded search" like
>> > >
>> > > q=content:nietava
>> > >
>> > > So this: q=content:nietava&hl=true&hl.fl=content
>> > > is searching the "content" field. The word you're looking for isn't in
>> > > the content field so naturally no docs are returned. And no
>> > > highlighting either.
>> > >
>> > > This: q=nietava&hl=true&hl.fl=content
>> > >
>> > > is searching somewhere else, thus getting the hit. We already know
>> > > that "nietava" is not in the content field because the first search
>> > > failed. You need to find out what field is being matched (probably
>> > > something like "text") and then try highlighting on _that_ field. Try
>> > > adding "debug=query" to the URL and look at the "parsed_query" section
>> > > of the return and you'll see what field(s) is/are actually being
>> > > searched against.
>> > >
>> > > NOTE: The field you highlight on _must_ have stored="true" in
>> schema.xml.
>> > >
>> > > As to why "nietava" isn't being found in the content field, probably
>> > > you have some kind of analysis chain configured for that field that
>> > > isn't searching as you expect. See the admin/analysis page for some
>> > > insight into why that would be. The most frequent reason is that the
>> > > field is a "string" type which is not broken up into words. Another
>> > > possibility is that your analysis chain is leaving in the quotes or
>> > > something similar. As James says, looking at admin/analysis is a good
>> > > way to figure this out.
>> > >
>> > > I still strongly recommend you go from the stock techproducts example
>> > > and get familiar with how Solr (and highlighting) work before jumping
>> > > in and changing things. There are a number of ways things can be
>> > > mis-configured and trying to change several things at once is a fine
>> > > way to go mad. The admin UI>>schema browser is another way you can see
>> > > what kind of terms are _actually_ in your index in a particular field.
>> > >
>> > > Best,
>> > > Erick
>> > >
>> > >
>> > >
>> > >
>> > > On Wed, Dec 16, 2015 at 12:26 PM, Teague James <
>> teaguej@insystechinc.com
>> > >
>> > > wrote:
>> > > > Sorry to hear that didn't work! Let me ask a couple of questions...
>> > > >
>> > > > Have you tried the analyzer inside of the Admin Interface? It has
>> > helped
>> > > me sort out a number of highlighting issues in the past. To access it,
>> go
>> > > to your Admin interface, select your core, then select Analysis from
>> the
>> > > list of options on the left. In the analyzer, enter the term you are
>> > > indexing in the top left (in other words the term in the document you
>> are
>> > > indexing that you expect to get a hit on) and right input fields.
>> Select
>> > > the field that it is destined for (in your case that would be
>> 'content'),
>> > > then hit analyze. Helps if you have a big screen!
>> > > >
>> > > > This will show you the impact of the various filter factories that
>> you
>> > > have engaged and their effect on whether or not a 'hit' is being
>> > generated.
>> > > Hits are idietified by a very feint highlight. (PSST... Developers...
>> It
>> > > would be really cool if the highlight color were more visible or
>> > > customizable... Thanks y'all) If it looks like you're getting hits, but
>> > not
>> > > getting highlighting, then open up a new tab with the Admin's query
>> > > interface. Same place on the left as the analyzer. Replace the "*:*"
>> with
>> > > your search term (assuming you already indexed your document) and if
>> > > necessary you can put something in the FQ like "id:123456" to target a
>> > > specific record.
>> > > >
>> > > > Did you get a hit? If no, then it's not highlighting that's the
>> issue.
>> > > If yes, then try dumping this in your address bar (using your URL/IP,
>> > > search term, and core name of course. The fq= is an example) :
>> > > > http://
>> [URL/IP]/solr/[CORE-NAME]/select?fq=id:123456&q="[SEARCH-TERM]"
>> > > >
>> > > > That will dump Solr's output to your browser where you can see
>> exactly
>> > > what is getting hit.
>> > > >
>> > > > Hope that helps! Let me know how it goes. Good luck.
>> > > >
>> > > > -Teague
>> > > >
>> > > > -----Original Message-----
>> > > > From: Evert R. [mailto:evert.ramos@gmail.com]
>> > > > Sent: Wednesday, December 16, 2015 1:46 PM
>> > > > To: solr-user <so...@lucene.apache.org>
>> > > > Subject: Re: Solr Basic Configuration - Highlight - Begginer
>> > > >
>> > > > Hi Teague!
>> > > >
>> > > > I configured the solrconf.xml and schema.xml exactly the way you did,
>> > > only substituting the word 'documentText' per 'content' used by the
>> > > techproducts sample, I reindex through :
>> > > >
>> > > >  curl '
>> > > >
>> > >
>> >
>> http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true
>> > > '
>> > > > -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"
>> > > >
>> > > > with the same result.... no highlight in the respond as below:
>> > > >
>> > > > "highlighting": { "pdf1": {} }
>> > > >
>> > > > =(
>> > > >
>> > > > Really... do not know what to do...
>> > > >
>> > > > Thanks for your time, if you have any more suggestion where I could
>> be
>> > > missing something... please let me know.
>> > > >
>> > > >
>> > > > Best regards,
>> > > >
>> > > > *Evert*
>> > > >
>> > > > 2015-12-16 15:30 GMT-02:00 Teague James <te...@insystechinc.com>:
>> > > >
>> > > >> Hi Evert,
>> > > >>
>> > > >> I recently needed help with phrase highlighting and was pointed to
>> the
>> > > >> FastVectorHighlighter which worked out great. I just made a change
>> to
>> > > >> the configuration to add generateWordParts="0" and
>> > > >> generateNumberParts="0" so that searches for things like "1a" would
>> > > >> get highlighted correctly. You may or may not need that feature. You
>> > > >> can always remove them or change the value to "1" to switch them on
>> > > explicitly. Anyway, hope this helps!
>> > > >>
>> > > >> solrconfig.xml (partial snip)
>> > > >> <requestHandler name="/select" class="solr.SearchHandler">
>> > > >>                 <lst name="defaults">
>> > > >>                         <str name="wt">xml</str>
>> > > >>                         <str name="echoParams">explicit</str>
>> > > >>                         <int name="rows">10</int>
>> > > >>                         <str name="df">documentText</str>
>> > > >>                         <str name="hl">on</str>
>> > > >>                         <str name="hl.fl">text</str>
>> > > >>                         <str
>> > > name="hl.useFastVectorHighlighter">true</str>
>> > > >>                         <str name="hl.snippets">100</str>
>> > > >>                         <str name="hl.tag.pre"><b></str>
>> > > >>                         <str name="hl.tag.post"></b></str>
>> > > >>                 </lst>
>> > > >> </requestHandler>
>> > > >>
>> > > >> schema.xml (partial snip)
>> > > >>    <field name="id" type="string" indexed="true" stored="true"
>> > > >> required="true" multiValued="false" />
>> > > >>    <field name="documentText" type="text_general" indexed="true"
>> > > >> multivalued="true" termVectors="true" termOffsets="true"
>> > > >> termPositions="true" />
>> > > >>
>> > > >> <fieldType name="text_general" class="solr.TextField"
>> > > >> positionIncrementGap="100">
>> > > >>         <analyzer type="index">
>> > > >>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> > > >>                 <filter class="solr.StopFilterFactory"
>> > ignoreCase="true"
>> > > >> words="stopwords.txt" />
>> > > >>                 <filter class="solr.WordDelimiterFilterFactory"
>> > > >> catenateAll="1" preserveOriginal="1" generateNumberParts="0"
>> > > >> generateWordParts="0" />
>> > > >>                 <filter class="solr.SynonymFilterFactory"
>> > > >> synonyms="index_synonyms.txt" ignoreCase="true" expand="true"/>
>> > > >>                 <filter class="solr.LowerCaseFilterFactory"/>
>> > > >>                 <filter class="solr.PorterStemFilterFactory"/>
>> > > >>                 <filter class="solr.ApostropheFilterFactory"/>
>> > > >>         </analyzer>
>> > > >>         <analyzer type="query">
>> > > >>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> > > >>                 <filter class="solr.WordDelimiterFilterFactory"
>> > > >> catenateAll="1" preserveOriginal="1" generateWordParts="0" />
>> > > >>                 <filter class="solr.StopFilterFactory"
>> > ignoreCase="true"
>> > > >> words="stopwords.txt" />
>> > > >>                 <filter class="solr.LowerCaseFilterFactory"/>
>> > > >>                 <filter class="solr.ApostropheFilterFactory"/>
>> > > >>         </analyzer>
>> > > >> </fieldType>
>> > > >>
>> > > >> -Teague
>> > > >>
>> > > >> From: Evert R. [mailto:evert.ramos@gmail.com]
>> > > >> Sent: Tuesday, December 15, 2015 6:25 AM
>> > > >> To: solr-user@lucene.apache.org
>> > > >> Subject: Solr Basic Configuration - Highlight - Begginer
>> > > >>
>> > > >> Hi there!
>> > > >>
>> > > >> It´s my first installation, not sure if here is the right channel...
>> > > >>
>> > > >> Here is my steps:
>> > > >>
>> > > >> 1. Set up a basic install of solr 5.4.0
>> > > >>
>> > > >> 2. Create a new core through command line (bin/solr create -c test)
>> > > >>
>> > > >> 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)
>> > > >>
>> > > >> 4. Query over the browser and it brings the correct search, but it
>> > > >> does not show the part of the text I am querying, the highlight.
>> > > >>
>> > > >>   I have already flagled the 'hl' option. But still it does not
>> > word...
>> > > >>
>> > > >> Exemple: I am looking for the word 'peace' in my pdf file (book) I
>> > > >> have 4 matches for this word, it shows me the book name (pdf file)
>> but
>> > > >> does not bring which part of the text it has the word peace on it.
>> > > >>
>> > > >>
>> > > >> I am problably missing some configuration in schema.xml, which is
>> > > >> missing from my folder.... /solr/server/solr/test/conf/
>> > > >>
>> > > >> Or even the solrconfig.xml...
>> > > >>
>> > > >> I have read a bunch of things about highlight check these files,
>> > > >> copied the standard schema.xml to my core/conf folder, but still it
>> > > >> does not bring the highlight.
>> > > >>
>> > > >>
>> > > >> Attached a copy of my solrconfig.xml file.
>> > > >>
>> > > >>
>> > > >> I am very sorry for this, probably, dumb and too basic question...
>> > > >> First time I see solr in live.
>> > > >>
>> > > >>
>> > > >> Any help will be appreciated.
>> > > >>
>> > > >>
>> > > >>
>> > > >> Best regards,
>> > > >>
>> > > >>
>> > > >> Evert Ramos
>> > > >>
>> > > >> mailto:evert.ramos@gmail.com
>> > > >>
>> > > >>
>> > > >>
>> > > >
>> > >
>> >
>>
>>
>>
>> --
>> Kind regards,
>>
>> -Teague James
>> *Senior Web Applications Developer*
>> Insystech Inc.
>> teaguej@insystechinc.com
>> (703) 508-0008 (Cell)
>>

Re: Solr Basic Configuration - Highlight - Begginer

Posted by "Evert R." <ev...@gmail.com>.

Hello Teague,

Thanks for your reply and tip! I think Solr will give me a better result
than just using Tika to read up my files and send to a Fulltext Index in my
MySQL, which has the precise point of not highlighting the text snippets...

So, I will keep on trying to fix Solr to my needs, and sure it works... I
am missing something.

Thanks again and I will keep on track.

When I find the solution I will post all files and configs here for future
references.

Best regards,

*Evert*

2015-12-17 6:11 GMT-02:00 Teague James <te...@insystechinc.com>:

> Erik's comments not withstanding, there are some gaps in my understanding
> of your precise situation. Here's a few things that weren't necessarily
> obvious to me when I took my first try with Solr.
>
> Highlighting is the end result of a good hit. It is essentially formatting
> applied to your hit. It is possible to get a hit without a highlight if
> certain conditions exist.
>
> First, start by making sure you are indexing your target (a PDF file?)
> correctly. Assuming you are indexing PDFs, are you extracting meta data
> only or are you parsing the document with Tika? If you want hits on the
> contents of your PDF, then you have to parse it at index time and store
> that.That was why I suggested just running some queries through the
> interface and the URL to see what Solr actually captured from your indexed
> PDF before worrying about how it looks on the screen.
>
> Next, you should look carefully at the Analyzer's output. Notice the
> abbreviations to the left of the columns? Hover over those to see what
> filter factory it is. When words are split into multiple columns at one of
> those points, it indicates that the filter factory broke apart the word
> while analyzing it. Do a search for the filter filter factories that you
> find and read up on them. In my case "1a" was being split into 4 by a word
> delimiter filter factory - "1a", "1", "a", "1a" which caused highlighting
> to fail in my case while still getting a hit. It also caused erroneous hits
> elsewhere. Adding some switches to the schema is all it took to correct
> that for me. However, every case is different based on your needs. That is
> why it is important to go through the analyzer and see if Solr's indexing
> and querying are doing what you expect.
>
> If that looks good and you've got solid hits all the way down, then it is
> time to start looking at your highlighter implementation in the index and
> query analyzers that you are using. My original issue of not being able to
> highlight phrases with one set of tags necessitated me switching to the
> fast vector highlighter - which had its own requirements for certain
> parameters to be set. Here again - going to the Solr docs and reading up on
> the various highlighters will be helpful in most cases.
>
> Solr has a very steep learning curve. I've been using it for several years
> and I still consider myself a noob. It can be a deep dive, but don't be
> discouraged. Keep at it. Cheers!
>
> -Teague
>
> On Wed, Dec 16, 2015 at 8:54 PM, Evert R. <ev...@gmail.com> wrote:
>
> > Hi Erick and Teague,
> >
> >
> > I found that when using the field 'text' it shows the pdf file result
> > id:pdf1 in this case, like:
> >
> > http://localhost:8983/solr/techproducts/select?fq=id:pdf1&q=nietava
> >
> > but when highlight, using the text field...nothing comes up...
> >
> >
> >
> http://localhost:8983/solr/techproducts/select?q=text:nietava&fq=id:pdf1&wt=json&indent=true&hl=true&hl.fl=text&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
> >
> > of even with the option
> >
> > f.text.hl.snippets=2 under the hl.fl field.
> >
> >
> > I tried as well with the standard configuration, did it all over,
> reindexed
> > a couple times... and still did not work.
> >
> > Also,
> >
> > Using the Analysis, it brings below information:
> >
> > ST
> > textraw_bytesstartendpositionLengthtypeposition
> > nietava[6e 69 65 74 61 76 61]071<ALPHANUM>1
> > SF
> > textraw_bytesstartendpositionLengthtypeposition
> > nietava[6e 69 65 74 61 76 61]071<ALPHANUM>1
> > LCF
> > textraw_bytesstartendpositionLengthtypeposition
> > nietava[6e 69 65 74 61 76 61]071<ALPHANUM>1
> > 
> >
> > Alphanumeric I think... so, it´s 'string', right? would that be a
> problem?
> > Should be some other indication?
> >
> >
> > Thanks again!
> >
> >
> > *Evert*
> >
> > 2015-12-16 21:09 GMT-02:00 Erick Erickson <er...@gmail.com>:
> >
> > > I think you're still missing the critical bit. Highlighting is
> > > completely separate from searching. In other words, you can search on
> > > one field and highlight another. What field is searched is governed by
> > > the "qf" parameter when using edismax and by the the "df" parameter
> > > configured in your request handler in solrconfig.xml. These defaults
> > > are overridden when you do a "fielded search" like
> > >
> > > q=content:nietava
> > >
> > > So this: q=content:nietava&hl=true&hl.fl=content
> > > is searching the "content" field. The word you're looking for isn't in
> > > the content field so naturally no docs are returned. And no
> > > highlighting either.
> > >
> > > This: q=nietava&hl=true&hl.fl=content
> > >
> > > is searching somewhere else, thus getting the hit. We already know
> > > that "nietava" is not in the content field because the first search
> > > failed. You need to find out what field is being matched (probably
> > > something like "text") and then try highlighting on _that_ field. Try
> > > adding "debug=query" to the URL and look at the "parsed_query" section
> > > of the return and you'll see what field(s) is/are actually being
> > > searched against.
> > >
> > > NOTE: The field you highlight on _must_ have stored="true" in
> schema.xml.
> > >
> > > As to why "nietava" isn't being found in the content field, probably
> > > you have some kind of analysis chain configured for that field that
> > > isn't searching as you expect. See the admin/analysis page for some
> > > insight into why that would be. The most frequent reason is that the
> > > field is a "string" type which is not broken up into words. Another
> > > possibility is that your analysis chain is leaving in the quotes or
> > > something similar. As James says, looking at admin/analysis is a good
> > > way to figure this out.
> > >
> > > I still strongly recommend you go from the stock techproducts example
> > > and get familiar with how Solr (and highlighting) work before jumping
> > > in and changing things. There are a number of ways things can be
> > > mis-configured and trying to change several things at once is a fine
> > > way to go mad. The admin UI>>schema browser is another way you can see
> > > what kind of terms are _actually_ in your index in a particular field.
> > >
> > > Best,
> > > Erick
> > >
> > >
> > >
> > >
> > > On Wed, Dec 16, 2015 at 12:26 PM, Teague James <
> teaguej@insystechinc.com
> > >
> > > wrote:
> > > > Sorry to hear that didn't work! Let me ask a couple of questions...
> > > >
> > > > Have you tried the analyzer inside of the Admin Interface? It has
> > helped
> > > me sort out a number of highlighting issues in the past. To access it,
> go
> > > to your Admin interface, select your core, then select Analysis from
> the
> > > list of options on the left. In the analyzer, enter the term you are
> > > indexing in the top left (in other words the term in the document you
> are
> > > indexing that you expect to get a hit on) and right input fields.
> Select
> > > the field that it is destined for (in your case that would be
> 'content'),
> > > then hit analyze. Helps if you have a big screen!
> > > >
> > > > This will show you the impact of the various filter factories that
> you
> > > have engaged and their effect on whether or not a 'hit' is being
> > generated.
> > > Hits are idietified by a very feint highlight. (PSST... Developers...
> It
> > > would be really cool if the highlight color were more visible or
> > > customizable... Thanks y'all) If it looks like you're getting hits, but
> > not
> > > getting highlighting, then open up a new tab with the Admin's query
> > > interface. Same place on the left as the analyzer. Replace the "*:*"
> with
> > > your search term (assuming you already indexed your document) and if
> > > necessary you can put something in the FQ like "id:123456" to target a
> > > specific record.
> > > >
> > > > Did you get a hit? If no, then it's not highlighting that's the
> issue.
> > > If yes, then try dumping this in your address bar (using your URL/IP,
> > > search term, and core name of course. The fq= is an example) :
> > > > http://
> [URL/IP]/solr/[CORE-NAME]/select?fq=id:123456&q="[SEARCH-TERM]"
> > > >
> > > > That will dump Solr's output to your browser where you can see
> exactly
> > > what is getting hit.
> > > >
> > > > Hope that helps! Let me know how it goes. Good luck.
> > > >
> > > > -Teague
> > > >
> > > > -----Original Message-----
> > > > From: Evert R. [mailto:evert.ramos@gmail.com]
> > > > Sent: Wednesday, December 16, 2015 1:46 PM
> > > > To: solr-user <so...@lucene.apache.org>
> > > > Subject: Re: Solr Basic Configuration - Highlight - Begginer
> > > >
> > > > Hi Teague!
> > > >
> > > > I configured the solrconf.xml and schema.xml exactly the way you did,
> > > only substituting the word 'documentText' per 'content' used by the
> > > techproducts sample, I reindex through :
> > > >
> > > >  curl '
> > > >
> > >
> >
> http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true
> > > '
> > > > -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"
> > > >
> > > > with the same result.... no highlight in the respond as below:
> > > >
> > > > "highlighting": { "pdf1": {} }
> > > >
> > > > =(
> > > >
> > > > Really... do not know what to do...
> > > >
> > > > Thanks for your time, if you have any more suggestion where I could
> be
> > > missing something... please let me know.
> > > >
> > > >
> > > > Best regards,
> > > >
> > > > *Evert*
> > > >
> > > > 2015-12-16 15:30 GMT-02:00 Teague James <te...@insystechinc.com>:
> > > >
> > > >> Hi Evert,
> > > >>
> > > >> I recently needed help with phrase highlighting and was pointed to
> the
> > > >> FastVectorHighlighter which worked out great. I just made a change
> to
> > > >> the configuration to add generateWordParts="0" and
> > > >> generateNumberParts="0" so that searches for things like "1a" would
> > > >> get highlighted correctly. You may or may not need that feature. You
> > > >> can always remove them or change the value to "1" to switch them on
> > > explicitly. Anyway, hope this helps!
> > > >>
> > > >> solrconfig.xml (partial snip)
> > > >> <requestHandler name="/select" class="solr.SearchHandler">
> > > >>                 <lst name="defaults">
> > > >>                         <str name="wt">xml</str>
> > > >>                         <str name="echoParams">explicit</str>
> > > >>                         <int name="rows">10</int>
> > > >>                         <str name="df">documentText</str>
> > > >>                         <str name="hl">on</str>
> > > >>                         <str name="hl.fl">text</str>
> > > >>                         <str
> > > name="hl.useFastVectorHighlighter">true</str>
> > > >>                         <str name="hl.snippets">100</str>
> > > >>                         <str name="hl.tag.pre"><b></str>
> > > >>                         <str name="hl.tag.post"></b></str>
> > > >>                 </lst>
> > > >> </requestHandler>
> > > >>
> > > >> schema.xml (partial snip)
> > > >>    <field name="id" type="string" indexed="true" stored="true"
> > > >> required="true" multiValued="false" />
> > > >>    <field name="documentText" type="text_general" indexed="true"
> > > >> multivalued="true" termVectors="true" termOffsets="true"
> > > >> termPositions="true" />
> > > >>
> > > >> <fieldType name="text_general" class="solr.TextField"
> > > >> positionIncrementGap="100">
> > > >>         <analyzer type="index">
> > > >>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > > >>                 <filter class="solr.StopFilterFactory"
> > ignoreCase="true"
> > > >> words="stopwords.txt" />
> > > >>                 <filter class="solr.WordDelimiterFilterFactory"
> > > >> catenateAll="1" preserveOriginal="1" generateNumberParts="0"
> > > >> generateWordParts="0" />
> > > >>                 <filter class="solr.SynonymFilterFactory"
> > > >> synonyms="index_synonyms.txt" ignoreCase="true" expand="true"/>
> > > >>                 <filter class="solr.LowerCaseFilterFactory"/>
> > > >>                 <filter class="solr.PorterStemFilterFactory"/>
> > > >>                 <filter class="solr.ApostropheFilterFactory"/>
> > > >>         </analyzer>
> > > >>         <analyzer type="query">
> > > >>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > > >>                 <filter class="solr.WordDelimiterFilterFactory"
> > > >> catenateAll="1" preserveOriginal="1" generateWordParts="0" />
> > > >>                 <filter class="solr.StopFilterFactory"
> > ignoreCase="true"
> > > >> words="stopwords.txt" />
> > > >>                 <filter class="solr.LowerCaseFilterFactory"/>
> > > >>                 <filter class="solr.ApostropheFilterFactory"/>
> > > >>         </analyzer>
> > > >> </fieldType>
> > > >>
> > > >> -Teague
> > > >>
> > > >> From: Evert R. [mailto:evert.ramos@gmail.com]
> > > >> Sent: Tuesday, December 15, 2015 6:25 AM
> > > >> To: solr-user@lucene.apache.org
> > > >> Subject: Solr Basic Configuration - Highlight - Begginer
> > > >>
> > > >> Hi there!
> > > >>
> > > >> It´s my first installation, not sure if here is the right channel...
> > > >>
> > > >> Here is my steps:
> > > >>
> > > >> 1. Set up a basic install of solr 5.4.0
> > > >>
> > > >> 2. Create a new core through command line (bin/solr create -c test)
> > > >>
> > > >> 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)
> > > >>
> > > >> 4. Query over the browser and it brings the correct search, but it
> > > >> does not show the part of the text I am querying, the highlight.
> > > >>
> > > >>   I have already flagled the 'hl' option. But still it does not
> > word...
> > > >>
> > > >> Exemple: I am looking for the word 'peace' in my pdf file (book) I
> > > >> have 4 matches for this word, it shows me the book name (pdf file)
> but
> > > >> does not bring which part of the text it has the word peace on it.
> > > >>
> > > >>
> > > >> I am problably missing some configuration in schema.xml, which is
> > > >> missing from my folder.... /solr/server/solr/test/conf/
> > > >>
> > > >> Or even the solrconfig.xml...
> > > >>
> > > >> I have read a bunch of things about highlight check these files,
> > > >> copied the standard schema.xml to my core/conf folder, but still it
> > > >> does not bring the highlight.
> > > >>
> > > >>
> > > >> Attached a copy of my solrconfig.xml file.
> > > >>
> > > >>
> > > >> I am very sorry for this, probably, dumb and too basic question...
> > > >> First time I see solr in live.
> > > >>
> > > >>
> > > >> Any help will be appreciated.
> > > >>
> > > >>
> > > >>
> > > >> Best regards,
> > > >>
> > > >>
> > > >> Evert Ramos
> > > >>
> > > >> mailto:evert.ramos@gmail.com
> > > >>
> > > >>
> > > >>
> > > >
> > >
> >
>
>
>
> --
> Kind regards,
>
> -Teague James
> *Senior Web Applications Developer*
> Insystech Inc.
> teaguej@insystechinc.com
> (703) 508-0008 (Cell)
>

Re: Solr Basic Configuration - Highlight - Begginer

Posted by Teague James <te...@insystechinc.com>.

Erik's comments not withstanding, there are some gaps in my understanding
of your precise situation. Here's a few things that weren't necessarily
obvious to me when I took my first try with Solr.

Highlighting is the end result of a good hit. It is essentially formatting
applied to your hit. It is possible to get a hit without a highlight if
certain conditions exist.

First, start by making sure you are indexing your target (a PDF file?)
correctly. Assuming you are indexing PDFs, are you extracting meta data
only or are you parsing the document with Tika? If you want hits on the
contents of your PDF, then you have to parse it at index time and store
that.That was why I suggested just running some queries through the
interface and the URL to see what Solr actually captured from your indexed
PDF before worrying about how it looks on the screen.

Next, you should look carefully at the Analyzer's output. Notice the
abbreviations to the left of the columns? Hover over those to see what
filter factory it is. When words are split into multiple columns at one of
those points, it indicates that the filter factory broke apart the word
while analyzing it. Do a search for the filter filter factories that you
find and read up on them. In my case "1a" was being split into 4 by a word
delimiter filter factory - "1a", "1", "a", "1a" which caused highlighting
to fail in my case while still getting a hit. It also caused erroneous hits
elsewhere. Adding some switches to the schema is all it took to correct
that for me. However, every case is different based on your needs. That is
why it is important to go through the analyzer and see if Solr's indexing
and querying are doing what you expect.

If that looks good and you've got solid hits all the way down, then it is
time to start looking at your highlighter implementation in the index and
query analyzers that you are using. My original issue of not being able to
highlight phrases with one set of tags necessitated me switching to the
fast vector highlighter - which had its own requirements for certain
parameters to be set. Here again - going to the Solr docs and reading up on
the various highlighters will be helpful in most cases.

Solr has a very steep learning curve. I've been using it for several years
and I still consider myself a noob. It can be a deep dive, but don't be
discouraged. Keep at it. Cheers!

-Teague

On Wed, Dec 16, 2015 at 8:54 PM, Evert R. <ev...@gmail.com> wrote:

> Hi Erick and Teague,
>
>
> I found that when using the field 'text' it shows the pdf file result
> id:pdf1 in this case, like:
>
> http://localhost:8983/solr/techproducts/select?fq=id:pdf1&q=nietava
>
> but when highlight, using the text field...nothing comes up...
>
>
> http://localhost:8983/solr/techproducts/select?q=text:nietava&fq=id:pdf1&wt=json&indent=true&hl=true&hl.fl=text&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
>
> of even with the option
>
> f.text.hl.snippets=2 under the hl.fl field.
>
>
> I tried as well with the standard configuration, did it all over, reindexed
> a couple times... and still did not work.
>
> Also,
>
> Using the Analysis, it brings below information:
>
> ST
> textraw_bytesstartendpositionLengthtypeposition
> nietava[6e 69 65 74 61 76 61]071<ALPHANUM>1
> SF
> textraw_bytesstartendpositionLengthtypeposition
> nietava[6e 69 65 74 61 76 61]071<ALPHANUM>1
> LCF
> textraw_bytesstartendpositionLengthtypeposition
> nietava[6e 69 65 74 61 76 61]071<ALPHANUM>1
> 
>
> Alphanumeric I think... so, it´s 'string', right? would that be a problem?
> Should be some other indication?
>
>
> Thanks again!
>
>
> *Evert*
>
> 2015-12-16 21:09 GMT-02:00 Erick Erickson <er...@gmail.com>:
>
> > I think you're still missing the critical bit. Highlighting is
> > completely separate from searching. In other words, you can search on
> > one field and highlight another. What field is searched is governed by
> > the "qf" parameter when using edismax and by the the "df" parameter
> > configured in your request handler in solrconfig.xml. These defaults
> > are overridden when you do a "fielded search" like
> >
> > q=content:nietava
> >
> > So this: q=content:nietava&hl=true&hl.fl=content
> > is searching the "content" field. The word you're looking for isn't in
> > the content field so naturally no docs are returned. And no
> > highlighting either.
> >
> > This: q=nietava&hl=true&hl.fl=content
> >
> > is searching somewhere else, thus getting the hit. We already know
> > that "nietava" is not in the content field because the first search
> > failed. You need to find out what field is being matched (probably
> > something like "text") and then try highlighting on _that_ field. Try
> > adding "debug=query" to the URL and look at the "parsed_query" section
> > of the return and you'll see what field(s) is/are actually being
> > searched against.
> >
> > NOTE: The field you highlight on _must_ have stored="true" in schema.xml.
> >
> > As to why "nietava" isn't being found in the content field, probably
> > you have some kind of analysis chain configured for that field that
> > isn't searching as you expect. See the admin/analysis page for some
> > insight into why that would be. The most frequent reason is that the
> > field is a "string" type which is not broken up into words. Another
> > possibility is that your analysis chain is leaving in the quotes or
> > something similar. As James says, looking at admin/analysis is a good
> > way to figure this out.
> >
> > I still strongly recommend you go from the stock techproducts example
> > and get familiar with how Solr (and highlighting) work before jumping
> > in and changing things. There are a number of ways things can be
> > mis-configured and trying to change several things at once is a fine
> > way to go mad. The admin UI>>schema browser is another way you can see
> > what kind of terms are _actually_ in your index in a particular field.
> >
> > Best,
> > Erick
> >
> >
> >
> >
> > On Wed, Dec 16, 2015 at 12:26 PM, Teague James <teaguej@insystechinc.com
> >
> > wrote:
> > > Sorry to hear that didn't work! Let me ask a couple of questions...
> > >
> > > Have you tried the analyzer inside of the Admin Interface? It has
> helped
> > me sort out a number of highlighting issues in the past. To access it, go
> > to your Admin interface, select your core, then select Analysis from the
> > list of options on the left. In the analyzer, enter the term you are
> > indexing in the top left (in other words the term in the document you are
> > indexing that you expect to get a hit on) and right input fields. Select
> > the field that it is destined for (in your case that would be 'content'),
> > then hit analyze. Helps if you have a big screen!
> > >
> > > This will show you the impact of the various filter factories that you
> > have engaged and their effect on whether or not a 'hit' is being
> generated.
> > Hits are idietified by a very feint highlight. (PSST... Developers... It
> > would be really cool if the highlight color were more visible or
> > customizable... Thanks y'all) If it looks like you're getting hits, but
> not
> > getting highlighting, then open up a new tab with the Admin's query
> > interface. Same place on the left as the analyzer. Replace the "*:*" with
> > your search term (assuming you already indexed your document) and if
> > necessary you can put something in the FQ like "id:123456" to target a
> > specific record.
> > >
> > > Did you get a hit? If no, then it's not highlighting that's the issue.
> > If yes, then try dumping this in your address bar (using your URL/IP,
> > search term, and core name of course. The fq= is an example) :
> > > http://[URL/IP]/solr/[CORE-NAME]/select?fq=id:123456&q="[SEARCH-TERM]"
> > >
> > > That will dump Solr's output to your browser where you can see exactly
> > what is getting hit.
> > >
> > > Hope that helps! Let me know how it goes. Good luck.
> > >
> > > -Teague
> > >
> > > -----Original Message-----
> > > From: Evert R. [mailto:evert.ramos@gmail.com]
> > > Sent: Wednesday, December 16, 2015 1:46 PM
> > > To: solr-user <so...@lucene.apache.org>
> > > Subject: Re: Solr Basic Configuration - Highlight - Begginer
> > >
> > > Hi Teague!
> > >
> > > I configured the solrconf.xml and schema.xml exactly the way you did,
> > only substituting the word 'documentText' per 'content' used by the
> > techproducts sample, I reindex through :
> > >
> > >  curl '
> > >
> >
> http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true
> > '
> > > -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"
> > >
> > > with the same result.... no highlight in the respond as below:
> > >
> > > "highlighting": { "pdf1": {} }
> > >
> > > =(
> > >
> > > Really... do not know what to do...
> > >
> > > Thanks for your time, if you have any more suggestion where I could be
> > missing something... please let me know.
> > >
> > >
> > > Best regards,
> > >
> > > *Evert*
> > >
> > > 2015-12-16 15:30 GMT-02:00 Teague James <te...@insystechinc.com>:
> > >
> > >> Hi Evert,
> > >>
> > >> I recently needed help with phrase highlighting and was pointed to the
> > >> FastVectorHighlighter which worked out great. I just made a change to
> > >> the configuration to add generateWordParts="0" and
> > >> generateNumberParts="0" so that searches for things like "1a" would
> > >> get highlighted correctly. You may or may not need that feature. You
> > >> can always remove them or change the value to "1" to switch them on
> > explicitly. Anyway, hope this helps!
> > >>
> > >> solrconfig.xml (partial snip)
> > >> <requestHandler name="/select" class="solr.SearchHandler">
> > >>                 <lst name="defaults">
> > >>                         <str name="wt">xml</str>
> > >>                         <str name="echoParams">explicit</str>
> > >>                         <int name="rows">10</int>
> > >>                         <str name="df">documentText</str>
> > >>                         <str name="hl">on</str>
> > >>                         <str name="hl.fl">text</str>
> > >>                         <str
> > name="hl.useFastVectorHighlighter">true</str>
> > >>                         <str name="hl.snippets">100</str>
> > >>                         <str name="hl.tag.pre"><b></str>
> > >>                         <str name="hl.tag.post"></b></str>
> > >>                 </lst>
> > >> </requestHandler>
> > >>
> > >> schema.xml (partial snip)
> > >>    <field name="id" type="string" indexed="true" stored="true"
> > >> required="true" multiValued="false" />
> > >>    <field name="documentText" type="text_general" indexed="true"
> > >> multivalued="true" termVectors="true" termOffsets="true"
> > >> termPositions="true" />
> > >>
> > >> <fieldType name="text_general" class="solr.TextField"
> > >> positionIncrementGap="100">
> > >>         <analyzer type="index">
> > >>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > >>                 <filter class="solr.StopFilterFactory"
> ignoreCase="true"
> > >> words="stopwords.txt" />
> > >>                 <filter class="solr.WordDelimiterFilterFactory"
> > >> catenateAll="1" preserveOriginal="1" generateNumberParts="0"
> > >> generateWordParts="0" />
> > >>                 <filter class="solr.SynonymFilterFactory"
> > >> synonyms="index_synonyms.txt" ignoreCase="true" expand="true"/>
> > >>                 <filter class="solr.LowerCaseFilterFactory"/>
> > >>                 <filter class="solr.PorterStemFilterFactory"/>
> > >>                 <filter class="solr.ApostropheFilterFactory"/>
> > >>         </analyzer>
> > >>         <analyzer type="query">
> > >>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > >>                 <filter class="solr.WordDelimiterFilterFactory"
> > >> catenateAll="1" preserveOriginal="1" generateWordParts="0" />
> > >>                 <filter class="solr.StopFilterFactory"
> ignoreCase="true"
> > >> words="stopwords.txt" />
> > >>                 <filter class="solr.LowerCaseFilterFactory"/>
> > >>                 <filter class="solr.ApostropheFilterFactory"/>
> > >>         </analyzer>
> > >> </fieldType>
> > >>
> > >> -Teague
> > >>
> > >> From: Evert R. [mailto:evert.ramos@gmail.com]
> > >> Sent: Tuesday, December 15, 2015 6:25 AM
> > >> To: solr-user@lucene.apache.org
> > >> Subject: Solr Basic Configuration - Highlight - Begginer
> > >>
> > >> Hi there!
> > >>
> > >> It´s my first installation, not sure if here is the right channel...
> > >>
> > >> Here is my steps:
> > >>
> > >> 1. Set up a basic install of solr 5.4.0
> > >>
> > >> 2. Create a new core through command line (bin/solr create -c test)
> > >>
> > >> 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)
> > >>
> > >> 4. Query over the browser and it brings the correct search, but it
> > >> does not show the part of the text I am querying, the highlight.
> > >>
> > >>   I have already flagled the 'hl' option. But still it does not
> word...
> > >>
> > >> Exemple: I am looking for the word 'peace' in my pdf file (book) I
> > >> have 4 matches for this word, it shows me the book name (pdf file) but
> > >> does not bring which part of the text it has the word peace on it.
> > >>
> > >>
> > >> I am problably missing some configuration in schema.xml, which is
> > >> missing from my folder.... /solr/server/solr/test/conf/
> > >>
> > >> Or even the solrconfig.xml...
> > >>
> > >> I have read a bunch of things about highlight check these files,
> > >> copied the standard schema.xml to my core/conf folder, but still it
> > >> does not bring the highlight.
> > >>
> > >>
> > >> Attached a copy of my solrconfig.xml file.
> > >>
> > >>
> > >> I am very sorry for this, probably, dumb and too basic question...
> > >> First time I see solr in live.
> > >>
> > >>
> > >> Any help will be appreciated.
> > >>
> > >>
> > >>
> > >> Best regards,
> > >>
> > >>
> > >> Evert Ramos
> > >>
> > >> mailto:evert.ramos@gmail.com
> > >>
> > >>
> > >>
> > >
> >
>



-- 
Kind regards,

-Teague James
*Senior Web Applications Developer*
Insystech Inc.
teaguej@insystechinc.com
(703) 508-0008 (Cell)

Re: Solr Basic Configuration - Highlight - Begginer

Posted by "Evert R." <ev...@gmail.com>.

Hi Erick and Teague,


I found that when using the field 'text' it shows the pdf file result
id:pdf1 in this case, like:

http://localhost:8983/solr/techproducts/select?fq=id:pdf1&q=nietava

but when highlight, using the text field...nothing comes up...

http://localhost:8983/solr/techproducts/select?q=text:nietava&fq=id:pdf1&wt=json&indent=true&hl=true&hl.fl=text&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E

of even with the option

f.text.hl.snippets=2 under the hl.fl field.


I tried as well with the standard configuration, did it all over, reindexed
a couple times... and still did not work.

Also,

Using the Analysis, it brings below information:

ST
textraw_bytesstartendpositionLengthtypeposition
nietava[6e 69 65 74 61 76 61]071<ALPHANUM>1
SF
textraw_bytesstartendpositionLengthtypeposition
nietava[6e 69 65 74 61 76 61]071<ALPHANUM>1
LCF
textraw_bytesstartendpositionLengthtypeposition
nietava[6e 69 65 74 61 76 61]071<ALPHANUM>1


Alphanumeric I think... so, it´s 'string', right? would that be a problem?
Should be some other indication?


Thanks again!


*Evert*

2015-12-16 21:09 GMT-02:00 Erick Erickson <er...@gmail.com>:

> I think you're still missing the critical bit. Highlighting is
> completely separate from searching. In other words, you can search on
> one field and highlight another. What field is searched is governed by
> the "qf" parameter when using edismax and by the the "df" parameter
> configured in your request handler in solrconfig.xml. These defaults
> are overridden when you do a "fielded search" like
>
> q=content:nietava
>
> So this: q=content:nietava&hl=true&hl.fl=content
> is searching the "content" field. The word you're looking for isn't in
> the content field so naturally no docs are returned. And no
> highlighting either.
>
> This: q=nietava&hl=true&hl.fl=content
>
> is searching somewhere else, thus getting the hit. We already know
> that "nietava" is not in the content field because the first search
> failed. You need to find out what field is being matched (probably
> something like "text") and then try highlighting on _that_ field. Try
> adding "debug=query" to the URL and look at the "parsed_query" section
> of the return and you'll see what field(s) is/are actually being
> searched against.
>
> NOTE: The field you highlight on _must_ have stored="true" in schema.xml.
>
> As to why "nietava" isn't being found in the content field, probably
> you have some kind of analysis chain configured for that field that
> isn't searching as you expect. See the admin/analysis page for some
> insight into why that would be. The most frequent reason is that the
> field is a "string" type which is not broken up into words. Another
> possibility is that your analysis chain is leaving in the quotes or
> something similar. As James says, looking at admin/analysis is a good
> way to figure this out.
>
> I still strongly recommend you go from the stock techproducts example
> and get familiar with how Solr (and highlighting) work before jumping
> in and changing things. There are a number of ways things can be
> mis-configured and trying to change several things at once is a fine
> way to go mad. The admin UI>>schema browser is another way you can see
> what kind of terms are _actually_ in your index in a particular field.
>
> Best,
> Erick
>
>
>
>
> On Wed, Dec 16, 2015 at 12:26 PM, Teague James <te...@insystechinc.com>
> wrote:
> > Sorry to hear that didn't work! Let me ask a couple of questions...
> >
> > Have you tried the analyzer inside of the Admin Interface? It has helped
> me sort out a number of highlighting issues in the past. To access it, go
> to your Admin interface, select your core, then select Analysis from the
> list of options on the left. In the analyzer, enter the term you are
> indexing in the top left (in other words the term in the document you are
> indexing that you expect to get a hit on) and right input fields. Select
> the field that it is destined for (in your case that would be 'content'),
> then hit analyze. Helps if you have a big screen!
> >
> > This will show you the impact of the various filter factories that you
> have engaged and their effect on whether or not a 'hit' is being generated.
> Hits are idietified by a very feint highlight. (PSST... Developers... It
> would be really cool if the highlight color were more visible or
> customizable... Thanks y'all) If it looks like you're getting hits, but not
> getting highlighting, then open up a new tab with the Admin's query
> interface. Same place on the left as the analyzer. Replace the "*:*" with
> your search term (assuming you already indexed your document) and if
> necessary you can put something in the FQ like "id:123456" to target a
> specific record.
> >
> > Did you get a hit? If no, then it's not highlighting that's the issue.
> If yes, then try dumping this in your address bar (using your URL/IP,
> search term, and core name of course. The fq= is an example) :
> > http://[URL/IP]/solr/[CORE-NAME]/select?fq=id:123456&q="[SEARCH-TERM]"
> >
> > That will dump Solr's output to your browser where you can see exactly
> what is getting hit.
> >
> > Hope that helps! Let me know how it goes. Good luck.
> >
> > -Teague
> >
> > -----Original Message-----
> > From: Evert R. [mailto:evert.ramos@gmail.com]
> > Sent: Wednesday, December 16, 2015 1:46 PM
> > To: solr-user <so...@lucene.apache.org>
> > Subject: Re: Solr Basic Configuration - Highlight - Begginer
> >
> > Hi Teague!
> >
> > I configured the solrconf.xml and schema.xml exactly the way you did,
> only substituting the word 'documentText' per 'content' used by the
> techproducts sample, I reindex through :
> >
> >  curl '
> >
> http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true
> '
> > -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"
> >
> > with the same result.... no highlight in the respond as below:
> >
> > "highlighting": { "pdf1": {} }
> >
> > =(
> >
> > Really... do not know what to do...
> >
> > Thanks for your time, if you have any more suggestion where I could be
> missing something... please let me know.
> >
> >
> > Best regards,
> >
> > *Evert*
> >
> > 2015-12-16 15:30 GMT-02:00 Teague James <te...@insystechinc.com>:
> >
> >> Hi Evert,
> >>
> >> I recently needed help with phrase highlighting and was pointed to the
> >> FastVectorHighlighter which worked out great. I just made a change to
> >> the configuration to add generateWordParts="0" and
> >> generateNumberParts="0" so that searches for things like "1a" would
> >> get highlighted correctly. You may or may not need that feature. You
> >> can always remove them or change the value to "1" to switch them on
> explicitly. Anyway, hope this helps!
> >>
> >> solrconfig.xml (partial snip)
> >> <requestHandler name="/select" class="solr.SearchHandler">
> >>                 <lst name="defaults">
> >>                         <str name="wt">xml</str>
> >>                         <str name="echoParams">explicit</str>
> >>                         <int name="rows">10</int>
> >>                         <str name="df">documentText</str>
> >>                         <str name="hl">on</str>
> >>                         <str name="hl.fl">text</str>
> >>                         <str
> name="hl.useFastVectorHighlighter">true</str>
> >>                         <str name="hl.snippets">100</str>
> >>                         <str name="hl.tag.pre"><b></str>
> >>                         <str name="hl.tag.post"></b></str>
> >>                 </lst>
> >> </requestHandler>
> >>
> >> schema.xml (partial snip)
> >>    <field name="id" type="string" indexed="true" stored="true"
> >> required="true" multiValued="false" />
> >>    <field name="documentText" type="text_general" indexed="true"
> >> multivalued="true" termVectors="true" termOffsets="true"
> >> termPositions="true" />
> >>
> >> <fieldType name="text_general" class="solr.TextField"
> >> positionIncrementGap="100">
> >>         <analyzer type="index">
> >>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >>                 <filter class="solr.StopFilterFactory" ignoreCase="true"
> >> words="stopwords.txt" />
> >>                 <filter class="solr.WordDelimiterFilterFactory"
> >> catenateAll="1" preserveOriginal="1" generateNumberParts="0"
> >> generateWordParts="0" />
> >>                 <filter class="solr.SynonymFilterFactory"
> >> synonyms="index_synonyms.txt" ignoreCase="true" expand="true"/>
> >>                 <filter class="solr.LowerCaseFilterFactory"/>
> >>                 <filter class="solr.PorterStemFilterFactory"/>
> >>                 <filter class="solr.ApostropheFilterFactory"/>
> >>         </analyzer>
> >>         <analyzer type="query">
> >>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >>                 <filter class="solr.WordDelimiterFilterFactory"
> >> catenateAll="1" preserveOriginal="1" generateWordParts="0" />
> >>                 <filter class="solr.StopFilterFactory" ignoreCase="true"
> >> words="stopwords.txt" />
> >>                 <filter class="solr.LowerCaseFilterFactory"/>
> >>                 <filter class="solr.ApostropheFilterFactory"/>
> >>         </analyzer>
> >> </fieldType>
> >>
> >> -Teague
> >>
> >> From: Evert R. [mailto:evert.ramos@gmail.com]
> >> Sent: Tuesday, December 15, 2015 6:25 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Solr Basic Configuration - Highlight - Begginer
> >>
> >> Hi there!
> >>
> >> It´s my first installation, not sure if here is the right channel...
> >>
> >> Here is my steps:
> >>
> >> 1. Set up a basic install of solr 5.4.0
> >>
> >> 2. Create a new core through command line (bin/solr create -c test)
> >>
> >> 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)
> >>
> >> 4. Query over the browser and it brings the correct search, but it
> >> does not show the part of the text I am querying, the highlight.
> >>
> >>   I have already flagled the 'hl' option. But still it does not word...
> >>
> >> Exemple: I am looking for the word 'peace' in my pdf file (book) I
> >> have 4 matches for this word, it shows me the book name (pdf file) but
> >> does not bring which part of the text it has the word peace on it.
> >>
> >>
> >> I am problably missing some configuration in schema.xml, which is
> >> missing from my folder.... /solr/server/solr/test/conf/
> >>
> >> Or even the solrconfig.xml...
> >>
> >> I have read a bunch of things about highlight check these files,
> >> copied the standard schema.xml to my core/conf folder, but still it
> >> does not bring the highlight.
> >>
> >>
> >> Attached a copy of my solrconfig.xml file.
> >>
> >>
> >> I am very sorry for this, probably, dumb and too basic question...
> >> First time I see solr in live.
> >>
> >>
> >> Any help will be appreciated.
> >>
> >>
> >>
> >> Best regards,
> >>
> >>
> >> Evert Ramos
> >>
> >> mailto:evert.ramos@gmail.com
> >>
> >>
> >>
> >
>

Re: Solr Basic Configuration - Highlight - Begginer

Posted by Erick Erickson <er...@gmail.com>.

I think you're still missing the critical bit. Highlighting is
completely separate from searching. In other words, you can search on
one field and highlight another. What field is searched is governed by
the "qf" parameter when using edismax and by the the "df" parameter
configured in your request handler in solrconfig.xml. These defaults
are overridden when you do a "fielded search" like

q=content:nietava

So this: q=content:nietava&hl=true&hl.fl=content
is searching the "content" field. The word you're looking for isn't in
the content field so naturally no docs are returned. And no
highlighting either.

This: q=nietava&hl=true&hl.fl=content

is searching somewhere else, thus getting the hit. We already know
that "nietava" is not in the content field because the first search
failed. You need to find out what field is being matched (probably
something like "text") and then try highlighting on _that_ field. Try
adding "debug=query" to the URL and look at the "parsed_query" section
of the return and you'll see what field(s) is/are actually being
searched against.

NOTE: The field you highlight on _must_ have stored="true" in schema.xml.

As to why "nietava" isn't being found in the content field, probably
you have some kind of analysis chain configured for that field that
isn't searching as you expect. See the admin/analysis page for some
insight into why that would be. The most frequent reason is that the
field is a "string" type which is not broken up into words. Another
possibility is that your analysis chain is leaving in the quotes or
something similar. As James says, looking at admin/analysis is a good
way to figure this out.

I still strongly recommend you go from the stock techproducts example
and get familiar with how Solr (and highlighting) work before jumping
in and changing things. There are a number of ways things can be
mis-configured and trying to change several things at once is a fine
way to go mad. The admin UI>>schema browser is another way you can see
what kind of terms are _actually_ in your index in a particular field.

Best,
Erick




On Wed, Dec 16, 2015 at 12:26 PM, Teague James <te...@insystechinc.com> wrote:
> Sorry to hear that didn't work! Let me ask a couple of questions...
>
> Have you tried the analyzer inside of the Admin Interface? It has helped me sort out a number of highlighting issues in the past. To access it, go to your Admin interface, select your core, then select Analysis from the list of options on the left. In the analyzer, enter the term you are indexing in the top left (in other words the term in the document you are indexing that you expect to get a hit on) and right input fields. Select the field that it is destined for (in your case that would be 'content'), then hit analyze. Helps if you have a big screen!
>
> This will show you the impact of the various filter factories that you have engaged and their effect on whether or not a 'hit' is being generated. Hits are idietified by a very feint highlight. (PSST... Developers... It would be really cool if the highlight color were more visible or customizable... Thanks y'all) If it looks like you're getting hits, but not getting highlighting, then open up a new tab with the Admin's query interface. Same place on the left as the analyzer. Replace the "*:*" with your search term (assuming you already indexed your document) and if necessary you can put something in the FQ like "id:123456" to target a specific record.
>
> Did you get a hit? If no, then it's not highlighting that's the issue. If yes, then try dumping this in your address bar (using your URL/IP, search term, and core name of course. The fq= is an example) :
> http://[URL/IP]/solr/[CORE-NAME]/select?fq=id:123456&q="[SEARCH-TERM]"
>
> That will dump Solr's output to your browser where you can see exactly what is getting hit.
>
> Hope that helps! Let me know how it goes. Good luck.
>
> -Teague
>
> -----Original Message-----
> From: Evert R. [mailto:evert.ramos@gmail.com]
> Sent: Wednesday, December 16, 2015 1:46 PM
> To: solr-user <so...@lucene.apache.org>
> Subject: Re: Solr Basic Configuration - Highlight - Begginer
>
> Hi Teague!
>
> I configured the solrconf.xml and schema.xml exactly the way you did, only substituting the word 'documentText' per 'content' used by the techproducts sample, I reindex through :
>
>  curl '
> http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true'
> -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"
>
> with the same result.... no highlight in the respond as below:
>
> "highlighting": { "pdf1": {} }
>
> =(
>
> Really... do not know what to do...
>
> Thanks for your time, if you have any more suggestion where I could be missing something... please let me know.
>
>
> Best regards,
>
> *Evert*
>
> 2015-12-16 15:30 GMT-02:00 Teague James <te...@insystechinc.com>:
>
>> Hi Evert,
>>
>> I recently needed help with phrase highlighting and was pointed to the
>> FastVectorHighlighter which worked out great. I just made a change to
>> the configuration to add generateWordParts="0" and
>> generateNumberParts="0" so that searches for things like "1a" would
>> get highlighted correctly. You may or may not need that feature. You
>> can always remove them or change the value to "1" to switch them on explicitly. Anyway, hope this helps!
>>
>> solrconfig.xml (partial snip)
>> <requestHandler name="/select" class="solr.SearchHandler">
>>                 <lst name="defaults">
>>                         <str name="wt">xml</str>
>>                         <str name="echoParams">explicit</str>
>>                         <int name="rows">10</int>
>>                         <str name="df">documentText</str>
>>                         <str name="hl">on</str>
>>                         <str name="hl.fl">text</str>
>>                         <str name="hl.useFastVectorHighlighter">true</str>
>>                         <str name="hl.snippets">100</str>
>>                         <str name="hl.tag.pre"><b></str>
>>                         <str name="hl.tag.post"></b></str>
>>                 </lst>
>> </requestHandler>
>>
>> schema.xml (partial snip)
>>    <field name="id" type="string" indexed="true" stored="true"
>> required="true" multiValued="false" />
>>    <field name="documentText" type="text_general" indexed="true"
>> multivalued="true" termVectors="true" termOffsets="true"
>> termPositions="true" />
>>
>> <fieldType name="text_general" class="solr.TextField"
>> positionIncrementGap="100">
>>         <analyzer type="index">
>>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>                 <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords.txt" />
>>                 <filter class="solr.WordDelimiterFilterFactory"
>> catenateAll="1" preserveOriginal="1" generateNumberParts="0"
>> generateWordParts="0" />
>>                 <filter class="solr.SynonymFilterFactory"
>> synonyms="index_synonyms.txt" ignoreCase="true" expand="true"/>
>>                 <filter class="solr.LowerCaseFilterFactory"/>
>>                 <filter class="solr.PorterStemFilterFactory"/>
>>                 <filter class="solr.ApostropheFilterFactory"/>
>>         </analyzer>
>>         <analyzer type="query">
>>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>                 <filter class="solr.WordDelimiterFilterFactory"
>> catenateAll="1" preserveOriginal="1" generateWordParts="0" />
>>                 <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords.txt" />
>>                 <filter class="solr.LowerCaseFilterFactory"/>
>>                 <filter class="solr.ApostropheFilterFactory"/>
>>         </analyzer>
>> </fieldType>
>>
>> -Teague
>>
>> From: Evert R. [mailto:evert.ramos@gmail.com]
>> Sent: Tuesday, December 15, 2015 6:25 AM
>> To: solr-user@lucene.apache.org
>> Subject: Solr Basic Configuration - Highlight - Begginer
>>
>> Hi there!
>>
>> It´s my first installation, not sure if here is the right channel...
>>
>> Here is my steps:
>>
>> 1. Set up a basic install of solr 5.4.0
>>
>> 2. Create a new core through command line (bin/solr create -c test)
>>
>> 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)
>>
>> 4. Query over the browser and it brings the correct search, but it
>> does not show the part of the text I am querying, the highlight.
>>
>>   I have already flagled the 'hl' option. But still it does not word...
>>
>> Exemple: I am looking for the word 'peace' in my pdf file (book) I
>> have 4 matches for this word, it shows me the book name (pdf file) but
>> does not bring which part of the text it has the word peace on it.
>>
>>
>> I am problably missing some configuration in schema.xml, which is
>> missing from my folder.... /solr/server/solr/test/conf/
>>
>> Or even the solrconfig.xml...
>>
>> I have read a bunch of things about highlight check these files,
>> copied the standard schema.xml to my core/conf folder, but still it
>> does not bring the highlight.
>>
>>
>> Attached a copy of my solrconfig.xml file.
>>
>>
>> I am very sorry for this, probably, dumb and too basic question...
>> First time I see solr in live.
>>
>>
>> Any help will be appreciated.
>>
>>
>>
>> Best regards,
>>
>>
>> Evert Ramos
>>
>> mailto:evert.ramos@gmail.com
>>
>>
>>
>

RE: Solr Basic Configuration - Highlight - Begginer

Posted by Teague James <te...@insystechinc.com>.

Sorry to hear that didn't work! Let me ask a couple of questions...

Have you tried the analyzer inside of the Admin Interface? It has helped me sort out a number of highlighting issues in the past. To access it, go to your Admin interface, select your core, then select Analysis from the list of options on the left. In the analyzer, enter the term you are indexing in the top left (in other words the term in the document you are indexing that you expect to get a hit on) and right input fields. Select the field that it is destined for (in your case that would be 'content'), then hit analyze. Helps if you have a big screen!

This will show you the impact of the various filter factories that you have engaged and their effect on whether or not a 'hit' is being generated. Hits are idietified by a very feint highlight. (PSST... Developers... It would be really cool if the highlight color were more visible or customizable... Thanks y'all) If it looks like you're getting hits, but not getting highlighting, then open up a new tab with the Admin's query interface. Same place on the left as the analyzer. Replace the "*:*" with your search term (assuming you already indexed your document) and if necessary you can put something in the FQ like "id:123456" to target a specific record.

Did you get a hit? If no, then it's not highlighting that's the issue. If yes, then try dumping this in your address bar (using your URL/IP, search term, and core name of course. The fq= is an example) :
http://[URL/IP]/solr/[CORE-NAME]/select?fq=id:123456&q="[SEARCH-TERM]"

That will dump Solr's output to your browser where you can see exactly what is getting hit.

Hope that helps! Let me know how it goes. Good luck.

-Teague

-----Original Message-----
From: Evert R. [mailto:evert.ramos@gmail.com] 
Sent: Wednesday, December 16, 2015 1:46 PM
To: solr-user <so...@lucene.apache.org>
Subject: Re: Solr Basic Configuration - Highlight - Begginer

Hi Teague!

I configured the solrconf.xml and schema.xml exactly the way you did, only substituting the word 'documentText' per 'content' used by the techproducts sample, I reindex through :

 curl '
http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true'
-F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"

with the same result.... no highlight in the respond as below:

"highlighting": { "pdf1": {} }

=(

Really... do not know what to do...

Thanks for your time, if you have any more suggestion where I could be missing something... please let me know.


Best regards,

*Evert*

2015-12-16 15:30 GMT-02:00 Teague James <te...@insystechinc.com>:

> Hi Evert,
>
> I recently needed help with phrase highlighting and was pointed to the 
> FastVectorHighlighter which worked out great. I just made a change to 
> the configuration to add generateWordParts="0" and 
> generateNumberParts="0" so that searches for things like "1a" would 
> get highlighted correctly. You may or may not need that feature. You 
> can always remove them or change the value to "1" to switch them on explicitly. Anyway, hope this helps!
>
> solrconfig.xml (partial snip)
> <requestHandler name="/select" class="solr.SearchHandler">
>                 <lst name="defaults">
>                         <str name="wt">xml</str>
>                         <str name="echoParams">explicit</str>
>                         <int name="rows">10</int>
>                         <str name="df">documentText</str>
>                         <str name="hl">on</str>
>                         <str name="hl.fl">text</str>
>                         <str name="hl.useFastVectorHighlighter">true</str>
>                         <str name="hl.snippets">100</str>
>                         <str name="hl.tag.pre"><b></str>
>                         <str name="hl.tag.post"></b></str>
>                 </lst>
> </requestHandler>
>
> schema.xml (partial snip)
>    <field name="id" type="string" indexed="true" stored="true"
> required="true" multiValued="false" />
>    <field name="documentText" type="text_general" indexed="true"
> multivalued="true" termVectors="true" termOffsets="true"
> termPositions="true" />
>
> <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100">
>         <analyzer type="index">
>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                 <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>                 <filter class="solr.WordDelimiterFilterFactory"
> catenateAll="1" preserveOriginal="1" generateNumberParts="0"
> generateWordParts="0" />
>                 <filter class="solr.SynonymFilterFactory"
> synonyms="index_synonyms.txt" ignoreCase="true" expand="true"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>                 <filter class="solr.PorterStemFilterFactory"/>
>                 <filter class="solr.ApostropheFilterFactory"/>
>         </analyzer>
>         <analyzer type="query">
>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                 <filter class="solr.WordDelimiterFilterFactory"
> catenateAll="1" preserveOriginal="1" generateWordParts="0" />
>                 <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>                 <filter class="solr.LowerCaseFilterFactory"/>
>                 <filter class="solr.ApostropheFilterFactory"/>
>         </analyzer>
> </fieldType>
>
> -Teague
>
> From: Evert R. [mailto:evert.ramos@gmail.com]
> Sent: Tuesday, December 15, 2015 6:25 AM
> To: solr-user@lucene.apache.org
> Subject: Solr Basic Configuration - Highlight - Begginer
>
> Hi there!
>
> It´s my first installation, not sure if here is the right channel...
>
> Here is my steps:
>
> 1. Set up a basic install of solr 5.4.0
>
> 2. Create a new core through command line (bin/solr create -c test)
>
> 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)
>
> 4. Query over the browser and it brings the correct search, but it 
> does not show the part of the text I am querying, the highlight.
>
>   I have already flagled the 'hl' option. But still it does not word...
>
> Exemple: I am looking for the word 'peace' in my pdf file (book) I 
> have 4 matches for this word, it shows me the book name (pdf file) but 
> does not bring which part of the text it has the word peace on it.
>
>
> I am problably missing some configuration in schema.xml, which is 
> missing from my folder.... /solr/server/solr/test/conf/
>
> Or even the solrconfig.xml...
>
> I have read a bunch of things about highlight check these files, 
> copied the standard schema.xml to my core/conf folder, but still it 
> does not bring the highlight.
>
>
> Attached a copy of my solrconfig.xml file.
>
>
> I am very sorry for this, probably, dumb and too basic question... 
> First time I see solr in live.
>
>
> Any help will be appreciated.
>
>
>
> Best regards,
>
>
> Evert Ramos
>
> mailto:evert.ramos@gmail.com
>
>
>

Re: Solr Basic Configuration - Highlight - Begginer

Posted by "Evert R." <ev...@gmail.com>.

Hi Teague!

I configured the solrconf.xml and schema.xml exactly the way you did, only
substituting the word 'documentText' per 'content' used by the techproducts
sample, I reindex through :

 curl '
http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1&commit=true'
-F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"

with the same result.... no highlight in the respond as below:

"highlighting": { "pdf1": {} }

=(

Really... do not know what to do...

Thanks for your time, if you have any more suggestion where I could be
missing something... please let me know.


Best regards,

*Evert*

2015-12-16 15:30 GMT-02:00 Teague James <te...@insystechinc.com>:

> Hi Evert,
>
> I recently needed help with phrase highlighting and was pointed to the
> FastVectorHighlighter which worked out great. I just made a change to the
> configuration to add generateWordParts="0" and generateNumberParts="0" so
> that searches for things like "1a" would get highlighted correctly. You may
> or may not need that feature. You can always remove them or change the
> value to "1" to switch them on explicitly. Anyway, hope this helps!
>
> solrconfig.xml (partial snip)
> <requestHandler name="/select" class="solr.SearchHandler">
>                 <lst name="defaults">
>                         <str name="wt">xml</str>
>                         <str name="echoParams">explicit</str>
>                         <int name="rows">10</int>
>                         <str name="df">documentText</str>
>                         <str name="hl">on</str>
>                         <str name="hl.fl">text</str>
>                         <str name="hl.useFastVectorHighlighter">true</str>
>                         <str name="hl.snippets">100</str>
>                         <str name="hl.tag.pre"><b></str>
>                         <str name="hl.tag.post"></b></str>
>                 </lst>
> </requestHandler>
>
> schema.xml (partial snip)
>    <field name="id" type="string" indexed="true" stored="true"
> required="true" multiValued="false" />
>    <field name="documentText" type="text_general" indexed="true"
> multivalued="true" termVectors="true" termOffsets="true"
> termPositions="true" />
>
> <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100">
>         <analyzer type="index">
>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                 <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>                 <filter class="solr.WordDelimiterFilterFactory"
> catenateAll="1" preserveOriginal="1" generateNumberParts="0"
> generateWordParts="0" />
>                 <filter class="solr.SynonymFilterFactory"
> synonyms="index_synonyms.txt" ignoreCase="true" expand="true"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>                 <filter class="solr.PorterStemFilterFactory"/>
>                 <filter class="solr.ApostropheFilterFactory"/>
>         </analyzer>
>         <analyzer type="query">
>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                 <filter class="solr.WordDelimiterFilterFactory"
> catenateAll="1" preserveOriginal="1" generateWordParts="0" />
>                 <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>                 <filter class="solr.LowerCaseFilterFactory"/>
>                 <filter class="solr.ApostropheFilterFactory"/>
>         </analyzer>
> </fieldType>
>
> -Teague
>
> From: Evert R. [mailto:evert.ramos@gmail.com]
> Sent: Tuesday, December 15, 2015 6:25 AM
> To: solr-user@lucene.apache.org
> Subject: Solr Basic Configuration - Highlight - Begginer
>
> Hi there!
>
> It´s my first installation, not sure if here is the right channel...
>
> Here is my steps:
>
> 1. Set up a basic install of solr 5.4.0
>
> 2. Create a new core through command line (bin/solr create -c test)
>
> 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)
>
> 4. Query over the browser and it brings the correct search, but it does
> not show the part of the text I am querying, the highlight.
>
>   I have already flagled the 'hl' option. But still it does not word...
>
> Exemple: I am looking for the word 'peace' in my pdf file (book) I have 4
> matches for this word, it shows me the book name (pdf file) but does not
> bring which part of the text it has the word peace on it.
>
>
> I am problably missing some configuration in schema.xml, which is missing
> from my folder.... /solr/server/solr/test/conf/
>
> Or even the solrconfig.xml...
>
> I have read a bunch of things about highlight check these files, copied
> the standard schema.xml to my core/conf folder, but still it does not bring
> the highlight.
>
>
> Attached a copy of my solrconfig.xml file.
>
>
> I am very sorry for this, probably, dumb and too basic question... First
> time I see solr in live.
>
>
> Any help will be appreciated.
>
>
>
> Best regards,
>
>
> Evert Ramos
>
> mailto:evert.ramos@gmail.com
>
>
>

RE: Solr Basic Configuration - Highlight - Begginer

Posted by Teague James <te...@insystechinc.com>.

Hi Evert,

I recently needed help with phrase highlighting and was pointed to the FastVectorHighlighter which worked out great. I just made a change to the configuration to add generateWordParts="0" and generateNumberParts="0" so that searches for things like "1a" would get highlighted correctly. You may or may not need that feature. You can always remove them or change the value to "1" to switch them on explicitly. Anyway, hope this helps!

solrconfig.xml (partial snip)
<requestHandler name="/select" class="solr.SearchHandler">
		<lst name="defaults">
			<str name="wt">xml</str>
			<str name="echoParams">explicit</str>
			<int name="rows">10</int>
			<str name="df">documentText</str>
			<str name="hl">on</str>
			<str name="hl.fl">text</str>
			<str name="hl.useFastVectorHighlighter">true</str>
			<str name="hl.snippets">100</str>
			<str name="hl.tag.pre"><b></str>
			<str name="hl.tag.post"></b></str>
		</lst>
</requestHandler>

schema.xml (partial snip)
   <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> 
   <field name="documentText" type="text_general" indexed="true" multivalued="true" termVectors="true" termOffsets="true" termPositions="true" />

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
	<analyzer type="index">
		<tokenizer class="solr.WhitespaceTokenizerFactory"/>
		<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
		<filter class="solr.WordDelimiterFilterFactory" catenateAll="1" preserveOriginal="1" generateNumberParts="0" generateWordParts="0" />
		<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="true"/>
		<filter class="solr.LowerCaseFilterFactory"/>
		<filter class="solr.PorterStemFilterFactory"/>
		<filter class="solr.ApostropheFilterFactory"/>
	</analyzer>
	<analyzer type="query">
		<tokenizer class="solr.WhitespaceTokenizerFactory"/>
		<filter class="solr.WordDelimiterFilterFactory" catenateAll="1" preserveOriginal="1" generateWordParts="0" />
		<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
		<filter class="solr.LowerCaseFilterFactory"/>
		<filter class="solr.ApostropheFilterFactory"/>
	</analyzer>
</fieldType>

-Teague

From: Evert R. [mailto:evert.ramos@gmail.com] 
Sent: Tuesday, December 15, 2015 6:25 AM
To: solr-user@lucene.apache.org
Subject: Solr Basic Configuration - Highlight - Begginer

Hi there!

It´s my first installation, not sure if here is the right channel...

Here is my steps:

1. Set up a basic install of solr 5.4.0

2. Create a new core through command line (bin/solr create -c test)

3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)

4. Query over the browser and it brings the correct search, but it does not show the part of the text I am querying, the highlight. 

  I have already flagled the 'hl' option. But still it does not word...

Exemple: I am looking for the word 'peace' in my pdf file (book) I have 4 matches for this word, it shows me the book name (pdf file) but does not bring which part of the text it has the word peace on it.


I am problably missing some configuration in schema.xml, which is missing from my folder.... /solr/server/solr/test/conf/

Or even the solrconfig.xml...

I have read a bunch of things about highlight check these files, copied the standard schema.xml to my core/conf folder, but still it does not bring the highlight.


Attached a copy of my solrconfig.xml file.


I am very sorry for this, probably, dumb and too basic question... First time I see solr in live.


Any help will be appreciated.



Best regards,


Evert Ramos

mailto:evert.ramos@gmail.com