You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by satya swaroop <sa...@gmail.com> on 2010/12/14 09:29:14 UTC

Google like search

Hi All,
         Can we get the results like google  having some data  about the
search... I was able to get the data that is the first 300 characters of a
file, but it is not helpful for me, can i be get the data that is having the
first found key in that file....

Regards,
Satya

Re: Google like search

Posted by satya swaroop <sa...@gmail.com>.

Hi All,

         Thanks for your suggestions.. I got the result of what i expected..

Cheers,
Satya

Re: Google like search

Posted by Bhavnik Gajjar <bh...@gatewaynintec.com>.

Hi Satya,

Coming to your original question, there is one possibility to make Solr 
emit snippets like Google. Solr query syntax goes like,

http://localhost:8080/solr/DefaultInstance/select/?q=java&version=2.2&start=0&rows=10&indent=on&hl=true&hl.snippets=5&hl.fl=Field_Text&fl=Field_Text

Note that, the key thing used here is Highlighting feature provided by 
Solr. Executing above Solr query will result into two main block of 
results. First part would contain normal results, whereas another part 
would contain highlighted snippets, based on the parameters provided in 
query. One should pickup the later part (snippets) and show it in result 
page UI.

Cheers,

Bhavnik Gajjar

On 12/14/2010 8:35 PM, Tanguy Moal wrote:
> To do so, you have several possibilities, I don't know if there is a best one.
>
> It depends pretty much on the format of the input file(s), your
> affinities with a given programing language,some libraries you might
> need and the time you're ready to spend on this task.
>
> Consider having a look at SolrJ  (http://wiki.apache.org/solr/Solrj)
> or at the DataImportHandler
> (http://wiki.apache.org/solr/DataImportHandler) .
>
> Cheers,
>
> --
> Tanguy
>
> 2010/12/14 satya swaroop<sa...@gmail.com>:
>> Hi Tanguy,
>>                  Thanks for ur reply. sorry to ask this type of question.
>> how can we index each chapter of a file as seperate document.As for i know
>> we just give the path of file to solr to index it... Can u provide me any
>> sources for this type... I mean any blogs or wiki's...
>>
>> Regards,
>> satya

The contents of this eMail including the contents of attachment(s) are privileged and confidential material of Gateway NINtec Pvt. Ltd. (GNPL) and should not be disclosed to, used by or copied in any manner by anyone other than the intended addressee(s). If this eMail has been received by error, please advise the sender immediately and delete it from your system. The views expressed in this eMail message are those of the individual sender, except where the sender expressly, and with authority, states them to be the views of GNPL. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this eMail or any action taken in reliance on this eMail is strictly prohibited and may be unlawful. This eMail may contain viruses. GNPL has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this eMail. You should carry out your own virus checks before opening the eMail or attachment(s). GNPL is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt. GNPL reserves the right to monitor and review the content of all messages sent to or from this eMail address and may be stored on the GNPL eMail system. In case this eMail has reached you in error, and you  would no longer like to receive eMails from us, then please send an eMail to dnd@gatewaynintec.com

Re: Google like search

Posted by Tanguy Moal <ta...@gmail.com>.

To do so, you have several possibilities, I don't know if there is a best one.

It depends pretty much on the format of the input file(s), your
affinities with a given programing language,some libraries you might
need and the time you're ready to spend on this task.

Consider having a look at SolrJ  (http://wiki.apache.org/solr/Solrj)
or at the DataImportHandler
(http://wiki.apache.org/solr/DataImportHandler) .

Cheers,

--
Tanguy

2010/12/14 satya swaroop <sa...@gmail.com>:
> Hi Tanguy,
>                 Thanks for ur reply. sorry to ask this type of question.
> how can we index each chapter of a file as seperate document.As for i know
> we just give the path of file to solr to index it... Can u provide me any
> sources for this type... I mean any blogs or wiki's...
>
> Regards,
> satya
>

Re: Google like search

Posted by satya swaroop <sa...@gmail.com>.

Hi Tanguy,
                 Thanks for ur reply. sorry to ask this type of question.
how can we index each chapter of a file as seperate document.As for i know
we just give the path of file to solr to index it... Can u provide me any
sources for this type... I mean any blogs or wiki's...

Regards,
satya

Re: Google like search

Posted by Tanguy Moal <ta...@gmail.com>.

Satya,

In fact the highlighter will select the relevant part of the whole
text and return it with the matched terms highlighted.

If you do so for a whole book, you will face the issue spotted by Dave
(too long text).

To address that issue, you have the possibility to split your book in
chapters, and index each chapter as a unique document.

You would then be interested in adding a field to identify uniquely
each book (using ISBN number for example) and turn on grouping (or
collapsing) on that field ... (see this very good blog post :
http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/
)

Moreover, you might be interested by the following JIRA issue :
https://issues.apache.org/jira/browse/SOLR-2272 . Using this patch,
you could for example ensure that if a given document-chapter is
selected by the query, then another (or several) document(s) (maybe a
father "book-document", or all the other chapters) get selected along
the way (by doing a self-join on the ISBN number). Here again,
grouping afterward would return a group of document representing each
book.

Good luck!

--
Tanguy

2010/12/14 Dave Searle <da...@magicalia.com>:
> Highlighting is exactly what you need, although if you highlight the whole book, this could slow down your queries. Index/store the first 5000-10000 characters and see how you get on
>
> -----Original Message-----
> From: satya swaroop [mailto:satya.yadalam@gmail.com]
> Sent: 14 December 2010 10:08
> To: solr-user@lucene.apache.org
> Subject: Re: Google like search
>
> Hi Tanguy,
>                  I am not asking for highlighting.. I think it can be
> explained with an example.. Here i illustarte it::
>
> when i post the query like dis::
>
> http://localhost:8080/solr/select?q=Java&version=2.2&start=0&rows=10&indent=on
>
> i Would be getting the result as follows::
>
> -<response>
> -<lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">1</int>
> </lst>
> -<result name="response" numFound="1" start="0">
> -<doc>
> <str name="filename">Java%20debugging.pdf</str>
> <str name="id">1222222</str>
> -<arr name="text1">
> -<str>
> Table of Contents
> If you're viewing this document online, you can click any of the topics
> below to link directly to that section.
> 1. Tutorial tips 2
> 2. Introducing debugging  4
> 3. Overview of the basics 6
> 4. Lessons in client-side debugging 11
> 5. Lessons in server-side debugging 15
> 6. Multithread debugging 18
> 7. Jikes overview 20
> </str>
> </arr>
> </doc>
> </result>
> </response>
>
> Here the str field contains the first 300 characters of the file as i kept a
> field to copy only 300 characters in schema.xml...
> But i dont want the content like dis.. Is there any way to make an o/p as
> follows::
>
> <str> Java is one of the best language,java is easy to learn...</str>
>
>
> where this content is at start of the chapter,where the first word of java
> is occured in the file...
>
>
> Regards,
> Satya
>

RE: Google like search

Posted by Dave Searle <da...@magicalia.com>.

Highlighting is exactly what you need, although if you highlight the whole book, this could slow down your queries. Index/store the first 5000-10000 characters and see how you get on

-----Original Message-----
From: satya swaroop [mailto:satya.yadalam@gmail.com] 
Sent: 14 December 2010 10:08
To: solr-user@lucene.apache.org
Subject: Re: Google like search

Hi Tanguy,
                  I am not asking for highlighting.. I think it can be
explained with an example.. Here i illustarte it::

when i post the query like dis::

http://localhost:8080/solr/select?q=Java&version=2.2&start=0&rows=10&indent=on

i Would be getting the result as follows::

-<response>
-<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
</lst>
-<result name="response" numFound="1" start="0">
-<doc>
<str name="filename">Java%20debugging.pdf</str>
<str name="id">1222222</str>
-<arr name="text1">
-<str>
Table of Contents
If you're viewing this document online, you can click any of the topics
below to link directly to that section.
1. Tutorial tips 2
2. Introducing debugging  4
3. Overview of the basics 6
4. Lessons in client-side debugging 11
5. Lessons in server-side debugging 15
6. Multithread debugging 18
7. Jikes overview 20
</str>
</arr>
</doc>
</result>
</response>

Here the str field contains the first 300 characters of the file as i kept a
field to copy only 300 characters in schema.xml...
But i dont want the content like dis.. Is there any way to make an o/p as
follows::

<str> Java is one of the best language,java is easy to learn...</str>


where this content is at start of the chapter,where the first word of java
is occured in the file...


Regards,
Satya

Re: Google like search

Posted by satya swaroop <sa...@gmail.com>.

Hi Tanguy,
                  I am not asking for highlighting.. I think it can be
explained with an example.. Here i illustarte it::

when i post the query like dis::

http://localhost:8080/solr/select?q=Java&version=2.2&start=0&rows=10&indent=on

i Would be getting the result as follows::

-<response>
-<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
</lst>
-<result name="response" numFound="1" start="0">
-<doc>
<str name="filename">Java%20debugging.pdf</str>
<str name="id">1222222</str>
-<arr name="text1">
-<str>
Table of Contents
If you're viewing this document online, you can click any of the topics
below to link directly to that section.
1. Tutorial tips 2
2. Introducing debugging  4
3. Overview of the basics 6
4. Lessons in client-side debugging 11
5. Lessons in server-side debugging 15
6. Multithread debugging 18
7. Jikes overview 20
</str>
</arr>
</doc>
</result>
</response>

Here the str field contains the first 300 characters of the file as i kept a
field to copy only 300 characters in schema.xml...
But i dont want the content like dis.. Is there any way to make an o/p as
follows::

<str> Java is one of the best language,java is easy to learn...</str>


where this content is at start of the chapter,where the first word of java
is occured in the file...


Regards,
Satya

Re: Google like search

Posted by Tanguy Moal <ta...@gmail.com>.

Hi Satya,

I think what you'e looking for is called "highlighting" in the sense
of "highlighting" the query terms in their matching context.

You could start by googling "solr highlight", surely the first results
will make sense.

Solr's wiki results are usually a good entry point :
http://wiki.apache.org/solr/HighlightingParameters .

Maybe I misunderstood your question, but I hope that'll help...

Regards,

Tanguy


2010/12/14 satya swaroop <sa...@gmail.com>:
> Hi All,
>         Can we get the results like google  having some data  about the
> search... I was able to get the data that is the first 300 characters of a
> file, but it is not helpful for me, can i be get the data that is having the
> first found key in that file....
>
> Regards,
> Satya
>