You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by chetan minajagi <ch...@yahoo.co.in> on 2005/01/20 07:34:42 UTC

help in indexing

Hi ,

It might seem elementary to most of you.
I am trying to build a search tool for internal use using lucene.
I have used the following
 for 
 .pdf   -->  PDFBOx
 .html -->  demo file of lucene(HTMLDocument)
 .xls   -->  poi
 
The indexing seems to work without throwing up any errors.
But,when i try to search i end up getting with zero hits always.
I have tried to use the same string that i see (System.out.print(Document)) but in vain.
Can somebody let me know where and what could be wrong.
Regards,
Chetan

		
---------------------------------
Do you Yahoo!?
 Yahoo! Search presents - Jib Jab's 'Second Term'

RE: help in indexing

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hello Chetan,

The code that comes with the Lucene book contains a little framework
for indexing rich-text documents.  It sounds like you may be able to
use it as-is, and extending it with a parser for Excel files, which we
didn't include in the code (whould we include it in the next edition?).
 While PDFBox comes with that handy Lucene-specific class that you are
using, it may be better for you to be in control of how exactly you
construct your Lucene documents.
c.f. http://www.lucenebook.com/search?query=framework

Otis

--- chetan minajagi <ch...@yahoo.co.in> wrote:

> Hi Karthik/Cocula,
> 
> Luke didn't work but Limo helped.I seem to get results when i use
> Limo for my text/xls files.
> Now the problem with pdf search
> The problem that i see is the "summary" field as seen through LIMO is
> not indexed and hence no hits.
> I'm using the default document got by 
>  LucenePDFDocument.getDocument(myPdfFile);
> So how do i ensure that a few of the fields in this which are not
> indexed are set to indexed.
> As far as I can see I can only probe whether a field is indexed or
> not by using 
> Field.isIndexed() but is there a method by which i can set to
> indexed.
> can someone provide any help or pointers in this regard?
>  
> Thanks & Regards,
> Chetan
> 
> Karthik N S <ka...@controlnet.co.in> wrote:
> Hi
> 
> Probably u need to use the Luke S/w to peek insid tu'r Indexer,Use it
> then
> come back for more help
> 
> 
> Karthik
> 
> 
> -----Original Message-----
> From: chetan minajagi [mailto:chetan_vm@yahoo.co.in]
> Sent: Thursday, January 20, 2005 12:05 PM
> To: lucene-user@jakarta.apache.org
> Subject: help in indexing
> 
> 
> Hi ,
> 
> It might seem elementary to most of you.
> I am trying to build a search tool for internal use using lucene.
> I have used the following
> for
> .pdf --> PDFBOx
> .html --> demo file of lucene(HTMLDocument)
> .xls --> poi
> 
> The indexing seems to work without throwing up any errors.
> But,when i try to search i end up getting with zero hits always.
> I have tried to use the same string that i see
> (System.out.print(Document))
> but in vain.
> Can somebody let me know where and what could be wrong.
> Regards,
> Chetan
> 
> 
> ---------------------------------
> Do you Yahoo!?
> Yahoo! Search presents - Jib Jab's 'Second Term'
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> 		
> ---------------------------------
> Do you Yahoo!?
>  Yahoo! Mail - You care about security. So do we.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: help in indexing

Posted by chetan minajagi <ch...@yahoo.co.in>.
Hi Karthik/Cocula,

Luke didn't work but Limo helped.I seem to get results when i use Limo for my text/xls files.
Now the problem with pdf search
The problem that i see is the "summary" field as seen through LIMO is not indexed and hence no hits.
I'm using the default document got by 
 LucenePDFDocument.getDocument(myPdfFile);
So how do i ensure that a few of the fields in this which are not indexed are set to indexed.
As far as I can see I can only probe whether a field is indexed or not by using 
Field.isIndexed() but is there a method by which i can set to indexed.
can someone provide any help or pointers in this regard?
 
Thanks & Regards,
Chetan

Karthik N S <ka...@controlnet.co.in> wrote:
Hi

Probably u need to use the Luke S/w to peek insid tu'r Indexer,Use it then
come back for more help


Karthik


-----Original Message-----
From: chetan minajagi [mailto:chetan_vm@yahoo.co.in]
Sent: Thursday, January 20, 2005 12:05 PM
To: lucene-user@jakarta.apache.org
Subject: help in indexing


Hi ,

It might seem elementary to most of you.
I am trying to build a search tool for internal use using lucene.
I have used the following
for
.pdf --> PDFBOx
.html --> demo file of lucene(HTMLDocument)
.xls --> poi

The indexing seems to work without throwing up any errors.
But,when i try to search i end up getting with zero hits always.
I have tried to use the same string that i see (System.out.print(Document))
but in vain.
Can somebody let me know where and what could be wrong.
Regards,
Chetan


---------------------------------
Do you Yahoo!?
Yahoo! Search presents - Jib Jab's 'Second Term'


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


		
---------------------------------
Do you Yahoo!?
 Yahoo! Mail - You care about security. So do we.

RE: help in indexing

Posted by Karthik N S <ka...@controlnet.co.in>.
Hi

 Probably u need to use the Luke S/w to peek insid tu'r Indexer,Use it then
  come back for more help


Karthik


-----Original Message-----
From: chetan minajagi [mailto:chetan_vm@yahoo.co.in]
Sent: Thursday, January 20, 2005 12:05 PM
To: lucene-user@jakarta.apache.org
Subject: help in indexing


Hi ,

It might seem elementary to most of you.
I am trying to build a search tool for internal use using lucene.
I have used the following
 for
 .pdf   -->  PDFBOx
 .html -->  demo file of lucene(HTMLDocument)
 .xls   -->  poi

The indexing seems to work without throwing up any errors.
But,when i try to search i end up getting with zero hits always.
I have tried to use the same string that i see (System.out.print(Document))
but in vain.
Can somebody let me know where and what could be wrong.
Regards,
Chetan


---------------------------------
Do you Yahoo!?
 Yahoo! Search presents - Jib Jab's 'Second Term'


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org