You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by hermit <he...@freestart.hu> on 2003/02/03 08:39:04 UTC

'-' character not interpreted correctly in field names

Hello!

I have a problem, a big one. I have successfully indexed 600 MB of XML 
data, but the search can't give any results if the field contains any  
'-' characters .
For example: compound@cgx-code:[2 - 5] must match at least two results 
based on my XML data but it gives nothing.

Can you advice me a simple solution? Or is it a bug?

    The Hermit


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: Indexing Growth

Posted by Rob Outar <ro...@ideorlando.org>.
Just about everything calls getValue:

 public synchronized String getValue(String key, File file)
    throws ParseException,  IOException {

        Document doc = getDocument(file);
        return doc.get(key.toLowerCase());
    }

which calls get document:

 private synchronized Document getDocument(File file) throws
MalformedURLException,
    IOException {


        checkForIndexChange();
        Term t       = new Term(PATH,
file.toURI().toString().toLowerCase());
        TermQuery tQ = new TermQuery(t);

        Hits hits    = this.searcher.search(tQ);

        if (hits.length() == 1) {
            return hits.doc(0);
        }
        //this should never happen, cannot have a URL that returns 2 hits
        //that would mean the same file has been indexed twice
        else {
            return null;
        }


Thanks,

Rob


-----Original Message-----
From: Michael Barry [mailto:mbarry@cos.com]
Sent: Wednesday, April 02, 2003 9:20 AM
To: Lucene Users List
Subject: Re: Indexing Growth


Sounds like you either have an indexer that's run amok (maybe
a background process that's continually re-indexing your sandbox -
or expanding outside your sandbox) or your Query code is doing more
than querying. It's not behaviour I've seen. Without a snippet of
Query code, it's going to be hard to help.

Rob Outar wrote:

>Hi all,
>
>	This is too odd and I do not even know where to start.  We built a Windows
>Explorer type tool that indexes all files in a "sabdboxed" file system.
>Each Lucene document contains stuff like path, parent directory, last
>modified date, file_lock etc..  When we display the files in a given
>directory through the tool we query the index about 5 times for each file
in
>the repository, this is done so we can display all attributes in the index
>about that file.  So for example if there are 5 files in the directory,
each
>file has 6 attributes that means about 30 term queries are executed.  The
>initial index when build it about 10.4megs, after accessing about 3 or 4
>directories the index size increased to over 100megs, and we did not add
>anything!!  All we are doing is querying!!  Yesterday after querying became
>ungodly slow, we looked at the index size it had grown from 10megs to 1.5GB
>(granted we tested the tool all morning).  But I have no idea why the index
>is growing like this.  ANY help would be greatly appreciated.
>
>
>Thanks,
>
>Rob
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Indexing Growth

Posted by Michael Barry <mb...@cos.com>.
Sounds like you either have an indexer that's run amok (maybe
a background process that's continually re-indexing your sandbox -
or expanding outside your sandbox) or your Query code is doing more
than querying. It's not behaviour I've seen. Without a snippet of
Query code, it's going to be hard to help.

Rob Outar wrote:

>Hi all,
>
>	This is too odd and I do not even know where to start.  We built a Windows
>Explorer type tool that indexes all files in a "sabdboxed" file system.
>Each Lucene document contains stuff like path, parent directory, last
>modified date, file_lock etc..  When we display the files in a given
>directory through the tool we query the index about 5 times for each file in
>the repository, this is done so we can display all attributes in the index
>about that file.  So for example if there are 5 files in the directory, each
>file has 6 attributes that means about 30 term queries are executed.  The
>initial index when build it about 10.4megs, after accessing about 3 or 4
>directories the index size increased to over 100megs, and we did not add
>anything!!  All we are doing is querying!!  Yesterday after querying became
>ungodly slow, we looked at the index size it had grown from 10megs to 1.5GB
>(granted we tested the tool all morning).  But I have no idea why the index
>is growing like this.  ANY help would be greatly appreciated.
>
>
>Thanks,
>
>Rob
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: Indexing Growth

Posted by Alex Murzaku <li...@lissus.com>.
Hi Rob,

After you build the initial index (10.4M) you can see that the directory
contains some files - those are the index files. I am sure that the size
and number of those initial files doesn't change while you query Lucene.
When you say the index grows, I assume you are referring to other files
that are added to the index directory therefore increasing the size of
the directory and not the initial index files. One experiment you could
do is to inspect the index directory after each query. This should give
you an idea of what is going on. Also, it might help sending some code
snippets to the list to get more informed help...

Alex

-----Original Message-----
From: Rob Outar [mailto:routar@ideorlando.org] 
Sent: Wednesday, April 02, 2003 8:51 AM
To: Lucene Users List
Subject: RE: Indexing Growth
Importance: High


Hi all,

	This is too odd and I do not even know where to start.  We built
a Windows Explorer type tool that indexes all files in a "sabdboxed"
file system. Each Lucene document contains stuff like path, parent
directory, last modified date, file_lock etc..  When we display the
files in a given directory through the tool we query the index about 5
times for each file in the repository, this is done so we can display
all attributes in the index about that file.  So for example if there
are 5 files in the directory, each file has 6 attributes that means
about 30 term queries are executed.  The initial index when build it
about 10.4megs, after accessing about 3 or 4 directories the index size
increased to over 100megs, and we did not add anything!!  All we are
doing is querying!!  Yesterday after querying became ungodly slow, we
looked at the index size it had grown from 10megs to 1.5GB (granted we
tested the tool all morning).  But I have no idea why the index is
growing like this.  ANY help would be greatly appreciated.


Thanks,

Rob


-----Original Message-----
From: Rob Outar [mailto:routar@ideorlando.org]
Sent: Tuesday, April 01, 2003 3:32 PM
To: Lucene Users List; lists@lissus.com
Subject: RE: Indexing Growth


I reuse the same searcher, analyzer and Query object I don't think that
should cause the problem.

Thanks,

Rob


-----Original Message-----
From: Alex Murzaku [mailto:lists@lissus.com]
Sent: Tuesday, April 01, 2003 3:22 PM
To: 'Lucene Users List'
Subject: RE: Indexing Growth


I don't know if I remember this correctly: I think for every query
(term) is created a file but the file should disappear after the query
is completed.

-----Original Message-----
From: Rob Outar [mailto:routar@ideorlando.org]
Sent: Tuesday, April 01, 2003 3:13 PM
To: Lucene Users List
Subject: RE: Indexing Growth


Dang I must be doing something crazy cause all my client app does is
search and the index size increases.  I do not add anything.

Thanks,

Rob


-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
Sent: Tuesday, April 01, 2003 3:07 PM
To: Lucene Users List
Subject: Re: Indexing Growth


Only when you add new documents to it.

Otis

--- Rob Outar <ro...@ideorlando.org> wrote:
> Hi all,
>
> 	Will the index grow based on queries alone?  I build my index,
then
> run several queries against it and afterwards I check the size of the 
> index and in some cases it has grown quite a bit although I did not 
> add anything???
>
> Anyhow please let me know the cases when the index will grow.
>
> Thanks,
>
> Rob
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://platinum.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: Indexing Growth

Posted by Rob Outar <ro...@ideorlando.org>.
Additional info on the problem, the index contains several 1kb files and
several files that have different names, but the same file size.  It looks
like the files that comprise the index are being duplicated causing the
index to become huge.

Thanks,

Rob

-----Original Message-----
From: Rob Outar [mailto:routar@ideorlando.org]
Sent: Wednesday, April 02, 2003 8:51 AM
To: Lucene Users List
Subject: RE: Indexing Growth
Importance: High


Hi all,

	This is too odd and I do not even know where to start.  We built a Windows
Explorer type tool that indexes all files in a "sabdboxed" file system.
Each Lucene document contains stuff like path, parent directory, last
modified date, file_lock etc..  When we display the files in a given
directory through the tool we query the index about 5 times for each file in
the repository, this is done so we can display all attributes in the index
about that file.  So for example if there are 5 files in the directory, each
file has 6 attributes that means about 30 term queries are executed.  The
initial index when build it about 10.4megs, after accessing about 3 or 4
directories the index size increased to over 100megs, and we did not add
anything!!  All we are doing is querying!!  Yesterday after querying became
ungodly slow, we looked at the index size it had grown from 10megs to 1.5GB
(granted we tested the tool all morning).  But I have no idea why the index
is growing like this.  ANY help would be greatly appreciated.


Thanks,

Rob


-----Original Message-----
From: Rob Outar [mailto:routar@ideorlando.org]
Sent: Tuesday, April 01, 2003 3:32 PM
To: Lucene Users List; lists@lissus.com
Subject: RE: Indexing Growth


I reuse the same searcher, analyzer and Query object I don't think that
should cause the problem.

Thanks,

Rob


-----Original Message-----
From: Alex Murzaku [mailto:lists@lissus.com]
Sent: Tuesday, April 01, 2003 3:22 PM
To: 'Lucene Users List'
Subject: RE: Indexing Growth


I don't know if I remember this correctly: I think for every query
(term) is created a file but the file should disappear after the query
is completed.

-----Original Message-----
From: Rob Outar [mailto:routar@ideorlando.org]
Sent: Tuesday, April 01, 2003 3:13 PM
To: Lucene Users List
Subject: RE: Indexing Growth


Dang I must be doing something crazy cause all my client app does is
search and the index size increases.  I do not add anything.

Thanks,

Rob


-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
Sent: Tuesday, April 01, 2003 3:07 PM
To: Lucene Users List
Subject: Re: Indexing Growth


Only when you add new documents to it.

Otis

--- Rob Outar <ro...@ideorlando.org> wrote:
> Hi all,
>
> 	Will the index grow based on queries alone?  I build my index,
then
> run several queries against it and afterwards I check the size of the
> index and
> in some cases it has grown quite a bit although I did not add
> anything???
>
> Anyhow please let me know the cases when the index will grow.
>
> Thanks,
>
> Rob
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://platinum.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: Indexing Growth

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Rob,

Here are some ideas.

1. Which version of Lucene are you using?  HEAD from CVS?  If so,
FSDirectory was patched recently (renameSomething method), make sure
there is no bug in it.

2. IndexReader/IndexWriter allow you to set the stream for logging 
(can't remember the method name now).  Those classes will then print
out what they are doing to that stream.

3. Write a simple class that opens your index before any queries are
made and prints out the number of documents, the number of terms, etc.
Then run your queries, and run the application again and see if the
numbers are different.

4. Do as in 3, but run 1 single query.  Observe difference between pre
and post-query numbers.  Run another query.  Observe the difference in
numbers again.  Is there a pattern?

5. To make sure it's not some background process adding documents, run
your application (from 3.) twice, but without querying the index, just
waiting some time between runs.

Otis



--- Rob Outar <ro...@ideorlando.org> wrote:
> Hi all,
> 
> 	This is too odd and I do not even know where to start.  We built a
> Windows
> Explorer type tool that indexes all files in a "sabdboxed" file
> system.
> Each Lucene document contains stuff like path, parent directory, last
> modified date, file_lock etc..  When we display the files in a given
> directory through the tool we query the index about 5 times for each
> file in
> the repository, this is done so we can display all attributes in the
> index
> about that file.  So for example if there are 5 files in the
> directory, each
> file has 6 attributes that means about 30 term queries are executed. 
> The
> initial index when build it about 10.4megs, after accessing about 3
> or 4
> directories the index size increased to over 100megs, and we did not
> add
> anything!!  All we are doing is querying!!  Yesterday after querying
> became
> ungodly slow, we looked at the index size it had grown from 10megs to
> 1.5GB
> (granted we tested the tool all morning).  But I have no idea why the
> index
> is growing like this.  ANY help would be greatly appreciated.
> 
> 
> Thanks,
> 
> Rob
> 
> 
> -----Original Message-----
> From: Rob Outar [mailto:routar@ideorlando.org]
> Sent: Tuesday, April 01, 2003 3:32 PM
> To: Lucene Users List; lists@lissus.com
> Subject: RE: Indexing Growth
> 
> 
> I reuse the same searcher, analyzer and Query object I don't think
> that
> should cause the problem.
> 
> Thanks,
> 
> Rob
> 
> 
> -----Original Message-----
> From: Alex Murzaku [mailto:lists@lissus.com]
> Sent: Tuesday, April 01, 2003 3:22 PM
> To: 'Lucene Users List'
> Subject: RE: Indexing Growth
> 
> 
> I don't know if I remember this correctly: I think for every query
> (term) is created a file but the file should disappear after the
> query
> is completed.
> 
> -----Original Message-----
> From: Rob Outar [mailto:routar@ideorlando.org]
> Sent: Tuesday, April 01, 2003 3:13 PM
> To: Lucene Users List
> Subject: RE: Indexing Growth
> 
> 
> Dang I must be doing something crazy cause all my client app does is
> search and the index size increases.  I do not add anything.
> 
> Thanks,
> 
> Rob
> 
> 
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> Sent: Tuesday, April 01, 2003 3:07 PM
> To: Lucene Users List
> Subject: Re: Indexing Growth
> 
> 
> Only when you add new documents to it.
> 
> Otis
> 
> --- Rob Outar <ro...@ideorlando.org> wrote:
> > Hi all,
> >
> > 	Will the index grow based on queries alone?  I build my index,
> then
> > run several queries against it and afterwards I check the size of
> the
> > index and
> > in some cases it has grown quite a bit although I did not add
> > anything???
> >
> > Anyhow please let me know the cases when the index will grow.
> >
> > Thanks,
> >
> > Rob
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> >
> 
> 
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Tax Center - File online, calculators, forms, and more
> http://platinum.yahoo.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://tax.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: Indexing Growth

Posted by Rob Outar <ro...@ideorlando.org>.
Hi all,

	This is too odd and I do not even know where to start.  We built a Windows
Explorer type tool that indexes all files in a "sabdboxed" file system.
Each Lucene document contains stuff like path, parent directory, last
modified date, file_lock etc..  When we display the files in a given
directory through the tool we query the index about 5 times for each file in
the repository, this is done so we can display all attributes in the index
about that file.  So for example if there are 5 files in the directory, each
file has 6 attributes that means about 30 term queries are executed.  The
initial index when build it about 10.4megs, after accessing about 3 or 4
directories the index size increased to over 100megs, and we did not add
anything!!  All we are doing is querying!!  Yesterday after querying became
ungodly slow, we looked at the index size it had grown from 10megs to 1.5GB
(granted we tested the tool all morning).  But I have no idea why the index
is growing like this.  ANY help would be greatly appreciated.


Thanks,

Rob


-----Original Message-----
From: Rob Outar [mailto:routar@ideorlando.org]
Sent: Tuesday, April 01, 2003 3:32 PM
To: Lucene Users List; lists@lissus.com
Subject: RE: Indexing Growth


I reuse the same searcher, analyzer and Query object I don't think that
should cause the problem.

Thanks,

Rob


-----Original Message-----
From: Alex Murzaku [mailto:lists@lissus.com]
Sent: Tuesday, April 01, 2003 3:22 PM
To: 'Lucene Users List'
Subject: RE: Indexing Growth


I don't know if I remember this correctly: I think for every query
(term) is created a file but the file should disappear after the query
is completed.

-----Original Message-----
From: Rob Outar [mailto:routar@ideorlando.org]
Sent: Tuesday, April 01, 2003 3:13 PM
To: Lucene Users List
Subject: RE: Indexing Growth


Dang I must be doing something crazy cause all my client app does is
search and the index size increases.  I do not add anything.

Thanks,

Rob


-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
Sent: Tuesday, April 01, 2003 3:07 PM
To: Lucene Users List
Subject: Re: Indexing Growth


Only when you add new documents to it.

Otis

--- Rob Outar <ro...@ideorlando.org> wrote:
> Hi all,
>
> 	Will the index grow based on queries alone?  I build my index,
then
> run several queries against it and afterwards I check the size of the
> index and
> in some cases it has grown quite a bit although I did not add
> anything???
>
> Anyhow please let me know the cases when the index will grow.
>
> Thanks,
>
> Rob
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://platinum.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: Indexing Growth

Posted by Rob Outar <ro...@ideorlando.org>.
I reuse the same searcher, analyzer and Query object I don't think that
should cause the problem.

Thanks,

Rob


-----Original Message-----
From: Alex Murzaku [mailto:lists@lissus.com]
Sent: Tuesday, April 01, 2003 3:22 PM
To: 'Lucene Users List'
Subject: RE: Indexing Growth


I don't know if I remember this correctly: I think for every query
(term) is created a file but the file should disappear after the query
is completed.

-----Original Message-----
From: Rob Outar [mailto:routar@ideorlando.org]
Sent: Tuesday, April 01, 2003 3:13 PM
To: Lucene Users List
Subject: RE: Indexing Growth


Dang I must be doing something crazy cause all my client app does is
search and the index size increases.  I do not add anything.

Thanks,

Rob


-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
Sent: Tuesday, April 01, 2003 3:07 PM
To: Lucene Users List
Subject: Re: Indexing Growth


Only when you add new documents to it.

Otis

--- Rob Outar <ro...@ideorlando.org> wrote:
> Hi all,
>
> 	Will the index grow based on queries alone?  I build my index,
then
> run several queries against it and afterwards I check the size of the
> index and
> in some cases it has grown quite a bit although I did not add
> anything???
>
> Anyhow please let me know the cases when the index will grow.
>
> Thanks,
>
> Rob
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://platinum.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: Indexing Growth

Posted by Alex Murzaku <li...@lissus.com>.
I don't know if I remember this correctly: I think for every query
(term) is created a file but the file should disappear after the query
is completed. 

-----Original Message-----
From: Rob Outar [mailto:routar@ideorlando.org] 
Sent: Tuesday, April 01, 2003 3:13 PM
To: Lucene Users List
Subject: RE: Indexing Growth


Dang I must be doing something crazy cause all my client app does is
search and the index size increases.  I do not add anything.

Thanks,

Rob


-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
Sent: Tuesday, April 01, 2003 3:07 PM
To: Lucene Users List
Subject: Re: Indexing Growth


Only when you add new documents to it.

Otis

--- Rob Outar <ro...@ideorlando.org> wrote:
> Hi all,
>
> 	Will the index grow based on queries alone?  I build my index,
then 
> run several queries against it and afterwards I check the size of the
> index and
> in some cases it has grown quite a bit although I did not add
> anything???
>
> Anyhow please let me know the cases when the index will grow.
>
> Thanks,
>
> Rob
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://platinum.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: Indexing Growth

Posted by Rob Outar <ro...@ideorlando.org>.
Dang I must be doing something crazy cause all my client app does is search
and the index size increases.  I do not add anything.

Thanks,

Rob


-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
Sent: Tuesday, April 01, 2003 3:07 PM
To: Lucene Users List
Subject: Re: Indexing Growth


Only when you add new documents to it.

Otis

--- Rob Outar <ro...@ideorlando.org> wrote:
> Hi all,
>
> 	Will the index grow based on queries alone?  I build my index, then
> run
> several queries against it and afterwards I check the size of the
> index and
> in some cases it has grown quite a bit although I did not add
> anything???
>
> Anyhow please let me know the cases when the index will grow.
>
> Thanks,
>
> Rob
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://platinum.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Indexing Growth

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Only when you add new documents to it.

Otis

--- Rob Outar <ro...@ideorlando.org> wrote:
> Hi all,
> 
> 	Will the index grow based on queries alone?  I build my index, then
> run
> several queries against it and afterwards I check the size of the
> index and
> in some cases it has grown quite a bit although I did not add
> anything???
> 
> Anyhow please let me know the cases when the index will grow.
> 
> Thanks,
> 
> Rob
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://platinum.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Indexing Growth

Posted by Rob Outar <ro...@ideorlando.org>.
Hi all,

	Will the index grow based on queries alone?  I build my index, then run
several queries against it and afterwards I check the size of the index and
in some cases it has grown quite a bit although I did not add anything???

Anyhow please let me know the cases when the index will grow.

Thanks,

Rob


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: '-' character not interpreted correctly in field names

Posted by Tatu Saloranta <ta...@hypermall.net>.
On Monday 03 February 2003 07:19, Terry Steichen wrote:
> I believe that the tokenizer treats a dash as a token separator.  Hence,
> the only way, as I recall, to eliminate this behavior is to modify
> QueryParser.jj so it doesn't do this.  However, doing this can cause some
> other problems, like hyphenated words at a line break and the like.

It might be enough to just replace analyzer passed in to QueryParser
to do this? This is the case if QueryParser only handles modifiers outside
terms, and terms are passed to analyzer.
I think this is the case (QueryParser does  call the analyzer in couple of 
places, and one word may actually expand to a phrase or vice versa)?

Still, it seems like using a hyphen as separator shouldn't necessarily cause 
big problems when indexer does the same; queries against "2 - 5" would be 
phrase queries for "2 5", which is still reasonably specific (and should 
match the content).

On the other hand, simple analyzer and standard analyzer have pretty different 
tokenization rules, so it's important to make sure same analyzer is used for 
both indexing and searching (that mismatch can prevent matches easily).

-+ Tatu +-



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: '-' character not interpreted correctly in field names

Posted by Terry Steichen <te...@net-frame.com>.
I believe that the tokenizer treats a dash as a token separator.  Hence, the
only way, as I recall, to eliminate this behavior is to modify
QueryParser.jj so it doesn't do this.  However, doing this can cause some
other problems, like hyphenated words at a line break and the like.

(Of course, if you do make such a change, you'll have to go back and reindex
after such a change.)

I've run into this problem myself and I've 'punted' -  on certain fields,
when I index, I replace the dash with an underscore.  This isn't a real good
solution, and it does require me to keep remembering in which fields I have
to do this substitution in the search.  But, for the moment it works.  I'll
probably go back and make some kind of change later, when I have more time.

HTH,

Terry

----- Original Message -----
From: "hermit" <he...@freestart.hu>
To: <lu...@jakarta.apache.org>
Sent: Monday, February 03, 2003 2:39 AM
Subject: '-' character not interpreted correctly in field names


> Hello!
>
> I have a problem, a big one. I have successfully indexed 600 MB of XML
> data, but the search can't give any results if the field contains any
> '-' characters .
> For example: compound@cgx-code:[2 - 5] must match at least two results
> based on my XML data but it gives nothing.
>
> Can you advice me a simple solution? Or is it a bug?
>
>     The Hermit
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org