You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "ZYWALEWSKI, DANIEL (DANIEL)" <da...@alcatel-lucent.com> on 2011/04/06 11:08:10 UTC

Indexation takes a lot of time :(

Hello Champions !!

I have a problem with indexation(or should I say its time); So the elements to Index are represtented by my own class - DocumentToIndex that consists of Fields(one Field is a fieldName and fieldValue). All documentToIndex are kept/stocked in ArrayList. When I start indexing firstly I open IndexWriter then for each field of documentToIndex I take its value and name and I create Lucene Field then added to the LuceneDocument; Once it's finished (creation of LuceneDocument I add it to index). After passing all documents I close IndexWriter; All this can be represented by code:


indexWriter = new IndexWriter(indexDirectory, indexAnalyzer, false, IndexWriter.MaxFieldLength.UNLIMITED);

      for (DocumentToIndex documentToIndex : objectsToIndex) {
        Document indexedDocument = new Document();
        for (int i = 0; i < documentToIndex.getDocumentSize(); i++) {

          indexedDocument.add(new Field(documentToIndex.getDocumentField(i).getName(),
                                        documentToIndex.getDocumentField(i).getValue(), Field.Store.YES,
                                        Field.Index.ANALYZED));
        }

        indexWriter.addDocument(indexedDocument);

      }

indexWriter.close()

My problem is that it takes much time to index. For example to index 28310 documentToIndex I need about 15min. Do I miss something or it's normal? Maybe this code is not really optimized? I'll be really grateful for any hints and tips;

Thanks in advance,
D


Re: Indexation takes a lot of time :(

Posted by Ian Lea <ia...@gmail.com>.
15 minutes for 28k docs does sound very slow.

In my experience it's usually the reading of the raw data from
database or network or wherever that turns out to be the problem.  You
could easily check that by commenting out the lucene calls in your
code.

See also http://wiki.apache.org/lucene-java/ImproveIndexingSpeed.


--
Ian.


On Wed, Apr 6, 2011 at 10:08 AM, ZYWALEWSKI, DANIEL (DANIEL)
<da...@alcatel-lucent.com> wrote:
> Hello Champions !!
>
> I have a problem with indexation(or should I say its time); So the elements to Index are represtented by my own class - DocumentToIndex that consists of Fields(one Field is a fieldName and fieldValue). All documentToIndex are kept/stocked in ArrayList. When I start indexing firstly I open IndexWriter then for each field of documentToIndex I take its value and name and I create Lucene Field then added to the LuceneDocument; Once it's finished (creation of LuceneDocument I add it to index). After passing all documents I close IndexWriter; All this can be represented by code:
>
>
> indexWriter = new IndexWriter(indexDirectory, indexAnalyzer, false, IndexWriter.MaxFieldLength.UNLIMITED);
>
>      for (DocumentToIndex documentToIndex : objectsToIndex) {
>        Document indexedDocument = new Document();
>        for (int i = 0; i < documentToIndex.getDocumentSize(); i++) {
>
>          indexedDocument.add(new Field(documentToIndex.getDocumentField(i).getName(),
>                                        documentToIndex.getDocumentField(i).getValue(), Field.Store.YES,
>                                        Field.Index.ANALYZED));
>        }
>
>        indexWriter.addDocument(indexedDocument);
>
>      }
>
> indexWriter.close()
>
> My problem is that it takes much time to index. For example to index 28310 documentToIndex I need about 15min. Do I miss something or it's normal? Maybe this code is not really optimized? I'll be really grateful for any hints and tips;
>
> Thanks in advance,
> D
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Indexation takes a lot of time :(

Posted by Simon Willnauer <si...@googlemail.com>.
On Wed, Apr 6, 2011 at 11:50 AM, findbestopensource
<fi...@gmail.com> wrote:
> Hello daniel,
>
> The code seems to be fine. I think you are calculating the time for entire
> program which may read the data from external source and prepare the array
> list. Just calculate time only for indexing.
>
> Regards
> Aditya
> www.findbestopensource.com
>
>
>
> On Wed, Apr 6, 2011 at 2:38 PM, ZYWALEWSKI, DANIEL (DANIEL) <
> daniel.zywalewski@alcatel-lucent.com> wrote:
>
>> Hello Champions !!
>>
>> I have a problem with indexation(or should I say its time); So the elements
>> to Index are represtented by my own class - DocumentToIndex that consists of
>> Fields(one Field is a fieldName and fieldValue). All documentToIndex are
>> kept/stocked in ArrayList. When I start indexing firstly I open IndexWriter
>> then for each field of documentToIndex I take its value and name and I
>> create Lucene Field then added to the LuceneDocument; Once it's finished
>> (creation of LuceneDocument I add it to index). After passing all documents
>> I close IndexWriter; All this can be represented by code:
>>
>>
>> indexWriter = new IndexWriter(indexDirectory, indexAnalyzer, false,
>> IndexWriter.MaxFieldLength.UNLIMITED);
>>
>>      for (DocumentToIndex documentToIndex : objectsToIndex) {
>>        Document indexedDocument = new Document();
>>        for (int i = 0; i < documentToIndex.getDocumentSize(); i++) {
>>
>>          indexedDocument.add(new
>> Field(documentToIndex.getDocumentField(i).getName(),
>>
>>  documentToIndex.getDocumentField(i).getValue(), Field.Store.YES,
>>                                        Field.Index.ANALYZED));
>>        }
>>
>>        indexWriter.addDocument(indexedDocument);
>>
>>      }
>>
>> indexWriter.close()
>>
>> My problem is that it takes much time to index. For example to index 28310
>> documentToIndex I need about 15min. Do I miss something or it's normal?
>> Maybe this code is not really optimized? I'll be really grateful for any
>> hints and tips;
>>
>> Thanks in advance,
>> D

I think you code is fine you really need to look into what you are
measuring... in an idea situation you can index up to 60k documents
per second with lucene given your hardware is fast and you can get the
data quick enough out of your database. look here

http://blog.jteam.nl/2011/04/01/gimme-all-resources-you-have-i-can-use-them/

or here

http://blog.mikemccandless.com/2010/09/lucenes-indexing-is-fast.html

for some insights!

Simon
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Indexation takes a lot of time :(

Posted by findbestopensource <fi...@gmail.com>.
Hello daniel,

The code seems to be fine. I think you are calculating the time for entire
program which may read the data from external source and prepare the array
list. Just calculate time only for indexing.

Regards
Aditya
www.findbestopensource.com



On Wed, Apr 6, 2011 at 2:38 PM, ZYWALEWSKI, DANIEL (DANIEL) <
daniel.zywalewski@alcatel-lucent.com> wrote:

> Hello Champions !!
>
> I have a problem with indexation(or should I say its time); So the elements
> to Index are represtented by my own class - DocumentToIndex that consists of
> Fields(one Field is a fieldName and fieldValue). All documentToIndex are
> kept/stocked in ArrayList. When I start indexing firstly I open IndexWriter
> then for each field of documentToIndex I take its value and name and I
> create Lucene Field then added to the LuceneDocument; Once it's finished
> (creation of LuceneDocument I add it to index). After passing all documents
> I close IndexWriter; All this can be represented by code:
>
>
> indexWriter = new IndexWriter(indexDirectory, indexAnalyzer, false,
> IndexWriter.MaxFieldLength.UNLIMITED);
>
>      for (DocumentToIndex documentToIndex : objectsToIndex) {
>        Document indexedDocument = new Document();
>        for (int i = 0; i < documentToIndex.getDocumentSize(); i++) {
>
>          indexedDocument.add(new
> Field(documentToIndex.getDocumentField(i).getName(),
>
>  documentToIndex.getDocumentField(i).getValue(), Field.Store.YES,
>                                        Field.Index.ANALYZED));
>        }
>
>        indexWriter.addDocument(indexedDocument);
>
>      }
>
> indexWriter.close()
>
> My problem is that it takes much time to index. For example to index 28310
> documentToIndex I need about 15min. Do I miss something or it's normal?
> Maybe this code is not really optimized? I'll be really grateful for any
> hints and tips;
>
> Thanks in advance,
> D
>
>