You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Wei Wang <we...@gmail.com> on 2013/04/13 07:44:28 UTC

DiskDocValuesFormat

I am trying to use DiskDocValuesFormat for a particular
BinaryDocValuesField. It seems there is no good examples showing how to do
this. The only hint I got from various docs and forums is set some codec in
IndexWriter. Could someone give a few lines of code snippet and show how to
set DiskDocValuesFormat?

Thanks.

Re: DiskDocValuesFormat

Posted by Wei Wang <we...@gmail.com>.

Strange. That's all I got from the log beside the first line I wrote to
show starting merging with a time stamp.

On Sun, Apr 14, 2013 at 4:58 PM, Robert Muir <rc...@gmail.com> wrote:

> Your stack trace is incomplete: it doesn't even show where the OOM
> occurred.
>
> On Sun, Apr 14, 2013 at 7:48 PM, Wei Wang <we...@gmail.com> wrote:
>
> > Unfortunately, I got another problem. My index has 9 segments (9 dvdd
> > files) with total size is about 22GB. The merging step eventually failed
> > and I saw an error message:
> >
> > Exception in thread "main" java.lang.IllegalStateException: this writer
> hit
> > an OutOfMemoryError; cannot complete forceMerge
> >     at
> > org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1664)
> >     at
> > org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1610)
> >     at
> >
> >
> com.ea.eadp.data.aem.audience.indexer.tools.IndexingTool.mergeIndex(IndexingTool.java:196)
> >     at
> >
> >
> com.ea.eadp.data.aem.audience.indexer.tools.AudienceIndexer.main(AudienceIndexer.java:46)
> > Exception in thread "Lucene Merge Thread #0"
> > org.apache.lucene.index.MergePolicy$MergeException:
> > java.lang.OutOfMemoryError: Java heap space
> >     at
> >
> >
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:541)
> >     at
> >
> >
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:514)
> >
> > I configured jvm with "-Xmx4096m", and it seems still not enough memory.
> I
> > thought DiskDocValuesFormat puts most of the data on disk and there
> should
> > not be that much memory consumption. But it seems not the case.
> >
> > On Sun, Apr 14, 2013 at 4:13 PM, Wei Wang <we...@gmail.com> wrote:
> >
> > > That makes sense.
> > >
> > > BTW, I checked the jar file. Exactly as you pointed out, the services
> > > files only contains info from lucene-core, without codec from
> > > lucene-codecs. After adding the maven plugin, now it is running.
> > >
> > > Thanks!
> > >
> > >
> > > On Sun, Apr 14, 2013 at 3:26 PM, Uwe Schindler <uw...@thetaphi.de>
> wrote:
> > >
> > >> Hi,
> > >>
> > >> > Thanks for the hint. I will double check the jar file.
> > >> >
> > >> > I am just a bit puzzled that if the indexing step recognizes 'Disk'
> > >> codec and
> > >> > creates index properly, the merge step that immediately follows
> > indexing
> > >> > seems should also recognize the 'Disk' codec.
> > >>
> > >> This is easy to explain: By creating the custom Lucene42 Codec as a
> > >> Class, you just define the disk format on the initial write (when
> *new*
> > >> segments are written with new documents). While merging (or
> > force-merging),
> > >> Lucene uses the metadata that’s already on disk for the segments to
> > merge.
> > >> The metadata on disk contains the names of all codec components used.
> > Those
> > >> metadata is also used when opening IndexReaders. It will then use SPI
> > and
> > >> META-INF/services files to look up the class that is responsible for
> > e.g.
> > >> the "Disk" docvalues format. Without the META-INF data, Lucene cannot
> > >> lookup the segment codecs.
> > >>
> > >> Uwe
> > >>
> > >> > On Sun, Apr 14, 2013 at 3:03 PM, Uwe Schindler <uw...@thetaphi.de>
> > wrote:
> > >> >
> > >> > > Are you sure that you use the ServicesResourceTransformer in your
> > >> > > shade config?
> > >> > >
> > >> > >
> > >> > > http://maven.apache.org/plugins/maven-shade-
> > >> > plugin/examples/resource-t
> > >> > > ransformers.html#ServicesResourceTransformer
> > >> > >
> > >> > > The problem is: lucene-core.jar and lucene-codecs.jar both contain
> > >> > > codec components and their classes are listed in
> META-INF/services.
> > If
> > >> > > those files are not correctly merged through this resource
> > >> > > transformer, the resulting JAR file will miss some codecs.
> > >> > >
> > >> > > You can check correctness by opening the final JAR file with a ZIP
> > >> > > program and check that all files in META-INF/services contain all
> > >> > > entries merged from all Lucene JARs.
> > >> > >
> > >> > > Uwe
> > >> > >
> > >> > > -----
> > >> > > Uwe Schindler
> > >> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > >> > > http://www.thetaphi.de
> > >> > > eMail: uwe@thetaphi.de
> > >> > >
> > >> > >
> > >> > > > -----Original Message-----
> > >> > > > From: Wei Wang [mailto:welshwang@gmail.com]
> > >> > > > Sent: Sunday, April 14, 2013 11:49 PM
> > >> > > > To: java-user@lucene.apache.org
> > >> > > > Subject: Re: DiskDocValuesFormat
> > >> > > >
> > >> > > > Yes, I used Maven Shade plugin, but still have this problem.
> Here
> > is
> > >> > > > the Maven output during packaging:
> > >> > > >
> > >> > > > [INFO] --- maven-shade-plugin:2.0:shade (default) @
> > >> > > > audience-profile- indexer --- [INFO] Including
> > >> > > > commons-collections:commons-
> > >> > > > collections:jar:3.2.1 in the shaded jar.
> > >> > > > [INFO] Including org.mockito:mockito-core:jar:1.9.5 in the
> shaded
> > >> jar.
> > >> > > > [INFO] Including org.hamcrest:hamcrest-core:jar:1.1 in the
> shaded
> > >> jar.
> > >> > > > [INFO] Including org.objenesis:objenesis:jar:1.0 in the shaded
> > jar.
> > >> > > > [INFO] Including junit:junit:jar:4.11 in the shaded jar.
> > >> > > > [INFO] Including log4j:log4j:jar:1.2.17 in the shaded jar.
> > >> > > > [INFO] Including org.apache.lucene:lucene-core:jar:4.2.1 in the
> > >> > > > shaded
> > >> > > jar.
> > >> > > > [INFO] Including org.apache.lucene:lucene-queries:jar:4.2.1 in
> the
> > >> > > > shaded jar.
> > >> > > > [INFO] Including org.apache.lucene:lucene-queryparser:jar:4.2.1
> in
> > >> > > > the shaded jar.
> > >> > > > [INFO] Including org.apache.lucene:lucene-sandbox:jar:4.2.1 in
> the
> > >> > > > shaded jar.
> > >> > > > [INFO] Including jakarta-regexp:jakarta-regexp:jar:1.4 in the
> > >> shaded jar.
> > >> > > > [INFO] Including
> > org.apache.lucene:lucene-analyzers-common:jar:4.2.1
> > >> > > > in the shaded jar.
> > >> > > > [INFO] Including org.apache.lucene:lucene-codecs:jar:4.2.1 in
> the
> > >> > > > shaded
> > >> > > jar.
> > >> > > > [INFO] Including commons-lang:commons-lang:jar:2.6 in the shaded
> > >> jar.
> > >> > > > [INFO] Including commons-logging:commons-logging:jar:1.1.1 in
> the
> > >> > > > shaded jar.
> > >> > > > [INFO] Including commons-io:commons-io:jar:2.4 in the shaded
> jar.
> > >> > > > [INFO] Replacing original artifact with shaded artifact.
> > >> > > >
> > >> > > > On Sun, Apr 14, 2013 at 2:40 PM, Uwe Schindler <uwe@thetaphi.de
> >
> > >> > wrote:
> > >> > > >
> > >> > > > > If you create a single JAR file out of multiple Lucene JAR
> files
> > >> > > > > use a tool like Maven Shade plugin, otherwise, required
> metadata
> > >> > > > > propreties
> > >> > > > > (META-INF/services) files in the JAR files are not correctly
> > >> > > > > merged together.
> > >> > > > >
> > >> > > > > -----
> > >> > > > > Uwe Schindler
> > >> > > > > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
> > >> > > > > eMail: uwe@thetaphi.de
> > >> > > > >
> > >> > > > >
> > >> > > > > > -----Original Message-----
> > >> > > > > > From: Wei Wang [mailto:welshwang@gmail.com]
> > >> > > > > > Sent: Sunday, April 14, 2013 11:30 PM
> > >> > > > > > To: java-user@lucene.apache.org
> > >> > > > > > Subject: Re: DiskDocValuesFormat
> > >> > > > > >
> > >> > > > > > Hi Adrien,
> > >> > > > > >
> > >> > > > > > The Lucene42Codec works well to generate the index with
> > >> > > > > > DiskDocValuesFormat. But when I tried to merge the index
> > >> segments
> > >> > by
> > >> > > > > > calling:
> > >> > > > > >
> > >> > > > > > IndexWriter iw = new IndexWriter(directory, iw_config); ...
> > >> > > > > > iw.forceMerge(1);
> > >> > > > > >
> > >> > > > > > I got the following error message:
> > >> > > > > >
> > >> > > > > > Caused by: java.lang.IllegalArgumentException: A SPI class
> of
> > >> type
> > >> > > > > > org.apache.lucene.codecs.DocValuesFormat with name 'Disk'
> does
> > >> > not
> > >> > > > exist.
> > >> > > > > > You need to add the corresponding JAR file supporting this
> SPI
> > >> to
> > >> > > > > > your classpath.The current classpath supports the following
> > >> names:
> > >> > > > > > [Lucene42]
> > >> > > > > >
> > >> > > > > > Any hint on this classpath problem? I have created a single
> > jar
> > >> file
> > >> > > > > that has all
> > >> > > > > > necessary dependencies, such as lucene-codecs-4.2.0.jar.
> And I
> > >> > > > > > assume the indexing step works well, so Lucene already knows
> > the
> > >> > > > > > format with name 'Disk'.
> > >> > > > > >
> > >> > > > > > Thanks.
> > >> > > > > >
> > >> > > > > > On Sat, Apr 13, 2013 at 4:25 AM, Adrien Grand <
> > >> jpountz@gmail.com>
> > >> > > > wrote:
> > >> > > > > >
> > >> > > > > > > Hi Wei,
> > >> > > > > > >
> > >> > > > > > > On Sat, Apr 13, 2013 at 7:44 AM, Wei Wang
> > >> > <we...@gmail.com>
> > >> > > > > > wrote:
> > >> > > > > > > > I am trying to use DiskDocValuesFormat for a particular
> > >> > > > > > > > BinaryDocValuesField. It seems there is no good examples
> > >> > showing
> > >> > > > > > > > how to
> > >> > > > > > > do
> > >> > > > > > > > this. The only hint I got from various docs and forums
> is
> > >> set
> > >> > > > > > > > some codec
> > >> > > > > > > in
> > >> > > > > > > > IndexWriter. Could someone give a few lines of code
> > snippet
> > >> and
> > >> > > > > > > > show how
> > >> > > > > > > to
> > >> > > > > > > > set DiskDocValuesFormat?
> > >> > > > > > >
> > >> > > > > > > Lucene42Codec can be extended to specify the doc values
> > format
> > >> > to
> > >> > > > > > > use on a per-field basis. For example:
> > >> > > > > > >
> > >> > > > > > > final Codec codec = new Lucene42Codec() {
> > >> > > > > > >   final Lucene42DocValuesFormat memoryDVFormat = new
> > >> > > > > > > Lucene42DocValuesFormat();
> > >> > > > > > >   final DiskDocValuesFormat diskDVFormat = new
> > >> > > > DiskDocValuesFormat();
> > >> > > > > > >   @Override
> > >> > > > > > >   public DocValuesFormat getDocValuesFormatForField(String
> > >> field)
> > >> > {
> > >> > > > > > >     if ("dv_mem".equals(field)) {
> > >> > > > > > >       // use Lucene42 for "dv_mem"
> > >> > > > > > >       return memoryDVFormat;
> > >> > > > > > >     } else {
> > >> > > > > > >       // use Disk otherwise
> > >> > > > > > >       return diskDVFormat;
> > >> > > > > > >     }
> > >> > > > > > >   }
> > >> > > > > > > };
> > >> > > > > > >
> > >> > > > > > > Then just pass this Codec instance to your
> > IndexWriterConfig.
> > >> > > > > > >
> > >> > > > > > > --
> > >> > > > > > > Adrien
> > >> > > > > > >
> > >> > > > > > >
> > >> ------------------------------------------------------------------
> > >> > > > > > > --- To unsubscribe, e-mail:
> > >> > > > > > > java-user-unsubscribe@lucene.apache.org
> > >> > > > > > > For additional commands, e-mail: java-user-
> > >> > help@lucene.apache.org
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> ---------------------------------------------------------------------
> > >> > > > > To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> > >> > > > > For additional commands, e-mail:
> > java-user-help@lucene.apache.org
> > >> > > > >
> > >> > > > >
> > >> > >
> > >> > >
> > >> > >
> > ---------------------------------------------------------------------
> > >> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >> > >
> > >> > >
> > >>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > >>
> > >>
> > >
> >
>

Re: DiskDocValuesFormat

Posted by Robert Muir <rc...@gmail.com>.

Your stack trace is incomplete: it doesn't even show where the OOM occurred.

On Sun, Apr 14, 2013 at 7:48 PM, Wei Wang <we...@gmail.com> wrote:

> Unfortunately, I got another problem. My index has 9 segments (9 dvdd
> files) with total size is about 22GB. The merging step eventually failed
> and I saw an error message:
>
> Exception in thread "main" java.lang.IllegalStateException: this writer hit
> an OutOfMemoryError; cannot complete forceMerge
>     at
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1664)
>     at
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1610)
>     at
>
> com.ea.eadp.data.aem.audience.indexer.tools.IndexingTool.mergeIndex(IndexingTool.java:196)
>     at
>
> com.ea.eadp.data.aem.audience.indexer.tools.AudienceIndexer.main(AudienceIndexer.java:46)
> Exception in thread "Lucene Merge Thread #0"
> org.apache.lucene.index.MergePolicy$MergeException:
> java.lang.OutOfMemoryError: Java heap space
>     at
>
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:541)
>     at
>
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:514)
>
> I configured jvm with "-Xmx4096m", and it seems still not enough memory. I
> thought DiskDocValuesFormat puts most of the data on disk and there should
> not be that much memory consumption. But it seems not the case.
>
> On Sun, Apr 14, 2013 at 4:13 PM, Wei Wang <we...@gmail.com> wrote:
>
> > That makes sense.
> >
> > BTW, I checked the jar file. Exactly as you pointed out, the services
> > files only contains info from lucene-core, without codec from
> > lucene-codecs. After adding the maven plugin, now it is running.
> >
> > Thanks!
> >
> >
> > On Sun, Apr 14, 2013 at 3:26 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> >
> >> Hi,
> >>
> >> > Thanks for the hint. I will double check the jar file.
> >> >
> >> > I am just a bit puzzled that if the indexing step recognizes 'Disk'
> >> codec and
> >> > creates index properly, the merge step that immediately follows
> indexing
> >> > seems should also recognize the 'Disk' codec.
> >>
> >> This is easy to explain: By creating the custom Lucene42 Codec as a
> >> Class, you just define the disk format on the initial write (when *new*
> >> segments are written with new documents). While merging (or
> force-merging),
> >> Lucene uses the metadata that’s already on disk for the segments to
> merge.
> >> The metadata on disk contains the names of all codec components used.
> Those
> >> metadata is also used when opening IndexReaders. It will then use SPI
> and
> >> META-INF/services files to look up the class that is responsible for
> e.g.
> >> the "Disk" docvalues format. Without the META-INF data, Lucene cannot
> >> lookup the segment codecs.
> >>
> >> Uwe
> >>
> >> > On Sun, Apr 14, 2013 at 3:03 PM, Uwe Schindler <uw...@thetaphi.de>
> wrote:
> >> >
> >> > > Are you sure that you use the ServicesResourceTransformer in your
> >> > > shade config?
> >> > >
> >> > >
> >> > > http://maven.apache.org/plugins/maven-shade-
> >> > plugin/examples/resource-t
> >> > > ransformers.html#ServicesResourceTransformer
> >> > >
> >> > > The problem is: lucene-core.jar and lucene-codecs.jar both contain
> >> > > codec components and their classes are listed in META-INF/services.
> If
> >> > > those files are not correctly merged through this resource
> >> > > transformer, the resulting JAR file will miss some codecs.
> >> > >
> >> > > You can check correctness by opening the final JAR file with a ZIP
> >> > > program and check that all files in META-INF/services contain all
> >> > > entries merged from all Lucene JARs.
> >> > >
> >> > > Uwe
> >> > >
> >> > > -----
> >> > > Uwe Schindler
> >> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> >> > > http://www.thetaphi.de
> >> > > eMail: uwe@thetaphi.de
> >> > >
> >> > >
> >> > > > -----Original Message-----
> >> > > > From: Wei Wang [mailto:welshwang@gmail.com]
> >> > > > Sent: Sunday, April 14, 2013 11:49 PM
> >> > > > To: java-user@lucene.apache.org
> >> > > > Subject: Re: DiskDocValuesFormat
> >> > > >
> >> > > > Yes, I used Maven Shade plugin, but still have this problem. Here
> is
> >> > > > the Maven output during packaging:
> >> > > >
> >> > > > [INFO] --- maven-shade-plugin:2.0:shade (default) @
> >> > > > audience-profile- indexer --- [INFO] Including
> >> > > > commons-collections:commons-
> >> > > > collections:jar:3.2.1 in the shaded jar.
> >> > > > [INFO] Including org.mockito:mockito-core:jar:1.9.5 in the shaded
> >> jar.
> >> > > > [INFO] Including org.hamcrest:hamcrest-core:jar:1.1 in the shaded
> >> jar.
> >> > > > [INFO] Including org.objenesis:objenesis:jar:1.0 in the shaded
> jar.
> >> > > > [INFO] Including junit:junit:jar:4.11 in the shaded jar.
> >> > > > [INFO] Including log4j:log4j:jar:1.2.17 in the shaded jar.
> >> > > > [INFO] Including org.apache.lucene:lucene-core:jar:4.2.1 in the
> >> > > > shaded
> >> > > jar.
> >> > > > [INFO] Including org.apache.lucene:lucene-queries:jar:4.2.1 in the
> >> > > > shaded jar.
> >> > > > [INFO] Including org.apache.lucene:lucene-queryparser:jar:4.2.1 in
> >> > > > the shaded jar.
> >> > > > [INFO] Including org.apache.lucene:lucene-sandbox:jar:4.2.1 in the
> >> > > > shaded jar.
> >> > > > [INFO] Including jakarta-regexp:jakarta-regexp:jar:1.4 in the
> >> shaded jar.
> >> > > > [INFO] Including
> org.apache.lucene:lucene-analyzers-common:jar:4.2.1
> >> > > > in the shaded jar.
> >> > > > [INFO] Including org.apache.lucene:lucene-codecs:jar:4.2.1 in the
> >> > > > shaded
> >> > > jar.
> >> > > > [INFO] Including commons-lang:commons-lang:jar:2.6 in the shaded
> >> jar.
> >> > > > [INFO] Including commons-logging:commons-logging:jar:1.1.1 in the
> >> > > > shaded jar.
> >> > > > [INFO] Including commons-io:commons-io:jar:2.4 in the shaded jar.
> >> > > > [INFO] Replacing original artifact with shaded artifact.
> >> > > >
> >> > > > On Sun, Apr 14, 2013 at 2:40 PM, Uwe Schindler <uw...@thetaphi.de>
> >> > wrote:
> >> > > >
> >> > > > > If you create a single JAR file out of multiple Lucene JAR files
> >> > > > > use a tool like Maven Shade plugin, otherwise, required metadata
> >> > > > > propreties
> >> > > > > (META-INF/services) files in the JAR files are not correctly
> >> > > > > merged together.
> >> > > > >
> >> > > > > -----
> >> > > > > Uwe Schindler
> >> > > > > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
> >> > > > > eMail: uwe@thetaphi.de
> >> > > > >
> >> > > > >
> >> > > > > > -----Original Message-----
> >> > > > > > From: Wei Wang [mailto:welshwang@gmail.com]
> >> > > > > > Sent: Sunday, April 14, 2013 11:30 PM
> >> > > > > > To: java-user@lucene.apache.org
> >> > > > > > Subject: Re: DiskDocValuesFormat
> >> > > > > >
> >> > > > > > Hi Adrien,
> >> > > > > >
> >> > > > > > The Lucene42Codec works well to generate the index with
> >> > > > > > DiskDocValuesFormat. But when I tried to merge the index
> >> segments
> >> > by
> >> > > > > > calling:
> >> > > > > >
> >> > > > > > IndexWriter iw = new IndexWriter(directory, iw_config); ...
> >> > > > > > iw.forceMerge(1);
> >> > > > > >
> >> > > > > > I got the following error message:
> >> > > > > >
> >> > > > > > Caused by: java.lang.IllegalArgumentException: A SPI class of
> >> type
> >> > > > > > org.apache.lucene.codecs.DocValuesFormat with name 'Disk' does
> >> > not
> >> > > > exist.
> >> > > > > > You need to add the corresponding JAR file supporting this SPI
> >> to
> >> > > > > > your classpath.The current classpath supports the following
> >> names:
> >> > > > > > [Lucene42]
> >> > > > > >
> >> > > > > > Any hint on this classpath problem? I have created a single
> jar
> >> file
> >> > > > > that has all
> >> > > > > > necessary dependencies, such as lucene-codecs-4.2.0.jar. And I
> >> > > > > > assume the indexing step works well, so Lucene already knows
> the
> >> > > > > > format with name 'Disk'.
> >> > > > > >
> >> > > > > > Thanks.
> >> > > > > >
> >> > > > > > On Sat, Apr 13, 2013 at 4:25 AM, Adrien Grand <
> >> jpountz@gmail.com>
> >> > > > wrote:
> >> > > > > >
> >> > > > > > > Hi Wei,
> >> > > > > > >
> >> > > > > > > On Sat, Apr 13, 2013 at 7:44 AM, Wei Wang
> >> > <we...@gmail.com>
> >> > > > > > wrote:
> >> > > > > > > > I am trying to use DiskDocValuesFormat for a particular
> >> > > > > > > > BinaryDocValuesField. It seems there is no good examples
> >> > showing
> >> > > > > > > > how to
> >> > > > > > > do
> >> > > > > > > > this. The only hint I got from various docs and forums is
> >> set
> >> > > > > > > > some codec
> >> > > > > > > in
> >> > > > > > > > IndexWriter. Could someone give a few lines of code
> snippet
> >> and
> >> > > > > > > > show how
> >> > > > > > > to
> >> > > > > > > > set DiskDocValuesFormat?
> >> > > > > > >
> >> > > > > > > Lucene42Codec can be extended to specify the doc values
> format
> >> > to
> >> > > > > > > use on a per-field basis. For example:
> >> > > > > > >
> >> > > > > > > final Codec codec = new Lucene42Codec() {
> >> > > > > > >   final Lucene42DocValuesFormat memoryDVFormat = new
> >> > > > > > > Lucene42DocValuesFormat();
> >> > > > > > >   final DiskDocValuesFormat diskDVFormat = new
> >> > > > DiskDocValuesFormat();
> >> > > > > > >   @Override
> >> > > > > > >   public DocValuesFormat getDocValuesFormatForField(String
> >> field)
> >> > {
> >> > > > > > >     if ("dv_mem".equals(field)) {
> >> > > > > > >       // use Lucene42 for "dv_mem"
> >> > > > > > >       return memoryDVFormat;
> >> > > > > > >     } else {
> >> > > > > > >       // use Disk otherwise
> >> > > > > > >       return diskDVFormat;
> >> > > > > > >     }
> >> > > > > > >   }
> >> > > > > > > };
> >> > > > > > >
> >> > > > > > > Then just pass this Codec instance to your
> IndexWriterConfig.
> >> > > > > > >
> >> > > > > > > --
> >> > > > > > > Adrien
> >> > > > > > >
> >> > > > > > >
> >> ------------------------------------------------------------------
> >> > > > > > > --- To unsubscribe, e-mail:
> >> > > > > > > java-user-unsubscribe@lucene.apache.org
> >> > > > > > > For additional commands, e-mail: java-user-
> >> > help@lucene.apache.org
> >> > > > > > >
> >> > > > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> ---------------------------------------------------------------------
> >> > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> > > > > For additional commands, e-mail:
> java-user-help@lucene.apache.org
> >> > > > >
> >> > > > >
> >> > >
> >> > >
> >> > >
> ---------------------------------------------------------------------
> >> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> >> > >
> >> > >
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
>

Re: DiskDocValuesFormat

Posted by Wei Wang <we...@gmail.com>.

Unfortunately, I got another problem. My index has 9 segments (9 dvdd
files) with total size is about 22GB. The merging step eventually failed
and I saw an error message:

Exception in thread "main" java.lang.IllegalStateException: this writer hit
an OutOfMemoryError; cannot complete forceMerge
    at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1664)
    at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1610)
    at
com.ea.eadp.data.aem.audience.indexer.tools.IndexingTool.mergeIndex(IndexingTool.java:196)
    at
com.ea.eadp.data.aem.audience.indexer.tools.AudienceIndexer.main(AudienceIndexer.java:46)
Exception in thread "Lucene Merge Thread #0"
org.apache.lucene.index.MergePolicy$MergeException:
java.lang.OutOfMemoryError: Java heap space
    at
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:541)
    at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:514)

I configured jvm with "-Xmx4096m", and it seems still not enough memory. I
thought DiskDocValuesFormat puts most of the data on disk and there should
not be that much memory consumption. But it seems not the case.

On Sun, Apr 14, 2013 at 4:13 PM, Wei Wang <we...@gmail.com> wrote:

> That makes sense.
>
> BTW, I checked the jar file. Exactly as you pointed out, the services
> files only contains info from lucene-core, without codec from
> lucene-codecs. After adding the maven plugin, now it is running.
>
> Thanks!
>
>
> On Sun, Apr 14, 2013 at 3:26 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>
>> Hi,
>>
>> > Thanks for the hint. I will double check the jar file.
>> >
>> > I am just a bit puzzled that if the indexing step recognizes 'Disk'
>> codec and
>> > creates index properly, the merge step that immediately follows indexing
>> > seems should also recognize the 'Disk' codec.
>>
>> This is easy to explain: By creating the custom Lucene42 Codec as a
>> Class, you just define the disk format on the initial write (when *new*
>> segments are written with new documents). While merging (or force-merging),
>> Lucene uses the metadata that’s already on disk for the segments to merge.
>> The metadata on disk contains the names of all codec components used. Those
>> metadata is also used when opening IndexReaders. It will then use SPI and
>> META-INF/services files to look up the class that is responsible for e.g.
>> the "Disk" docvalues format. Without the META-INF data, Lucene cannot
>> lookup the segment codecs.
>>
>> Uwe
>>
>> > On Sun, Apr 14, 2013 at 3:03 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>> >
>> > > Are you sure that you use the ServicesResourceTransformer in your
>> > > shade config?
>> > >
>> > >
>> > > http://maven.apache.org/plugins/maven-shade-
>> > plugin/examples/resource-t
>> > > ransformers.html#ServicesResourceTransformer
>> > >
>> > > The problem is: lucene-core.jar and lucene-codecs.jar both contain
>> > > codec components and their classes are listed in META-INF/services. If
>> > > those files are not correctly merged through this resource
>> > > transformer, the resulting JAR file will miss some codecs.
>> > >
>> > > You can check correctness by opening the final JAR file with a ZIP
>> > > program and check that all files in META-INF/services contain all
>> > > entries merged from all Lucene JARs.
>> > >
>> > > Uwe
>> > >
>> > > -----
>> > > Uwe Schindler
>> > > H.-H.-Meier-Allee 63, D-28213 Bremen
>> > > http://www.thetaphi.de
>> > > eMail: uwe@thetaphi.de
>> > >
>> > >
>> > > > -----Original Message-----
>> > > > From: Wei Wang [mailto:welshwang@gmail.com]
>> > > > Sent: Sunday, April 14, 2013 11:49 PM
>> > > > To: java-user@lucene.apache.org
>> > > > Subject: Re: DiskDocValuesFormat
>> > > >
>> > > > Yes, I used Maven Shade plugin, but still have this problem. Here is
>> > > > the Maven output during packaging:
>> > > >
>> > > > [INFO] --- maven-shade-plugin:2.0:shade (default) @
>> > > > audience-profile- indexer --- [INFO] Including
>> > > > commons-collections:commons-
>> > > > collections:jar:3.2.1 in the shaded jar.
>> > > > [INFO] Including org.mockito:mockito-core:jar:1.9.5 in the shaded
>> jar.
>> > > > [INFO] Including org.hamcrest:hamcrest-core:jar:1.1 in the shaded
>> jar.
>> > > > [INFO] Including org.objenesis:objenesis:jar:1.0 in the shaded jar.
>> > > > [INFO] Including junit:junit:jar:4.11 in the shaded jar.
>> > > > [INFO] Including log4j:log4j:jar:1.2.17 in the shaded jar.
>> > > > [INFO] Including org.apache.lucene:lucene-core:jar:4.2.1 in the
>> > > > shaded
>> > > jar.
>> > > > [INFO] Including org.apache.lucene:lucene-queries:jar:4.2.1 in the
>> > > > shaded jar.
>> > > > [INFO] Including org.apache.lucene:lucene-queryparser:jar:4.2.1 in
>> > > > the shaded jar.
>> > > > [INFO] Including org.apache.lucene:lucene-sandbox:jar:4.2.1 in the
>> > > > shaded jar.
>> > > > [INFO] Including jakarta-regexp:jakarta-regexp:jar:1.4 in the
>> shaded jar.
>> > > > [INFO] Including org.apache.lucene:lucene-analyzers-common:jar:4.2.1
>> > > > in the shaded jar.
>> > > > [INFO] Including org.apache.lucene:lucene-codecs:jar:4.2.1 in the
>> > > > shaded
>> > > jar.
>> > > > [INFO] Including commons-lang:commons-lang:jar:2.6 in the shaded
>> jar.
>> > > > [INFO] Including commons-logging:commons-logging:jar:1.1.1 in the
>> > > > shaded jar.
>> > > > [INFO] Including commons-io:commons-io:jar:2.4 in the shaded jar.
>> > > > [INFO] Replacing original artifact with shaded artifact.
>> > > >
>> > > > On Sun, Apr 14, 2013 at 2:40 PM, Uwe Schindler <uw...@thetaphi.de>
>> > wrote:
>> > > >
>> > > > > If you create a single JAR file out of multiple Lucene JAR files
>> > > > > use a tool like Maven Shade plugin, otherwise, required metadata
>> > > > > propreties
>> > > > > (META-INF/services) files in the JAR files are not correctly
>> > > > > merged together.
>> > > > >
>> > > > > -----
>> > > > > Uwe Schindler
>> > > > > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
>> > > > > eMail: uwe@thetaphi.de
>> > > > >
>> > > > >
>> > > > > > -----Original Message-----
>> > > > > > From: Wei Wang [mailto:welshwang@gmail.com]
>> > > > > > Sent: Sunday, April 14, 2013 11:30 PM
>> > > > > > To: java-user@lucene.apache.org
>> > > > > > Subject: Re: DiskDocValuesFormat
>> > > > > >
>> > > > > > Hi Adrien,
>> > > > > >
>> > > > > > The Lucene42Codec works well to generate the index with
>> > > > > > DiskDocValuesFormat. But when I tried to merge the index
>> segments
>> > by
>> > > > > > calling:
>> > > > > >
>> > > > > > IndexWriter iw = new IndexWriter(directory, iw_config); ...
>> > > > > > iw.forceMerge(1);
>> > > > > >
>> > > > > > I got the following error message:
>> > > > > >
>> > > > > > Caused by: java.lang.IllegalArgumentException: A SPI class of
>> type
>> > > > > > org.apache.lucene.codecs.DocValuesFormat with name 'Disk' does
>> > not
>> > > > exist.
>> > > > > > You need to add the corresponding JAR file supporting this SPI
>> to
>> > > > > > your classpath.The current classpath supports the following
>> names:
>> > > > > > [Lucene42]
>> > > > > >
>> > > > > > Any hint on this classpath problem? I have created a single jar
>> file
>> > > > > that has all
>> > > > > > necessary dependencies, such as lucene-codecs-4.2.0.jar. And I
>> > > > > > assume the indexing step works well, so Lucene already knows the
>> > > > > > format with name 'Disk'.
>> > > > > >
>> > > > > > Thanks.
>> > > > > >
>> > > > > > On Sat, Apr 13, 2013 at 4:25 AM, Adrien Grand <
>> jpountz@gmail.com>
>> > > > wrote:
>> > > > > >
>> > > > > > > Hi Wei,
>> > > > > > >
>> > > > > > > On Sat, Apr 13, 2013 at 7:44 AM, Wei Wang
>> > <we...@gmail.com>
>> > > > > > wrote:
>> > > > > > > > I am trying to use DiskDocValuesFormat for a particular
>> > > > > > > > BinaryDocValuesField. It seems there is no good examples
>> > showing
>> > > > > > > > how to
>> > > > > > > do
>> > > > > > > > this. The only hint I got from various docs and forums is
>> set
>> > > > > > > > some codec
>> > > > > > > in
>> > > > > > > > IndexWriter. Could someone give a few lines of code snippet
>> and
>> > > > > > > > show how
>> > > > > > > to
>> > > > > > > > set DiskDocValuesFormat?
>> > > > > > >
>> > > > > > > Lucene42Codec can be extended to specify the doc values format
>> > to
>> > > > > > > use on a per-field basis. For example:
>> > > > > > >
>> > > > > > > final Codec codec = new Lucene42Codec() {
>> > > > > > >   final Lucene42DocValuesFormat memoryDVFormat = new
>> > > > > > > Lucene42DocValuesFormat();
>> > > > > > >   final DiskDocValuesFormat diskDVFormat = new
>> > > > DiskDocValuesFormat();
>> > > > > > >   @Override
>> > > > > > >   public DocValuesFormat getDocValuesFormatForField(String
>> field)
>> > {
>> > > > > > >     if ("dv_mem".equals(field)) {
>> > > > > > >       // use Lucene42 for "dv_mem"
>> > > > > > >       return memoryDVFormat;
>> > > > > > >     } else {
>> > > > > > >       // use Disk otherwise
>> > > > > > >       return diskDVFormat;
>> > > > > > >     }
>> > > > > > >   }
>> > > > > > > };
>> > > > > > >
>> > > > > > > Then just pass this Codec instance to your IndexWriterConfig.
>> > > > > > >
>> > > > > > > --
>> > > > > > > Adrien
>> > > > > > >
>> > > > > > >
>> ------------------------------------------------------------------
>> > > > > > > --- To unsubscribe, e-mail:
>> > > > > > > java-user-unsubscribe@lucene.apache.org
>> > > > > > > For additional commands, e-mail: java-user-
>> > help@lucene.apache.org
>> > > > > > >
>> > > > > > >
>> > > > >
>> > > > >
>> > > > >
>> ---------------------------------------------------------------------
>> > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > > > > For additional commands, e-mail: java-user-help@lucene.apache.org
>> > > > >
>> > > > >
>> > >
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > > For additional commands, e-mail: java-user-help@lucene.apache.org
>> > >
>> > >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

Re: DiskDocValuesFormat

Posted by Wei Wang <we...@gmail.com>.

That makes sense.

BTW, I checked the jar file. Exactly as you pointed out, the services files
only contains info from lucene-core, without codec from lucene-codecs.
After adding the maven plugin, now it is running.

Thanks!

On Sun, Apr 14, 2013 at 3:26 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

> Hi,
>
> > Thanks for the hint. I will double check the jar file.
> >
> > I am just a bit puzzled that if the indexing step recognizes 'Disk'
> codec and
> > creates index properly, the merge step that immediately follows indexing
> > seems should also recognize the 'Disk' codec.
>
> This is easy to explain: By creating the custom Lucene42 Codec as a Class,
> you just define the disk format on the initial write (when *new* segments
> are written with new documents). While merging (or force-merging), Lucene
> uses the metadata that’s already on disk for the segments to merge. The
> metadata on disk contains the names of all codec components used. Those
> metadata is also used when opening IndexReaders. It will then use SPI and
> META-INF/services files to look up the class that is responsible for e.g.
> the "Disk" docvalues format. Without the META-INF data, Lucene cannot
> lookup the segment codecs.
>
> Uwe
>
> > On Sun, Apr 14, 2013 at 3:03 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> >
> > > Are you sure that you use the ServicesResourceTransformer in your
> > > shade config?
> > >
> > >
> > > http://maven.apache.org/plugins/maven-shade-
> > plugin/examples/resource-t
> > > ransformers.html#ServicesResourceTransformer
> > >
> > > The problem is: lucene-core.jar and lucene-codecs.jar both contain
> > > codec components and their classes are listed in META-INF/services. If
> > > those files are not correctly merged through this resource
> > > transformer, the resulting JAR file will miss some codecs.
> > >
> > > You can check correctness by opening the final JAR file with a ZIP
> > > program and check that all files in META-INF/services contain all
> > > entries merged from all Lucene JARs.
> > >
> > > Uwe
> > >
> > > -----
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > http://www.thetaphi.de
> > > eMail: uwe@thetaphi.de
> > >
> > >
> > > > -----Original Message-----
> > > > From: Wei Wang [mailto:welshwang@gmail.com]
> > > > Sent: Sunday, April 14, 2013 11:49 PM
> > > > To: java-user@lucene.apache.org
> > > > Subject: Re: DiskDocValuesFormat
> > > >
> > > > Yes, I used Maven Shade plugin, but still have this problem. Here is
> > > > the Maven output during packaging:
> > > >
> > > > [INFO] --- maven-shade-plugin:2.0:shade (default) @
> > > > audience-profile- indexer --- [INFO] Including
> > > > commons-collections:commons-
> > > > collections:jar:3.2.1 in the shaded jar.
> > > > [INFO] Including org.mockito:mockito-core:jar:1.9.5 in the shaded
> jar.
> > > > [INFO] Including org.hamcrest:hamcrest-core:jar:1.1 in the shaded
> jar.
> > > > [INFO] Including org.objenesis:objenesis:jar:1.0 in the shaded jar.
> > > > [INFO] Including junit:junit:jar:4.11 in the shaded jar.
> > > > [INFO] Including log4j:log4j:jar:1.2.17 in the shaded jar.
> > > > [INFO] Including org.apache.lucene:lucene-core:jar:4.2.1 in the
> > > > shaded
> > > jar.
> > > > [INFO] Including org.apache.lucene:lucene-queries:jar:4.2.1 in the
> > > > shaded jar.
> > > > [INFO] Including org.apache.lucene:lucene-queryparser:jar:4.2.1 in
> > > > the shaded jar.
> > > > [INFO] Including org.apache.lucene:lucene-sandbox:jar:4.2.1 in the
> > > > shaded jar.
> > > > [INFO] Including jakarta-regexp:jakarta-regexp:jar:1.4 in the shaded
> jar.
> > > > [INFO] Including org.apache.lucene:lucene-analyzers-common:jar:4.2.1
> > > > in the shaded jar.
> > > > [INFO] Including org.apache.lucene:lucene-codecs:jar:4.2.1 in the
> > > > shaded
> > > jar.
> > > > [INFO] Including commons-lang:commons-lang:jar:2.6 in the shaded jar.
> > > > [INFO] Including commons-logging:commons-logging:jar:1.1.1 in the
> > > > shaded jar.
> > > > [INFO] Including commons-io:commons-io:jar:2.4 in the shaded jar.
> > > > [INFO] Replacing original artifact with shaded artifact.
> > > >
> > > > On Sun, Apr 14, 2013 at 2:40 PM, Uwe Schindler <uw...@thetaphi.de>
> > wrote:
> > > >
> > > > > If you create a single JAR file out of multiple Lucene JAR files
> > > > > use a tool like Maven Shade plugin, otherwise, required metadata
> > > > > propreties
> > > > > (META-INF/services) files in the JAR files are not correctly
> > > > > merged together.
> > > > >
> > > > > -----
> > > > > Uwe Schindler
> > > > > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
> > > > > eMail: uwe@thetaphi.de
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Wei Wang [mailto:welshwang@gmail.com]
> > > > > > Sent: Sunday, April 14, 2013 11:30 PM
> > > > > > To: java-user@lucene.apache.org
> > > > > > Subject: Re: DiskDocValuesFormat
> > > > > >
> > > > > > Hi Adrien,
> > > > > >
> > > > > > The Lucene42Codec works well to generate the index with
> > > > > > DiskDocValuesFormat. But when I tried to merge the index segments
> > by
> > > > > > calling:
> > > > > >
> > > > > > IndexWriter iw = new IndexWriter(directory, iw_config); ...
> > > > > > iw.forceMerge(1);
> > > > > >
> > > > > > I got the following error message:
> > > > > >
> > > > > > Caused by: java.lang.IllegalArgumentException: A SPI class of
> type
> > > > > > org.apache.lucene.codecs.DocValuesFormat with name 'Disk' does
> > not
> > > > exist.
> > > > > > You need to add the corresponding JAR file supporting this SPI to
> > > > > > your classpath.The current classpath supports the following
> names:
> > > > > > [Lucene42]
> > > > > >
> > > > > > Any hint on this classpath problem? I have created a single jar
> file
> > > > > that has all
> > > > > > necessary dependencies, such as lucene-codecs-4.2.0.jar. And I
> > > > > > assume the indexing step works well, so Lucene already knows the
> > > > > > format with name 'Disk'.
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > On Sat, Apr 13, 2013 at 4:25 AM, Adrien Grand <jpountz@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > > > Hi Wei,
> > > > > > >
> > > > > > > On Sat, Apr 13, 2013 at 7:44 AM, Wei Wang
> > <we...@gmail.com>
> > > > > > wrote:
> > > > > > > > I am trying to use DiskDocValuesFormat for a particular
> > > > > > > > BinaryDocValuesField. It seems there is no good examples
> > showing
> > > > > > > > how to
> > > > > > > do
> > > > > > > > this. The only hint I got from various docs and forums is set
> > > > > > > > some codec
> > > > > > > in
> > > > > > > > IndexWriter. Could someone give a few lines of code snippet
> and
> > > > > > > > show how
> > > > > > > to
> > > > > > > > set DiskDocValuesFormat?
> > > > > > >
> > > > > > > Lucene42Codec can be extended to specify the doc values format
> > to
> > > > > > > use on a per-field basis. For example:
> > > > > > >
> > > > > > > final Codec codec = new Lucene42Codec() {
> > > > > > >   final Lucene42DocValuesFormat memoryDVFormat = new
> > > > > > > Lucene42DocValuesFormat();
> > > > > > >   final DiskDocValuesFormat diskDVFormat = new
> > > > DiskDocValuesFormat();
> > > > > > >   @Override
> > > > > > >   public DocValuesFormat getDocValuesFormatForField(String
> field)
> > {
> > > > > > >     if ("dv_mem".equals(field)) {
> > > > > > >       // use Lucene42 for "dv_mem"
> > > > > > >       return memoryDVFormat;
> > > > > > >     } else {
> > > > > > >       // use Disk otherwise
> > > > > > >       return diskDVFormat;
> > > > > > >     }
> > > > > > >   }
> > > > > > > };
> > > > > > >
> > > > > > > Then just pass this Codec instance to your IndexWriterConfig.
> > > > > > >
> > > > > > > --
> > > > > > > Adrien
> > > > > > >
> > > > > > >
> ------------------------------------------------------------------
> > > > > > > --- To unsubscribe, e-mail:
> > > > > > > java-user-unsubscribe@lucene.apache.org
> > > > > > > For additional commands, e-mail: java-user-
> > help@lucene.apache.org
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > > >
> > > > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: DiskDocValuesFormat

Posted by Uwe Schindler <uw...@thetaphi.de>.

Hi,

> Thanks for the hint. I will double check the jar file.
> 
> I am just a bit puzzled that if the indexing step recognizes 'Disk' codec and
> creates index properly, the merge step that immediately follows indexing
> seems should also recognize the 'Disk' codec.

This is easy to explain: By creating the custom Lucene42 Codec as a Class, you just define the disk format on the initial write (when *new* segments are written with new documents). While merging (or force-merging), Lucene uses the metadata that’s already on disk for the segments to merge. The metadata on disk contains the names of all codec components used. Those metadata is also used when opening IndexReaders. It will then use SPI and META-INF/services files to look up the class that is responsible for e.g. the "Disk" docvalues format. Without the META-INF data, Lucene cannot lookup the segment codecs.

Uwe

> On Sun, Apr 14, 2013 at 3:03 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> 
> > Are you sure that you use the ServicesResourceTransformer in your
> > shade config?
> >
> >
> > http://maven.apache.org/plugins/maven-shade-
> plugin/examples/resource-t
> > ransformers.html#ServicesResourceTransformer
> >
> > The problem is: lucene-core.jar and lucene-codecs.jar both contain
> > codec components and their classes are listed in META-INF/services. If
> > those files are not correctly merged through this resource
> > transformer, the resulting JAR file will miss some codecs.
> >
> > You can check correctness by opening the final JAR file with a ZIP
> > program and check that all files in META-INF/services contain all
> > entries merged from all Lucene JARs.
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> > > -----Original Message-----
> > > From: Wei Wang [mailto:welshwang@gmail.com]
> > > Sent: Sunday, April 14, 2013 11:49 PM
> > > To: java-user@lucene.apache.org
> > > Subject: Re: DiskDocValuesFormat
> > >
> > > Yes, I used Maven Shade plugin, but still have this problem. Here is
> > > the Maven output during packaging:
> > >
> > > [INFO] --- maven-shade-plugin:2.0:shade (default) @
> > > audience-profile- indexer --- [INFO] Including
> > > commons-collections:commons-
> > > collections:jar:3.2.1 in the shaded jar.
> > > [INFO] Including org.mockito:mockito-core:jar:1.9.5 in the shaded jar.
> > > [INFO] Including org.hamcrest:hamcrest-core:jar:1.1 in the shaded jar.
> > > [INFO] Including org.objenesis:objenesis:jar:1.0 in the shaded jar.
> > > [INFO] Including junit:junit:jar:4.11 in the shaded jar.
> > > [INFO] Including log4j:log4j:jar:1.2.17 in the shaded jar.
> > > [INFO] Including org.apache.lucene:lucene-core:jar:4.2.1 in the
> > > shaded
> > jar.
> > > [INFO] Including org.apache.lucene:lucene-queries:jar:4.2.1 in the
> > > shaded jar.
> > > [INFO] Including org.apache.lucene:lucene-queryparser:jar:4.2.1 in
> > > the shaded jar.
> > > [INFO] Including org.apache.lucene:lucene-sandbox:jar:4.2.1 in the
> > > shaded jar.
> > > [INFO] Including jakarta-regexp:jakarta-regexp:jar:1.4 in the shaded jar.
> > > [INFO] Including org.apache.lucene:lucene-analyzers-common:jar:4.2.1
> > > in the shaded jar.
> > > [INFO] Including org.apache.lucene:lucene-codecs:jar:4.2.1 in the
> > > shaded
> > jar.
> > > [INFO] Including commons-lang:commons-lang:jar:2.6 in the shaded jar.
> > > [INFO] Including commons-logging:commons-logging:jar:1.1.1 in the
> > > shaded jar.
> > > [INFO] Including commons-io:commons-io:jar:2.4 in the shaded jar.
> > > [INFO] Replacing original artifact with shaded artifact.
> > >
> > > On Sun, Apr 14, 2013 at 2:40 PM, Uwe Schindler <uw...@thetaphi.de>
> wrote:
> > >
> > > > If you create a single JAR file out of multiple Lucene JAR files
> > > > use a tool like Maven Shade plugin, otherwise, required metadata
> > > > propreties
> > > > (META-INF/services) files in the JAR files are not correctly
> > > > merged together.
> > > >
> > > > -----
> > > > Uwe Schindler
> > > > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
> > > > eMail: uwe@thetaphi.de
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Wei Wang [mailto:welshwang@gmail.com]
> > > > > Sent: Sunday, April 14, 2013 11:30 PM
> > > > > To: java-user@lucene.apache.org
> > > > > Subject: Re: DiskDocValuesFormat
> > > > >
> > > > > Hi Adrien,
> > > > >
> > > > > The Lucene42Codec works well to generate the index with
> > > > > DiskDocValuesFormat. But when I tried to merge the index segments
> by
> > > > > calling:
> > > > >
> > > > > IndexWriter iw = new IndexWriter(directory, iw_config); ...
> > > > > iw.forceMerge(1);
> > > > >
> > > > > I got the following error message:
> > > > >
> > > > > Caused by: java.lang.IllegalArgumentException: A SPI class of type
> > > > > org.apache.lucene.codecs.DocValuesFormat with name 'Disk' does
> not
> > > exist.
> > > > > You need to add the corresponding JAR file supporting this SPI to
> > > > > your classpath.The current classpath supports the following names:
> > > > > [Lucene42]
> > > > >
> > > > > Any hint on this classpath problem? I have created a single jar file
> > > > that has all
> > > > > necessary dependencies, such as lucene-codecs-4.2.0.jar. And I
> > > > > assume the indexing step works well, so Lucene already knows the
> > > > > format with name 'Disk'.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > On Sat, Apr 13, 2013 at 4:25 AM, Adrien Grand <jp...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi Wei,
> > > > > >
> > > > > > On Sat, Apr 13, 2013 at 7:44 AM, Wei Wang
> <we...@gmail.com>
> > > > > wrote:
> > > > > > > I am trying to use DiskDocValuesFormat for a particular
> > > > > > > BinaryDocValuesField. It seems there is no good examples
> showing
> > > > > > > how to
> > > > > > do
> > > > > > > this. The only hint I got from various docs and forums is set
> > > > > > > some codec
> > > > > > in
> > > > > > > IndexWriter. Could someone give a few lines of code snippet and
> > > > > > > show how
> > > > > > to
> > > > > > > set DiskDocValuesFormat?
> > > > > >
> > > > > > Lucene42Codec can be extended to specify the doc values format
> to
> > > > > > use on a per-field basis. For example:
> > > > > >
> > > > > > final Codec codec = new Lucene42Codec() {
> > > > > >   final Lucene42DocValuesFormat memoryDVFormat = new
> > > > > > Lucene42DocValuesFormat();
> > > > > >   final DiskDocValuesFormat diskDVFormat = new
> > > DiskDocValuesFormat();
> > > > > >   @Override
> > > > > >   public DocValuesFormat getDocValuesFormatForField(String field)
> {
> > > > > >     if ("dv_mem".equals(field)) {
> > > > > >       // use Lucene42 for "dv_mem"
> > > > > >       return memoryDVFormat;
> > > > > >     } else {
> > > > > >       // use Disk otherwise
> > > > > >       return diskDVFormat;
> > > > > >     }
> > > > > >   }
> > > > > > };
> > > > > >
> > > > > > Then just pass this Codec instance to your IndexWriterConfig.
> > > > > >
> > > > > > --
> > > > > > Adrien
> > > > > >
> > > > > > ------------------------------------------------------------------
> > > > > > --- To unsubscribe, e-mail:
> > > > > > java-user-unsubscribe@lucene.apache.org
> > > > > > For additional commands, e-mail: java-user-
> help@lucene.apache.org
> > > > > >
> > > > > >
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: DiskDocValuesFormat

Posted by Wei Wang <we...@gmail.com>.

Thanks for the hint. I will double check the jar file.

I am just a bit puzzled that if the indexing step recognizes 'Disk' codec
and creates index properly, the merge step that immediately follows
indexing seems should also recognize the 'Disk' codec.

On Sun, Apr 14, 2013 at 3:03 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

> Are you sure that you use the ServicesResourceTransformer in your shade
> config?
>
>
> http://maven.apache.org/plugins/maven-shade-plugin/examples/resource-transformers.html#ServicesResourceTransformer
>
> The problem is: lucene-core.jar and lucene-codecs.jar both contain codec
> components and their classes are listed in META-INF/services. If those
> files are not correctly merged through this resource transformer, the
> resulting JAR file will miss some codecs.
>
> You can check correctness by opening the final JAR file with a ZIP program
> and check that all files in META-INF/services contain all entries merged
> from all Lucene JARs.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: Wei Wang [mailto:welshwang@gmail.com]
> > Sent: Sunday, April 14, 2013 11:49 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: DiskDocValuesFormat
> >
> > Yes, I used Maven Shade plugin, but still have this problem. Here is the
> > Maven output during packaging:
> >
> > [INFO] --- maven-shade-plugin:2.0:shade (default) @ audience-profile-
> > indexer --- [INFO] Including commons-collections:commons-
> > collections:jar:3.2.1 in the shaded jar.
> > [INFO] Including org.mockito:mockito-core:jar:1.9.5 in the shaded jar.
> > [INFO] Including org.hamcrest:hamcrest-core:jar:1.1 in the shaded jar.
> > [INFO] Including org.objenesis:objenesis:jar:1.0 in the shaded jar.
> > [INFO] Including junit:junit:jar:4.11 in the shaded jar.
> > [INFO] Including log4j:log4j:jar:1.2.17 in the shaded jar.
> > [INFO] Including org.apache.lucene:lucene-core:jar:4.2.1 in the shaded
> jar.
> > [INFO] Including org.apache.lucene:lucene-queries:jar:4.2.1 in the shaded
> > jar.
> > [INFO] Including org.apache.lucene:lucene-queryparser:jar:4.2.1 in the
> > shaded jar.
> > [INFO] Including org.apache.lucene:lucene-sandbox:jar:4.2.1 in the shaded
> > jar.
> > [INFO] Including jakarta-regexp:jakarta-regexp:jar:1.4 in the shaded jar.
> > [INFO] Including org.apache.lucene:lucene-analyzers-common:jar:4.2.1 in
> > the shaded jar.
> > [INFO] Including org.apache.lucene:lucene-codecs:jar:4.2.1 in the shaded
> jar.
> > [INFO] Including commons-lang:commons-lang:jar:2.6 in the shaded jar.
> > [INFO] Including commons-logging:commons-logging:jar:1.1.1 in the shaded
> > jar.
> > [INFO] Including commons-io:commons-io:jar:2.4 in the shaded jar.
> > [INFO] Replacing original artifact with shaded artifact.
> >
> > On Sun, Apr 14, 2013 at 2:40 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> >
> > > If you create a single JAR file out of multiple Lucene JAR files use a
> > > tool like Maven Shade plugin, otherwise, required metadata propreties
> > > (META-INF/services) files in the JAR files are not correctly merged
> > > together.
> > >
> > > -----
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > http://www.thetaphi.de
> > > eMail: uwe@thetaphi.de
> > >
> > >
> > > > -----Original Message-----
> > > > From: Wei Wang [mailto:welshwang@gmail.com]
> > > > Sent: Sunday, April 14, 2013 11:30 PM
> > > > To: java-user@lucene.apache.org
> > > > Subject: Re: DiskDocValuesFormat
> > > >
> > > > Hi Adrien,
> > > >
> > > > The Lucene42Codec works well to generate the index with
> > > > DiskDocValuesFormat. But when I tried to merge the index segments by
> > > > calling:
> > > >
> > > > IndexWriter iw = new IndexWriter(directory, iw_config); ...
> > > > iw.forceMerge(1);
> > > >
> > > > I got the following error message:
> > > >
> > > > Caused by: java.lang.IllegalArgumentException: A SPI class of type
> > > > org.apache.lucene.codecs.DocValuesFormat with name 'Disk' does not
> > exist.
> > > > You need to add the corresponding JAR file supporting this SPI to
> > > > your classpath.The current classpath supports the following names:
> > > > [Lucene42]
> > > >
> > > > Any hint on this classpath problem? I have created a single jar file
> > > that has all
> > > > necessary dependencies, such as lucene-codecs-4.2.0.jar. And I
> > > > assume the indexing step works well, so Lucene already knows the
> > > > format with name 'Disk'.
> > > >
> > > > Thanks.
> > > >
> > > > On Sat, Apr 13, 2013 at 4:25 AM, Adrien Grand <jp...@gmail.com>
> > wrote:
> > > >
> > > > > Hi Wei,
> > > > >
> > > > > On Sat, Apr 13, 2013 at 7:44 AM, Wei Wang <we...@gmail.com>
> > > > wrote:
> > > > > > I am trying to use DiskDocValuesFormat for a particular
> > > > > > BinaryDocValuesField. It seems there is no good examples showing
> > > > > > how to
> > > > > do
> > > > > > this. The only hint I got from various docs and forums is set
> > > > > > some codec
> > > > > in
> > > > > > IndexWriter. Could someone give a few lines of code snippet and
> > > > > > show how
> > > > > to
> > > > > > set DiskDocValuesFormat?
> > > > >
> > > > > Lucene42Codec can be extended to specify the doc values format to
> > > > > use on a per-field basis. For example:
> > > > >
> > > > > final Codec codec = new Lucene42Codec() {
> > > > >   final Lucene42DocValuesFormat memoryDVFormat = new
> > > > > Lucene42DocValuesFormat();
> > > > >   final DiskDocValuesFormat diskDVFormat = new
> > DiskDocValuesFormat();
> > > > >   @Override
> > > > >   public DocValuesFormat getDocValuesFormatForField(String field) {
> > > > >     if ("dv_mem".equals(field)) {
> > > > >       // use Lucene42 for "dv_mem"
> > > > >       return memoryDVFormat;
> > > > >     } else {
> > > > >       // use Disk otherwise
> > > > >       return diskDVFormat;
> > > > >     }
> > > > >   }
> > > > > };
> > > > >
> > > > > Then just pass this Codec instance to your IndexWriterConfig.
> > > > >
> > > > > --
> > > > > Adrien
> > > > >
> > > > > ------------------------------------------------------------------
> > > > > --- To unsubscribe, e-mail:
> > > > > java-user-unsubscribe@lucene.apache.org
> > > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > > >
> > > > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: DiskDocValuesFormat

Posted by Uwe Schindler <uw...@thetaphi.de>.

Are you sure that you use the ServicesResourceTransformer in your shade config?

http://maven.apache.org/plugins/maven-shade-plugin/examples/resource-transformers.html#ServicesResourceTransformer

The problem is: lucene-core.jar and lucene-codecs.jar both contain codec components and their classes are listed in META-INF/services. If those files are not correctly merged through this resource transformer, the resulting JAR file will miss some codecs.

You can check correctness by opening the final JAR file with a ZIP program and check that all files in META-INF/services contain all entries merged from all Lucene JARs.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Wei Wang [mailto:welshwang@gmail.com]
> Sent: Sunday, April 14, 2013 11:49 PM
> To: java-user@lucene.apache.org
> Subject: Re: DiskDocValuesFormat
> 
> Yes, I used Maven Shade plugin, but still have this problem. Here is the
> Maven output during packaging:
> 
> [INFO] --- maven-shade-plugin:2.0:shade (default) @ audience-profile-
> indexer --- [INFO] Including commons-collections:commons-
> collections:jar:3.2.1 in the shaded jar.
> [INFO] Including org.mockito:mockito-core:jar:1.9.5 in the shaded jar.
> [INFO] Including org.hamcrest:hamcrest-core:jar:1.1 in the shaded jar.
> [INFO] Including org.objenesis:objenesis:jar:1.0 in the shaded jar.
> [INFO] Including junit:junit:jar:4.11 in the shaded jar.
> [INFO] Including log4j:log4j:jar:1.2.17 in the shaded jar.
> [INFO] Including org.apache.lucene:lucene-core:jar:4.2.1 in the shaded jar.
> [INFO] Including org.apache.lucene:lucene-queries:jar:4.2.1 in the shaded
> jar.
> [INFO] Including org.apache.lucene:lucene-queryparser:jar:4.2.1 in the
> shaded jar.
> [INFO] Including org.apache.lucene:lucene-sandbox:jar:4.2.1 in the shaded
> jar.
> [INFO] Including jakarta-regexp:jakarta-regexp:jar:1.4 in the shaded jar.
> [INFO] Including org.apache.lucene:lucene-analyzers-common:jar:4.2.1 in
> the shaded jar.
> [INFO] Including org.apache.lucene:lucene-codecs:jar:4.2.1 in the shaded jar.
> [INFO] Including commons-lang:commons-lang:jar:2.6 in the shaded jar.
> [INFO] Including commons-logging:commons-logging:jar:1.1.1 in the shaded
> jar.
> [INFO] Including commons-io:commons-io:jar:2.4 in the shaded jar.
> [INFO] Replacing original artifact with shaded artifact.
> 
> On Sun, Apr 14, 2013 at 2:40 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> 
> > If you create a single JAR file out of multiple Lucene JAR files use a
> > tool like Maven Shade plugin, otherwise, required metadata propreties
> > (META-INF/services) files in the JAR files are not correctly merged
> > together.
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> > > -----Original Message-----
> > > From: Wei Wang [mailto:welshwang@gmail.com]
> > > Sent: Sunday, April 14, 2013 11:30 PM
> > > To: java-user@lucene.apache.org
> > > Subject: Re: DiskDocValuesFormat
> > >
> > > Hi Adrien,
> > >
> > > The Lucene42Codec works well to generate the index with
> > > DiskDocValuesFormat. But when I tried to merge the index segments by
> > > calling:
> > >
> > > IndexWriter iw = new IndexWriter(directory, iw_config); ...
> > > iw.forceMerge(1);
> > >
> > > I got the following error message:
> > >
> > > Caused by: java.lang.IllegalArgumentException: A SPI class of type
> > > org.apache.lucene.codecs.DocValuesFormat with name 'Disk' does not
> exist.
> > > You need to add the corresponding JAR file supporting this SPI to
> > > your classpath.The current classpath supports the following names:
> > > [Lucene42]
> > >
> > > Any hint on this classpath problem? I have created a single jar file
> > that has all
> > > necessary dependencies, such as lucene-codecs-4.2.0.jar. And I
> > > assume the indexing step works well, so Lucene already knows the
> > > format with name 'Disk'.
> > >
> > > Thanks.
> > >
> > > On Sat, Apr 13, 2013 at 4:25 AM, Adrien Grand <jp...@gmail.com>
> wrote:
> > >
> > > > Hi Wei,
> > > >
> > > > On Sat, Apr 13, 2013 at 7:44 AM, Wei Wang <we...@gmail.com>
> > > wrote:
> > > > > I am trying to use DiskDocValuesFormat for a particular
> > > > > BinaryDocValuesField. It seems there is no good examples showing
> > > > > how to
> > > > do
> > > > > this. The only hint I got from various docs and forums is set
> > > > > some codec
> > > > in
> > > > > IndexWriter. Could someone give a few lines of code snippet and
> > > > > show how
> > > > to
> > > > > set DiskDocValuesFormat?
> > > >
> > > > Lucene42Codec can be extended to specify the doc values format to
> > > > use on a per-field basis. For example:
> > > >
> > > > final Codec codec = new Lucene42Codec() {
> > > >   final Lucene42DocValuesFormat memoryDVFormat = new
> > > > Lucene42DocValuesFormat();
> > > >   final DiskDocValuesFormat diskDVFormat = new
> DiskDocValuesFormat();
> > > >   @Override
> > > >   public DocValuesFormat getDocValuesFormatForField(String field) {
> > > >     if ("dv_mem".equals(field)) {
> > > >       // use Lucene42 for "dv_mem"
> > > >       return memoryDVFormat;
> > > >     } else {
> > > >       // use Disk otherwise
> > > >       return diskDVFormat;
> > > >     }
> > > >   }
> > > > };
> > > >
> > > > Then just pass this Codec instance to your IndexWriterConfig.
> > > >
> > > > --
> > > > Adrien
> > > >
> > > > ------------------------------------------------------------------
> > > > --- To unsubscribe, e-mail:
> > > > java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: DiskDocValuesFormat

Posted by Wei Wang <we...@gmail.com>.

Yes, I used Maven Shade plugin, but still have this problem. Here is the
Maven output during packaging:

[INFO] --- maven-shade-plugin:2.0:shade (default) @
audience-profile-indexer ---
[INFO] Including commons-collections:commons-collections:jar:3.2.1 in the
shaded jar.
[INFO] Including org.mockito:mockito-core:jar:1.9.5 in the shaded jar.
[INFO] Including org.hamcrest:hamcrest-core:jar:1.1 in the shaded jar.
[INFO] Including org.objenesis:objenesis:jar:1.0 in the shaded jar.
[INFO] Including junit:junit:jar:4.11 in the shaded jar.
[INFO] Including log4j:log4j:jar:1.2.17 in the shaded jar.
[INFO] Including org.apache.lucene:lucene-core:jar:4.2.1 in the shaded jar.
[INFO] Including org.apache.lucene:lucene-queries:jar:4.2.1 in the shaded
jar.
[INFO] Including org.apache.lucene:lucene-queryparser:jar:4.2.1 in the
shaded jar.
[INFO] Including org.apache.lucene:lucene-sandbox:jar:4.2.1 in the shaded
jar.
[INFO] Including jakarta-regexp:jakarta-regexp:jar:1.4 in the shaded jar.
[INFO] Including org.apache.lucene:lucene-analyzers-common:jar:4.2.1 in the
shaded jar.
[INFO] Including org.apache.lucene:lucene-codecs:jar:4.2.1 in the shaded
jar.
[INFO] Including commons-lang:commons-lang:jar:2.6 in the shaded jar.
[INFO] Including commons-logging:commons-logging:jar:1.1.1 in the shaded
jar.
[INFO] Including commons-io:commons-io:jar:2.4 in the shaded jar.
[INFO] Replacing original artifact with shaded artifact.

On Sun, Apr 14, 2013 at 2:40 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

> If you create a single JAR file out of multiple Lucene JAR files use a
> tool like Maven Shade plugin, otherwise, required metadata propreties
> (META-INF/services) files in the JAR files are not correctly merged
> together.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: Wei Wang [mailto:welshwang@gmail.com]
> > Sent: Sunday, April 14, 2013 11:30 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: DiskDocValuesFormat
> >
> > Hi Adrien,
> >
> > The Lucene42Codec works well to generate the index with
> > DiskDocValuesFormat. But when I tried to merge the index segments by
> > calling:
> >
> > IndexWriter iw = new IndexWriter(directory, iw_config); ...
> > iw.forceMerge(1);
> >
> > I got the following error message:
> >
> > Caused by: java.lang.IllegalArgumentException: A SPI class of type
> > org.apache.lucene.codecs.DocValuesFormat with name 'Disk' does not exist.
> > You need to add the corresponding JAR file supporting this SPI to your
> > classpath.The current classpath supports the following names: [Lucene42]
> >
> > Any hint on this classpath problem? I have created a single jar file
> that has all
> > necessary dependencies, such as lucene-codecs-4.2.0.jar. And I assume the
> > indexing step works well, so Lucene already knows the format with name
> > 'Disk'.
> >
> > Thanks.
> >
> > On Sat, Apr 13, 2013 at 4:25 AM, Adrien Grand <jp...@gmail.com> wrote:
> >
> > > Hi Wei,
> > >
> > > On Sat, Apr 13, 2013 at 7:44 AM, Wei Wang <we...@gmail.com>
> > wrote:
> > > > I am trying to use DiskDocValuesFormat for a particular
> > > > BinaryDocValuesField. It seems there is no good examples showing how
> > > > to
> > > do
> > > > this. The only hint I got from various docs and forums is set some
> > > > codec
> > > in
> > > > IndexWriter. Could someone give a few lines of code snippet and show
> > > > how
> > > to
> > > > set DiskDocValuesFormat?
> > >
> > > Lucene42Codec can be extended to specify the doc values format to use
> > > on a per-field basis. For example:
> > >
> > > final Codec codec = new Lucene42Codec() {
> > >   final Lucene42DocValuesFormat memoryDVFormat = new
> > > Lucene42DocValuesFormat();
> > >   final DiskDocValuesFormat diskDVFormat = new DiskDocValuesFormat();
> > >   @Override
> > >   public DocValuesFormat getDocValuesFormatForField(String field) {
> > >     if ("dv_mem".equals(field)) {
> > >       // use Lucene42 for "dv_mem"
> > >       return memoryDVFormat;
> > >     } else {
> > >       // use Disk otherwise
> > >       return diskDVFormat;
> > >     }
> > >   }
> > > };
> > >
> > > Then just pass this Codec instance to your IndexWriterConfig.
> > >
> > > --
> > > Adrien
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: DiskDocValuesFormat

Posted by Uwe Schindler <uw...@thetaphi.de>.

If you create a single JAR file out of multiple Lucene JAR files use a tool like Maven Shade plugin, otherwise, required metadata propreties (META-INF/services) files in the JAR files are not correctly merged together.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Wei Wang [mailto:welshwang@gmail.com]
> Sent: Sunday, April 14, 2013 11:30 PM
> To: java-user@lucene.apache.org
> Subject: Re: DiskDocValuesFormat
> 
> Hi Adrien,
> 
> The Lucene42Codec works well to generate the index with
> DiskDocValuesFormat. But when I tried to merge the index segments by
> calling:
> 
> IndexWriter iw = new IndexWriter(directory, iw_config); ...
> iw.forceMerge(1);
> 
> I got the following error message:
> 
> Caused by: java.lang.IllegalArgumentException: A SPI class of type
> org.apache.lucene.codecs.DocValuesFormat with name 'Disk' does not exist.
> You need to add the corresponding JAR file supporting this SPI to your
> classpath.The current classpath supports the following names: [Lucene42]
> 
> Any hint on this classpath problem? I have created a single jar file that has all
> necessary dependencies, such as lucene-codecs-4.2.0.jar. And I assume the
> indexing step works well, so Lucene already knows the format with name
> 'Disk'.
> 
> Thanks.
> 
> On Sat, Apr 13, 2013 at 4:25 AM, Adrien Grand <jp...@gmail.com> wrote:
> 
> > Hi Wei,
> >
> > On Sat, Apr 13, 2013 at 7:44 AM, Wei Wang <we...@gmail.com>
> wrote:
> > > I am trying to use DiskDocValuesFormat for a particular
> > > BinaryDocValuesField. It seems there is no good examples showing how
> > > to
> > do
> > > this. The only hint I got from various docs and forums is set some
> > > codec
> > in
> > > IndexWriter. Could someone give a few lines of code snippet and show
> > > how
> > to
> > > set DiskDocValuesFormat?
> >
> > Lucene42Codec can be extended to specify the doc values format to use
> > on a per-field basis. For example:
> >
> > final Codec codec = new Lucene42Codec() {
> >   final Lucene42DocValuesFormat memoryDVFormat = new
> > Lucene42DocValuesFormat();
> >   final DiskDocValuesFormat diskDVFormat = new DiskDocValuesFormat();
> >   @Override
> >   public DocValuesFormat getDocValuesFormatForField(String field) {
> >     if ("dv_mem".equals(field)) {
> >       // use Lucene42 for "dv_mem"
> >       return memoryDVFormat;
> >     } else {
> >       // use Disk otherwise
> >       return diskDVFormat;
> >     }
> >   }
> > };
> >
> > Then just pass this Codec instance to your IndexWriterConfig.
> >
> > --
> > Adrien
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: DiskDocValuesFormat

Posted by Wei Wang <we...@gmail.com>.

Hi Adrien,

The Lucene42Codec works well to generate the index with
DiskDocValuesFormat. But when I tried to merge the index segments by
calling:

IndexWriter iw = new IndexWriter(directory, iw_config);
...
iw.forceMerge(1);

I got the following error message:

Caused by: java.lang.IllegalArgumentException: A SPI class of type
org.apache.lucene.codecs.DocValuesFormat with name 'Disk' does not exist.
You need to add the corresponding JAR file supporting this SPI to your
classpath.The current classpath supports the following names: [Lucene42]

Any hint on this classpath problem? I have created a single jar file that
has all necessary dependencies, such as lucene-codecs-4.2.0.jar. And I
assume the indexing step works well, so Lucene already knows the format
with name 'Disk'.

Thanks.

On Sat, Apr 13, 2013 at 4:25 AM, Adrien Grand <jp...@gmail.com> wrote:

> Hi Wei,
>
> On Sat, Apr 13, 2013 at 7:44 AM, Wei Wang <we...@gmail.com> wrote:
> > I am trying to use DiskDocValuesFormat for a particular
> > BinaryDocValuesField. It seems there is no good examples showing how to
> do
> > this. The only hint I got from various docs and forums is set some codec
> in
> > IndexWriter. Could someone give a few lines of code snippet and show how
> to
> > set DiskDocValuesFormat?
>
> Lucene42Codec can be extended to specify the doc values format to use
> on a per-field basis. For example:
>
> final Codec codec = new Lucene42Codec() {
>   final Lucene42DocValuesFormat memoryDVFormat = new
> Lucene42DocValuesFormat();
>   final DiskDocValuesFormat diskDVFormat = new DiskDocValuesFormat();
>   @Override
>   public DocValuesFormat getDocValuesFormatForField(String field) {
>     if ("dv_mem".equals(field)) {
>       // use Lucene42 for "dv_mem"
>       return memoryDVFormat;
>     } else {
>       // use Disk otherwise
>       return diskDVFormat;
>     }
>   }
> };
>
> Then just pass this Codec instance to your IndexWriterConfig.
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: DiskDocValuesFormat

Posted by Wei Wang <we...@gmail.com>.

Hi Adrien,

Thanks for your example. Really helpful!

Wei

On Sat, Apr 13, 2013 at 4:25 AM, Adrien Grand <jp...@gmail.com> wrote:

> Hi Wei,
>
> On Sat, Apr 13, 2013 at 7:44 AM, Wei Wang <we...@gmail.com> wrote:
> > I am trying to use DiskDocValuesFormat for a particular
> > BinaryDocValuesField. It seems there is no good examples showing how to
> do
> > this. The only hint I got from various docs and forums is set some codec
> in
> > IndexWriter. Could someone give a few lines of code snippet and show how
> to
> > set DiskDocValuesFormat?
>
> Lucene42Codec can be extended to specify the doc values format to use
> on a per-field basis. For example:
>
> final Codec codec = new Lucene42Codec() {
>   final Lucene42DocValuesFormat memoryDVFormat = new
> Lucene42DocValuesFormat();
>   final DiskDocValuesFormat diskDVFormat = new DiskDocValuesFormat();
>   @Override
>   public DocValuesFormat getDocValuesFormatForField(String field) {
>     if ("dv_mem".equals(field)) {
>       // use Lucene42 for "dv_mem"
>       return memoryDVFormat;
>     } else {
>       // use Disk otherwise
>       return diskDVFormat;
>     }
>   }
> };
>
> Then just pass this Codec instance to your IndexWriterConfig.
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: DiskDocValuesFormat

Posted by Adrien Grand <jp...@gmail.com>.

Hi Wei,

On Sat, Apr 13, 2013 at 7:44 AM, Wei Wang <we...@gmail.com> wrote:
> I am trying to use DiskDocValuesFormat for a particular
> BinaryDocValuesField. It seems there is no good examples showing how to do
> this. The only hint I got from various docs and forums is set some codec in
> IndexWriter. Could someone give a few lines of code snippet and show how to
> set DiskDocValuesFormat?

Lucene42Codec can be extended to specify the doc values format to use
on a per-field basis. For example:

final Codec codec = new Lucene42Codec() {
  final Lucene42DocValuesFormat memoryDVFormat = new Lucene42DocValuesFormat();
  final DiskDocValuesFormat diskDVFormat = new DiskDocValuesFormat();
  @Override
  public DocValuesFormat getDocValuesFormatForField(String field) {
    if ("dv_mem".equals(field)) {
      // use Lucene42 for "dv_mem"
      return memoryDVFormat;
    } else {
      // use Disk otherwise
      return diskDVFormat;
    }
  }
};

Then just pass this Codec instance to your IndexWriterConfig.

-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org