You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Florian Buetow <fb...@mimecast.com> on 2017/06/30 01:45:26 UTC

Maintaining sorting order (stored fields vs DocValue fields) while upgrading Lucene version

Hi,

I am in the process of updating a large index from Lucene 4.x to 5.x and have two questions related to the sorting order.

1. Is it correct that stored fields can only be sorted on if they become a DocValue field in 5.x?
2. When "updating" stored fields to DocValue fields , is it required to update all documents in the index at the same time?

Thank you in advance for your help.

Best regards
Florian
Hi,

I am in the process of updating a large index from Lucene 4.x to 5.x and have two questions related to the sorting order.

1. Is it correct that stored fields can only be sorted on if they become a DocValue field in 5.x?
2. When "updating" stored fields to DocValue fields , is it required to update all documents in the index at the same time?

Thank you in advance for your help.

Best regards
Florian


Disclaimer

This email, sent at @|time|@ on @|date|@ from @|from|email|@ to @|to|email|@ has been scanned for viruses and malware by Mimecast, an innovator in software as a service (SaaS) for business. @|from|company|@'s email continuity, security, archiving and compliancy is managed by Mimecast's unified email management platform. 
To find out more, email info@mimecast.co.za or request a demo.

Mimecast SA (Pty) Ltd is a registered company within the Republic of South Africa, company registration number: 2004/000965/07  VAT No. 4650210547



Integer Range Query in Lucene 4.10.4 not working as expected.

Posted by andi rexha <a_...@hotmail.com>.
I have a numeric range query to perform in an index. I begin by indexing a document with a field value of "300". When I search for a range [100 TO 400] I get results from the search operation. Strangely enough, when I search for [100 TO 4000], I don't get any search results.


Here is a code snippet for the test I perform:


public static void main(String[] args) throws IOException {

String fileName = args[0];

File file = new File(fileName);

FSDirectory directory = FSDirectory.open(file);

IndexWriterConfig conf = new IndexWriterConfig(Version.LUCENE_4_9_1, new WhitespaceAnalyzer());

IndexWriter indexWriter = new IndexWriter(directory, conf);

indexWriter.deleteAll();

indexWriter.commit();

//creating document

Document doc = new Document();

FieldType fieldType = new FieldType();

fieldType.setIndexed(true);

fieldType.setNumericType(NumericType.INT);

IntField intField = new IntField("field1", 300, fieldType);

doc.add(intField);

indexWriter.addDocument(doc);

indexWriter.commit();

indexWriter.close();

DirectoryReader directoryReader = DirectoryReader.open(directory);

IndexSearcher indexSearcher = new IndexSearcher(directoryReader);

//searching for numbers >= 100 and <= 400

Query rangeQueryWorking = NumericRangeQuery.newIntRange("field1", 100, 400, true, true);

TopDocs resultsWorking = indexSearcher.search(rangeQueryWorking, 10);

//searching for numbers >= 100 and <= 4000

Query rangeQueryNotWorking = NumericRangeQuery.newIntRange("field1", 100, 4000, true, true);

TopDocs resultsNotWorking = indexSearcher.search(rangeQueryNotWorking, 10);

//returns 1 as expected

System.out.println(resultsWorking.totalHits);

//returns 0 but expected 1

System.out.println(resultsNotWorking.totalHits);

}



Can someone help me with this issue?


Thank you in advance!


Re: Maintaining sorting order (stored fields vs DocValue fields) while upgrading Lucene version

Posted by András Péteri <ap...@b2international.com>.
Hi,

Note that If you are using Lucene directly, 5.x introduced LUCENE-6064 [1]
[2], which adds checks to ensure that the sort field has a corresponding
DocValue of the expected type. Indexed fields can only be used for sorting
via an UninvertingReader, at a cost of increased heap usage [3]. Solr
handles the "index-only" cases transparently [4].

[1] https://issues.apache.org/jira/browse/LUCENE-6064
[2] https://github.com/apache/lucene-solr/commit/e696967770b3505
18fcd3f88050511349d5607a6#diff-9e0559ce8f7317732e59c3be337716a2R62
[3] https://github.com/apache/lucene-solr/blob/releases/lucene-
solr/5.5.4/lucene/misc/src/java/org/apache/lucene/uninver
ting/UninvertingReader.java#L49
[4] https://cwiki.apache.org/confluence/display/solr/
Common+Query+Parameters#CommonQueryParameters-ThesortParameter

Regards,
András

On Fri, Jun 30, 2017 at 4:44 AM, Erick Erickson <er...@gmail.com>
wrote:

> 1>  Is it correct that stored fields can only be sorted on if they become a
> DocValue field in 5.x
>
> no. Indexed-only fields can still be used to sort. DocValues are just more
> efficient at load time and don't consume as much of the Java heap.
> Essentially this latter can be thought of as moving the "uninverted"
> structure from heap to MMap space.
>
> That said, I can't think of any _good_ reason to continue to sort on
> indexed="true" docValues="false" fields. Use DocValues.
>
> 2> When "updating" stored fields to DocValue fields , is it required to
> update all documents in the index at the same time?
>
> Yes. I'm assuming here you're talking about changing the schema definition
> to include docValues="true". In general I advocate re-indexing everything
> when upgrading major versions. Technically if you want to some
> "interesting" things with low-level Lucene you can upgrade your index, Uwe
> Schindler outlined the process. I copied what he said but don't understand
> it ;).
>
> I've seen some situations where people will define a _new_ field with both,
> gradually re-index and when all the docs have been updated switch to using
> the new field. That assumes that it's just impossible to reindex all at
> once.
>
> The question I have to ask... Why upgrade just to 5x? Solr is releasing 7.0
> very shortly. I can't think of a really good reason not to jump to 6x
> unless you have heavy customizations and the like. Even in that case you'll
> have to upgrade eventually. And if you wind up re-indexing everything
> anyway, it seems like stopping at 5x is unnecessary.
>
> Best,
> Erick
>
> On Thu, Jun 29, 2017 at 6:45 PM, Florian Buetow <fb...@mimecast.com>
> wrote:
>
> >
> > Hi,
> >
> >
> >
> > I am in the process of updating a large index from Lucene 4.x to 5.x and
> > have two questions related to the sorting order.
> >
> >
> >
> > 1. Is it correct that stored fields can only be sorted on if they become
> a
> > DocValue field in 5.x?
> >
> > 2. When "updating" stored fields to DocValue fields , is it required to
> > update all documents in the index at the same time?
> >
> >
> >
> > Thank you in advance for your help.
> >
> >
> >
> > Best regards
> >
> > Florian
> >
> >
> >
> >
> > Florian Buetow m: +44 7702 557267 <+44%207702%20557267> www.mimecast.com
> > Software Engineer p: +44 207 847 8700 <+44%2020%207847%208700> Address
> > click here <http://www.mimecast.com/About-us/Contact-us/>
> > ------------------------------
> > [image: Mimecast Logo]
> > <https://eu-api.mimecast.com/s/click/XujAZpejvFW2OIhYbUKIGzr
> oV6Ul00G1pndONKfdiASkL7P_JTj_EbOwSR6KJeM3Kvz0IZRCB8acaJBqWJO
> x38gmrExje8x_ZkiWP_1hffShQenbwEWz_1oZ1cbKQQG4IfVy_GaWWH_nasT
> a-CxfcIhZmNdIYJbBmmJS3QzSJiixOWl8enXqQrGcgifXyDPE2X25_Gibsklnspkf31Weag>
> >
> > [image: Linked In]
> > <https://eu-api.mimecast.com/s/click/F2A44qlyvx7D1oreXULOBfH
> yqFe-ucyZnbwU4nyMvdvUEGcUvIxVnjwbq5maMNXUvt3rIuwP0RRogPF5-Da
> KXXVPCRBYg4JXq_Wd9owjxjdIhbzjJFyQw0PStTFX85RQ-1-DXs8HNoBxB7O
> UVIfjBbm80zerQX9iyu2hUqSsBeorOQA5m0DSs02m-WfDE0D8Fk5QhYYVuNml1jnwK04O1A>
> > [image: You Tube]
> > <https://eu-api.mimecast.com/s/click/XujAZpejvFW2OIhYbUKIG2n
> AuYW6P9Ht8pYvS25cjsRqiqDLULnAw7_zVKh7qu0Cj5DlaDrIyaXNgxRDQ28
> 91XVPCRBYg4JXq_Wd9owjxjdIhbzjJFyQw0PStTFX85RQ-1-DXs8HNoBxB7O
> UVIfjBbm80zerQX9iyu2hUqSsBeorOQA5m0DSs02m-WfDE0D8Z2TW5muLnp5v1RvqNV85XA>
> > [image: Facebook]
> > <https://eu-api.mimecast.com/s/click/uQWkx1ojkUjr1VxEtkyiByq
> 4BnZ6vuvpsN8NdJZql_pF01rrX3lVN0Lkn17pYkcdegwl-9AVMG4H83XPkwR
> fs3HtvVBjQaZEcDg2mFzDF1aqY9nE2tOgEoMHpuK779bJDGst5dfpouURnyY
> 09us_UyyDKJwJwUfRSnFZ-AqLkMPUn1LoVm9oGenYNwGtEKHmFPRdp8WsooS3k5xOk52Z7A>
> > [image: Blog]
> > <https://eu-api.mimecast.com/s/click/1K7xTdhoqgjnB3PEFCIbOb8
> le33alv6yAdkn1w_geParmiDkrKTJpFb8SM6re-1Kg41NMmHOQefcj9nAX3y
> 56QZbY2H7yXqlixsehHybau3duZRb40foIi8j_9kd2WIhd6BMUlxMqXFSsol
> n_Legi_UcwnCCCu4aMN9dqpnXkmOgTTuZkNYniodU7KrpZB-fUWsThNMfSE_TxYg9ZhC-Fg>
> > [image: TwitterGlobal]
> > <https://eu-api.mimecast.com/s/click/kZ10BfBHOLnnDW9JqwKMJfy
> _6t9o-KCV44vs6UlXxz9W_NOKmTQZflJz-Bl6GV_6kPqHLUsfI8o2hvRvYO8
> OZxumXDjn9Uw0gSidB_1ElORv0fhh3lCq7XfcyQqTcNrW2yGC7iwuZeeHKcc
> BlHDNPX1aVmHHswrILqAqhBBiKR-DFj1YiPWhevZc21ryfaiRUWsThNMfSE_TxYg9ZhC-Fg>
> >
> >
> > [image: ESRA]
> > <https://eu-api.mimecast.com/s/click/XujAZpejvFW2OIhYbUKIG2Z
> rtA5qIMPIpMTeMN0NQraXvQeN9RALNGa0aMd0fP6_BOl80yHWDMTxIYtR1U8
> XArwbkTeK6xzoDkgbEf3Jv7IImmDW79LHBgwfMuc1NE9BQYGLsysA_qxqzLl
> mgHh0s0QhvGUnBXihs0pinvg0j4DRwXgM5E6l6Vq773KgYZFRdlRIP-qxKhZi_ID3Wx60Ow>
> >
> >
> > *Disclaimer*
> > The information contained in this communication from *
> > fbuetow@mimecast.com <fb...@mimecast.com> * sent at 2017-06-30
> 02:45:29
> > is confidential and may be legally privileged. It is intended solely for
> > use by * java-user@lucene.apache.org <ja...@lucene.apache.org> * and
> > others authorized to receive it. If you are not *
> > java-user@lucene.apache.org <ja...@lucene.apache.org> * you are
> > hereby notified that any disclosure, copying, distribution or taking
> action
> > in reliance of the contents of this information is strictly prohibited
> and
> > may be unlawful.
> >
> > This email message has been scanned for viruses by Mimecast. Mimecast
> > delivers a complete managed email solution from a single web based
> > platform. For more information please visit http://www.mimecast.com
> >
> >
> >
> >
>

Re: Maintaining sorting order (stored fields vs DocValue fields) while upgrading Lucene version

Posted by Erick Erickson <er...@gmail.com>.
1>  Is it correct that stored fields can only be sorted on if they become a
DocValue field in 5.x

no. Indexed-only fields can still be used to sort. DocValues are just more
efficient at load time and don't consume as much of the Java heap.
Essentially this latter can be thought of as moving the "uninverted"
structure from heap to MMap space.

That said, I can't think of any _good_ reason to continue to sort on
indexed="true" docValues="false" fields. Use DocValues.

2> When "updating" stored fields to DocValue fields , is it required to
update all documents in the index at the same time?

Yes. I'm assuming here you're talking about changing the schema definition
to include docValues="true". In general I advocate re-indexing everything
when upgrading major versions. Technically if you want to some
"interesting" things with low-level Lucene you can upgrade your index, Uwe
Schindler outlined the process. I copied what he said but don't understand
it ;).

I've seen some situations where people will define a _new_ field with both,
gradually re-index and when all the docs have been updated switch to using
the new field. That assumes that it's just impossible to reindex all at
once.

The question I have to ask... Why upgrade just to 5x? Solr is releasing 7.0
very shortly. I can't think of a really good reason not to jump to 6x
unless you have heavy customizations and the like. Even in that case you'll
have to upgrade eventually. And if you wind up re-indexing everything
anyway, it seems like stopping at 5x is unnecessary.

Best,
Erick

On Thu, Jun 29, 2017 at 6:45 PM, Florian Buetow <fb...@mimecast.com>
wrote:

>
> Hi,
>
>
>
> I am in the process of updating a large index from Lucene 4.x to 5.x and
> have two questions related to the sorting order.
>
>
>
> 1. Is it correct that stored fields can only be sorted on if they become a
> DocValue field in 5.x?
>
> 2. When "updating" stored fields to DocValue fields , is it required to
> update all documents in the index at the same time?
>
>
>
> Thank you in advance for your help.
>
>
>
> Best regards
>
> Florian
>
>
>
>
> Florian Buetow m: +44 7702 557267 <+44%207702%20557267> www.mimecast.com
> Software Engineer p: +44 207 847 8700 <+44%2020%207847%208700> Address
> click here <http://www.mimecast.com/About-us/Contact-us/>
> ------------------------------
> [image: Mimecast Logo]
> <https://eu-api.mimecast.com/s/click/XujAZpejvFW2OIhYbUKIGzroV6Ul00G1pndONKfdiASkL7P_JTj_EbOwSR6KJeM3Kvz0IZRCB8acaJBqWJOx38gmrExje8x_ZkiWP_1hffShQenbwEWz_1oZ1cbKQQG4IfVy_GaWWH_nasTa-CxfcIhZmNdIYJbBmmJS3QzSJiixOWl8enXqQrGcgifXyDPE2X25_Gibsklnspkf31Weag>
>
> [image: Linked In]
> <https://eu-api.mimecast.com/s/click/F2A44qlyvx7D1oreXULOBfHyqFe-ucyZnbwU4nyMvdvUEGcUvIxVnjwbq5maMNXUvt3rIuwP0RRogPF5-DaKXXVPCRBYg4JXq_Wd9owjxjdIhbzjJFyQw0PStTFX85RQ-1-DXs8HNoBxB7OUVIfjBbm80zerQX9iyu2hUqSsBeorOQA5m0DSs02m-WfDE0D8Fk5QhYYVuNml1jnwK04O1A>
> [image: You Tube]
> <https://eu-api.mimecast.com/s/click/XujAZpejvFW2OIhYbUKIG2nAuYW6P9Ht8pYvS25cjsRqiqDLULnAw7_zVKh7qu0Cj5DlaDrIyaXNgxRDQ2891XVPCRBYg4JXq_Wd9owjxjdIhbzjJFyQw0PStTFX85RQ-1-DXs8HNoBxB7OUVIfjBbm80zerQX9iyu2hUqSsBeorOQA5m0DSs02m-WfDE0D8Z2TW5muLnp5v1RvqNV85XA>
> [image: Facebook]
> <https://eu-api.mimecast.com/s/click/uQWkx1ojkUjr1VxEtkyiByq4BnZ6vuvpsN8NdJZql_pF01rrX3lVN0Lkn17pYkcdegwl-9AVMG4H83XPkwRfs3HtvVBjQaZEcDg2mFzDF1aqY9nE2tOgEoMHpuK779bJDGst5dfpouURnyY09us_UyyDKJwJwUfRSnFZ-AqLkMPUn1LoVm9oGenYNwGtEKHmFPRdp8WsooS3k5xOk52Z7A>
> [image: Blog]
> <https://eu-api.mimecast.com/s/click/1K7xTdhoqgjnB3PEFCIbOb8le33alv6yAdkn1w_geParmiDkrKTJpFb8SM6re-1Kg41NMmHOQefcj9nAX3y56QZbY2H7yXqlixsehHybau3duZRb40foIi8j_9kd2WIhd6BMUlxMqXFSsoln_Legi_UcwnCCCu4aMN9dqpnXkmOgTTuZkNYniodU7KrpZB-fUWsThNMfSE_TxYg9ZhC-Fg>
> [image: TwitterGlobal]
> <https://eu-api.mimecast.com/s/click/kZ10BfBHOLnnDW9JqwKMJfy_6t9o-KCV44vs6UlXxz9W_NOKmTQZflJz-Bl6GV_6kPqHLUsfI8o2hvRvYO8OZxumXDjn9Uw0gSidB_1ElORv0fhh3lCq7XfcyQqTcNrW2yGC7iwuZeeHKccBlHDNPX1aVmHHswrILqAqhBBiKR-DFj1YiPWhevZc21ryfaiRUWsThNMfSE_TxYg9ZhC-Fg>
>
>
> [image: ESRA]
> <https://eu-api.mimecast.com/s/click/XujAZpejvFW2OIhYbUKIG2ZrtA5qIMPIpMTeMN0NQraXvQeN9RALNGa0aMd0fP6_BOl80yHWDMTxIYtR1U8XArwbkTeK6xzoDkgbEf3Jv7IImmDW79LHBgwfMuc1NE9BQYGLsysA_qxqzLlmgHh0s0QhvGUnBXihs0pinvg0j4DRwXgM5E6l6Vq773KgYZFRdlRIP-qxKhZi_ID3Wx60Ow>
>
>
> *Disclaimer*
> The information contained in this communication from *
> fbuetow@mimecast.com <fb...@mimecast.com> * sent at 2017-06-30 02:45:29
> is confidential and may be legally privileged. It is intended solely for
> use by * java-user@lucene.apache.org <ja...@lucene.apache.org> * and
> others authorized to receive it. If you are not *
> java-user@lucene.apache.org <ja...@lucene.apache.org> * you are
> hereby notified that any disclosure, copying, distribution or taking action
> in reliance of the contents of this information is strictly prohibited and
> may be unlawful.
>
> This email message has been scanned for viruses by Mimecast. Mimecast
> delivers a complete managed email solution from a single web based
> platform. For more information please visit http://www.mimecast.com
>
>
>
>