You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Simon Willnauer (JIRA)" <ji...@apache.org> on 2011/06/09 11:42:58 UTC

[jira] [Created] (LUCENE-3186) DocValues type should be recored in FNX file to early fail if user specifies incompatible type

DocValues type should be recored in FNX file to early fail if user specifies incompatible type
----------------------------------------------------------------------------------------------

                 Key: LUCENE-3186
                 URL: https://issues.apache.org/jira/browse/LUCENE-3186
             Project: Lucene - Java
          Issue Type: Improvement
          Components: core/index
    Affects Versions: 4.0
            Reporter: Simon Willnauer
            Assignee: Simon Willnauer
             Fix For: 4.0


Currently segment merger fails if the docvalues type is not compatible across segments. We already catch this problem if somebody changes the values type for a field within one segment but not across segments. in order to do that we should record the type in the fnx fiel alone with the field numbers.

I marked this 4.0 since it should not block the landing on trunk

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3186) DocValues type should be recored in FNX file to early fail if user specifies incompatible type

Posted by "Simon Willnauer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118497#comment-13118497 ] 

Simon Willnauer commented on LUCENE-3186:
-----------------------------------------

any comments on this?
                
> DocValues type should be recored in FNX file to early fail if user specifies incompatible type
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3186
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3186
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3186.patch
>
>
> Currently segment merger fails if the docvalues type is not compatible across segments. We already catch this problem if somebody changes the values type for a field within one segment but not across segments. in order to do that we should record the type in the fnx fiel alone with the field numbers.
> I marked this 4.0 since it should not block the landing on trunk

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3186) DocValues type should be recored in FNX file to early fail if user specifies incompatible type

Posted by "Simon Willnauer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3186:
------------------------------------

    Attachment: LUCENE-3186.patch

here is an initial patch that promotes the following types:
 * all permutations of int8, int16, int32, int64 and packed ints
 * all permutations of float32 and float 64
 * all permutations of var_deref_bytes, var_straight_bytes, fixed_deref_bytes, fixed_straight_bytes
 * all permuations of sorted_fixed_bytes and sorted_var_bytes

general rule here is that var wins over fixed and straight wins over deref. For int types variable ints wins over all other fixed types (according to var wins over fixed). 

if those types are mixed up like float32 and int32 this patch drops the docvalues field. 

this patch is still very rough and has one rather critical nocommit in SegmentMerger since I am changeing the FieldInfo for a field if I promote the type or drop the dv field which essentially means that I need to write the FI after this has been done.
                
> DocValues type should be recored in FNX file to early fail if user specifies incompatible type
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3186
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3186
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3186.patch
>
>
> Currently segment merger fails if the docvalues type is not compatible across segments. We already catch this problem if somebody changes the values type for a field within one segment but not across segments. in order to do that we should record the type in the fnx fiel alone with the field numbers.
> I marked this 4.0 since it should not block the landing on trunk

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3186) DocValues type should be recored in FNX file to early fail if user specifies incompatible type

Posted by "Simon Willnauer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116330#comment-13116330 ] 

Simon Willnauer commented on LUCENE-3186:
-----------------------------------------

This issue still remains and we need to solve this in order to move forward towards 4.0. Yet, I am still banging my head against the wall about how to really fix this.
Promoting up values that are compatible should be doable like Int_32 and Int_8 should be promoted to Int_32 likewise for float & double. Regarding byte values this is kind of tricky. Var length vs Fixed length is still doable but what if something is sorted. I don't think we should promote something to sorted in any case. The problem here is not just merging but also reading, I can not pull a sorted source from a non-sorted DV field which would cause confusion on the user level.

I think we should compromise here and try to promote what we can like Ints, Floats and unsorted bytes (incompatible can go to variable length simply) and for Sorted if something is not sorted we should just drop the entire field. SortedVar & SortedFixed can be promoted to SortedVar and everything should be fine IMO. 

One other idea would be removing the flexibility to set the actual type on the user level and decide at indexing time to use var vs. fixed and straight vs. deref and move the sorting decision to the codec ie. if you want to sort then your codec needs to provide sorting otherwise its not sorted. We could then treat everything as bytes and convert at load time. Like you say getSource(Int | Float | Bytes) and the codec decides how to convert each value to a float / int / byte and fails if not possible?
                
> DocValues type should be recored in FNX file to early fail if user specifies incompatible type
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3186
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3186
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>
> Currently segment merger fails if the docvalues type is not compatible across segments. We already catch this problem if somebody changes the values type for a field within one segment but not across segments. in order to do that we should record the type in the fnx fiel alone with the field numbers.
> I marked this 4.0 since it should not block the landing on trunk

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3186) DocValues type should be recored in FNX file to early fail if user specifies incompatible type

Posted by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124061#comment-13124061 ] 

Michael McCandless commented on LUCENE-3186:
--------------------------------------------

Patch looks good Simon!  Thanks.
                
> DocValues type should be recored in FNX file to early fail if user specifies incompatible type
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3186
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3186
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3186.patch, LUCENE-3186.patch, LUCENE-3186.patch
>
>
> Currently segment merger fails if the docvalues type is not compatible across segments. We already catch this problem if somebody changes the values type for a field within one segment but not across segments. in order to do that we should record the type in the fnx fiel alone with the field numbers.
> I marked this 4.0 since it should not block the landing on trunk

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3186) DocValues type should be recored in FNX file to early fail if user specifies incompatible type

Posted by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116334#comment-13116334 ] 

Michael McCandless commented on LUCENE-3186:
--------------------------------------------

Remember we are talking about the exception case here (app mis-uses
the API by mixing up types).

So I think a best-effort approach is fine: we "cast up" when we can,
but when we cannot we should silently drop the inconsistent values
(better, I think, than getting an unrecoverable exception on merge).

Having indexer/searcher decide what's best is an interesting
option... though, I think sorted or not should be specified in the
Field and not codec?  Ie, I think that's higher up than the codec (all
codecs should be able to sort, or not, my byte[] DV field).

                
> DocValues type should be recored in FNX file to early fail if user specifies incompatible type
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3186
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3186
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>
> Currently segment merger fails if the docvalues type is not compatible across segments. We already catch this problem if somebody changes the values type for a field within one segment but not across segments. in order to do that we should record the type in the fnx fiel alone with the field numbers.
> I marked this 4.0 since it should not block the landing on trunk

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3186) DocValues type should be recored in FNX file to early fail if user specifies incompatible type

Posted by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120782#comment-13120782 ] 

Michael McCandless commented on LUCENE-3186:
--------------------------------------------

I hit several test failures w/ the patch, all from same exc, eg:

{noformat}
  ant test-core -Dtestcase=TestIndexWriterDelete -Dtestmethod=testDeleteAllSlowly -Dtests.seed=-8b03171493c8e71:-20c60f02979a36ec:-485ea5dac6836e33
{noformat}

I didn't dig...

Net/net this approach looks good!  Though I have to say it's hard to
follow all the numerous tied-together classes we now have for
DocValues... I find myself getting "lost" but I don't know how to
simplify it.

Random feedback:

  * About the nocommit in SegmentMerger: I think writing FIS after
    merge is OK?  In theory nothing should care?  Anything that needs
    FIS during merging should receive the object, not load it from
    the directory...

  * You can remove the TODO in DocValuesConsumer.merge :)

  * Make sure we mark TypePromoter as @lucene.internal.

  * If we hit exc inside FixedStraightBytesImpl.merge, we are still
    setting merge=true; is finish then called, up above?  (At which
    point we are going to try to write to the closed file).

  * Maybe rename that bool to "merged" or "didMerge" or something?
    "merge" sounds like it's an imperative command.

  * The jdocs for DocValuesConsumer.merge say "Merging segments out of
    order is not supported", but you just mean segments must arrive in
    sorted order right?  (Ie, TieredMP merges non-sequential segments,
    which we have been calling out-of-order merge).

  * Since DocValuesConsumer.merge ignores the incoming IndexDocValues
    (a MultiIndexDocValues), do we even need to pass that in...?

  * It's sort of confusing how we have a DocValuesConsumer.MergeState
    (holds one segment's reader) and the oal.index.codecs.MergeState
    (references all segments); maybe rename the former to
    SingleReaderMergeState?  SubMergeState?  (Something to indicate it
    just covers one sub reader at a time).

  * It looks like we don't handle the case of merging segs when a
    given field is always FixedStraightBytes but the size had changed
    from one seg to another?  (We throw IAE in
    FixedStraightBytesImpl.merge).  Are there other cases that will
    lead to "late binding" exc during merge?

  * In TypePromoter, instead of "promoted.flags & ~(1 << 3)" can you
    name that constant bit mask?  EG IS_BYTES or NOT_NUMERIC
    or something? 

  * With this patch, does this mean we can type-promote on the fly?
    Ie, if I make a SlowMRWrapper, and pull its perDocValues, we will
    present the promoted type (across all segments) correctly?  And
    when the user looks up the value for a certain docID, we promote
    it as needed?

  * Typo in comment in Floats.java: "only bulk merge is value type is
    the same otherwise size differs": change "is" to "if"

  * Can we rename Writer.setNextEnum -> Writer.setNextMergeEnum?

                
> DocValues type should be recored in FNX file to early fail if user specifies incompatible type
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3186
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3186
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3186.patch
>
>
> Currently segment merger fails if the docvalues type is not compatible across segments. We already catch this problem if somebody changes the values type for a field within one segment but not across segments. in order to do that we should record the type in the fnx fiel alone with the field numbers.
> I marked this 4.0 since it should not block the landing on trunk

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3186) DocValues type should be recored in FNX file to early fail if user specifies incompatible type

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046467#comment-13046467 ] 

Robert Muir commented on LUCENE-3186:
-------------------------------------

do we really need to do this? I guess also looking at LUCENE-3187, I think I'm against this trend.

Shall we put analyzer classnames in there too? If we are going to put docvalues type and precision step, well then i want the stopwords file in the fnx file too!

At some point, if a user is going to shoot themselves in the foot, we simply cannot stop them, and I don't think its our job to.


> DocValues type should be recored in FNX file to early fail if user specifies incompatible type
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3186
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3186
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>
> Currently segment merger fails if the docvalues type is not compatible across segments. We already catch this problem if somebody changes the values type for a field within one segment but not across segments. in order to do that we should record the type in the fnx fiel alone with the field numbers.
> I marked this 4.0 since it should not block the landing on trunk

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3186) DocValues type should be recored in FNX file to early fail if user specifies incompatible type

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046602#comment-13046602 ] 

Michael McCandless commented on LUCENE-3186:
--------------------------------------------

I think there are a few separate questions here...

Today, on doc values branch, if you mix up your doc values, ie a field
"foo" is at first indexed as a FLOAT_32 and then later you change your
mind and later docs are index field "foo" as BYTES_FIXED_STRAIGHT,
then this is bad news right now because everything will index fine,
you can close your IW, etc., but at some later time merges will hit
unrecoverable exceptions.  You'll have no choice but to fully rebuild
the index, which is rather awful.

However, this is true even for cases you would expect to work, eg say
"foo" was BYTES_FIXED_STRAIGHT but then later you decided you will
want to sort on this field and so you use BYTES_FIXED_SORTED.  (Simon:
this also results in exception I think...?).  Ideally we should do the
right thing here and "upgrade" the BYTES_FIXED_STRAIGHT to
BYTES_FIXED_SORTED (I think) -- Simon is there an issue open for this?

So, I think the first question here is: which cases should be merged
"properly" and which should be considered "an error"?  Probably we have
to work out the full matrix...

Then the second question is, for the "error" cases (if any!),
can/should we detect this up front, as you're indexing?

Then third question is, if we want to detect up front, do we do that
w/ fnx file or do we do that on init of IW (= no index change).


> DocValues type should be recored in FNX file to early fail if user specifies incompatible type
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3186
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3186
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>
> Currently segment merger fails if the docvalues type is not compatible across segments. We already catch this problem if somebody changes the values type for a field within one segment but not across segments. in order to do that we should record the type in the fnx fiel alone with the field numbers.
> I marked this 4.0 since it should not block the landing on trunk

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3186) DocValues type should be recored in FNX file to early fail if user specifies incompatible type

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046604#comment-13046604 ] 

Robert Muir commented on LUCENE-3186:
-------------------------------------

{quote}
So, I think the first question here is: which cases should be merged
"properly" and which should be considered "an error"? Probably we have
to work out the full matrix...
{quote}

this is all implementation details of docvalues, that it must deal with during merging.
I think it should work out the "LCD" and merge to that.

This is no different than if i have a field with all 8 character terms and then i add a 10-character term,
sure my impl/codec's encoding could internally rely upon the the fact all terms are 8 chars, but it must transparently change
its encoding to then support both 8 and 10 character terms and not throw an error.

If you mix up your doc values with ints and floats and bytes, isnt the least common denominator always bytes?
(just encode the int as 4 bytes or whatever).

So in other words, i think its up to docvalues to change its encoding to support the LCD, which might mean
downgrading ints to bytes or whatever, my only opinion is that it should never 'create' data (this was my issue with fake norms,
lets not do that).


> DocValues type should be recored in FNX file to early fail if user specifies incompatible type
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3186
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3186
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>
> Currently segment merger fails if the docvalues type is not compatible across segments. We already catch this problem if somebody changes the values type for a field within one segment but not across segments. in order to do that we should record the type in the fnx fiel alone with the field numbers.
> I marked this 4.0 since it should not block the landing on trunk

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3186) DocValues type should be recored in FNX file to early fail if user specifies incompatible type

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046481#comment-13046481 ] 

Simon Willnauer commented on LUCENE-3186:
-----------------------------------------

I think for this issue we can compute that info at IW open time. we can simply run through the FIs and prepopulate the info. I think this is better than redundantly store this info.

> DocValues type should be recored in FNX file to early fail if user specifies incompatible type
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3186
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3186
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>
> Currently segment merger fails if the docvalues type is not compatible across segments. We already catch this problem if somebody changes the values type for a field within one segment but not across segments. in order to do that we should record the type in the fnx fiel alone with the field numbers.
> I marked this 4.0 since it should not block the landing on trunk

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3186) DocValues type should be recored in FNX file to early fail if user specifies incompatible type

Posted by "Simon Willnauer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3186:
------------------------------------

    Attachment: LUCENE-3186.patch

attaching current state. I updated the latest patch to trunk and fixed several issues...


bq. You can remove the TODO in DocValuesConsumer.merge 

done

bq.Make sure we mark TypePromoter as @lucene.internal.

done

{quote}f we hit exc inside FixedStraightBytesImpl.merge, we are still
setting merge=true; is finish then called, up above? (At which
point we are going to try to write to the closed file).{quote}
no, this merge only is taken into account if the merge is successful. in the case of an exception we don't call finish. I renamed it to hasMerged.

{quote}The jdocs for DocValuesConsumer.merge say "Merging segments out of
order is not supported", but you just mean segments must arrive in
sorted order right? (Ie, TieredMP merges non-sequential segments,
which we have been calling out-of-order merge).{quote}

this is bogus (leftover from branch) I removed it

{quote}Since DocValuesConsumer.merge ignores the incoming IndexDocValues
(a MultiIndexDocValues), do we even need to pass that in...?{quote} 
I fixed this too. I don't create this MIDV anymore and use the readers directly which I need anyway for type promotion.

{quote}
It looks like we don't handle the case of merging segs when a
given field is always FixedStraightBytes but the size had changed
from one seg to another? (We throw IAE in
FixedStraightBytesImpl.merge). Are there other cases that will
lead to "late binding" exc during merge?{quote}

FixedStraightBytes with different sizes are promoted to VarStraightBytes so this is not an issue.

{quote}With this patch, does this mean we can type-promote on the fly?
Ie, if I make a SlowMRWrapper, and pull its perDocValues, we will
present the promoted type (across all segments) correctly? And
when the user looks up the value for a certain docID, we promote
it as needed?{quote}

actually yes! I implemented this in the patch but need to test it though.

the current patch has one problem, if you add three incompatible types A,B,C and Segs with A & B get merged (and dropped) but segs containing C are not in that merge those the type will be C eventually. but this seems ok to me though.

still some nocommits left but I think we are getting close 
                
> DocValues type should be recored in FNX file to early fail if user specifies incompatible type
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3186
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3186
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3186.patch, LUCENE-3186.patch
>
>
> Currently segment merger fails if the docvalues type is not compatible across segments. We already catch this problem if somebody changes the values type for a field within one segment but not across segments. in order to do that we should record the type in the fnx fiel alone with the field numbers.
> I marked this 4.0 since it should not block the landing on trunk

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3186) DocValues type should be recored in FNX file to early fail if user specifies incompatible type

Posted by "Simon Willnauer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-3186:
------------------------------------

    Attachment: LUCENE-3186.patch

next iteration. Added JavaDoc, removed all nocommits and fixed all tests.

This version of the patch promotes incompatible variants to BYTES_VAR_STRAIGHT instead of dropping the data entirely. this looses at least no data if somebody messes up their types. I think this is ready - if nobody objects I am going to commit this tomorrow...
                
> DocValues type should be recored in FNX file to early fail if user specifies incompatible type
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3186
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3186
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3186.patch, LUCENE-3186.patch, LUCENE-3186.patch
>
>
> Currently segment merger fails if the docvalues type is not compatible across segments. We already catch this problem if somebody changes the values type for a field within one segment but not across segments. in order to do that we should record the type in the fnx fiel alone with the field numbers.
> I marked this 4.0 since it should not block the landing on trunk

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3186) DocValues type should be recored in FNX file to early fail if user specifies incompatible type

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046479#comment-13046479 ] 

Uwe Schindler commented on LUCENE-3186:
---------------------------------------

Hi Robert,

I am also not really happy with this trend. I just opened LUCENE-3187 to start a discussion. In my opinion we should improve documentation instead.

> DocValues type should be recored in FNX file to early fail if user specifies incompatible type
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3186
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3186
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>
> Currently segment merger fails if the docvalues type is not compatible across segments. We already catch this problem if somebody changes the values type for a field within one segment but not across segments. in order to do that we should record the type in the fnx fiel alone with the field numbers.
> I marked this 4.0 since it should not block the landing on trunk

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3186) DocValues type should be recored in FNX file to early fail if user specifies incompatible type

Posted by "Simon Willnauer (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer resolved LUCENE-3186.
-------------------------------------

       Resolution: Fixed
    Lucene Fields: New,Patch Available  (was: New)

committed to trunk in rev. 1181020

thanks
                
> DocValues type should be recored in FNX file to early fail if user specifies incompatible type
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3186
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3186
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>
>         Attachments: LUCENE-3186.patch, LUCENE-3186.patch, LUCENE-3186.patch
>
>
> Currently segment merger fails if the docvalues type is not compatible across segments. We already catch this problem if somebody changes the values type for a field within one segment but not across segments. in order to do that we should record the type in the fnx fiel alone with the field numbers.
> I marked this 4.0 since it should not block the landing on trunk

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org