You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Mark Nemeskey (Created) (JIRA)" <ji...@apache.org> on 2011/11/07 11:51:52 UTC

[jira] [Created] (LUCENE-3566) Parametrizing H1 and H2

Parametrizing H1 and H2
-----------------------

                 Key: LUCENE-3566
                 URL: https://issues.apache.org/jira/browse/LUCENE-3566
             Project: Lucene - Java
          Issue Type: Improvement
          Components: core/search
    Affects Versions: flexscoring branch
            Reporter: David Mark Nemeskey
            Assignee: David Mark Nemeskey
            Priority: Minor
             Fix For: flexscoring branch


The DFR normalizations {{H1}} and {{H2}} are parameter-free. This is in line with the [original article|http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.101.742], but not with the [thesis|http://theses.gla.ac.uk/1570/], where H2 accepts a {{c}} parameter, nor with [information-based models|http://dl.acm.org/citation.cfm?id=1835490], where H1 also accepts a {{c}} parameter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3566) Parametrizing H1 and H2

Posted by "David Mark Nemeskey (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Mark Nemeskey updated LUCENE-3566:
----------------------------------------

    Attachment: LUCENE-3566.patch

Patch re-based on trunk.
                
> Parametrizing H1 and H2
> -----------------------
>
>                 Key: LUCENE-3566
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3566
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 4.0
>            Reporter: David Mark Nemeskey
>            Assignee: David Mark Nemeskey
>            Priority: Minor
>              Labels: score
>             Fix For: 4.0
>
>         Attachments: LUCENE-3566.patch, LUCENE-3566.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The DFR normalizations {{H1}} and {{H2}} are parameter-free. This is in line with the [original article|http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.101.742], but not with the [thesis|http://theses.gla.ac.uk/1570/], where H2 accepts a {{c}} parameter, nor with [information-based models|http://dl.acm.org/citation.cfm?id=1835490], where H1 also accepts a {{c}} parameter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3566) Parametrizing H1 and H2

Posted by "Robert Muir (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-3566:
--------------------------------

    Attachment: LUCENE-3566.patch

I thought we had done this already: but realized I forgot about it!

I added the solr factory/parsing stuff to the patch. Will commit shortly.

                
> Parametrizing H1 and H2
> -----------------------
>
>                 Key: LUCENE-3566
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3566
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 4.0
>            Reporter: David Mark Nemeskey
>            Assignee: David Mark Nemeskey
>            Priority: Minor
>              Labels: score
>             Fix For: 4.0
>
>         Attachments: LUCENE-3566.patch, LUCENE-3566.patch, LUCENE-3566.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The DFR normalizations {{H1}} and {{H2}} are parameter-free. This is in line with the [original article|http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.101.742], but not with the [thesis|http://theses.gla.ac.uk/1570/], where H2 accepts a {{c}} parameter, nor with [information-based models|http://dl.acm.org/citation.cfm?id=1835490], where H1 also accepts a {{c}} parameter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3566) Parametrizing H1 and H2

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145360#comment-13145360 ] 

Robert Muir commented on LUCENE-3566:
-------------------------------------

+1, lets add these.

i didnt think H1 took params (the thesis says 'Therefore, the constant of C is 1 assuming H1', then defines it without C). did the IB paper make a mistake? 

either way, it wont hurt anything to add the parameter, just confusing :)
                
> Parametrizing H1 and H2
> -----------------------
>
>                 Key: LUCENE-3566
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3566
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: flexscoring branch
>            Reporter: David Mark Nemeskey
>            Assignee: David Mark Nemeskey
>            Priority: Minor
>              Labels: score
>             Fix For: flexscoring branch
>
>         Attachments: LUCENE-3566.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The DFR normalizations {{H1}} and {{H2}} are parameter-free. This is in line with the [original article|http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.101.742], but not with the [thesis|http://theses.gla.ac.uk/1570/], where H2 accepts a {{c}} parameter, nor with [information-based models|http://dl.acm.org/citation.cfm?id=1835490], where H1 also accepts a {{c}} parameter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3566) Parametrizing H1 and H2

Posted by "Robert Muir (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-3566:
--------------------------------

    Affects Version/s:     (was: flexscoring branch)
                       4.0
        Fix Version/s:     (was: flexscoring branch)
                       4.0

editing fix version to 4.0, since flexscoring branch was merged, i think we can safely do any scoring improvements in mainline trunk

                
> Parametrizing H1 and H2
> -----------------------
>
>                 Key: LUCENE-3566
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3566
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 4.0
>            Reporter: David Mark Nemeskey
>            Assignee: David Mark Nemeskey
>            Priority: Minor
>              Labels: score
>             Fix For: 4.0
>
>         Attachments: LUCENE-3566.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The DFR normalizations {{H1}} and {{H2}} are parameter-free. This is in line with the [original article|http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.101.742], but not with the [thesis|http://theses.gla.ac.uk/1570/], where H2 accepts a {{c}} parameter, nor with [information-based models|http://dl.acm.org/citation.cfm?id=1835490], where H1 also accepts a {{c}} parameter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3566) Parametrizing H1 and H2

Posted by "David Mark Nemeskey (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145502#comment-13145502 ] 

David Mark Nemeskey commented on LUCENE-3566:
---------------------------------------------

bq. i didnt think H1 took params (the thesis says 'Therefore, the constant of C is 1 assuming H1', then defines it without C). did the IB paper make a mistake?

Good question. Perhaps it was a mistake; however, according to my colleague, who had experimented with the IB method in our own engine and proposed to add the parameter to Lucene, a well chosen {{c}} can improve the results. Well, duh really; nevertheless, as long as we have defaults, shouldn't be a problem. :)
                
> Parametrizing H1 and H2
> -----------------------
>
>                 Key: LUCENE-3566
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3566
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 4.0
>            Reporter: David Mark Nemeskey
>            Assignee: David Mark Nemeskey
>            Priority: Minor
>              Labels: score
>             Fix For: 4.0
>
>         Attachments: LUCENE-3566.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The DFR normalizations {{H1}} and {{H2}} are parameter-free. This is in line with the [original article|http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.101.742], but not with the [thesis|http://theses.gla.ac.uk/1570/], where H2 accepts a {{c}} parameter, nor with [information-based models|http://dl.acm.org/citation.cfm?id=1835490], where H1 also accepts a {{c}} parameter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3566) Parametrizing H1 and H2

Posted by "Robert Muir (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir resolved LUCENE-3566.
---------------------------------

    Resolution: Fixed

Thanks David!
                
> Parametrizing H1 and H2
> -----------------------
>
>                 Key: LUCENE-3566
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3566
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 4.0
>            Reporter: David Mark Nemeskey
>            Assignee: David Mark Nemeskey
>            Priority: Minor
>              Labels: score
>             Fix For: 4.0
>
>         Attachments: LUCENE-3566.patch, LUCENE-3566.patch, LUCENE-3566.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The DFR normalizations {{H1}} and {{H2}} are parameter-free. This is in line with the [original article|http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.101.742], but not with the [thesis|http://theses.gla.ac.uk/1570/], where H2 accepts a {{c}} parameter, nor with [information-based models|http://dl.acm.org/citation.cfm?id=1835490], where H1 also accepts a {{c}} parameter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3566) Parametrizing H1 and H2

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145509#comment-13145509 ] 

Robert Muir commented on LUCENE-3566:
-------------------------------------

Yeah I agree... maybe in the patch we can expose the parameter to the factory in solr (DFRSimilarityFactory has a param-parsing method for Normalization reused by IB, too) ?
                
> Parametrizing H1 and H2
> -----------------------
>
>                 Key: LUCENE-3566
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3566
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 4.0
>            Reporter: David Mark Nemeskey
>            Assignee: David Mark Nemeskey
>            Priority: Minor
>              Labels: score
>             Fix For: 4.0
>
>         Attachments: LUCENE-3566.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The DFR normalizations {{H1}} and {{H2}} are parameter-free. This is in line with the [original article|http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.101.742], but not with the [thesis|http://theses.gla.ac.uk/1570/], where H2 accepts a {{c}} parameter, nor with [information-based models|http://dl.acm.org/citation.cfm?id=1835490], where H1 also accepts a {{c}} parameter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3566) Parametrizing H1 and H2

Posted by "David Mark Nemeskey (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Mark Nemeskey updated LUCENE-3566:
----------------------------------------

    Lucene Fields: New,Patch Available  (was: New)
    
> Parametrizing H1 and H2
> -----------------------
>
>                 Key: LUCENE-3566
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3566
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: flexscoring branch
>            Reporter: David Mark Nemeskey
>            Assignee: David Mark Nemeskey
>            Priority: Minor
>              Labels: score
>             Fix For: flexscoring branch
>
>         Attachments: LUCENE-3566.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The DFR normalizations {{H1}} and {{H2}} are parameter-free. This is in line with the [original article|http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.101.742], but not with the [thesis|http://theses.gla.ac.uk/1570/], where H2 accepts a {{c}} parameter, nor with [information-based models|http://dl.acm.org/citation.cfm?id=1835490], where H1 also accepts a {{c}} parameter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3566) Parametrizing H1 and H2

Posted by "David Mark Nemeskey (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Mark Nemeskey updated LUCENE-3566:
----------------------------------------

    Attachment: LUCENE-3566.patch

Patch.
                
> Parametrizing H1 and H2
> -----------------------
>
>                 Key: LUCENE-3566
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3566
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: flexscoring branch
>            Reporter: David Mark Nemeskey
>            Assignee: David Mark Nemeskey
>            Priority: Minor
>              Labels: score
>             Fix For: flexscoring branch
>
>         Attachments: LUCENE-3566.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The DFR normalizations {{H1}} and {{H2}} are parameter-free. This is in line with the [original article|http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.101.742], but not with the [thesis|http://theses.gla.ac.uk/1570/], where H2 accepts a {{c}} parameter, nor with [information-based models|http://dl.acm.org/citation.cfm?id=1835490], where H1 also accepts a {{c}} parameter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org