You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Julian Atkinson (JIRA)" <ji...@apache.org> on 2010/05/21 12:49:17 UTC

[jira] Created: (LUCENE-2475) Incorrect Bounding Box calculation results in the exclusion of valid data locations

Incorrect Bounding Box calculation results in the exclusion of valid data locations
-----------------------------------------------------------------------------------

                 Key: LUCENE-2475
                 URL: https://issues.apache.org/jira/browse/LUCENE-2475
             Project: Lucene - Java
          Issue Type: Bug
          Components: contrib/spatial
    Affects Versions: 3.0, 2.9.1
            Reporter: Julian Atkinson


I have found a scenario where some of my location data is not being returned.  The calculated distance between my search origin and the data is well within my search radius but the data is not being returned. 

I have traced this down to what I think is an error when calculating the boundary box which is used to determine the Shape for the CartesianShapeFilter in  CartesianPolyFilterBuilder.getBoxShape()

The boundary box calculated by LLRect.createBox() is incorrect.  The box returned is a box that fits WITHIN the search circle, where the four corners of the box intersect the circle line. This creates 4 regions where data points are not included - these are regions that are in the circle but outside the box.

What I is required is a boundary box that fully CONTAINS the search circle.  As a side effect you would end up with 4 regions outside of the circle but inside the box.  This would potentially return data that are not real hits but these can be filtered out by a more precise distance comparison.

I will attach a test class that covers the issue with more details and a proposed fix - a one liner in LLRect.java

I would appreciate if someone could verify my findings.  All my data tests pass with this fix but there is one test case in Lucene 3.0.0 that fails and I can't figure out why.  TestCartesian.testAntiM().




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2475) Incorrect Bounding Box calculation results in the exclusion of valid data locations

Posted by "Julian Atkinson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870176#action_12870176 ] 

Julian Atkinson commented on LUCENE-2475:
-----------------------------------------

OK, so ignoring the whole Boundary Box story and looking deeper into the code with my test case I noticed that a different bestFit value was being determined by the CartesianTierPlotter. 

I get a value of 13 for the test that passes (radius 32miles) and 14 when the test fails with a search radius of 31.  This means to me that we end up searching in the wrong tier. 

Looking at CartesianTierPlotter.bestFit() I see on the  line below the passed in value of miles is divided by 2.

>>> double r = miles / 2.0; 

I'm guessing r is meant to be a radius - but the miles parameter is already a radius  - of my search circle.

This has an effect on the calculation of the best fix box width - aka corner in the code - and the resulting bestFit or tierId.

If I change this to not divide by 2 - my issue test case passes - as do all my other tests.

Again I'd appreciate if someone who knows the code could comment and confirm my finding or tell my I'm crazy!

Thx


> Incorrect Bounding Box calculation results in the exclusion of valid data locations
> -----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2475
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2475
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/spatial
>    Affects Versions: 2.9.1, 3.0
>            Reporter: Julian Atkinson
>         Attachments: BoundingBoxCalucationIssueTest.java, test.html
>
>
> I have found a scenario where some of my location data is not being returned.  The calculated distance between my search origin and the data is well within my search radius but the data is not being returned. 
> I have traced this down to what I think is an error when calculating the boundary box which is used to determine the Shape for the CartesianShapeFilter in  CartesianPolyFilterBuilder.getBoxShape()
> The boundary box calculated by LLRect.createBox() is incorrect.  The box returned is a box that fits WITHIN the search circle, where the four corners of the box intersect the circle line. This creates 4 regions where data points are not included - these are regions that are in the circle but outside the box.
> What I is required is a boundary box that fully CONTAINS the search circle.  As a side effect you would end up with 4 regions outside of the circle but inside the box.  This would potentially return data that are not real hits but these can be filtered out by a more precise distance comparison.
> I will attach a test class that covers the issue with more details and a proposed fix - a one liner in LLRect.java
> I would appreciate if someone could verify my findings.  All my data tests pass with this fix but there is one test case in Lucene 3.0.0 that fails and I can't figure out why.  TestCartesian.testAntiM().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2475) Incorrect Bounding Box calculation results in the exclusion of valid data locations

Posted by "Julian Atkinson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869969#action_12869969 ] 

Julian Atkinson commented on LUCENE-2475:
-----------------------------------------

I've done some more investigation and I think the dimensions of the Bounding Box is not the issue.  If I add a data point outside the box but within the circle it is returned as a hit.

//add the following to my test case data set and this is returned - see attachment
addPoint(writer,"outside box in circle",52.6695404,	4.8471904);

This must be because the original shape is extended to include neighboring boxes in CartesianPolyFilterBuilder.getShapeLoop() ?

So although it makes sense logically it is irrelevant to my actual problem.

I also noticed the hit that I am missing is just WITHIN the dimensions of the boundary box shape - so now I really don't understand why it is not being matched.

I'll continue to look into this but any help from someone more familiar with the code would be appreciated.  


> Incorrect Bounding Box calculation results in the exclusion of valid data locations
> -----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2475
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2475
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/spatial
>    Affects Versions: 2.9.1, 3.0
>            Reporter: Julian Atkinson
>         Attachments: BoundingBoxCalucationIssueTest.java
>
>
> I have found a scenario where some of my location data is not being returned.  The calculated distance between my search origin and the data is well within my search radius but the data is not being returned. 
> I have traced this down to what I think is an error when calculating the boundary box which is used to determine the Shape for the CartesianShapeFilter in  CartesianPolyFilterBuilder.getBoxShape()
> The boundary box calculated by LLRect.createBox() is incorrect.  The box returned is a box that fits WITHIN the search circle, where the four corners of the box intersect the circle line. This creates 4 regions where data points are not included - these are regions that are in the circle but outside the box.
> What I is required is a boundary box that fully CONTAINS the search circle.  As a side effect you would end up with 4 regions outside of the circle but inside the box.  This would potentially return data that are not real hits but these can be filtered out by a more precise distance comparison.
> I will attach a test class that covers the issue with more details and a proposed fix - a one liner in LLRect.java
> I would appreciate if someone could verify my findings.  All my data tests pass with this fix but there is one test case in Lucene 3.0.0 that fails and I can't figure out why.  TestCartesian.testAntiM().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2475) Incorrect Bounding Box calculation results in the exclusion of valid data locations

Posted by "Nicolas Helleringer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907959#action_12907959 ] 

Nicolas Helleringer commented on LUCENE-2475:
---------------------------------------------

Hi Julian

Your problem should be solved by work discussed here https://issues.apache.org/jira/browse/LUCENE-2359

> Incorrect Bounding Box calculation results in the exclusion of valid data locations
> -----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2475
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2475
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/spatial
>    Affects Versions: 2.9.1, 3.0
>            Reporter: Julian Atkinson
>         Attachments: BoundingBoxCalucationIssueTest.java, test.html
>
>
> I have found a scenario where some of my location data is not being returned.  The calculated distance between my search origin and the data is well within my search radius but the data is not being returned. 
> I have traced this down to what I think is an error when calculating the boundary box which is used to determine the Shape for the CartesianShapeFilter in  CartesianPolyFilterBuilder.getBoxShape()
> The boundary box calculated by LLRect.createBox() is incorrect.  The box returned is a box that fits WITHIN the search circle, where the four corners of the box intersect the circle line. This creates 4 regions where data points are not included - these are regions that are in the circle but outside the box.
> What I is required is a boundary box that fully CONTAINS the search circle.  As a side effect you would end up with 4 regions outside of the circle but inside the box.  This would potentially return data that are not real hits but these can be filtered out by a more precise distance comparison.
> I will attach a test class that covers the issue with more details and a proposed fix - a one liner in LLRect.java
> I would appreciate if someone could verify my findings.  All my data tests pass with this fix but there is one test case in Lucene 3.0.0 that fails and I can't figure out why.  TestCartesian.testAntiM().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2475) Incorrect Bounding Box calculation results in the exclusion of valid data locations

Posted by "Julian Atkinson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Julian Atkinson updated LUCENE-2475:
------------------------------------

    Attachment: BoundingBoxCalucationIssueTest.java

Attachment with a test case and proposed fix

> Incorrect Bounding Box calculation results in the exclusion of valid data locations
> -----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2475
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2475
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/spatial
>    Affects Versions: 2.9.1, 3.0
>            Reporter: Julian Atkinson
>         Attachments: BoundingBoxCalucationIssueTest.java
>
>
> I have found a scenario where some of my location data is not being returned.  The calculated distance between my search origin and the data is well within my search radius but the data is not being returned. 
> I have traced this down to what I think is an error when calculating the boundary box which is used to determine the Shape for the CartesianShapeFilter in  CartesianPolyFilterBuilder.getBoxShape()
> The boundary box calculated by LLRect.createBox() is incorrect.  The box returned is a box that fits WITHIN the search circle, where the four corners of the box intersect the circle line. This creates 4 regions where data points are not included - these are regions that are in the circle but outside the box.
> What I is required is a boundary box that fully CONTAINS the search circle.  As a side effect you would end up with 4 regions outside of the circle but inside the box.  This would potentially return data that are not real hits but these can be filtered out by a more precise distance comparison.
> I will attach a test class that covers the issue with more details and a proposed fix - a one liner in LLRect.java
> I would appreciate if someone could verify my findings.  All my data tests pass with this fix but there is one test case in Lucene 3.0.0 that fails and I can't figure out why.  TestCartesian.testAntiM().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2475) Incorrect Bounding Box calculation results in the exclusion of valid data locations

Posted by "Julian Atkinson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870277#action_12870277 ] 

Julian Atkinson commented on LUCENE-2475:
-----------------------------------------

I got pen and paper out and worked out  the calculation being done in  CartesianTierPlotter.bestFit().

>>>  double corner = r - Math.sqrt(Math.pow(r, 2) / 2.0d);

I ended up with the same formula and it is definitely expecting the radius of the search circle as param.

There is therefore no need to divide miles param by 2.

BTW the formula can be simplified to 

//corner is the width/height of the box that fits between the arc of the search circle 
//and a corner of the boundary box containing the search circle
double corner = r - r/Math.sqrt(2);



> Incorrect Bounding Box calculation results in the exclusion of valid data locations
> -----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2475
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2475
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/spatial
>    Affects Versions: 2.9.1, 3.0
>            Reporter: Julian Atkinson
>         Attachments: BoundingBoxCalucationIssueTest.java, test.html
>
>
> I have found a scenario where some of my location data is not being returned.  The calculated distance between my search origin and the data is well within my search radius but the data is not being returned. 
> I have traced this down to what I think is an error when calculating the boundary box which is used to determine the Shape for the CartesianShapeFilter in  CartesianPolyFilterBuilder.getBoxShape()
> The boundary box calculated by LLRect.createBox() is incorrect.  The box returned is a box that fits WITHIN the search circle, where the four corners of the box intersect the circle line. This creates 4 regions where data points are not included - these are regions that are in the circle but outside the box.
> What I is required is a boundary box that fully CONTAINS the search circle.  As a side effect you would end up with 4 regions outside of the circle but inside the box.  This would potentially return data that are not real hits but these can be filtered out by a more precise distance comparison.
> I will attach a test class that covers the issue with more details and a proposed fix - a one liner in LLRect.java
> I would appreciate if someone could verify my findings.  All my data tests pass with this fix but there is one test case in Lucene 3.0.0 that fails and I can't figure out why.  TestCartesian.testAntiM().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2475) Incorrect Bounding Box calculation results in the exclusion of valid data locations

Posted by "Julian Atkinson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Julian Atkinson updated LUCENE-2475:
------------------------------------

    Attachment: test.html

Adding a Google Map to help visualise the problem. 

The bounding box and my search point location (center) are shown as red dots
The blue dot is the location of the hit I am expecting to get but don't - In my real data there are many others around it.

The yellow dot is the location I added that is outside the box but inside the search circle. This lead me to conclude that the Bounding Box is not the issue.

> Incorrect Bounding Box calculation results in the exclusion of valid data locations
> -----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2475
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2475
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/spatial
>    Affects Versions: 2.9.1, 3.0
>            Reporter: Julian Atkinson
>         Attachments: BoundingBoxCalucationIssueTest.java, test.html
>
>
> I have found a scenario where some of my location data is not being returned.  The calculated distance between my search origin and the data is well within my search radius but the data is not being returned. 
> I have traced this down to what I think is an error when calculating the boundary box which is used to determine the Shape for the CartesianShapeFilter in  CartesianPolyFilterBuilder.getBoxShape()
> The boundary box calculated by LLRect.createBox() is incorrect.  The box returned is a box that fits WITHIN the search circle, where the four corners of the box intersect the circle line. This creates 4 regions where data points are not included - these are regions that are in the circle but outside the box.
> What I is required is a boundary box that fully CONTAINS the search circle.  As a side effect you would end up with 4 regions outside of the circle but inside the box.  This would potentially return data that are not real hits but these can be filtered out by a more precise distance comparison.
> I will attach a test class that covers the issue with more details and a proposed fix - a one liner in LLRect.java
> I would appreciate if someone could verify my findings.  All my data tests pass with this fix but there is one test case in Lucene 3.0.0 that fails and I can't figure out why.  TestCartesian.testAntiM().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org