You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@asterixdb.apache.org by Riyafa Abdul Hameed <ri...@apache.org> on 2017/10/07 08:46:02 UTC

Result order changed in parallel and less parallel execution

Dear all,

I have this query in my spatial implementation[1]:

use GeoJSON;
SELECT VALUE {"Length": st_length(geo.myGeometry),
"Boundary":st_boundary(geo.myGeometry)} FROM Geometries geo WHERE
geometry_type(geo.myGeometry)="LineString" OR
geometry_type(geo.myGeometry)="MultiLineString";


When this query is run in the SqlppExecutionTest[2] class which says
"Runs the SQL++ runtime tests with the storage parallelism." the result
obtained is:

{ "Length": 0.004058119099397876, "Boundary":
{"type":"MultiPoint","coordinates":[[-69.1991349,-12.6006222],[-69.1975081,-12.6010968]],"crs":{"type":"name","properties":{"name":"EPSG:4326"}}}
}
{ "Length": 8.48528137423857, "Boundary":
{"type":"MultiPoint","coordinates":[[1,2],[7,8]],"crs":{"type":"name","properties":{"name":"EPSG:4326"}}}
}
{ "Length": 78.9292222699217, "Boundary":
{"type":"MultiPoint","coordinates":[[10,10],[10,40],[40,40],[30,10]],"crs":{"type":"name","properties":{"name":"EPSG:4326"}}}
}
{ "Length": 0.0031622776601655037, "Boundary":
{"type":"MultiPoint","coordinates":[[-113.98,39.198],[-113.981,39.195]],"crs":{"type":"name","properties":{"name":"EPSG:4326"}}}
}


When this query is run in the SqlppExecutionLessParallelismIT[3] which says
"Runs the SQL++ runtime tests with less parallelism on node controllers
than using all the cores." the result obtained is:

{ "Length": 8.48528137423857, "Boundary":
{"type":"MultiPoint","coordinates":[[1,2],[7,8]],"crs":{"type":"name","properties":{"name":"EPSG:4326"}}}
}
{ "Length": 0.004058119099397876, "Boundary":
{"type":"MultiPoint","coordinates":[[-69.1991349,-12.6006222],[-69.1975081,-12.6010968]],"crs":{"type":"name","properties":{"name":"EPSG:4326"}}}
}
{ "Length": 0.0031622776601655037, "Boundary":
{"type":"MultiPoint","coordinates":[[-113.98,39.198],[-113.981,39.195]],"crs":{"type":"name","properties":{"name":"EPSG:4326"}}}
}
{ "Length": 78.9292222699217, "Boundary":
{"type":"MultiPoint","coordinates":[[10,10],[10,40],[40,40],[30,10]],"crs":{"type":"name","properties":{"name":"EPSG:4326"}}}
}


The order of th results are different. Hence one of the tests fails. I
don't seem to understand the reason for the difference. Any help or
explanations would be highly appreciated.

[1]
https://github.com/riyafa/asterixdb/blob/geometry/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/geojson/datatype/primitive.9.query.sqlpp
[2]
https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-app/src/test/java/org/apache/asterix/test/runtime/SqlppExecutionTest.java
[3]
https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-app/src/test/java/org/apache/asterix/test/runtime/SqlppExecutionLessParallelismIT.java

Thank you,
Sincerely,
Riyafa

Re: Result order changed in parallel and less parallel execution

Posted by Riyafa Abdul Hameed <ri...@cse.mrt.ac.lk>.
Hi Steven,

Thank you for pointing this out to me.

Sincerely,
Riyafa

On 7 October 2017 at 19:39, Steven Jacobs <sj...@ucr.edu> wrote:

> Hi Riyafa,
> SQL ++ query results have no inherent ordering. In order to guarantee
> ordering you need to use an order clause in the query.
> Steven
>
> On Sat, Oct 7, 2017 at 1:46 AM Riyafa Abdul Hameed <ri...@apache.org>
> wrote:
>
> > Dear all,
> >
> > I have this query in my spatial implementation[1]:
> >
> > use GeoJSON;
> > SELECT VALUE {"Length": st_length(geo.myGeometry),
> > "Boundary":st_boundary(geo.myGeometry)} FROM Geometries geo WHERE
> > geometry_type(geo.myGeometry)="LineString" OR
> > geometry_type(geo.myGeometry)="MultiLineString";
> >
> >
> > When this query is run in the SqlppExecutionTest[2] class which says
> > "Runs the SQL++ runtime tests with the storage parallelism." the result
> > obtained is:
> >
> > { "Length": 0.004058119099397876, "Boundary":
> >
> > {"type":"MultiPoint","coordinates":[[-69.1991349,-
> 12.6006222],[-69.1975081,-12.6010968]],"crs":{"type":"name"
> ,"properties":{"name":"EPSG:4326"}}}
> > }
> > { "Length": 8.48528137423857, "Boundary":
> >
> > {"type":"MultiPoint","coordinates":[[1,2],[7,8]],"crs":{"type":"name","
> properties":{"name":"EPSG:4326"}}}
> > }
> > { "Length": 78.9292222699217, "Boundary":
> >
> > {"type":"MultiPoint","coordinates":[[10,10],[10,40],
> [40,40],[30,10]],"crs":{"type":"name","properties":{"name":"EPSG:4326"}}}
> > }
> > { "Length": 0.0031622776601655037, "Boundary":
> >
> > {"type":"MultiPoint","coordinates":[[-113.98,39.198]
> ,[-113.981,39.195]],"crs":{"type":"name","properties":{"
> name":"EPSG:4326"}}}
> > }
> >
> >
> > When this query is run in the SqlppExecutionLessParallelismIT[3] which
> says
> > "Runs the SQL++ runtime tests with less parallelism on node controllers
> > than using all the cores." the result obtained is:
> >
> > { "Length": 8.48528137423857, "Boundary":
> >
> > {"type":"MultiPoint","coordinates":[[1,2],[7,8]],"crs":{"type":"name","
> properties":{"name":"EPSG:4326"}}}
> > }
> > { "Length": 0.004058119099397876, "Boundary":
> >
> > {"type":"MultiPoint","coordinates":[[-69.1991349,-
> 12.6006222],[-69.1975081,-12.6010968]],"crs":{"type":"name"
> ,"properties":{"name":"EPSG:4326"}}}
> > }
> > { "Length": 0.0031622776601655037, "Boundary":
> >
> > {"type":"MultiPoint","coordinates":[[-113.98,39.198]
> ,[-113.981,39.195]],"crs":{"type":"name","properties":{"
> name":"EPSG:4326"}}}
> > }
> > { "Length": 78.9292222699217, "Boundary":
> >
> > {"type":"MultiPoint","coordinates":[[10,10],[10,40],
> [40,40],[30,10]],"crs":{"type":"name","properties":{"name":"EPSG:4326"}}}
> > }
> >
> >
> > The order of th results are different. Hence one of the tests fails. I
> > don't seem to understand the reason for the difference. Any help or
> > explanations would be highly appreciated.
> >
> > [1]
> >
> > https://github.com/riyafa/asterixdb/blob/geometry/
> asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/geojson/
> datatype/primitive.9.query.sqlpp
> > [2]
> >
> > https://github.com/apache/asterixdb/blob/master/
> asterixdb/asterix-app/src/test/java/org/apache/asterix/test/runtime/
> SqlppExecutionTest.java
> > [3]
> >
> > https://github.com/apache/asterixdb/blob/master/
> asterixdb/asterix-app/src/test/java/org/apache/asterix/test/runtime/
> SqlppExecutionLessParallelismIT.java
> >
> > Thank you,
> > Sincerely,
> > Riyafa
> >
>



-- 
Riyafa Abdul Hameed
Undergraduate, University of Moratuwa

Email: riyafa.12@cse.mrt.ac.lk
Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
<http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
<http://twitter.com/Riyafa1>

Re: Result order changed in parallel and less parallel execution

Posted by Steven Jacobs <sj...@ucr.edu>.
Hi Riyafa,
SQL ++ query results have no inherent ordering. In order to guarantee
ordering you need to use an order clause in the query.
Steven

On Sat, Oct 7, 2017 at 1:46 AM Riyafa Abdul Hameed <ri...@apache.org>
wrote:

> Dear all,
>
> I have this query in my spatial implementation[1]:
>
> use GeoJSON;
> SELECT VALUE {"Length": st_length(geo.myGeometry),
> "Boundary":st_boundary(geo.myGeometry)} FROM Geometries geo WHERE
> geometry_type(geo.myGeometry)="LineString" OR
> geometry_type(geo.myGeometry)="MultiLineString";
>
>
> When this query is run in the SqlppExecutionTest[2] class which says
> "Runs the SQL++ runtime tests with the storage parallelism." the result
> obtained is:
>
> { "Length": 0.004058119099397876, "Boundary":
>
> {"type":"MultiPoint","coordinates":[[-69.1991349,-12.6006222],[-69.1975081,-12.6010968]],"crs":{"type":"name","properties":{"name":"EPSG:4326"}}}
> }
> { "Length": 8.48528137423857, "Boundary":
>
> {"type":"MultiPoint","coordinates":[[1,2],[7,8]],"crs":{"type":"name","properties":{"name":"EPSG:4326"}}}
> }
> { "Length": 78.9292222699217, "Boundary":
>
> {"type":"MultiPoint","coordinates":[[10,10],[10,40],[40,40],[30,10]],"crs":{"type":"name","properties":{"name":"EPSG:4326"}}}
> }
> { "Length": 0.0031622776601655037, "Boundary":
>
> {"type":"MultiPoint","coordinates":[[-113.98,39.198],[-113.981,39.195]],"crs":{"type":"name","properties":{"name":"EPSG:4326"}}}
> }
>
>
> When this query is run in the SqlppExecutionLessParallelismIT[3] which says
> "Runs the SQL++ runtime tests with less parallelism on node controllers
> than using all the cores." the result obtained is:
>
> { "Length": 8.48528137423857, "Boundary":
>
> {"type":"MultiPoint","coordinates":[[1,2],[7,8]],"crs":{"type":"name","properties":{"name":"EPSG:4326"}}}
> }
> { "Length": 0.004058119099397876, "Boundary":
>
> {"type":"MultiPoint","coordinates":[[-69.1991349,-12.6006222],[-69.1975081,-12.6010968]],"crs":{"type":"name","properties":{"name":"EPSG:4326"}}}
> }
> { "Length": 0.0031622776601655037, "Boundary":
>
> {"type":"MultiPoint","coordinates":[[-113.98,39.198],[-113.981,39.195]],"crs":{"type":"name","properties":{"name":"EPSG:4326"}}}
> }
> { "Length": 78.9292222699217, "Boundary":
>
> {"type":"MultiPoint","coordinates":[[10,10],[10,40],[40,40],[30,10]],"crs":{"type":"name","properties":{"name":"EPSG:4326"}}}
> }
>
>
> The order of th results are different. Hence one of the tests fails. I
> don't seem to understand the reason for the difference. Any help or
> explanations would be highly appreciated.
>
> [1]
>
> https://github.com/riyafa/asterixdb/blob/geometry/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/geojson/datatype/primitive.9.query.sqlpp
> [2]
>
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-app/src/test/java/org/apache/asterix/test/runtime/SqlppExecutionTest.java
> [3]
>
> https://github.com/apache/asterixdb/blob/master/asterixdb/asterix-app/src/test/java/org/apache/asterix/test/runtime/SqlppExecutionLessParallelismIT.java
>
> Thank you,
> Sincerely,
> Riyafa
>