You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sis.apache.org by Martin Desruisseaux <ma...@geomatys.fr> on 2013/08/30 00:32:29 UTC
Current work: AbstractIdentifiedObject
Hello all
There is a quick update on current work:
"IdentifiedObject" is a type defined by the ISO 19111 specification.
"AbstractIdentifiedObject" [1] is the proposed SIS implementation. This
class is intended to be the base class of almost everything related to
Coordinate Reference System (GeographicCRS, ProjectedCRS, etc.). For
this reason, it would be an important class of SIS.
The problem that "IdentifiedObject" tries to resolve is that the same
map projections are often known by different names and identifiers
depending on the providers. For example "Oblique Mercator" and "Hotine
Oblique Mercator" (in EPSG naming) are two different projections. But
"Oblique Mercator" (not Hotine) in EPSG naming is also called "Hotine
Oblique Mercator Azimuth Center" by ESRI, while "Hotine Oblique
Mercator" (EPSG naming) is called "Hotine Oblique Mercator Azimuth
Natural Origin" by ESRI. In summary, it is not sufficient to know the
name of a map projection. We also need to know who chosen that name (the
"authority").
So IdentifiedObject manages:
* A primary name (whatever SIS chooses as our preferred naming)
* An arbitrary amount of aliases, together with their authorities
* An arbitrary amount of identifiers, typically primary keys in a
database (e.g. "EPSG:4326"), again with the authority that defines
each identifier.
Martin
[1]
https://builds.apache.org/job/sis-jdk7/site/apidocs/org/apache/sis/referencing/AbstractIdentifiedObject.html
Re: Current work: AbstractIdentifiedObject
Posted by Martin Desruisseaux <ma...@geomatys.fr>.
Hello Travis
There is a caching mechanism planed. When a CRS is asked, we first check
in a ConcurrentHashMap. Only if no CRS exists in the map for that code,
then we fetch it from the database then put it in the Map. I don't know
however if such caching works well with Hadoop...
Martin
Le 30/08/13 12:56, Travis L Pinney a écrit :
> Hi Martin,
>
> Thanks for the info. Maybe the tradeoff is to be able to load a few
> EPGSs in memory for something like a Map process in Hadoop. 17-18MB
> seems small to me, maybe I am wrong... but a possible alternative is
> to load the data into a java datastructure that can be quickly loaded
> using a HashMap-like object where the key is the EPSG code.
Re: Current work: AbstractIdentifiedObject
Posted by Travis L Pinney <tr...@gmail.com>.
Hi Martin,
Thanks for the info. Maybe the tradeoff is to be able to load a few
EPGSs in memory for something like a Map process in Hadoop. 17-18MB
seems small to me, maybe I am wrong... but a possible alternative is
to load the data into a java datastructure that can be quickly loaded
using a HashMap-like object where the key is the EPSG code.
On Fri, Aug 30, 2013 at 4:30 AM, Martin Desruisseaux
<ma...@geomatys.fr> wrote:
> Le 30/08/13 02:30, Travis L Pinney a écrit :
>
>> How large would the database be?
>
>
> Hard to said, since it depends a lot on the internal of the database engine.
> Looking on the space used on disk for EPSG 7.09, I have 17 Mb for Derby and
> 18 Mb for HSQL.
>
> The EPSG database defines about 5000 referencing systems, but applications
> will typically used only very few them, maybe 5. This is 0.1% of the
> database content (problem is that everyone may use a different 0.1%). I
> think it would be unfortunate to load such a big database in memory for
> using only 0.1% of it, which is why a tend to prefer disk-based solution in
> the particular case of EPSG. Of course, other kind of data would benefit
> more from memory-based solution.
>
> Martin
>
Re: Current work: AbstractIdentifiedObject
Posted by Martin Desruisseaux <ma...@geomatys.fr>.
Le 30/08/13 02:30, Travis L Pinney a écrit :
> How large would the database be?
Hard to said, since it depends a lot on the internal of the database
engine. Looking on the space used on disk for EPSG 7.09, I have 17 Mb
for Derby and 18 Mb for HSQL.
The EPSG database defines about 5000 referencing systems, but
applications will typically used only very few them, maybe 5. This is
0.1% of the database content (problem is that everyone may use a
different 0.1%). I think it would be unfortunate to load such a big
database in memory for using only 0.1% of it, which is why a tend to
prefer disk-based solution in the particular case of EPSG. Of course,
other kind of data would benefit more from memory-based solution.
Martin
Re: Current work: AbstractIdentifiedObject
Posted by Travis L Pinney <tr...@gmail.com>.
How large would the database be?
On Thu, Aug 29, 2013 at 8:19 PM, Martin Desruisseaux <
martin.desruisseaux@geomatys.fr> wrote:
> Le 30/08/13 02:11, Travis L Pinney a écrit :
>
> +1 on the approach you used for GeoTK and what is proposed for SIS. Would
>> it be possible have it work "in memory" without having to write out to
>> disk
>> for a use case like Hadoop?
>>
>
> Yes, both Derby and HSQL allows "in memory database". However given the
> memory consumption for this relatively large database and the cost of
> recreating the tables on system start, I don't know if the advantages would
> be greater than the inconvenient... But it would be at user choice anyway.
>
> Martin
>
>
Re: Current work: AbstractIdentifiedObject
Posted by Adam Estrada <es...@gmail.com>.
Thanks, Chris! I have heard nothing but good things about Shark :)
Adam
On Thu, Aug 29, 2013 at 11:51 PM, Mattmann, Chris A (398J) <
chris.a.mattmann@jpl.nasa.gov> wrote:
> http://spark.incubator.apache.org/
> https://github.com/amplab/shark/wiki
>
>
> Shark is a lightning fast SQL implementation built on top of Apache
> Spark (originating out of the Berkeley AMP Lab)
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW: http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
> -----Original Message-----
> From: Adam Estrada <es...@gmail.com>
> Reply-To: "dev@sis.apache.org" <de...@sis.apache.org>
> Date: Thursday, August 29, 2013 8:41 PM
> To: "dev@sis.apache.org" <de...@sis.apache.org>
> Subject: Re: Current work: AbstractIdentifiedObject
>
> >Interesting about Shark. Can you post the link and how you think it could
> >be used?
> >
> >
> >On Thu, Aug 29, 2013 at 9:19 PM, Mattmann, Chris A (398J) <
> >chris.a.mattmann@jpl.nasa.gov> wrote:
> >
> >> We should try and integrate with shark from the amp lab...
> >>
> >> Sent from my iPhone
> >>
> >> On Aug 29, 2013, at 5:26 PM, "Martin Desruisseaux" <
> >> martin.desruisseaux@geomatys.fr> wrote:
> >>
> >> > Le 30/08/13 02:11, Travis L Pinney a écrit :
> >> >> +1 on the approach you used for GeoTK and what is proposed for SIS.
> >> Would
> >> >> it be possible have it work "in memory" without having to write out
> >>to
> >> disk
> >> >> for a use case like Hadoop?
> >> >
> >> > Yes, both Derby and HSQL allows "in memory database". However given
> >>the
> >> memory consumption for this relatively large database and the cost of
> >> recreating the tables on system start, I don't know if the advantages
> >>would
> >> be greater than the inconvenient... But it would be at user choice
> >>anyway.
> >> >
> >> > Martin
> >> >
> >>
>
>
Re: Current work: AbstractIdentifiedObject
Posted by Travis L Pinney <tr...@gmail.com>.
I have heard good things about it also. It sounds like a good
candidate for a generalized geospatial datastore.
On Fri, Aug 30, 2013 at 4:21 AM, Martin Desruisseaux
<ma...@geomatys.fr> wrote:
> Thanks Chris. I didn't knew about Shark. I think this is definitively a
> target to put on our list.
>
> Martin
>
>
> Le 30/08/13 05:51, Mattmann, Chris A (398J) a écrit :
>
>> http://spark.incubator.apache.org/
>> https://github.com/amplab/shark/wiki
>>
>>
>> Shark is a lightning fast SQL implementation built on top of Apache
>> Spark (originating out of the Berkeley AMP Lab)
>>
>> Cheers,
>> Chris
>
>
Re: Current work: AbstractIdentifiedObject
Posted by Martin Desruisseaux <ma...@geomatys.fr>.
Thanks Chris. I didn't knew about Shark. I think this is definitively a
target to put on our list.
Martin
Le 30/08/13 05:51, Mattmann, Chris A (398J) a écrit :
> http://spark.incubator.apache.org/
> https://github.com/amplab/shark/wiki
>
>
> Shark is a lightning fast SQL implementation built on top of Apache
> Spark (originating out of the Berkeley AMP Lab)
>
> Cheers,
> Chris
Re: Current work: AbstractIdentifiedObject
Posted by "Mattmann, Chris A (398J)" <ch...@jpl.nasa.gov>.
http://spark.incubator.apache.org/
https://github.com/amplab/shark/wiki
Shark is a lightning fast SQL implementation built on top of Apache
Spark (originating out of the Berkeley AMP Lab)
Cheers,
Chris
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-----Original Message-----
From: Adam Estrada <es...@gmail.com>
Reply-To: "dev@sis.apache.org" <de...@sis.apache.org>
Date: Thursday, August 29, 2013 8:41 PM
To: "dev@sis.apache.org" <de...@sis.apache.org>
Subject: Re: Current work: AbstractIdentifiedObject
>Interesting about Shark. Can you post the link and how you think it could
>be used?
>
>
>On Thu, Aug 29, 2013 at 9:19 PM, Mattmann, Chris A (398J) <
>chris.a.mattmann@jpl.nasa.gov> wrote:
>
>> We should try and integrate with shark from the amp lab...
>>
>> Sent from my iPhone
>>
>> On Aug 29, 2013, at 5:26 PM, "Martin Desruisseaux" <
>> martin.desruisseaux@geomatys.fr> wrote:
>>
>> > Le 30/08/13 02:11, Travis L Pinney a écrit :
>> >> +1 on the approach you used for GeoTK and what is proposed for SIS.
>> Would
>> >> it be possible have it work "in memory" without having to write out
>>to
>> disk
>> >> for a use case like Hadoop?
>> >
>> > Yes, both Derby and HSQL allows "in memory database". However given
>>the
>> memory consumption for this relatively large database and the cost of
>> recreating the tables on system start, I don't know if the advantages
>>would
>> be greater than the inconvenient... But it would be at user choice
>>anyway.
>> >
>> > Martin
>> >
>>
Re: Current work: AbstractIdentifiedObject
Posted by Adam Estrada <es...@gmail.com>.
Interesting about Shark. Can you post the link and how you think it could
be used?
On Thu, Aug 29, 2013 at 9:19 PM, Mattmann, Chris A (398J) <
chris.a.mattmann@jpl.nasa.gov> wrote:
> We should try and integrate with shark from the amp lab...
>
> Sent from my iPhone
>
> On Aug 29, 2013, at 5:26 PM, "Martin Desruisseaux" <
> martin.desruisseaux@geomatys.fr> wrote:
>
> > Le 30/08/13 02:11, Travis L Pinney a écrit :
> >> +1 on the approach you used for GeoTK and what is proposed for SIS.
> Would
> >> it be possible have it work "in memory" without having to write out to
> disk
> >> for a use case like Hadoop?
> >
> > Yes, both Derby and HSQL allows "in memory database". However given the
> memory consumption for this relatively large database and the cost of
> recreating the tables on system start, I don't know if the advantages would
> be greater than the inconvenient... But it would be at user choice anyway.
> >
> > Martin
> >
>
Re: Current work: AbstractIdentifiedObject
Posted by "Mattmann, Chris A (398J)" <ch...@jpl.nasa.gov>.
We should try and integrate with shark from the amp lab...
Sent from my iPhone
On Aug 29, 2013, at 5:26 PM, "Martin Desruisseaux" <ma...@geomatys.fr> wrote:
> Le 30/08/13 02:11, Travis L Pinney a écrit :
>> +1 on the approach you used for GeoTK and what is proposed for SIS. Would
>> it be possible have it work "in memory" without having to write out to disk
>> for a use case like Hadoop?
>
> Yes, both Derby and HSQL allows "in memory database". However given the memory consumption for this relatively large database and the cost of recreating the tables on system start, I don't know if the advantages would be greater than the inconvenient... But it would be at user choice anyway.
>
> Martin
>
Re: Current work: AbstractIdentifiedObject
Posted by Martin Desruisseaux <ma...@geomatys.fr>.
Le 30/08/13 02:11, Travis L Pinney a écrit :
> +1 on the approach you used for GeoTK and what is proposed for SIS. Would
> it be possible have it work "in memory" without having to write out to disk
> for a use case like Hadoop?
Yes, both Derby and HSQL allows "in memory database". However given the
memory consumption for this relatively large database and the cost of
recreating the tables on system start, I don't know if the advantages
would be greater than the inconvenient... But it would be at user choice
anyway.
Martin
Re: Current work: AbstractIdentifiedObject
Posted by Travis L Pinney <tr...@gmail.com>.
Hi Martin,
+1 on the approach you used for GeoTK and what is proposed for SIS. Would
it be possible have it work "in memory" without having to write out to disk
for a use case like Hadoop?
Thanks,
Travis
On Thu, Aug 29, 2013 at 7:53 PM, Martin Desruisseaux <
martin.desruisseaux@geomatys.fr> wrote:
> Hello Adam
>
> Le 30/08/13 00:55, Adam Estrada a écrit :
>
> Thanks a lot, Martin. Where do you envision the database of identifier
>> codes living? I know in GDAL, we typically read from a directory full of
>> CSV's[1] that holds several thousand (not sure of the exact number right
>> now) codes along with their transformations.
>>
>
> A lot (maybe most) of those information are derived from the EPSG database
> [1]. GDAL extracted some information from the EPSG tables as CSV files.
> Indeed, the first row of some files are EPSG column names. The EPSG
> database contains definitions for about 5000 Coordinate Reference Systems.
>
> In Geotk - and what is proposed for SIS - we do not use such CSV files.
> Instead, we use a real EPSG database. The EPSG SQL scripts for creating the
> database are embedded in the JAR file (we are allowed to redistribute
> them), and the database is created the first time that the library is used.
> The database engine is at user choice - it would be Derby by default (an
> Apache project), but it works also on HSQL, PostgreSQL and MS-Access.
>
> In Geotk, information not related to EPSG (for example projection names
> used by ESRI) were hard-coded in Java. For SIS, I would like to store them
> in the database too. Inconvenient is that a database would soon become
> somewhat mandatory for many SIS usages. However I think that a database
> could hardly be avoided anyway for most medium or advanced usages, and this
> can be made transparent for the user if we default to some embedded
> database like Derby or HSQL.
>
> What do you think?
>
> Martin
>
>
> [1] http://www.epsg.org/ - click on "geodetic dataset"
>
>
Re: Current work: AbstractIdentifiedObject
Posted by Martin Desruisseaux <ma...@geomatys.fr>.
Hello Adam
Le 30/08/13 00:55, Adam Estrada a écrit :
> Thanks a lot, Martin. Where do you envision the database of identifier
> codes living? I know in GDAL, we typically read from a directory full of
> CSV's[1] that holds several thousand (not sure of the exact number right
> now) codes along with their transformations.
A lot (maybe most) of those information are derived from the EPSG
database [1]. GDAL extracted some information from the EPSG tables as
CSV files. Indeed, the first row of some files are EPSG column names.
The EPSG database contains definitions for about 5000 Coordinate
Reference Systems.
In Geotk - and what is proposed for SIS - we do not use such CSV files.
Instead, we use a real EPSG database. The EPSG SQL scripts for creating
the database are embedded in the JAR file (we are allowed to
redistribute them), and the database is created the first time that the
library is used. The database engine is at user choice - it would be
Derby by default (an Apache project), but it works also on HSQL,
PostgreSQL and MS-Access.
In Geotk, information not related to EPSG (for example projection names
used by ESRI) were hard-coded in Java. For SIS, I would like to store
them in the database too. Inconvenient is that a database would soon
become somewhat mandatory for many SIS usages. However I think that a
database could hardly be avoided anyway for most medium or advanced
usages, and this can be made transparent for the user if we default to
some embedded database like Derby or HSQL.
What do you think?
Martin
[1] http://www.epsg.org/ - click on "geodetic dataset"
Re: Current work: AbstractIdentifiedObject
Posted by Adam Estrada <es...@gmail.com>.
Thanks a lot, Martin. Where do you envision the database of identifier
codes living? I know in GDAL, we typically read from a directory full of
CSV's[1] that holds several thousand (not sure of the exact number right
now) codes along with their transformations.
Adam
[1] https://svn.osgeo.org/gdal/trunk/gdal/data/
On Thu, Aug 29, 2013 at 6:32 PM, Martin Desruisseaux <
martin.desruisseaux@geomatys.fr> wrote:
> Hello all
>
> There is a quick update on current work:
>
> "IdentifiedObject" is a type defined by the ISO 19111 specification.
> "AbstractIdentifiedObject" [1] is the proposed SIS implementation. This
> class is intended to be the base class of almost everything related to
> Coordinate Reference System (GeographicCRS, ProjectedCRS, etc.). For this
> reason, it would be an important class of SIS.
>
> The problem that "IdentifiedObject" tries to resolve is that the same map
> projections are often known by different names and identifiers depending on
> the providers. For example "Oblique Mercator" and "Hotine Oblique Mercator"
> (in EPSG naming) are two different projections. But "Oblique Mercator" (not
> Hotine) in EPSG naming is also called "Hotine Oblique Mercator Azimuth
> Center" by ESRI, while "Hotine Oblique Mercator" (EPSG naming) is called
> "Hotine Oblique Mercator Azimuth Natural Origin" by ESRI. In summary, it is
> not sufficient to know the name of a map projection. We also need to know
> who chosen that name (the "authority").
>
> So IdentifiedObject manages:
>
> * A primary name (whatever SIS chooses as our preferred naming)
> * An arbitrary amount of aliases, together with their authorities
> * An arbitrary amount of identifiers, typically primary keys in a
> database (e.g. "EPSG:4326"), again with the authority that defines
> each identifier.
>
>
> Martin
>
>
> [1] https://builds.apache.org/job/**sis-jdk7/site/apidocs/org/**
> apache/sis/referencing/**AbstractIdentifiedObject.html<https://builds.apache.org/job/sis-jdk7/site/apidocs/org/apache/sis/referencing/AbstractIdentifiedObject.html>
>
>