You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@sis.apache.org by Martin Desruisseaux <ma...@geomatys.fr> on 2013/08/30 00:32:29 UTC

Current work: AbstractIdentifiedObject

Hello all

There is a quick update on current work:

"IdentifiedObject" is a type defined by the ISO 19111 specification. 
"AbstractIdentifiedObject" [1] is the proposed SIS implementation. This 
class is intended to be the base class of almost everything related to 
Coordinate Reference System (GeographicCRS, ProjectedCRS, etc.). For 
this reason, it would be an important class of SIS.

The problem that "IdentifiedObject" tries to resolve is that the same 
map projections are often known by different names and identifiers 
depending on the providers. For example "Oblique Mercator" and "Hotine 
Oblique Mercator" (in EPSG naming) are two different projections. But 
"Oblique Mercator" (not Hotine) in EPSG naming is also called "Hotine 
Oblique Mercator Azimuth Center" by ESRI, while "Hotine Oblique 
Mercator" (EPSG naming) is called "Hotine Oblique Mercator Azimuth 
Natural Origin" by ESRI. In summary, it is not sufficient to know the 
name of a map projection. We also need to know who chosen that name (the 
"authority").

So IdentifiedObject manages:

  * A primary name (whatever SIS chooses as our preferred naming)
  * An arbitrary amount of aliases, together with their authorities
  * An arbitrary amount of identifiers, typically primary keys in a
    database (e.g. "EPSG:4326"), again with the authority that defines
    each identifier.


     Martin


[1] 
https://builds.apache.org/job/sis-jdk7/site/apidocs/org/apache/sis/referencing/AbstractIdentifiedObject.html

Re: Current work: AbstractIdentifiedObject

Posted by Martin Desruisseaux <ma...@geomatys.fr>.

Hello Travis

There is a caching mechanism planed. When a CRS is asked, we first check 
in a ConcurrentHashMap. Only if no CRS exists in the map for that code, 
then we fetch it from the database then put it in the Map. I don't know 
however if such caching works well with Hadoop...

     Martin


Le 30/08/13 12:56, Travis L Pinney a écrit :
> Hi Martin,
>
> Thanks for the info. Maybe the tradeoff is to be able to load a few
> EPGSs in memory for something like a Map process in Hadoop. 17-18MB
> seems small to me, maybe I am wrong... but a possible alternative is
> to load the data into a java datastructure that can be quickly loaded
> using a HashMap-like object where the key is the EPSG code.

Re: Current work: AbstractIdentifiedObject

Posted by Travis L Pinney <tr...@gmail.com>.

Hi Martin,

Thanks for the info. Maybe the tradeoff is to be able to load a few
EPGSs in memory for something like a Map process in Hadoop. 17-18MB
seems small to me, maybe I am wrong... but a possible alternative is
to load the data into a java datastructure that can be quickly loaded
using a HashMap-like object where the key is the EPSG code.






On Fri, Aug 30, 2013 at 4:30 AM, Martin Desruisseaux
<ma...@geomatys.fr> wrote:
> Le 30/08/13 02:30, Travis L Pinney a écrit :
>
>> How large would the database be?
>
>
> Hard to said, since it depends a lot on the internal of the database engine.
> Looking on the space used on disk for EPSG 7.09, I have 17 Mb for Derby and
> 18 Mb for HSQL.
>
> The EPSG database defines about 5000 referencing systems, but applications
> will typically used only very few them, maybe 5. This is 0.1% of the
> database content (problem is that everyone may use a different 0.1%). I
> think it would be unfortunate to load such a big database in memory for
> using only 0.1% of it, which is why a tend to prefer disk-based solution in
> the particular case of EPSG. Of course, other kind of data would benefit
> more from memory-based solution.
>
>     Martin
>

Re: Current work: AbstractIdentifiedObject

Posted by Martin Desruisseaux <ma...@geomatys.fr>.

Le 30/08/13 02:30, Travis L Pinney a écrit :
> How large would the database be?

Hard to said, since it depends a lot on the internal of the database 
engine. Looking on the space used on disk for EPSG 7.09, I have 17 Mb 
for Derby and 18 Mb for HSQL.

The EPSG database defines about 5000 referencing systems, but 
applications will typically used only very few them, maybe 5. This is 
0.1% of the database content (problem is that everyone may use a 
different 0.1%). I think it would be unfortunate to load such a big 
database in memory for using only 0.1% of it, which is why a tend to 
prefer disk-based solution in the particular case of EPSG. Of course, 
other kind of data would benefit more from memory-based solution.

     Martin

Re: Current work: AbstractIdentifiedObject

Posted by Travis L Pinney <tr...@gmail.com>.

How large would the database be?


On Thu, Aug 29, 2013 at 8:19 PM, Martin Desruisseaux <
martin.desruisseaux@geomatys.fr> wrote:

> Le 30/08/13 02:11, Travis L Pinney a écrit :
>
>  +1 on the approach you used for GeoTK and what is proposed for SIS. Would
>> it be possible have it work "in memory" without having to write out to
>> disk
>> for a use case like Hadoop?
>>
>
> Yes, both Derby and HSQL allows "in memory database". However given the
> memory consumption for this relatively large database and the cost of
> recreating the tables on system start, I don't know if the advantages would
> be greater than the inconvenient... But it would be at user choice anyway.
>
>         Martin
>
>

Re: Current work: AbstractIdentifiedObject

Posted by Adam Estrada <es...@gmail.com>.

Thanks, Chris! I have heard nothing but good things about Shark :)

Adam


On Thu, Aug 29, 2013 at 11:51 PM, Mattmann, Chris A (398J) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> http://spark.incubator.apache.org/
> https://github.com/amplab/shark/wiki
>
>
> Shark is a lightning fast SQL implementation built on top of Apache
> Spark (originating out of the Berkeley AMP Lab)
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
> -----Original Message-----
> From: Adam Estrada <es...@gmail.com>
> Reply-To: "dev@sis.apache.org" <de...@sis.apache.org>
> Date: Thursday, August 29, 2013 8:41 PM
> To: "dev@sis.apache.org" <de...@sis.apache.org>
> Subject: Re: Current work: AbstractIdentifiedObject
>
> >Interesting about Shark. Can you post the link and how you think it could
> >be used?
> >
> >
> >On Thu, Aug 29, 2013 at 9:19 PM, Mattmann, Chris A (398J) <
> >chris.a.mattmann@jpl.nasa.gov> wrote:
> >
> >> We should try and integrate with shark from the amp lab...
> >>
> >> Sent from my iPhone
> >>
> >> On Aug 29, 2013, at 5:26 PM, "Martin Desruisseaux" <
> >> martin.desruisseaux@geomatys.fr> wrote:
> >>
> >> > Le 30/08/13 02:11, Travis L Pinney a écrit :
> >> >> +1 on the approach you used for GeoTK and what is proposed for SIS.
> >> Would
> >> >> it be possible have it work "in memory" without having to write out
> >>to
> >> disk
> >> >> for a use case like Hadoop?
> >> >
> >> > Yes, both Derby and HSQL allows "in memory database". However given
> >>the
> >> memory consumption for this relatively large database and the cost of
> >> recreating the tables on system start, I don't know if the advantages
> >>would
> >> be greater than the inconvenient... But it would be at user choice
> >>anyway.
> >> >
> >> >        Martin
> >> >
> >>
>
>

Re: Current work: AbstractIdentifiedObject

Posted by Travis L Pinney <tr...@gmail.com>.

I have heard good things about it also. It sounds like a good
candidate for a generalized geospatial datastore.

On Fri, Aug 30, 2013 at 4:21 AM, Martin Desruisseaux
<ma...@geomatys.fr> wrote:
> Thanks Chris. I didn't knew about Shark. I think this is definitively a
> target to put on our list.
>
>     Martin
>
>
> Le 30/08/13 05:51, Mattmann, Chris A (398J) a écrit :
>
>> http://spark.incubator.apache.org/
>> https://github.com/amplab/shark/wiki
>>
>>
>> Shark is a lightning fast SQL implementation built on top of Apache
>> Spark (originating out of the Berkeley AMP Lab)
>>
>> Cheers,
>> Chris
>
>

Re: Current work: AbstractIdentifiedObject

Posted by Martin Desruisseaux <ma...@geomatys.fr>.

Thanks Chris. I didn't knew about Shark. I think this is definitively a 
target to put on our list.

     Martin


Le 30/08/13 05:51, Mattmann, Chris A (398J) a écrit :
> http://spark.incubator.apache.org/
> https://github.com/amplab/shark/wiki
>
>
> Shark is a lightning fast SQL implementation built on top of Apache
> Spark (originating out of the Berkeley AMP Lab)
>
> Cheers,
> Chris

Re: Current work: AbstractIdentifiedObject

Posted by "Mattmann, Chris A (398J)" <ch...@jpl.nasa.gov>.

http://spark.incubator.apache.org/
https://github.com/amplab/shark/wiki


Shark is a lightning fast SQL implementation built on top of Apache
Spark (originating out of the Berkeley AMP Lab)

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Adam Estrada <es...@gmail.com>
Reply-To: "dev@sis.apache.org" <de...@sis.apache.org>
Date: Thursday, August 29, 2013 8:41 PM
To: "dev@sis.apache.org" <de...@sis.apache.org>
Subject: Re: Current work: AbstractIdentifiedObject

>Interesting about Shark. Can you post the link and how you think it could
>be used?
>
>
>On Thu, Aug 29, 2013 at 9:19 PM, Mattmann, Chris A (398J) <
>chris.a.mattmann@jpl.nasa.gov> wrote:
>
>> We should try and integrate with shark from the amp lab...
>>
>> Sent from my iPhone
>>
>> On Aug 29, 2013, at 5:26 PM, "Martin Desruisseaux" <
>> martin.desruisseaux@geomatys.fr> wrote:
>>
>> > Le 30/08/13 02:11, Travis L Pinney a écrit :
>> >> +1 on the approach you used for GeoTK and what is proposed for SIS.
>> Would
>> >> it be possible have it work "in memory" without having to write out
>>to
>> disk
>> >> for a use case like Hadoop?
>> >
>> > Yes, both Derby and HSQL allows "in memory database". However given
>>the
>> memory consumption for this relatively large database and the cost of
>> recreating the tables on system start, I don't know if the advantages
>>would
>> be greater than the inconvenient... But it would be at user choice
>>anyway.
>> >
>> >        Martin
>> >
>>

Re: Current work: AbstractIdentifiedObject

Posted by Adam Estrada <es...@gmail.com>.

Interesting about Shark. Can you post the link and how you think it could
be used?


On Thu, Aug 29, 2013 at 9:19 PM, Mattmann, Chris A (398J) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> We should try and integrate with shark from the amp lab...
>
> Sent from my iPhone
>
> On Aug 29, 2013, at 5:26 PM, "Martin Desruisseaux" <
> martin.desruisseaux@geomatys.fr> wrote:
>
> > Le 30/08/13 02:11, Travis L Pinney a écrit :
> >> +1 on the approach you used for GeoTK and what is proposed for SIS.
> Would
> >> it be possible have it work "in memory" without having to write out to
> disk
> >> for a use case like Hadoop?
> >
> > Yes, both Derby and HSQL allows "in memory database". However given the
> memory consumption for this relatively large database and the cost of
> recreating the tables on system start, I don't know if the advantages would
> be greater than the inconvenient... But it would be at user choice anyway.
> >
> >        Martin
> >
>

Re: Current work: AbstractIdentifiedObject

Posted by "Mattmann, Chris A (398J)" <ch...@jpl.nasa.gov>.

We should try and integrate with shark from the amp lab...

Sent from my iPhone

On Aug 29, 2013, at 5:26 PM, "Martin Desruisseaux" <ma...@geomatys.fr> wrote:

> Le 30/08/13 02:11, Travis L Pinney a écrit :
>> +1 on the approach you used for GeoTK and what is proposed for SIS. Would
>> it be possible have it work "in memory" without having to write out to disk
>> for a use case like Hadoop?
> 
> Yes, both Derby and HSQL allows "in memory database". However given the memory consumption for this relatively large database and the cost of recreating the tables on system start, I don't know if the advantages would be greater than the inconvenient... But it would be at user choice anyway.
> 
>        Martin
>

Re: Current work: AbstractIdentifiedObject

Posted by Martin Desruisseaux <ma...@geomatys.fr>.

Le 30/08/13 02:11, Travis L Pinney a écrit :
> +1 on the approach you used for GeoTK and what is proposed for SIS. Would
> it be possible have it work "in memory" without having to write out to disk
> for a use case like Hadoop?

Yes, both Derby and HSQL allows "in memory database". However given the 
memory consumption for this relatively large database and the cost of 
recreating the tables on system start, I don't know if the advantages 
would be greater than the inconvenient... But it would be at user choice 
anyway.

         Martin

Re: Current work: AbstractIdentifiedObject

Posted by Travis L Pinney <tr...@gmail.com>.

Hi Martin,

+1 on the approach you used for GeoTK and what is proposed for SIS. Would
it be possible have it work "in memory" without having to write out to disk
for a use case like Hadoop?

Thanks,
Travis




On Thu, Aug 29, 2013 at 7:53 PM, Martin Desruisseaux <
martin.desruisseaux@geomatys.fr> wrote:

> Hello Adam
>
> Le 30/08/13 00:55, Adam Estrada a écrit :
>
>  Thanks a lot, Martin. Where do you envision the database of identifier
>> codes living? I know in GDAL, we typically read from a directory full of
>> CSV's[1] that holds several thousand (not sure of the exact number right
>> now) codes along with their transformations.
>>
>
> A lot (maybe most) of those information are derived from the EPSG database
> [1]. GDAL extracted some information from the EPSG tables as CSV files.
> Indeed, the first row of some files are EPSG column names. The EPSG
> database contains definitions for about 5000 Coordinate Reference Systems.
>
> In Geotk - and what is proposed for SIS - we do not use such CSV files.
> Instead, we use a real EPSG database. The EPSG SQL scripts for creating the
> database are embedded in the JAR file (we are allowed to redistribute
> them), and the database is created the first time that the library is used.
> The database engine is at user choice - it would be Derby by default (an
> Apache project), but it works also on HSQL, PostgreSQL and MS-Access.
>
> In Geotk, information not related to EPSG (for example projection names
> used by ESRI) were hard-coded in Java. For SIS, I would like to store them
> in the database too. Inconvenient is that a database would soon become
> somewhat mandatory for many SIS usages. However I think that a database
> could hardly be avoided anyway for most medium or advanced usages, and this
> can be made transparent for the user if we default to some embedded
> database like Derby or HSQL.
>
> What do you think?
>
>     Martin
>
>
> [1] http://www.epsg.org/ - click on "geodetic dataset"
>
>

Re: Current work: AbstractIdentifiedObject

Posted by Martin Desruisseaux <ma...@geomatys.fr>.

Hello Adam

Le 30/08/13 00:55, Adam Estrada a écrit :
> Thanks a lot, Martin. Where do you envision the database of identifier
> codes living? I know in GDAL, we typically read from a directory full of
> CSV's[1] that holds several thousand (not sure of the exact number right
> now) codes along with their transformations.

A lot (maybe most) of those information are derived from the EPSG 
database [1]. GDAL extracted some information from the EPSG tables as 
CSV files. Indeed, the first row of some files are EPSG column names. 
The EPSG database contains definitions for about 5000 Coordinate 
Reference Systems.

In Geotk - and what is proposed for SIS - we do not use such CSV files. 
Instead, we use a real EPSG database. The EPSG SQL scripts for creating 
the database are embedded in the JAR file (we are allowed to 
redistribute them), and the database is created the first time that the 
library is used. The database engine is at user choice - it would be 
Derby by default (an Apache project), but it works also on HSQL, 
PostgreSQL and MS-Access.

In Geotk, information not related to EPSG (for example projection names 
used by ESRI) were hard-coded in Java. For SIS, I would like to store 
them in the database too. Inconvenient is that a database would soon 
become somewhat mandatory for many SIS usages. However I think that a 
database could hardly be avoided anyway for most medium or advanced 
usages, and this can be made transparent for the user if we default to 
some embedded database like Derby or HSQL.

What do you think?

     Martin


[1] http://www.epsg.org/ - click on "geodetic dataset"

Re: Current work: AbstractIdentifiedObject

Posted by Adam Estrada <es...@gmail.com>.

Thanks a lot, Martin. Where do you envision the database of identifier
codes living? I know in GDAL, we typically read from a directory full of
CSV's[1] that holds several thousand (not sure of the exact number right
now) codes along with their transformations.

Adam

[1] https://svn.osgeo.org/gdal/trunk/gdal/data/




On Thu, Aug 29, 2013 at 6:32 PM, Martin Desruisseaux <
martin.desruisseaux@geomatys.fr> wrote:

> Hello all
>
> There is a quick update on current work:
>
> "IdentifiedObject" is a type defined by the ISO 19111 specification.
> "AbstractIdentifiedObject" [1] is the proposed SIS implementation. This
> class is intended to be the base class of almost everything related to
> Coordinate Reference System (GeographicCRS, ProjectedCRS, etc.). For this
> reason, it would be an important class of SIS.
>
> The problem that "IdentifiedObject" tries to resolve is that the same map
> projections are often known by different names and identifiers depending on
> the providers. For example "Oblique Mercator" and "Hotine Oblique Mercator"
> (in EPSG naming) are two different projections. But "Oblique Mercator" (not
> Hotine) in EPSG naming is also called "Hotine Oblique Mercator Azimuth
> Center" by ESRI, while "Hotine Oblique Mercator" (EPSG naming) is called
> "Hotine Oblique Mercator Azimuth Natural Origin" by ESRI. In summary, it is
> not sufficient to know the name of a map projection. We also need to know
> who chosen that name (the "authority").
>
> So IdentifiedObject manages:
>
>  * A primary name (whatever SIS chooses as our preferred naming)
>  * An arbitrary amount of aliases, together with their authorities
>  * An arbitrary amount of identifiers, typically primary keys in a
>    database (e.g. "EPSG:4326"), again with the authority that defines
>    each identifier.
>
>
>     Martin
>
>
> [1] https://builds.apache.org/job/**sis-jdk7/site/apidocs/org/**
> apache/sis/referencing/**AbstractIdentifiedObject.html<https://builds.apache.org/job/sis-jdk7/site/apidocs/org/apache/sis/referencing/AbstractIdentifiedObject.html>
>
>