You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sis.apache.org by Marc Le Bihan <ml...@numericable.fr> on 2015/01/04 22:03:38 UTC

Shapefile checkings

Hello,

    I committed a change that links more the Shapefile files to the DBase file. 

    The Shapefile class could disappear. The way to read Features from a shapefile is now : 

    InputFeatureStream is = new InputFeatureStream(shapeFile, databaseFile);  // Shapefile (.shp) and Database (.dbf) files.
    Feature feature = is.readFeature(); // null is returned when the end of the shapefile / database are reached.

    Inside the map of the Feature, the objects values have now the SQL type associated to their type : Date, Integer, Double, Float or String.

    I expect it works. Regarding to the test units available it doesn’t cause troubles, but... these tests are not checking deeply the problems that can exist.
    1) What control can I do on values red on shapefile (not the database part, but the shapefile part) to ensure that I am not reading stupid things ?
    2) Can we guess Shapefile entry #4 = Record #4 in the Database ? BTW, I will add a new constructor to InputFeatureStream that will have an SQL request of the form “SELECT * FROM <database> WHERE <single condition>“ to limit Feature to a database condition.
    3) Deleted records are still not taken into account in Database. If they came, the record should be skept (but isn’t yet). I hope that when this happens there is no Shapefile entry associated to this deleted record.

    4) Shall I put all the exceptions in a public package instead of their internal package ? I think it would be better.

Regards,

Marc Le Bihan

Re: Shapefile checkings

Posted by johann sorel <jo...@geomatys.com>.
Hello,

Here are more informations on shapefile structure and link between files :
http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
Page 24 and 25.

p.24 SHX : The I’th record in the index file (shx) stores the offset and 
content length for the I’th record in the main file (shp).
p.25 DBF : The record order must be the same as the order of shape 
features in the main (*.shp) file.

So the order must be consistant between all files : shx, shp and dbf

Hope this helps


Johann Sorel
Geomatys



On 05/01/2015 10:11, Martin Desruisseaux wrote:
> Hello Mark
>
> Thanks for the reply.
>
> Le 05/01/15 09:50, Marc LE BIHAN a écrit :
>> InputFeatureStream :
>> I guessed you wished a stream, but a Reader could be better.
>> Or anything else. What would you prefer ?
> I was thinking about java.util.stream (new in JDK8 - but I have some
> idea for JDK6/7 compatibilities):
>
> http://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html
>
> While the Javadoc shows lot of methods, many of them are static or have
> default implementation. So the number of methods to implement may not be
> so high.
>
>
>> Control the values :
>> How can I check if a shapefile content is correct ? Do you have some basic
>> controls to suggest ?
> I'm not sure to understand what we mean by correct. Do we mean checking
> if a value read from a Shapefile entry/record is of the expected type?
>
>
>> Shapefile - Dabase
>> Currently one MappedByteBuffer reads a shapefile,
>> while another one reads the Database.
>> The first record of the shapefile matches the first record of the database,
>> the second record of the shapefile matches the second record of the
>> database.
>> is it always true ?
> I just asked to our Shapefile specialist. He said that not necessarily,
> there is a third file with ".shx" extension which make the link between
> the database entries and the shapefile entries.
>
>
>> SQLException and DataStoreException :
>> (...snip...)
>>
>> But if you convert an SQLException to a DataStoreException :
>> throw new DataStoreException(sqlException.getMessage(), sqlException);
>> I fear the caller will have no way to react programmatically to the real
>> cause of a trouble. To allow programmer continuing to react to that, we
>> should set vendor codes in SQLException subclasses and DataStoreException.
> SQLException indeed have room for SQL state or vendor code. We could
> also define DataStoreException subclasses - we have not done that yet
> only because we were waiting for more use case experience.
>
> I agree with the goal to allow caller to react to the cause of trouble.
> The question is: are SQLException subclasses the best way to achieve
> this goal if there is no code outside sis-shapefile module would catch
> those exceptions? Are the alternatives (SQL state, vendor code,
> DataStoreException subclasses) worth to explore?
>
>      Martin
>
>


Re: Shapefile checkings

Posted by Martin Desruisseaux <ma...@geomatys.fr>.
Le 05/01/15 10:27, Marc LE BIHAN a écrit :
> Ok. With all this information, I think we cannot release yet the shapefile
> part of the Apache SIS API.
Many thanks for this feedback Marc

So the question to the rest of the PMC, do we omit the sis-shapefile
from Apache SIS 0.5 release? The advantage would be to remove any
pressure on Marc, so he have a full release cycle (maybe 6 months) to
review the sis-shapefile module.


> For DataStoreException, it's ok if subclasses can be created.
> but at the end, DataStoreException will have subclasses to represent
> various problems coming from SQL, JPA, other data source causes.
> In my mind, as long as the exception thrown by methods can summarize to
> DataStoreException when the caller doesn't want to know more, it's ok. He
> catches a more specific exception if he really wants too. But at the time
> he wants, he can : he only has to change the exception name in his catch
> clause.

Fine. The door is all open for adding whatever DataStoreException
subclasses we need. Maybe the main difficulty will be to identity a set
of exceptions that are not too specific to some storage format.


> For stream() :
> I will study the implementation stream() in JDK 8

Thanks. After a second look, there is much more abstract methods than I
though. But an easier way to get started may be to implement a
SplitIterator instead:

http://docs.oracle.com/javase/8/docs/api/java/util/Spliterator.html

Then if we need a stream, we could use:

http://docs.oracle.com/javase/8/docs/api/java/util/stream/StreamSupport.html#stream-java.util.function.Supplier-int-boolean-

The reason why I suggest to look at those interfaces is that they are
designed for parallelization. Even if we do not do parallelization in
Shapefile now, other formats may do or future Shapefile implementations
may do.

    Martin


Re: Shapefile checkings

Posted by Marc LE BIHAN <ml...@gmail.com>.
Ok. With all this information, I think we cannot release yet the shapefile
part of the Apache SIS API.
I have to read shx content, handle deleted record correctly.

For DataStoreException, it's ok if subclasses can be created.
but at the end, DataStoreException will have subclasses to represent
various problems coming from SQL, JPA, other data source causes.
In my mind, as long as the exception thrown by methods can summarize to
DataStoreException when the caller doesn't want to know more, it's ok. He
catches a more specific exception if he really wants too. But at the time
he wants, he can : he only has to change the exception name in his catch
clause.

For stream() :
I will study the implementation stream() in JDK 8

Marc.

2015-01-05 10:11 GMT+01:00 Martin Desruisseaux <
martin.desruisseaux@geomatys.fr>:

> Hello Mark
>
> Thanks for the reply.
>
> Le 05/01/15 09:50, Marc LE BIHAN a écrit :
> > InputFeatureStream :
> > I guessed you wished a stream, but a Reader could be better.
> > Or anything else. What would you prefer ?
>
> I was thinking about java.util.stream (new in JDK8 - but I have some
> idea for JDK6/7 compatibilities):
>
> http://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html
>
> While the Javadoc shows lot of methods, many of them are static or have
> default implementation. So the number of methods to implement may not be
> so high.
>
>
> > Control the values :
> > How can I check if a shapefile content is correct ? Do you have some
> basic
> > controls to suggest ?
>
> I'm not sure to understand what we mean by correct. Do we mean checking
> if a value read from a Shapefile entry/record is of the expected type?
>
>
> > Shapefile - Dabase
> > Currently one MappedByteBuffer reads a shapefile,
> > while another one reads the Database.
> > The first record of the shapefile matches the first record of the
> database,
> > the second record of the shapefile matches the second record of the
> > database.
> > is it always true ?
>
> I just asked to our Shapefile specialist. He said that not necessarily,
> there is a third file with ".shx" extension which make the link between
> the database entries and the shapefile entries.
>
>
> > SQLException and DataStoreException :
> > (...snip...)
> >
> > But if you convert an SQLException to a DataStoreException :
> > throw new DataStoreException(sqlException.getMessage(), sqlException);
> > I fear the caller will have no way to react programmatically to the real
> > cause of a trouble. To allow programmer continuing to react to that, we
> > should set vendor codes in SQLException subclasses and
> DataStoreException.
>
> SQLException indeed have room for SQL state or vendor code. We could
> also define DataStoreException subclasses - we have not done that yet
> only because we were waiting for more use case experience.
>
> I agree with the goal to allow caller to react to the cause of trouble.
> The question is: are SQLException subclasses the best way to achieve
> this goal if there is no code outside sis-shapefile module would catch
> those exceptions? Are the alternatives (SQL state, vendor code,
> DataStoreException subclasses) worth to explore?
>
>     Martin
>
>
>

Re: Shapefile checkings

Posted by Martin Desruisseaux <ma...@geomatys.fr>.
Hello Mark

Thanks for the reply.

Le 05/01/15 09:50, Marc LE BIHAN a écrit :
> InputFeatureStream :
> I guessed you wished a stream, but a Reader could be better.
> Or anything else. What would you prefer ?

I was thinking about java.util.stream (new in JDK8 - but I have some
idea for JDK6/7 compatibilities):

http://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html

While the Javadoc shows lot of methods, many of them are static or have
default implementation. So the number of methods to implement may not be
so high.


> Control the values :
> How can I check if a shapefile content is correct ? Do you have some basic
> controls to suggest ?

I'm not sure to understand what we mean by correct. Do we mean checking
if a value read from a Shapefile entry/record is of the expected type?


> Shapefile - Dabase
> Currently one MappedByteBuffer reads a shapefile,
> while another one reads the Database.
> The first record of the shapefile matches the first record of the database,
> the second record of the shapefile matches the second record of the
> database.
> is it always true ?

I just asked to our Shapefile specialist. He said that not necessarily,
there is a third file with ".shx" extension which make the link between
the database entries and the shapefile entries.


> SQLException and DataStoreException :
> (...snip...)
>
> But if you convert an SQLException to a DataStoreException :
> throw new DataStoreException(sqlException.getMessage(), sqlException);
> I fear the caller will have no way to react programmatically to the real
> cause of a trouble. To allow programmer continuing to react to that, we
> should set vendor codes in SQLException subclasses and DataStoreException.

SQLException indeed have room for SQL state or vendor code. We could
also define DataStoreException subclasses - we have not done that yet
only because we were waiting for more use case experience.

I agree with the goal to allow caller to react to the cause of trouble.
The question is: are SQLException subclasses the best way to achieve
this goal if there is no code outside sis-shapefile module would catch
those exceptions? Are the alternatives (SQL state, vendor code,
DataStoreException subclasses) worth to explore?

    Martin



Re: Shapefile checkings

Posted by Marc LE BIHAN <ml...@gmail.com>.
InputFeatureStream :
I guessed you wished a stream, but a Reader could be better.
Or anything else. What would you prefer ?

Control the values :
How can I check if a shapefile content is correct ? Do you have some basic
controls to suggest ?

Shapefile - Dabase
Currently one MappedByteBuffer reads a shapefile,
while another one reads the Database.
The first record of the shapefile matches the first record of the database,
the second record of the shapefile matches the second record of the
database.
is it always true ?

SQLException and DataStoreException :
I agree that top level function should limit themselves to SQLException,
and this would allow the caller who wants to specfically catch the
exception subclass he wants in case he really wants to.

But if you convert an SQLException to a DataStoreException :
throw new DataStoreException(sqlException.getMessage(), sqlException);
I fear the caller will have no way to react programmatically to the real
cause of a trouble. To allow programmer continuing to react to that, we
should set vendor codes in SQLException subclasses and DataStoreException.
Else, only the message will carry the cause of the trouble, and no code can
do that :
if (e.getMessage().contains("No more record")) {
...
}
else {
    if (e.getMessage().contains("connection closed")) {
       ...
    }
}

I ensured to give a chance to the caller to react to any specific problem
he wishes, and to avoid seeing them by doing a simple
catch(SQLException e) that globlalize all. If you force the developpers to
summarize to a single DatastoreException, you will loose this ability.

Marc.


2015-01-05 9:07 GMT+01:00 Martin Desruisseaux <
martin.desruisseaux@geomatys.fr>:

> Hello Marc
>
> Le 04/01/15 22:03, Marc Le Bihan a écrit :
> > The Shapefile class could disappear. The way to read Features from a
> shapefile is now :
> >
> > InputFeatureStream is = new InputFeatureStream(shapeFile,
> databaseFile);  // Shapefile (.shp) and Database (.dbf) files.
> > Feature feature = is.readFeature(); // null is returned when the end of
> the shapefile / database are reached.
>
> Why InputFeatureStream extends java.io.InputStream? What are the meaning
> of the bytes stream? In the source code, the method reading bytes just
> throw an UnsupportedOperationException, so why extending InputStream at
> all if almost all their methods are unsupported?
>
>
> > 1) What control can I do on values red on shapefile (not the database
> part, but the shapefile part) to ensure that I am not reading stupid things
> ?
>
> Could you give an example of what you mean?
>
>
> > 2) Can we guess Shapefile entry #4 = Record #4 in the Database
>
> I have not yet read the Shapefile specification. What are the difference
> between an entry and a record?
>
>
> > 4) Shall I put all the exceptions in a public package instead of their
> internal package ? I think it would be better.
>
> We could put some of them in an "org.apache.sis.storage.shapefile.sql"
> package. However those exceptions are specific to the SIS implementation
> of DBase3. Other JDBC driver never throw those exceptions. Consequently,
> any generic code designed for arbitrary database (PostgreSQL, Oracle,
> etc.) can not depend on those exceptions. Since we try to write Apache
> SIS for generic databases, this means that even SIS would not try to
> catch those exceptions.
>
> Furthermore all those exceptions extends SQLException, which are not the
> kind of exception that we would handle most of the time. The
> higher-level API is rather DataStore, which define a DataStoreException.
> The SQLException and IOException are lower level mechanic, which would
> appear in the "caused by" property of DataStoreException.
>
> Given the above, I do not expect any "catch" clause for those exceptions
> outside the Shapefile module. So what would be the use case for making
> those exceptions public?
>
>     Martin
>
>

Re: Shapefile checkings

Posted by Martin Desruisseaux <ma...@geomatys.fr>.
Hello Marc

Le 04/01/15 22:03, Marc Le Bihan a écrit :
> The Shapefile class could disappear. The way to read Features from a shapefile is now : 
>
> InputFeatureStream is = new InputFeatureStream(shapeFile, databaseFile);  // Shapefile (.shp) and Database (.dbf) files.
> Feature feature = is.readFeature(); // null is returned when the end of the shapefile / database are reached.

Why InputFeatureStream extends java.io.InputStream? What are the meaning
of the bytes stream? In the source code, the method reading bytes just
throw an UnsupportedOperationException, so why extending InputStream at
all if almost all their methods are unsupported?


> 1) What control can I do on values red on shapefile (not the database part, but the shapefile part) to ensure that I am not reading stupid things ?

Could you give an example of what you mean?


> 2) Can we guess Shapefile entry #4 = Record #4 in the Database

I have not yet read the Shapefile specification. What are the difference
between an entry and a record?


> 4) Shall I put all the exceptions in a public package instead of their internal package ? I think it would be better.

We could put some of them in an "org.apache.sis.storage.shapefile.sql"
package. However those exceptions are specific to the SIS implementation
of DBase3. Other JDBC driver never throw those exceptions. Consequently,
any generic code designed for arbitrary database (PostgreSQL, Oracle,
etc.) can not depend on those exceptions. Since we try to write Apache
SIS for generic databases, this means that even SIS would not try to
catch those exceptions.

Furthermore all those exceptions extends SQLException, which are not the
kind of exception that we would handle most of the time. The
higher-level API is rather DataStore, which define a DataStoreException.
The SQLException and IOException are lower level mechanic, which would
appear in the "caused by" property of DataStoreException.

Given the above, I do not expect any "catch" clause for those exceptions
outside the Shapefile module. So what would be the use case for making
those exceptions public?

    Martin