You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sis.apache.org by Martin Desruisseaux <ma...@geomatys.com> on 2015/11/20 15:14:33 UTC

License of new test Shapefiles?

Hello

I noticed new test files in the following directory:

    storage/sis-shapefile/src/test/resources/org/apache/sis/storage/shapefile/

I noticed a NOTES.md file (thanks), but I did not saw licence
information in that file. What are the license of those new
"DEPARTEMENT" files? (and the license for the other, older files too?)
Is it a licence compatible with the Apache License?

An other issue is that "DEPARTEMENT.SHP" is 3 Mb, which is a lot for
just a test file (if it was data needed for application execution, it
would be a different story). Would it be possible to have a much smaller
test file? We rarely need more than a few kb for testing, except if we
want to test scalability (but I think that discussion about scalability
could be another thread). It may also help to avoid licensing issues if
we use a subset small enough for being considered "fair use". For
example in the GR3DF97A.txt test file that I committed this week, I
extracted only a few lines. The resulting test file is only 2 kb.

Small test files are appreciated because they are included in the ZIP
files that make official Apache releases (the JAR deployed on Maven
Central are not official releases - just commodity), and also for making
history smaller (the SVN is mirrored on GitHub, so even if we delete the
file, its weight will stay in the history).

    Martin



Re: License of new test Shapefiles?

Posted by Martin Desruisseaux <ma...@geomatys.com>.
Hello Marc

Le 21/11/15 12:53, Marc Le Bihan a écrit :

> Its really boring to debate about the source of each test file,
> especially when they come from public organizations or gouvernments
> and are downloadable freely by everyone, but I understand that you
> have legacy quarrels that forces you to take care of everything, even
> if it causes your project no to benefit from as much informations
> sources than others. I want to see the day where someone will attack
> any apache project because it has a file inside his test resources...
> Really for what checkings are you loosing time ?!

Freely downloadable does not mean compatible with Apache license. For
example the EPSG database is freely downloadable, but we have still not
yet received the permission to include it in SIS
(https://issues.apache.org/jira/browse/LEGAL-183). Note that the license
that you found for the Shapefiles has a clause ("la réutilisation est
toutefois subordonnée au respect de l'intégrité de l'information et des
données") which is similar to an EPSG clause against which the Apache
legal team has raised objections. In my experience with LEGAL-183, if
the user can not freely modify the data, it is not compatible with
Apache license. It may nevertheless be included in SIS, but we need to
ask the legal team to grant us an exception. This is what I'm trying to
do with LEGAL-183. But we can obviously ask such exception only for data
important enough.

Other example: the "datum shift grid" for transforming coordinates from
the old French system to the new French system is freely downloadable
from the French mapping agency (http://www.ign.fr). Nevertheless, it is
not included in Debian distribution because of its redistribution
conditions (which are very similar to the EPSG conditions).

Data or software licensed under GPL is also a well-known example of
freely downloadable things that we can not include in SIS.


> 1) Many files that come from open data have a large amout of real case
> data, and among this data, you have a lot of interresting cases. For
> example, DEPARTEMENT.SHP shapefile had a the Finistere Departement
> inside, a feature created with a three-part polygon.
> Useful to challenge some displays or calculations. The size of the
> file was only 3 MB. An update or a pull return it in 0.1 seconds for
> anyone having an ADSL connection.
> The one who will want to do these testings will have to create himself
> a Shapefile, I think. Thanks.

In addition to Shapefile, other modules like GeoTIFF or NetCDF could
also have big test files. The total size of test data grow very quickly
in geospatial libraries. Testing interesting cases like this three-part
polygon is important, but it can be done as well with a Shapefile
trimmed to contain only the interesting cases. This is what I did with
other kind of test data (e.g. NetCDF) in GeoAPI and SIS. If a 3 Mb test
file is committed for each interesting case in every module, we will
have problems.

By coincidence, a discussion started today on another Apache mailing
list about removing a 103 Mb file committed accidentally, which is
causing them issues with GitHub. Another project took the opportunity
for requesting the removal of a 20 Mb binary file on their repository
too. They raised (among others) the same concern that I did: the cost
imposed on anyone who clone the project history.

It is okay to have some big test files, but we can make them optional
and outside the main repository. We even have a SVN directory for that
(while not yet used or part of any SIS download):

    http://svn.apache.org/repos/asf/sis/data/

So we could start a separated thread about how to handle big files. I'm
not against them. I just suggest to 1) make sure that we are allowed by
Apache rules to copy them, 2) find the right place for them and 3)
favour data files that are likely to be used in more than one test.

(Note: if we decide to bring back DEPARTEMENT.SHP shapefile in the
above-cited "data" directory, we need to make sure to use "svn copy" in
order to not impose the file weight on the Apache server twice. I would
volunteer for doing this operation if this is what peoples want).


> The allowed duration of unit tests is 0 second x 100 tests = 10
> seconds. Only in-memory tests, nothing else. If the test uses : any
> file, any external resource, any building that takes time, it has no
> more to classified as an unit test. Else, as you did, you attempt to
> discard tests one way or another because you are feeling (and you are
> right) that they took too much time for the only mvn clean install
> that you just want to do.

I'm not yet too much concerned about build time. This is sometime that
can be easily revisited in the future, for example using Maven profiles.
I was rather concerned about the size of committed files, because they
are (in principle) irremediable actions: those files will stay in the
history and be part of Git clone even after we deleted them.

    Martin



Re: INFRA issue created - please avoid commits for now

Posted by Martin Desruisseaux <ma...@geomatys.com>.
Le 06/12/15 14:36, Martin Desruisseaux a écrit :
> I'm not completely sure if what infra is looking for would allow us to
> clean history even months after the fact. I will try to read again
> infra emails more carefully and report back on this question.

>From my understanding of infra emails, they may have a solution for this
kind of situation in the future. But in the meantime adding new commits
will probably not make a difference provided that we do not refer the
deleted files in new commits.

    Martin



Re: INFRA issue created - please avoid commits for now

Posted by Martin Desruisseaux <ma...@geomatys.com>.
Le 07/12/15 11:14, Marc Le Bihan a écrit :
> How much time will take the plan to resume commits and updates on JDK8 ?

There is no delay - it can be now.

The proposed delay was for the merges with JDK7 and other branches, not
for commits on JDK8 branch.

    Martin



Re: INFRA issue created - please avoid commits for now

Posted by Marc Le Bihan <ml...@gmail.com>.
How much time will take the plan to resume commits and updates on JDK8 ?

I would like to receive the updates from the JDK 8 branch as soon as 
possible : I think the push --force can be done later.
I fear a bit from having been excluded from updates since 15 days. Because 
If I commit without updating, problems will surely come quicky. And the day 
you will try to do the merge, you might encounter some troubles.

-----Message d'origine----- 
From: Martin Desruisseaux
Sent: Monday, December 07, 2015 10:23 AM
To: dev@sis.apache.org
Subject: Re: INFRA issue created - please avoid commits for now

Hello Marc

Yes, the plan is to resume commits on JDK8 like before.

The reason for avoiding the merge with other branches is that the binary
files is only in the history of JDK8 branch for now. If we want to avoid
to bring this history into trunk, we need to do the "push --force" on
JDK8 branch first.

    Martin


Le 07/12/15 04:48, Marc Le Bihan a écrit :

> If I commit now, I will commit my work in a position that in SVN is
> "15 days ago", because currently I am no more receiving updates from
> this branch since 15 days ?
> Isn't it for me : committing in the void, if I don't receive any
> updates in exchange ?
>
> Resume all the things you have done with branch JDK 8 please.
> Let us return to normal work (commit and updates), and later you will
> see if you really to something with git or not.


Re: INFRA issue created - please avoid commits for now

Posted by Martin Desruisseaux <ma...@geomatys.com>.
Hello Marc

Yes, the plan is to resume commits on JDK8 like before.

The reason for avoiding the merge with other branches is that the binary
files is only in the history of JDK8 branch for now. If we want to avoid
to bring this history into trunk, we need to do the "push --force" on
JDK8 branch first.

    Martin


Le 07/12/15 04:48, Marc Le Bihan a écrit :

> If I commit now, I will commit my work in a position that in SVN is
> "15 days ago", because currently I am no more receiving updates from
> this branch since 15 days ?
> Isn't it for me : committing in the void, if I don't receive any
> updates in exchange ?
>
> Resume all the things you have done with branch JDK 8 please.
> Let us return to normal work (commit and updates), and later you will
> see if you really to something with git or not.


Re: INFRA issue created - please avoid commits for now

Posted by Marc Le Bihan <ml...@gmail.com>.
I don't understand what we are doing.
If I commit now, I will commit my work in a position that in SVN is "15 days 
ago", because currently I am no more receiving updates from this branch 
since 15 days ?
Isn't it for me : committing in the void, if I don't receive any updates in 
exchange ?

Resume all the things you have done with branch JDK 8 please.
Let us return to normal work (commit and updates), and later you will see if 
you really to something with git or not.

Marc.

-----Message d'origine----- 
From: Martin Desruisseaux
Sent: Sunday, December 06, 2015 6:32 PM
To: dev@sis.apache.org
Subject: Re: INFRA issue created - please avoid commits for now

Let resume commits on the JDK8 branch since we closed INFRA-10826
anyway. But we would not merge with other branches or trunk yet. If we
move to git, I would like to try a "push --force" before those merges.

Note that if we move to Git, SVN would not be completely abandoned. It
is still required for the Apache release process since SVN is better
suited than Git for binary files (because the history weight stay on the
server side only). We may also keep
http://svn.apache.org/repos/asf/sis/data/ for the same reason. We also
have the web site and the IP review pages which can stay on SVN for now.

    Martin


Le 06/12/15 15:06, Marc Le Bihan a écrit :
> I think : 1) Ignore the 3 Mb binary file and resume activity. It will
> allow me to update and commit a last time my work.
>
> 2) Ask infra to switch the SIS project from SVN to Git
> 3) Abandon SVN.


Re: INFRA issue created - please avoid commits for now

Posted by Martin Desruisseaux <ma...@geomatys.com>.
Let resume commits on the JDK8 branch since we closed INFRA-10826
anyway. But we would not merge with other branches or trunk yet. If we
move to git, I would like to try a "push --force" before those merges.

Note that if we move to Git, SVN would not be completely abandoned. It
is still required for the Apache release process since SVN is better
suited than Git for binary files (because the history weight stay on the
server side only). We may also keep
http://svn.apache.org/repos/asf/sis/data/ for the same reason. We also
have the web site and the IP review pages which can stay on SVN for now.

    Martin


Le 06/12/15 15:06, Marc Le Bihan a écrit :
> I think : 1) Ignore the 3 Mb binary file and resume activity. It will
> allow me to update and commit a last time my work.
>
> 2) Ask infra to switch the SIS project from SVN to Git
> 3) Abandon SVN.


Re: INFRA issue created - please avoid commits for now

Posted by Marc Le Bihan <ml...@gmail.com>.
I think : 
1) Ignore the 3 Mb binary file and resume activity. 
It will allow me to update and commit a last time my work.

2) Ask infra to switch the SIS project from SVN to Git 

3) Abandon SVN.

-----Message d'origine----- 
From: Martin Desruisseaux 
Sent: Sunday, December 06, 2015 2:36 PM 
To: dev@sis.apache.org 
Subject: Re: INFRA issue created - please avoid commits for now 

Update on INFRA-10826 front:

Someone on infrastructure@apache.org looked at INFRA-10826 (or to
similar issues - from the discussion that I saw on the mailing list, at
least 3 other projects are facing similar problems). They are still
looking into how to erase such files, but it's trickier than initially
thought. They tried a filter that claimed to have removed the files, but
they still show up on GitHub.

I closed INFRA-10826 as "will not fix". However in my understanding,
investigation on infrastructure side continue anyway for other projects.
For example they are looking for removing JAR files from Lucene history.
But I do not know if and when infra would have a solution.

So short terms alternatives for us are:

  * Ignore the 3 Mb binary file and resume activity. We should be
    careful to not commit other big files. I'm not completely sure if
    what infra is looking for would allow us to clean history even
    months after the fact. I will try to read again infra emails more
    carefully and report back on this question.
  * Ask infra to switch the SIS project from SVN to Git and do a "push
    --force" ourself before to resume activity.

What would peoples prefer?

    Martin



Re: INFRA issue created - please avoid commits for now

Posted by Martin Desruisseaux <ma...@geomatys.com>.
Update on INFRA-10826 front:

Someone on infrastructure@apache.org looked at INFRA-10826 (or to
similar issues - from the discussion that I saw on the mailing list, at
least 3 other projects are facing similar problems). They are still
looking into how to erase such files, but it's trickier than initially
thought. They tried a filter that claimed to have removed the files, but
they still show up on GitHub.

I closed INFRA-10826 as "will not fix". However in my understanding,
investigation on infrastructure side continue anyway for other projects.
For example they are looking for removing JAR files from Lucene history.
But I do not know if and when infra would have a solution.

So short terms alternatives for us are:

  * Ignore the 3 Mb binary file and resume activity. We should be
    careful to not commit other big files. I'm not completely sure if
    what infra is looking for would allow us to clean history even
    months after the fact. I will try to read again infra emails more
    carefully and report back on this question.
  * Ask infra to switch the SIS project from SVN to Git and do a "push
    --force" ourself before to resume activity.

What would peoples prefer?

    Martin



Re: INFRA issue created - please avoid commits for now

Posted by Martin Desruisseaux <ma...@geomatys.com>.
Hello Marc

Someone on infrastructure@apache.org told me that he will look at the
issue. I didn't hear back yet. I will send an email now for asking.

    Martin


Le 06/12/15 11:39, Marc Le Bihan a écrit :
> Hello,
>
>    I would like to commit now the abitility to do direct access in a
> Shapefile, using the .SHX file content that often comes with the .SHP
> file.
>
> The
> https://svn.apache.org/repos/asf/sis/branches/JDK8/
> repository is still "locked". 15 days has been spent since the
> creation of the INFRA-10826 and the infra team is not doing anything.
> https://issues.apache.org/jira/browse/INFRA-10826
>
> Can we resume the use of
> https://svn.apache.org/repos/asf/sis/branches/JDK8/ repository now ?
> Or at least have a date ? If you tell me : come back in one month to
> do your commit, I won't check everyday.
>
> Regards,
>
> Marc.


Re: INFRA issue created - please avoid commits for now

Posted by Marc Le Bihan <ml...@gmail.com>.
Hello,

    I would like to commit now the abitility to do direct access in a 
Shapefile, using the .SHX file content that often comes with the .SHP file.

The
https://svn.apache.org/repos/asf/sis/branches/JDK8/
repository is still "locked". 15 days has been spent since the creation of 
the INFRA-10826 and the infra team is not doing anything.
https://issues.apache.org/jira/browse/INFRA-10826

Can we resume the use of https://svn.apache.org/repos/asf/sis/branches/JDK8/ 
repository now ?
Or at least have a date ? If you tell me : come back in one month to do your 
commit, I won't check everyday.

Regards,

Marc.

-----Message d'origine----- 
From: Martin Desruisseaux
Sent: Friday, November 27, 2015 11:12 AM
To: dev@sis.apache.org
Subject: Re: INFRA issue created - please avoid commits for now

Thanks. Actually I was thinking about proposing to migrate SIS to Git
since a little while. But my intend was to wait for the debate on
board@apache.org to settle down in the hope to avoid putting additional
pressure on them. In the meantime, I commit my work on
https://github.com/Geomatys/sis/tree/JDK8 (I will delete that branch
when we will be back on Apache's SVN or Git).

    Martin


Le 27/11/15 10:52, Marc LE BIHAN a écrit :
> I can wait for the INFRA to be completed. No real emergency.
>
> For Git, I believe other developpers and you may like it, and I would 
> enjoy
> it too.
> If one day we fully migrate to Git, it will be nice, but at the time you
> want to complete the migration. No hurry for me.
>
> Marc.


Re: INFRA issue created - please avoid commits for now

Posted by Martin Desruisseaux <ma...@geomatys.com>.
Thanks. Actually I was thinking about proposing to migrate SIS to Git
since a little while. But my intend was to wait for the debate on
board@apache.org to settle down in the hope to avoid putting additional
pressure on them. In the meantime, I commit my work on
https://github.com/Geomatys/sis/tree/JDK8 (I will delete that branch
when we will be back on Apache's SVN or Git).

    Martin


Le 27/11/15 10:52, Marc LE BIHAN a écrit :
> I can wait for the INFRA to be completed. No real emergency.
>
> For Git, I believe other developpers and you may like it, and I would enjoy
> it too.
> If one day we fully migrate to Git, it will be nice, but at the time you
> want to complete the migration. No hurry for me.
>
> Marc.


Re: INFRA issue created - please avoid commits for now

Posted by Marc LE BIHAN <ml...@gmail.com>.
I can wait for the INFRA to be completed. No real emergency.

For Git, I believe other developpers and you may like it, and I would enjoy
it too.
If one day we fully migrate to Git, it will be nice, but at the time you
want to complete the migration. No hurry for me.

Marc.

2015-11-27 10:49 GMT+01:00 Martin Desruisseaux <
martin.desruisseaux@geomatys.com>:

> Hello Marc
>
> I do not know how long it will take before INFRA take the task. I
> suspect that the INFRA team is waiting for decisions from the Apache
> board about foundation's policy regarding history rewriting in Git (they
> are extensively debating this topic since a few weeks on
> board@apache.org). I also think that it wasn't clear that this issue was
> blocking commits on SIS (probably I should not have flagged this issue
> as "minor"). I will add a comment clarifying that point in the hope that
> it reach the infra team.
>
> In the meantime, we could create a temporary SVN branch for continuing
> the work. If the branch is not mirrored on Git, I think that we will not
> have any problem when the branch will be merged to JDK8.
>
> Another possible solution (but it may take more time than the SVN
> branch) would be to ask the infra team to fully migrate SIS to Git, if
> we have a consensus on this list for that request. In my understanding
> of the policy that seems to be emerging, we would be allowed to "push
> --force" on branches (but not on master).
>
> What would be your preference?
>
>     Martin
>
>
> Le 27/11/15 07:37, Marc Le Bihan a écrit :
> > Hello !
> >
> >    How long shall be commits avoided in JDK8 branch ?
> >    I have run a VisualVM session (a profiler) on the DBase III code to
> > examine why it was taking 310 seconds (5 minutes) to load and read the
> > whole DBase part of the shapefile of french cities outlines (36,500+
> > polygons to read).
> >    Mostly it was some log.debug that weren't well surrounded by
> > isLoggable(Level) functions, and few others optimizations where useful.
> >
> >    I succeded in making this time fall to 1.3 seconds which is better,
> > and I would like to do a commit.
> >
> > Regards,
> >
> > Marc.
>
>

Re: INFRA issue created - please avoid commits for now

Posted by Martin Desruisseaux <ma...@geomatys.com>.
Hello Marc

I do not know how long it will take before INFRA take the task. I
suspect that the INFRA team is waiting for decisions from the Apache
board about foundation's policy regarding history rewriting in Git (they
are extensively debating this topic since a few weeks on
board@apache.org). I also think that it wasn't clear that this issue was
blocking commits on SIS (probably I should not have flagged this issue
as "minor"). I will add a comment clarifying that point in the hope that
it reach the infra team.

In the meantime, we could create a temporary SVN branch for continuing
the work. If the branch is not mirrored on Git, I think that we will not
have any problem when the branch will be merged to JDK8.

Another possible solution (but it may take more time than the SVN
branch) would be to ask the infra team to fully migrate SIS to Git, if
we have a consensus on this list for that request. In my understanding
of the policy that seems to be emerging, we would be allowed to "push
--force" on branches (but not on master).

What would be your preference?

    Martin


Le 27/11/15 07:37, Marc Le Bihan a écrit :
> Hello !
>
>    How long shall be commits avoided in JDK8 branch ?
>    I have run a VisualVM session (a profiler) on the DBase III code to
> examine why it was taking 310 seconds (5 minutes) to load and read the
> whole DBase part of the shapefile of french cities outlines (36,500+
> polygons to read).
>    Mostly it was some log.debug that weren't well surrounded by
> isLoggable(Level) functions, and few others optimizations where useful.
>
>    I succeded in making this time fall to 1.3 seconds which is better,
> and I would like to do a commit.
>
> Regards,
>
> Marc.


Re: INFRA issue created - please avoid commits for now

Posted by Marc Le Bihan <ml...@gmail.com>.
Hello !

    How long shall be commits avoided in JDK8 branch ?
    I have run a VisualVM session (a profiler) on the DBase III code to 
examine why it was taking 310 seconds (5 minutes) to load and read the whole 
DBase part of the shapefile of french cities outlines (36,500+ polygons to 
read).
    Mostly it was some log.debug that weren't well surrounded by 
isLoggable(Level) functions, and few others optimizations where useful.

    I succeded in making this time fall to 1.3 seconds which is better, and 
I would like to do a commit.

Regards,

Marc.

-----Message d'origine----- 
From: Martin Desruisseaux
Sent: Monday, November 23, 2015 7:16 PM
To: dev@sis.apache.org
Subject: INFRA issue created - please avoid commits for now

Hello all

I created the ticket for the infra team:
https://issues.apache.org/jira/browse/INFRA-10826

We should not commit anything on the JDK8 branch until this issue is
resolved (either fixed or closed as "will not fix").

    Thanks,

        Martin



INFRA issue created - please avoid commits for now

Posted by Martin Desruisseaux <ma...@geomatys.com>.
Hello all

I created the ticket for the infra team:
https://issues.apache.org/jira/browse/INFRA-10826

We should not commit anything on the JDK8 branch until this issue is
resolved (either fixed or closed as "will not fix").

    Thanks,

        Martin



Re: Proposed edition of project history

Posted by Martin Desruisseaux <ma...@geomatys.com>.
Le 22/11/15 19:48, Mattmann, Chris A (3980) a écrit :
> No need for VOTE, Martin, just file the ticket, you have an iCLA
> on file, we trust you to move forward.

Thanks Chris. I went away with steps 1 and 2:

 1. "Resurrected" the DEPARTEMENT.* files in the
    http://svn.apache.org/repos/asf/sis/data/Shapefiles/ directory.
 2. Squashed the shapefile addition and deletion into a single commit on
    a temporary clone: https://github.com/desruisseaux/sis/commits/JDK8

I will wait until tonight (European time) in case there is discussion.
If there is no objection tonight, then I will fill an issue on
http://issues.apache.org/jira/browse/INFRA (after creating a new fresh
squash). We would need to abstain from committing anything on the JDK8
branch until the INFRA issue is resolved.

    Martin



Re: Proposed edition of project history

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
No need for VOTE, Martin, just file the ticket, you have an iCLA
on file, we trust you to move forward.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++





-----Original Message-----
From: Martin Desruisseaux <ma...@geomatys.com>
Organization: Geomatys
Reply-To: "dev@sis.apache.org" <de...@sis.apache.org>
Date: Sunday, November 22, 2015 at 10:46 AM
To: "dev@sis.apache.org" <de...@sis.apache.org>
Subject: Proposed edition of project history

>Hello all
>
>About the 3 Mb of binary files which were added and removed on the JDK8
>branch, I would like to propose the following actions. Of course they
>are not critical - it is just "would be nice if we can". Is there any
>comment or objection?
>
> 1.
>
>    "Resurrect" the DEPARTEMENT.* files in the
>    http://svn.apache.org/repos/asf/sis/data/Shapefiles directory with a
>    LICENSE file containing the text found by Marc. Since those files
>    are already in SVN history anyway, they should not cost anything
>    significant to the server if we use the proper SVN command. Since
>    the "sis/data" directory is not part of any distribution neither is
>    mirrored on any Git clone, it would give us time for revisiting
>    licensing question and how to handle big test files.
>
> 2.
>
>    Create a clone of the JDK8 branch on GitHub and squash commit
>    ca14680192 with 57b5204f92. This would hopefully remove reference to
>    the 3 Mb file.
>
> 3.
>
>    Open a INFRA issue similar to [1] asking the infra team to make a
>    "push --force" of the JDK8 branch on
>    the git://git.apache.org/sis.git repository and its
>    http://github.com/apache/sis mirror. Since only a branch is affected
>    (not the trunk or master), this would be hopefully less problematic
>    regarding the policy of history immutability.
>
> 4.
>
>    After 3 is resolved, we can do the usual merge of JDK8 with other
>    branches and trunk/master.
>
>I would volunteer for doing those actions if peoples agree. Since this
>touch the project history, do we need a vote after discussion?
>
>    Martin
>
>[1] https://issues.apache.org/jira/browse/INFRA-10731
>


Proposed edition of project history

Posted by Martin Desruisseaux <ma...@geomatys.com>.
Hello all

About the 3 Mb of binary files which were added and removed on the JDK8
branch, I would like to propose the following actions. Of course they
are not critical - it is just "would be nice if we can". Is there any
comment or objection?

 1.

    "Resurrect" the DEPARTEMENT.* files in the
    http://svn.apache.org/repos/asf/sis/data/Shapefiles directory with a
    LICENSE file containing the text found by Marc. Since those files
    are already in SVN history anyway, they should not cost anything
    significant to the server if we use the proper SVN command. Since
    the "sis/data" directory is not part of any distribution neither is
    mirrored on any Git clone, it would give us time for revisiting
    licensing question and how to handle big test files.

 2.

    Create a clone of the JDK8 branch on GitHub and squash commit
    ca14680192 with 57b5204f92. This would hopefully remove reference to
    the 3 Mb file.

 3.

    Open a INFRA issue similar to [1] asking the infra team to make a
    "push --force" of the JDK8 branch on
    the git://git.apache.org/sis.git repository and its
    http://github.com/apache/sis mirror. Since only a branch is affected
    (not the trunk or master), this would be hopefully less problematic
    regarding the policy of history immutability.

 4.

    After 3 is resolved, we can do the usual merge of JDK8 with other
    branches and trunk/master.

I would volunteer for doing those actions if peoples agree. Since this
touch the project history, do we need a vote after discussion?

    Martin

[1] https://issues.apache.org/jira/browse/INFRA-10731


Re: License of new test Shapefiles?

Posted by Marc Le Bihan <ml...@gmail.com>.
The removal is done.
I think I could have found an agreement for free public use of this 
shapefile.

(from INSEE : "Rediffusion des produits disponibles sur ce site
Les publications et données mises à disposition sur le présent site sont 
consultables et téléchargeables gratuitement ; sauf spécification contraire, 
elles peuvent être réutilisées, y compris à des fins commerciales, sans 
licence et sans versement de redevances autres que celles collectées par les 
sociétés de perception et de répartition des droits d'auteur régies par le 
titre II du livre III du code de la propriété intellectuelle ; la 
réutilisation est toutefois subordonnée au respect de l'intégrité de 
l'information et des données et à la mention précise des sources.").

Its really boring to debate about the source of each test file, especially 
when they come from public organizations or gouvernments and are 
downloadable freely by everyone, but I understand that you have legacy 
quarrels that forces you to take care of everything, even if it causes your 
project no to benefit from as much informations sources than others. I want 
to see the day where someone will attack any apache project because it has a 
file inside his test resources... Really for what checkings are you loosing 
time ?!

The file was told to cause a size problem too, so currently the best thing 
to do was to delete it. But you see the beginning of my message : I strongly 
disagree with this idea.

Because this point raise two problems :
1) Many files that come from open data have a large amout of real case data, 
and among this data, you have a lot of interresting cases. For example, 
DEPARTEMENT.SHP shapefile had a the Finistere Departement inside, a feature 
created with a three-part polygon.
Useful to challenge some displays or calculations. The size of the file was 
only 3 MB. An update or a pull return it in 0.1 seconds for anyone having an 
ADSL connection.
The one who will want to do these testings will have to create himself a 
Shapefile, I think. Thanks.

2) We may have test that would took a great time to execute. We should 
create another profile calling failsafe instead of surefire to execute 
integration-test from time to time, and perform them.
I encountered often this problem among the team I am working in : if there 
is 100 tests running in 1m30s seconds globally, one will do 
a -DskipTests=true (when you don't find an @Ignore on your test or its whole 
source code put into comments), one other will find a way to make the 
surefire module declare that some tests "has been already run, no need to 
reattempt". I encountered this behavior within the projects "Skipping 
execution of surefire because it has already been run for this 
configuration", and its tricking because when you change a module, you might 
have to do a clean to be sure that your tests will run. I was really 
surprised to see that. It's unsafe and it trapped me already.

The allowed duration of unit tests is 0 second x 100 tests = 10 seconds. 
Only in-memory tests, nothing else. If the test uses : any file, any 
external resource, any building that takes time, it has no more to 
classified as an unit test. Else, as you did, you attempt to discard tests 
one way or another because you are feeling (and you are right) that they 
took too much time for the only mvn clean install that you just want to do.

Any test that takes more than 0 second should by run by failsafe at demand 
by a :
mvn integration-test

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-failsafe-plugin</artifactId>
                <version>${maven-surefire-plugin.version}</version>

                <executions>
                    <execution>
                        <id>integration-test</id>
                        <goals>
                            <goal>integration-test</goal>
                            <goal>verify</goal>
                        </goals>
                    </execution>
                </executions>

                <configuration>
                    <skipTests>${skip.integration.tests}</skipTests>
                </configuration>
            </plugin>

(you set a way to prefix your integration tests by IT instead of Test, for 
example)

3) It's really to stay on your project, Martin.
You are nearly the only commiter on it since one year.
How can it gather new people to help if you are always refusing everything ?

Regards,

Marc.

-----Message d'origine----- 
From: Martin Desruisseaux
Sent: Friday, November 20, 2015 4:26 PM
To: dev@sis.apache.org
Subject: Re: License of new test Shapefiles?

Le 20/11/15 15:26, Marc LE BIHAN a écrit :
> The best thing I have to do is to remove these files soon and adapt the
> test with another existing dbf file.

Thanks!

    Martin


Re: License of new test Shapefiles?

Posted by Martin Desruisseaux <ma...@geomatys.com>.
Le 20/11/15 15:26, Marc LE BIHAN a écrit :
> The best thing I have to do is to remove these files soon and adapt the
> test with another existing dbf file.

Thanks!

    Martin


Is history cleanup allowed on Apache SVN/GIT repositories?

Posted by Martin Desruisseaux <ma...@geomatys.com>.
Hello all

Maybe this is not a serious issue, but I'm asking just in case. Does
Apache INFRA team can (or is allowed to) remove accidentally committed
files from SVN/GIT history? The intend would be to reduce the weight on
Apache servers (and I presume backups), and download time for those who
clone the project history from e.g. GitHub. Since the history edition
would be on a branch and not on trunk, would it be considered okay?

This is not a critic to anyone - this is just an attempt to be kind to
peoples who have to deal with the project history. If peoples agree, I
would create an issue on https://issues.apache.org/jira/browse/INFRA/.
If there is objections, I guess that it is okay to leave things as-is.

    Martin


Re: License of new test Shapefiles?

Posted by Marc LE BIHAN <ml...@gmail.com>.
The best thing I have to do is to remove these files soon and adapt the
test with another existing dbf file.

Regards,

Marc.

2015-11-20 15:14 GMT+01:00 Martin Desruisseaux <
martin.desruisseaux@geomatys.com>:

> Hello
>
> I noticed new test files in the following directory:
>
>
> storage/sis-shapefile/src/test/resources/org/apache/sis/storage/shapefile/
>
> I noticed a NOTES.md file (thanks), but I did not saw licence
> information in that file. What are the license of those new
> "DEPARTEMENT" files? (and the license for the other, older files too?)
> Is it a licence compatible with the Apache License?
>
> An other issue is that "DEPARTEMENT.SHP" is 3 Mb, which is a lot for
> just a test file (if it was data needed for application execution, it
> would be a different story). Would it be possible to have a much smaller
> test file? We rarely need more than a few kb for testing, except if we
> want to test scalability (but I think that discussion about scalability
> could be another thread). It may also help to avoid licensing issues if
> we use a subset small enough for being considered "fair use". For
> example in the GR3DF97A.txt test file that I committed this week, I
> extracted only a few lines. The resulting test file is only 2 kb.
>
> Small test files are appreciated because they are included in the ZIP
> files that make official Apache releases (the JAR deployed on Maven
> Central are not official releases - just commodity), and also for making
> history smaller (the SVN is mirrored on GitHub, so even if we delete the
> file, its weight will stay in the history).
>
>     Martin
>
>
>