You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sis.apache.org by Kelsea Flores <kj...@dons.usfca.edu> on 2018/06/26 23:04:48 UTC

Kelsea Flores introduction

Hi, my name is Kelsea Flores and I am a senior at the University of San
Francisco. I will be graduating in December 2018 with a Bachelor’s in
Computer Science. I am currently looking to gain professional experience as
a software engineer to add to the academic experiences I’ve had so far.



Could anyone help me find work that would be suitable for me to contribute
to? Many of the projects that I have worked on so far have been centered
around data structures, searching, and sorting. In my Software Development
class, I worked individually to build a search engine in Java. The first of
this four-part project was to write a Java program that processes all HTML
files in a directory and its subdirectories, cleans and parses the HTML
into words, and builds an inverted index to store the mapping from words to
the documents and positions within those documents where those words were
found. The second part of the project was to support exact search and
partial search by parsing a query file, generating a sorted list of search
results from the inverted index, and writing those results to a JSON file.
The third part consisted of extending part two to support multithreading by
making a thread-safe inverted index and using a work queue to build and
search an inverted index using multiple threads. The final part of the
project was to support building the index from the web instead of a
directory of text files using multithreading, an inverted index, sockets,
and HTTP. I should mention that I don’t have any experience working on an
open source project. I read over the brief project description and the list
of features as well, and I have never worked with geodetic data structures
or geographic metadata. If we find a good opportunity, I’m planning to work
on this project for 35+ hours per week during my summer break (until August
21st).



Would anyone be willing to mentor me as I learn how to contribute to ASF
projects? I’m reviewing the newcomers documentation, but it would be really
helpful to have some extra support since this may be pretty different from
what I’ve done in my classes so far.

Thanks for taking the time to read this and I look forward to hearing from
you.

Kelsea

Re: Kelsea Flores introduction

Posted by Kelsea Flores <kj...@dons.usfca.edu>.
Hi Martin,

Thanks for the ideas!

I just wanted to let you know that I'm still working on getting my
environment set up. Once it is set up, I will get back to you with any
questions I have.

Best,
Kelsea

On Thu, Jun 28, 2018 at 7:56 AM, Martin Desruisseaux <
martin.desruisseaux@geomatys.com> wrote:

> Le 27/06/2018 à 20:25, Kelsea Flores a écrit :
>
> Thank you for proposing so many different projects for me to work on!
> Martin, I'm interested in working on the QuadTree index. I will continue to
> look over the wiki pages and Java code more thoroughly.
>
> Thanks. There is a list of possible improvements that may be considered
> for the index:
>
>    - Work with an arbitrary number of dimensions (the version currently
>    in SIS is restricted to two dimensions).
>    - Work with arbitrary Coordinate Reference System (the version
>    currently in SIS is restricted to latitudes and longitudes).
>    - More compact in-memory representation (the version currently in SIS
>    created more objects than necessary).
>    - Work in memory or on a file on disk for large index.
>    - Safe for multi-threading
>
> They are just proposals; of course it is normal if you can pickup only a
> few of them.
>
>     Thanks!
>
>         Martin
>
>
>

Re: Kelsea Flores introduction

Posted by Martin Desruisseaux <ma...@geomatys.com>.
Le 27/06/2018 à 20:25, Kelsea Flores a écrit :

> Thank you for proposing so many different projects for me to work on!
> Martin, I'm interested in working on the QuadTree index. I will
> continue to look over the wiki pages and Java code more thoroughly. 
>
Thanks. There is a list of possible improvements that may be considered
for the index:

  * Work with an arbitrary number of dimensions (the version currently
    in SIS is restricted to two dimensions).
  * Work with arbitrary Coordinate Reference System (the version
    currently in SIS is restricted to latitudes and longitudes).
  * More compact in-memory representation (the version currently in SIS
    created more objects than necessary).
  * Work in memory or on a file on disk for large index.
  * Safe for multi-threading

They are just proposals; of course it is normal if you can pickup only a
few of them.

    Thanks!

        Martin



Re: Kelsea Flores introduction

Posted by Kelsea Flores <kj...@dons.usfca.edu>.
Hi Martin and Johann,

Thank you for proposing so many different projects for me to work on!
Martin, I'm interested in working on the QuadTree index. I will continue to
look over the wiki pages and Java code more thoroughly.

I look forward to hearing back from you and Alexis Manin.

Kelsea

On Wed, Jun 27, 2018 at 2:04 AM, johann sorel <jo...@geomatys.com>
wrote:

> Hello,
>
> Or working on new datastore providers.
>
> Since we have the Feature and query API running I think it would be great
> to have a GeoPackage provider.
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.open
> geospatial.org_standards_geopackage&d=DwIDaQ&c=qgVugHHq3rzou
> XkEXdxBNQ&r=PK8x1iOmlGctTaiufLd6SlqymbTiLVANaWPWfk-YIyI&m=0j
> ycKVdPYLjefjzX48lkCXTb-I7rY_OSw3gewUcpxN4&s=x2QzXuwPmMAe6fGW
> 21Oj0B3BerRVcyeATnxKIFnU5Ig&e=
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.geo
> package.org&d=DwIDaQ&c=qgVugHHq3rzouXkEXdxBNQ&r=PK8x1iOmlGct
> TaiufLd6SlqymbTiLVANaWPWfk-YIyI&m=0jycKVdPYLjefjzX48lkCXT
> b-I7rY_OSw3gewUcpxN4&s=E_52O9XBE-7KR8YTvIeRY5a7NiLZn9oGFi3DD9ZeGGQ&e=
>
> Johann Sorel
>
>
> On 27/06/2018 10:55, Martin Desruisseaux wrote:
>
>> Hello Kelsea, and welcome!
>>
>> Looking at your experience, you seem familiar with indexing. Apache SIS
>> has a QuadTree index which is currently orphan. The index Java code is
>> located at:
>>
>>      storage/sis-storage/src/main/java/org/apache/sis/index/tree/
>>
>> If you feel interested about working on an index system, we can make a
>> plan about what could be improvements to that code. It may be easier to
>> do that next week because Alexis Manin will be back from vacation and he
>> may have input on this topic. In the meantime, it may be worth to take a
>> look on QuadTree [1] and R-Tree [2] pages on wikipedia.
>>
>> Alternatively, if you would like to try something new and have
>> inclination for mathematics, we still have some map projections to
>> implement [3]. This work is more "mechanical", with very specific
>> classes to extend and method to implement for each set of formulas. I
>> will expand more on this topic if there is an interest.
>>
>> Another possibility is to continue the work on JavaFX components for
>> giving a Graphical User Interface to SIS (this work was started by other
>> students).
>>
>> For information, Hao is doing a Google Summer of Code project which has
>> some similarities with your previous works. But instead than scanning
>> HTML files, Hao's work scans GeoTIFF, netCDF and some other geospatial
>> files. And instead of collecting words, Hao's work collects information
>> structured like a form, with a clear "title" field, an "author" field, a
>> "geographic extent" field, etc., which enable searches by title, author,
>> etc. If you feel interested in this topic, we can try to coordinates in
>> such a way that your work are complementary.
>>
>> Do you have a preference for any of those alternatives?
>>
>>      Martin
>>
>>
>> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wiki
>> pedia.org_wiki_Quadtree&d=DwIDaQ&c=qgVugHHq3rzouXkEXdxBNQ&r=
>> PK8x1iOmlGctTaiufLd6SlqymbTiLVANaWPWfk-YIyI&m=0jycKVdPYLjefj
>> zX48lkCXTb-I7rY_OSw3gewUcpxN4&s=CW2Ce5A6pFpqDzvfzkOWEGW2dg_6
>> M4RuIuv_o6sC26k&e=
>> [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wiki
>> pedia.org_wiki_R-2Dtree&d=DwIDaQ&c=qgVugHHq3rzouXkEXdxBNQ&r=
>> PK8x1iOmlGctTaiufLd6SlqymbTiLVANaWPWfk-YIyI&m=0jycKVdPYLjefj
>> zX48lkCXTb-I7rY_OSw3gewUcpxN4&s=TcS5ceSe0ps5oEtOirX6T3Fs_h14
>> KzoJ42nPuOSGWBQ&e=
>> [3] https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.
>> apache.org_jira_browse_SIS-2D212&d=DwIDaQ&c=qgVugHHq3rzouXkE
>> XdxBNQ&r=PK8x1iOmlGctTaiufLd6SlqymbTiLVANaWPWfk-YIyI&m=0jycK
>> VdPYLjefjzX48lkCXTb-I7rY_OSw3gewUcpxN4&s=RdzYBVuciAmmEY9KOiT
>> ACJCkhNvmxAI3G9sSSz9mCtQ&e=
>>
>>
>>
>

Re: Kelsea Flores introduction

Posted by johann sorel <jo...@geomatys.com>.
Hello,

Or working on new datastore providers.

Since we have the Feature and query API running I think it would be 
great to have a GeoPackage provider.
http://www.opengeospatial.org/standards/geopackage
https://www.geopackage.org

Johann Sorel

On 27/06/2018 10:55, Martin Desruisseaux wrote:
> Hello Kelsea, and welcome!
>
> Looking at your experience, you seem familiar with indexing. Apache SIS
> has a QuadTree index which is currently orphan. The index Java code is
> located at:
>
>      storage/sis-storage/src/main/java/org/apache/sis/index/tree/
>
> If you feel interested about working on an index system, we can make a
> plan about what could be improvements to that code. It may be easier to
> do that next week because Alexis Manin will be back from vacation and he
> may have input on this topic. In the meantime, it may be worth to take a
> look on QuadTree [1] and R-Tree [2] pages on wikipedia.
>
> Alternatively, if you would like to try something new and have
> inclination for mathematics, we still have some map projections to
> implement [3]. This work is more "mechanical", with very specific
> classes to extend and method to implement for each set of formulas. I
> will expand more on this topic if there is an interest.
>
> Another possibility is to continue the work on JavaFX components for
> giving a Graphical User Interface to SIS (this work was started by other
> students).
>
> For information, Hao is doing a Google Summer of Code project which has
> some similarities with your previous works. But instead than scanning
> HTML files, Hao's work scans GeoTIFF, netCDF and some other geospatial
> files. And instead of collecting words, Hao's work collects information
> structured like a form, with a clear "title" field, an "author" field, a
> "geographic extent" field, etc., which enable searches by title, author,
> etc. If you feel interested in this topic, we can try to coordinates in
> such a way that your work are complementary.
>
> Do you have a preference for any of those alternatives?
>
>      Martin
>
>
> [1] https://en.wikipedia.org/wiki/Quadtree
> [2] https://en.wikipedia.org/wiki/R-tree
> [3] https://issues.apache.org/jira/browse/SIS-212
>
>


Re: Kelsea Flores introduction

Posted by Martin Desruisseaux <ma...@geomatys.com>.
Hello Kelsea, and welcome!

Looking at your experience, you seem familiar with indexing. Apache SIS
has a QuadTree index which is currently orphan. The index Java code is
located at:

    storage/sis-storage/src/main/java/org/apache/sis/index/tree/

If you feel interested about working on an index system, we can make a
plan about what could be improvements to that code. It may be easier to
do that next week because Alexis Manin will be back from vacation and he
may have input on this topic. In the meantime, it may be worth to take a
look on QuadTree [1] and R-Tree [2] pages on wikipedia.

Alternatively, if you would like to try something new and have
inclination for mathematics, we still have some map projections to
implement [3]. This work is more "mechanical", with very specific
classes to extend and method to implement for each set of formulas. I
will expand more on this topic if there is an interest.

Another possibility is to continue the work on JavaFX components for
giving a Graphical User Interface to SIS (this work was started by other
students).

For information, Hao is doing a Google Summer of Code project which has
some similarities with your previous works. But instead than scanning
HTML files, Hao's work scans GeoTIFF, netCDF and some other geospatial
files. And instead of collecting words, Hao's work collects information
structured like a form, with a clear "title" field, an "author" field, a
"geographic extent" field, etc., which enable searches by title, author,
etc. If you feel interested in this topic, we can try to coordinates in
such a way that your work are complementary.

Do you have a preference for any of those alternatives?

    Martin


[1] https://en.wikipedia.org/wiki/Quadtree
[2] https://en.wikipedia.org/wiki/R-tree
[3] https://issues.apache.org/jira/browse/SIS-212