You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Luca Matteis <lm...@gmail.com> on 2012/06/05 05:59:51 UTC

Best stack for querying spatial locations

Hello all,

I have a scenario where I'm working with a friend of mine that
maintains a bunch of spatial data in a MS Access database. I want to
put this data online, as a web-site, and allow people to query it
using an interface and a RESTful API. So I thought CouchDB + GeoCouch
would be perfect for this.

The issue is, how do I upload the MS Access data to my online Couch
instance? I want the process to be seamless - in the sense that my
friend will continue making modifications to his MS Access db, and
then he should "upload" it to the couch using some tool.

Do I need to write extra code to parse the MS Access into JSON and
then upload that to my couch? Or is there an easier way? Dunno, maybe
upload it the MS Access as an attachment and parse it with client side
JavaScript?

Also the MS Access is quite large: ~2gb

Thanks for any advice,

Luca

Re: Best stack for querying spatial locations

Posted by Luca Matteis <lm...@gmail.com>.
On Tue, Jun 5, 2012 at 11:38 AM, Miles Fidelman
<mf...@meetinghouse.net> wrote:
> - what kinds of spatial data?

He's storing something called "missions". Basically the location
people have been to collect certain types of plants. So his data
contains lat-lng coordinates and some other information of the mission
(such as the name of the collector and date).

> - is one person updating the data, or more?

For now just him. He changes the MS Access db and that's it. But with
Couch I can expect to plug it to other systems so that other people
could be adding it as well in the future - but for now it's not
needed.

> - does the database live on a single PC, or on a server?

It lives on his local PC.

> - how frequently is it updated?

Probably around once a month or so.

> - is it preferable for updates to propagate live to the online serve, or is
> a periodic dump enough (or preferred)?

Given the fact that it's only once a month, a periodic dump is
probably more efficient.

> - what kinds of queries are going to be made against the data?

The idea is that the UI will be built using Google Maps. Some layers
are going to appear to let users filter the data. Like to layers for
countries and maybe other areas with specific climate information, so
that users can filter on that. The logic to define these "layers" I
expect is going to be on the client, so Couch doesn't need to worry
about that, it only needs to answer with the coordinates given a
specific area (which GeoCouch does perfectly already, right?).

Also simple queries like "give me all the missions with these date
ranges". So Couch does this quite easily, all I need is to build a
view.

> - how many queries per unit of time?

I'm not sure, but I suspect very little. Not expecting high traffic at
all. But it needs to be responsive and I'm confident GeoCouch is quite
fast for the simple queries mentioned above.

> - RESTful API implies a liklihood of combining the data with other
> data/services - so what kinds of things will people be doing?

Yes indeed. One of the reasons I want to use Couch is because of its
REST capabilities. We don't have any plans for now to make the data
available to other services, but the idea that we can in the future is
what gives us confidence to go ahead with Couch.

> When I hear "geospatial stack" I don't generally think MS Access and
> CouchDB.  I tend to think more:
> - Google Maps & KML, and/or
> - GeoServer + PostGIS, and/or
> - OpenLayers, and/or
> - ESRI ArcGIS (for really serious commercial capabilities)

Yes, We tried ArcGis in fact. But it's a very complicated piece of
software which is very hard to customize. It also appeared to be quite
slow with our ~2gb points. So we decided that a customized solution
might be best.

> Or, you might think about something more like:
> - MS Access -> GeoServer (w/ some RESTful glue) -> CouchDB -> user facing

What's a GeoServer? How is it different than CouchDB + GeoCouch?

> Or it might make sense to keep some stuff in a geodatabase, other stuff in
> Couch, and mash stuff together in browser-side code.

What do you mean by "geodatabase". Like ArcGIS?

Thanks for this, answering these questions made the situation a bit
clearer for me as well.

Let me know what you think would be the best way to go. I still think
that maybe going with CouchDB + GeoCouch would be the simplest
solution given the queries that we need to make against the system,
but there might be something else which is 100 times better.

Re: Best stack for querying spatial locations

Posted by Miles Fidelman <mf...@meetinghouse.net>.
Luca Matteis wrote:
> Hello all,
>
> I have a scenario where I'm working with a friend of mine that
> maintains a bunch of spatial data in a MS Access database. I want to
> put this data online, as a web-site, and allow people to query it
> using an interface and a RESTful API. So I thought CouchDB + GeoCouch
> would be perfect for this.
>
> The issue is, how do I upload the MS Access data to my online Couch
> instance? I want the process to be seamless - in the sense that my
> friend will continue making modifications to his MS Access db, and
> then he should "upload" it to the couch using some tool.
>
> Do I need to write extra code to parse the MS Access into JSON and
> then upload that to my couch? Or is there an easier way? Dunno, maybe
> upload it the MS Access as an attachment and parse it with client side
> JavaScript?
>
> Also the MS Access is quite large: ~2gb
>

I think you've jumped a couple of design steps:
- what kinds of spatial data?
- is one person updating the data, or more?
- does the database live on a single PC, or on a server?
- how frequently is it updated?
- is it preferable for updates to propagate live to the online serve, or 
is a periodic dump enough (or preferred)?
- what kinds of queries are going to be made against the data?
- how many queries per unit of time?
- RESTful API implies a liklihood of combining the data with other 
data/services - so what kinds of things will people be doing?

When I hear "geospatial stack" I don't generally think MS Access and 
CouchDB.  I tend to think more:
- Google Maps & KML, and/or
- GeoServer + PostGIS, and/or
- OpenLayers, and/or
- ESRI ArcGIS (for really serious commercial capabilities)

Depending on the details of the problem at hand (the answers to the 
above questions), it could be that:
- MS Access -> <some glue> -> CouchDB/GeoCouch might be a perfectly good 
solution

Or, you might think about something more like:
- MS Access -> GeoServer (w/ some RESTful glue) -> CouchDB -> user 
facing i/f

Or it might make sense to keep some stuff in a geodatabase, other stuff 
in Couch, and mash stuff together in browser-side code.

Your problem statement is way too amorphous at this point.

Miles Fidelman

-- 
In theory, there is no difference between theory and practice.
In practice, there is.   .... Yogi Berra



Re: Best stack for querying spatial locations

Posted by Luca Matteis <lm...@gmail.com>.
On Tue, Jun 5, 2012 at 11:11 AM, Dave Cottlehuber <da...@muse.net.nz> wrote:
> Not necessarily, you could easily export the 2GB file, massage it to
> filter out records already stored & current in CouchDB, and then
> push a _bulk_docs with the new ones.

But the couch is located somewhere online. To check if the records are
already online I would have to query for each document, no? Is there
an easier way to check if documents have already been uploaded to an
online couch (not stored on the same computer where the upload is
being done).

Thanks,
Luca

Re: Best stack for querying spatial locations

Posted by Dave Cottlehuber <da...@muse.net.nz>.
On 5 June 2012 10:38, Luca Matteis <lm...@gmail.com> wrote:
> On Tue, Jun 5, 2012 at 9:46 AM, Dave Cottlehuber <da...@muse.net.nz> wrote:
>> The trite answer would be to rewrite the Access DB into Couch with a nice
>> couchapp front end for your friend's local copy,  & then summon replication
>> to keep a public web version updated.
>
> This is quite hard. My friend (colleague actually) is used to MS
> Access and it would be really cumbersome to build a "web" version of
> Access that contains similar functionalities. He's used to maintaining
> his data in Access, and I don't think I can change that.
>
>> There's a tool to dump a DB into JSON -
>> https://s3.amazonaws.com/divconq/bin/jsonutils/JSONUtils_dotnet_v1001.zip
>
> Awesome! I was looking for something like that. However this means
> that he'll need to upload his ~2gb of data every time he makes
> changes, no?

Not necessarily, you could easily export the 2GB file, massage it to
filter out records already stored & current in CouchDB, and then
push a _bulk_docs with the new ones.

Given that 5 minutes of curiousity & googling found those posts, I am sure that
you will be able to find something that meets your needs more closely :-).

It shouldn't be too hard to roll our own using an ODBC driver + some JSON
wrapping. If I were doing it myself, I'd likely use perl, DBD::ODBC and then
Store::CouchDB.

A+
Dave

Re: Best stack for querying spatial locations

Posted by Luca Matteis <lm...@gmail.com>.
On Tue, Jun 5, 2012 at 9:46 AM, Dave Cottlehuber <da...@muse.net.nz> wrote:
> The trite answer would be to rewrite the Access DB into Couch with a nice
> couchapp front end for your friend's local copy,  & then summon replication
> to keep a public web version updated.

This is quite hard. My friend (colleague actually) is used to MS
Access and it would be really cumbersome to build a "web" version of
Access that contains similar functionalities. He's used to maintaining
his data in Access, and I don't think I can change that.

> There's a tool to dump a DB into JSON -
> https://s3.amazonaws.com/divconq/bin/jsonutils/JSONUtils_dotnet_v1001.zip

Awesome! I was looking for something like that. However this means
that he'll need to upload his ~2gb of data every time he makes
changes, no?

Re: Best stack for querying spatial locations

Posted by Dave Cottlehuber <da...@muse.net.nz>.
On 5 June 2012 05:59, Luca Matteis <lm...@gmail.com> wrote:
> Hello all,
>
> I have a scenario where I'm working with a friend of mine that
> maintains a bunch of spatial data in a MS Access database. I want to
> put this data online, as a web-site, and allow people to query it
> using an interface and a RESTful API. So I thought CouchDB + GeoCouch
> would be perfect for this.
>
> The issue is, how do I upload the MS Access data to my online Couch
> instance? I want the process to be seamless - in the sense that my
> friend will continue making modifications to his MS Access db, and
> then he should "upload" it to the couch using some tool.
>
> Do I need to write extra code to parse the MS Access into JSON and
> then upload that to my couch? Or is there an easier way? Dunno, maybe
> upload it the MS Access as an attachment and parse it with client side
> JavaScript?

The trite answer would be to rewrite the Access DB into Couch with a nice
couchapp front end for your friend's local copy,  & then summon replication
to keep a public web version updated.

More practically I think you'll need 2 things;

#1 map the MS Access relational model into doc oriented JSON
#2 set up an export that uses this map to transform that data

You're not the first person to look at Access->JSON:

There's a tool to dump a DB into JSON -
https://s3.amazonaws.com/divconq/bin/jsonutils/JSONUtils_dotnet_v1001.zip

And the authors wrote a series of blog posts on it:
http://www.divconq.com/2010/export-a-microsoft-access-database-to-json/

There will also be libraries to do this via ODBC for python perl ruby etc.
Depending on data complexity it might be quicker to roll your own.

> Also the MS Access is quite large: ~2gb

Large for Access, not for Couch.

A+
Dave