You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2005/10/14 19:55:08 UTC

[Nutch Wiki] Update of "GeoPosition" by MatthiasJaekle

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The following page has been changed by MatthiasJaekle:
http://wiki.apache.org/nutch/GeoPosition

The comment on the change is:
New Version developed

------------------------------------------------------------------------------
  Download at http://nutch.eventax.com/
  
  == Query Syntax ==
+ 
+ hostel de:70174 (70174 = German zip code of Stuttgart)
  
  restaurant position:n52e10.0r10
  
@@ -24, +26 @@

  
  Use r in kilometers. Values bigger then 3000 km might not work.
  
+ If the data of other countries is supplied to the system each ISO3166 code followed by colon and the local zip code should bring you to search results.
+ 
+ 
  == Config File Options ==
  
- == geoPosition.step ==
+ === geoPosition.step ===
  
  The accuracy positions are stored can be changed in the config file. Default is 1000m.
  
- == geoPosition.Domain2PositionFile ==
+ === geoPosition.Domain2PositionFile ===
  
  Default filename: conf/geodata.txt
  
@@ -42, +47 @@

  
  http://www.germany.de/berlin 52.1234 9.9876
  
+ === geoposition.zips.dir ===
+ 
+ Default dir: zip/
+ 
+ The directory is within the conf dir and should contain files with the center position of each zip of a country.
+ The files for Germany should be called de.geo.txt .
+ The ISO3166 code should be used for the first part of the filename, followed by .geo.txt .
+ You could simply add a file for each country.
+ 
+ === geoposition.zips.use ===
+ 
+ ISO3166 codes from countries which should be used while searching, seperated by semicolon.
+ 
+ 
  == Internal Documentation ==
  
- == Our Earth ==
+ === Our Earth ===
  
- The plugin assumes that the earth is a globe with 6367km in radius. Calculations get a maximum error of around 0.3
+ The plugin assumes that the earth is a globe with 6367km in radius. Calculations get a maximum error of around 0.3%.
  
  The plugin further assumes that that the sea level is the same throughout the whole world. This should not make the error significantly larger.
  
- == Coordinate system ==
+ === Coordinate system ===
  
  To avoid cpu-consuming calculations of sine, cosine and tangent, all the geographic coordinates are transformed before indexing to a 3-D-System with x, y, z.
  
@@ -62, +81 @@

  
  y is 90 degrees to x and 90 degrees to z.
  
- == Storeing ==
+ === Storeing ===
  
  The coordinates are stored and not indexed in their polar version (north, east). The coordinates are unstored but indexed in their cartesian version (posX, posY, posZ). If available, the elevation about sea level is stored and indexed in meters (elevation).
  
- == Searching ==
+ === Searching ===
  
  Searches are done by putting a cube around the point of search and searching all stuff between min and max values in each direction. In a 2D view this means, all stuff within a square instead of a circle is fetched. This means a maximum fault of nearly 40% in the distance, hits retrieved from. In the area the fault is around 27%. But it is fast. Distance ranking should minimize this problem in future.
  
@@ -81, +100 @@

        * Caching Range Queries
        * Using Field Cache
        * Using Hit Collector http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/HitCollector.html
- 
-   * Enabling users to add locations in well know ways, e.g. zip:12345.
    * Also using distance for ranking.
    * Using whois informations to get positions, if useful.
    * Parsing addresses on websites to get positions.
  
- -- MatthiasJaekle - 14 Oct 2004
+ -- MatthiasJaekle - 14 Oct 2005