You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2005/04/09 03:14:54 UTC

[Nutch Wiki] Update of "GeoPosition" by ChiragChaman

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The following page has been changed by ChiragChaman:
http://wiki.apache.org/nutch/GeoPosition

------------------------------------------------------------------------------
- == GeoPosition Plugin ==
  
+ = Plugin: geoPosition =
+ 
- The GeoPosition Plugin enables local searches. The plugin parses geographical meta tags (geo.position, DC.coverage.spatial and ICBM). If there are no coordinates in the document, coordinates can be loaded from conf/geodata.txt file.
+ The geoPosition Plugin enables local searches. The plugin parses geographical meta tags (geo.position, DC.coverage.spatial and ICBM). If there are no coordinates in the document, coordinates can be loaded from conf/geodata.txt file.
  
  Download at http://nutch.eventax.com/
  
- === Query Syntax ===
+ == Query Syntax ==
  
- {{{restaurant position:n52e10.0r10
+ restaurant position:n52e10.0r10
+ 
  hotel position:s-52e10.0r10
+ 
  party position:n52.023w10.0r100
+ 
- politics position:n52e10.0r100 }}}
+ politics position:n52e10.0r100
  
  Possible indentifiers are n, s, e, w and r.
  
@@ -21, +25 @@

  
  Use r in kilometers. Values bigger then 3000 km might not work.
  
- === Config File Options ===
+ == Config File Options ==
  
- ==== geoPosition.step ====
+ == geoPosition.step ==
  
  The accuracy positions are stored can be changed in the config file. Default is 1000m.
  
- ==== geoPosition.Domain2PositionFile ====
+ == geoPosition.Domain2PositionFile ==
  
  Default filename: conf/geodata.txt
  
@@ -35, +39 @@

  
  Example:
  
- {{{
  http://www.berlin.de 52.1234 9.9876
+ 
  http://www.germany.de/berlin 52.1234 9.9876
- }}}
  
- === Internal Documentation ===
+ == Internal Documentation ==
  
- ==== Our Earth ====
+ == Our Earth ==
+ 
- The plugin assumes that the earth is a globe with 6367km in radius. Calculations get a maximum error of around 0.3%.
+ The plugin assumes that the earth is a globe with 6367km in radius. Calculations get a maximum error of around 0.3
  
  The plugin further assumes that that the sea level is the same throughout the whole world. This should not make the error significantly larger.
  
- ==== Coordinate system ====
+ == Coordinate system ==
  
  To avoid cpu-consuming calculations of sine, cosine and tangent, all the geographic coordinates are transformed before indexing to a 3-D-System with x, y, z.
  
@@ -59, +63 @@

  
  y is 90 degrees to x and 90 degrees to z.
  
- ==== Storing ====
+ == Storeing ==
+ 
  The coordinates are stored and not indexed also in their polar version (north, east). The coordinates are unstored but indexed in their cartesian version (posX, posY, posZ). If available, the elevation about sea level is stored and indexed in meters (elevation).
  
- ==== Searching ====
+ == Searching ==
  
- Searches are done by putting a cube around the point of search and searching all stuff between min and max values in each direction. In a 2D view this means, all stuff within a square instead of a circle is fetched. This means a maximum fault of nearly 40% in the distance, hits retrieved from. But it is fast. Distance ranking should minimize this problem in future.
+ Searches are done by putting a cube around the point of search and searching all stuff between min and max values in each direction. In a 2D view this means, all stuff within a square instead of a circle is fetched. This means a maximum fault of nearly 40
  
- ----
+ == Running search engines using this plugin ==
  
- Running search engines using this plugin
+   * This plugin is used for local searches at http://www.umkreisfinder.de/.
  
-     * This plugin is used for local searches at http://www.umkreisfinder.de/.
+ == To do ==
  
- To do
+   * Checking the existing implementation.
+   * Speeding up all the stuff
+       * Caching Range Queries
+       * Using Field Cache
+       * Using Hit Collector http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/HitCollector.html
  
+   * Enabling users to add locations in well know ways, e.g. zip:12345.
+   * Also using distance for ranking.
+   * Using whois informations to get positions, if useful.
+   * Parsing addresses on websites to get positions.
-     * Checking the existing implementation.
-     * Speeding up all the stuff
-           o Caching Range Queries
-           o Using Field Cache http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/FieldCache.html#getInts(org.apache.lucene.index.IndexReader,%20java.lang.String)
-           o Using Hit Collector http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/HitCollector.html
  
+ -- MatthiasJaekle - 14 Oct 2004
-     * Enabling users to add locations in well know ways, e.g. zip:12345.
-     * Also using distance for ranking.
-     * Using whois informations to get positions, if useful.
-     * Parsing addresses on websites to get positions.
  
- -- MatthiasJaekle - 14 Oct 2004 
-