You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by Apache Wiki <wi...@apache.org> on 2015/12/23 05:59:43 UTC

[Tika Wiki] Update of "GeoTopicParser" by MadhavSharan

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.

The "GeoTopicParser" page has been changed by MadhavSharan:
https://wiki.apache.org/tika/GeoTopicParser?action=diff&rev1=10&rev2=11

  usage: lucene-geo-gazetteer
   -b,--build <gazetteer file>           The Path to the Geonames
                                         allCountries.txt
+  -c,--count <number of results>        Number of best results to be
+                                        returned for one location
   -h,--help                             Print this message.
   -i,--index <directoryPath>            The path to the Lucene index
                                         directory to either create or read
+  -json,--json                          Formats output in well defined json
+                                        structure
   -s,--search <set of location names>   Location names to search the
                                         Gazetteer for
+  -server,--server                      Launches Geo Gazetteer Service
+ 
  }}}
  
  You will now need to build a Gazetteer using the Geonames.org dataset. Instructions are provided below. Note that you will need least 1.2 GB disk space for building Lucene Index for the Gazetteer.
@@ -44, +50 @@

  You can verify that the Gazetteer build worked by searching e.g., for Pasadena, and/or Texas:
  
  {{{
- $ lucene-geo-gazetteer -s Pasadena Texas
+ $ lucene-geo-gazetteer -s Pasadena Texas -json
+ {"Texas":[{"name":"Texas","countryCode":"US","admin1Code":"TX","admin2Code":"","latitude":31.25044,"longitude":-99.25061}],"Pasadena":[{"name":"Pasadena","countryCode":"US","admin1Code":"CA","admin2Code":"037","latitude":34.14778,"longitude":-118.14452}]}
- [
- {"Texas" : [
- "Texas",
- "-91.92139",
- "18.05333"
- ]},
- {"Pasadena" : [
- "Pasadena",
- "-74.06446",
- "4.6964"
- ]}
- ]
  }}}
  
+ Now you need to start REST service of lucene-geo-gazetteer. Tika uses this service internally
+ 
+ {{{
+ $ lucene-geo-gazetteer -server
+ }}}
+ 
+ You can verify that the REST API is responding by searching e.g., for Pasadena, and/or Texas:
+ 
+ {{{
+ $ curl "http://localhost:8765/api/search?s=Pasadena&s=Texas"
+ {"Texas":[{"name":"Texas","countryCode":"US","admin1Code":"TX","admin2Code":"","latitude":31.25044,"longitude":-99.25061}],"Pasadena":[{"name":"Pasadena","countryCode":"US","admin1Code":"CA","admin2Code":"037","latitude":34.14778,"longitude":-118.14452}]}
+ }}}
+ 
- Note that we used the convenience script `lucene-geo-gazetteer` which assumes that you created an indexed named geoIndex in the $HOME/src/lucene-geo-gazetter/geoIndex directory. We could have also used the pure Java command line to search. The return from the Gazetteer is a JSON List of JSON Object structures in which the structure is a key->JSON List map. The key is the location name given and the List is a list of closest match (by Edit Distance) in the Gazetteer for that name, followed by Latitude, and Longitude of that location.
+ Note that we used the convenience script `lucene-geo-gazetteer` which assumes that you created an indexed named geoIndex in the $HOME/src/lucene-geo-gazetter/geoIndex directory. We could have also used the pure Java command line to search. The return from the Gazetteer is a JSON List of Object structures in which the structure is a key->Object List map. The key is the location name given and the Object List is a list of most popular location objects in the Gazetteer for that name.
  
  == Installing and downloading an NER model ==