You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Kehan Harman <ke...@gaiaresources.com.au> on 2019/02/26 13:54:03 UTC

Suggester autocomplete for address information

Hi All,

I'm new to Solr & the community so feel free to ignore / remove if this is
the incorrect mailing list for this query.

I'm trying to build an autocomplete using a Solr index for addresses in a
format similar to:

123 Smith Street, KEMPSEY, NSW 2440

I'm looking to have these addresses suggest values to users based on their
input with some spellchecking capability.

My documents contain contents like:
{ "id":"ANSW718363409", "table":"ADDRESS_DEFAULT_GEOCODE", "address":"123-127
SMITH STREET, KEMPSEY NSW 2440", "address_location":
"-31.07321967,152.84505473", "address_latitude":-31.07322, "
address_longitude":152.84506, "locality_pid":"NSW2119", "locality_latitude":
-31.060476, "locality_longitude":152.84819, "suburb_postcode":"KEMPSEY NSW
2440", "number_first":123, "number_last":127, "street_number":"123-127", "
street_name":"SMITH", "street_type_code":"STREET", "locality_name":"KEMPSEY",
"state_name":"NEW SOUTH WALES", "state_abbreviation":"NSW", "postcode":
"2440", "_version_":1626515771141128204}

These are Australian addresses extracted from
https://data.gov.au/dataset/ds-dga-19432f89-dc3a-4ef3-b943-5326ef1dbecc/details
.

My managed schema has the following fields - I'm using the example managed
schema *sample_techproducts_configs* with some additional fields that have
been added using the schema API.:

<field name="address" type="text_en" multiValued="false" indexed="true"
stored="true"/> <field name="address_latitude" type="float" multiValued=
"false" indexed="true" stored="true"/> <field name="address_location" type=
"location" multiValued="false" indexed="true" stored="true"/> <field name=
"address_longitude" type="float" multiValued="false" indexed="true" stored=
"true"/> <field name="building_name" type="string" multiValued="false"
indexed="true" stored="true"/> <field name="filename" type="string"
multiValued="false" indexed="true" stored="true"/> <field name="flat_number"
type="int" multiValued="false" indexed="true" stored="true"/> <field name=
"flat_type_code" type="string" multiValued="false" indexed="true" stored=
"true"/> <field name="foo" type="string" indexed="true" stored="true"/> <
field name="id" type="string" multiValued="false" indexed="true" required=
"true" stored="true"/> <field name="index_id" type="strings"/> <field name=
"level_number" type="int" multiValued="false" indexed="true" stored="true"/>
<field name="locality_latitude" type="float" multiValued="false" indexed=
"true" stored="true"/> <field name="locality_location" type="location"
multiValued="false" indexed="true" stored="true"/> <field name=
"locality_longitude" type="float" multiValued="false" indexed="true" stored=
"true"/> <field name="locality_name" type="string" multiValued="false"
indexed="true" stored="true"/> <field name="locality_pid" type="string"
multiValued="false" indexed="true" stored="true"/> <field name=
"number_first" type="int" multiValued="false" indexed="true" stored="true"/>
<field name="number_first_suffix" type="string" multiValued="false" indexed=
"true" stored="true"/> <field name="number_last" type="int" multiValued=
"false" indexed="true" stored="true"/> <field name="number_last_suffix" type
="string" multiValued="false" indexed="true" stored="true"/> <field name=
"postcode" type="string" multiValued="false" indexed="true" stored="true"/>
<field name="state_abbreviation" type="string" multiValued="false" indexed=
"true" stored="true"/> <field name="state_name" type="string" multiValued=
"false" indexed="true" stored="true"/> <field name="street_name" type=
"string" multiValued="false" indexed="true" stored="true"/> <field name=
"street_number" type="string" multiValued="false" indexed="true" stored=
"true"/> <field name="street_type_code" type="string" multiValued="false"
indexed="true" stored="true"/> <field name="suburb_postcode" type="text_en"
multiValued="false" indexed="true" stored="true"/> <field name="table" type=
"string" multiValued="false" indexed="true" stored="true"/> <field name=
"type" type="string" multiValued="false" indexed="true" stored="true"/>

The search component / requestHandler are defined as follows.

<searchComponent name="suggest" class="solr.SuggestComponent"> <lst name=
"suggester"> <str name="name">suburb</str> <str name="lookupImpl">
FuzzyLookupFactory</str> <str name="dictionaryImpl">
DocumentDictionaryFactory</str> <str name="field">suburb_postcode</str> <str
name="suggestAnalyzerFieldType">string</str> <str name="buildOnStartup">true
</str> </lst> <lst name="suggester"> <str name="name">address</str> <str
name="lookupImpl">FuzzyLookupFactory</str> <str name="dictionaryImpl">
DocumentDictionaryFactory</str> <str name="field">address</str> <str name=
"suggestAnalyzerFieldType">string</str> <str name="buildOnStartup">true</str
> </lst> </searchComponent> <requestHandler name="/suggest" class=
"solr.SearchHandler" startup="lazy" > <lst name="defaults"> <str name=
"suggest">true</str> <str name="suggest.count">10</str> </lst> <arr name=
"components"> <str>suggest</str> </arr> </requestHandler>

Please let me know if you need any more information in order to answer this?
Thanks,
Kehan

Re: Suggester autocomplete for address information

Posted by Kehan Harman <ke...@gaiaresources.com.au>.
I'd like to clarify that what I am looking for is the right field type for
the address field that will suggest values as follows for the input:
Input:
"123 SM"
Suggestions:

   - 123-127 SMITH STREET, KEMPSEY NSW 2440
   - 123 SMYTHE STREET. RANDOM PLACE 9999


And in addition to this I want the search to also provide results if I
simply include the postcode (4 integers here in Oz) as follows:

Input:
"2440"

Suggestions:

   - 123-127 SMITH STREET, KEMPSEY NSW 2440
   - 120 SMITH STREET, KEMPSEY NSW 2440
   - 65 SMITH STREET, KEMPSEY NSW 2440
   - 2440 ANOTHER RANDOM ROAD, RANDOM PLACE 9999


In short I would like it to try to match the beginning part of the address
first and if that fails start using later parts of the string such as
suburb, state and postcode.

The field type that I'm currently using as the basis of these suggestions
is as follows:


<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <
filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt"
ignoreCase="true"/> <filter class="solr.LowerCaseFilterFactory"/> <filter
class="solr.EnglishPossessiveFilterFactory"/> <filter class=
"solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class=
"solr.PorterStemFilterFactory"/> </analyzer>

Thanks,
Kehan


On Tue, 26 Feb 2019 at 21:54, Kehan Harman <
kehan.harman@gaiaresources.com.au> wrote:

> Hi All,
>
> I'm new to Solr & the community so feel free to ignore / remove if this is
> the incorrect mailing list for this query.
>
> I'm trying to build an autocomplete using a Solr index for addresses in a
> format similar to:
>
> 123 Smith Street, KEMPSEY, NSW 2440
>
> I'm looking to have these addresses suggest values to users based on their
> input with some spellchecking capability.
>
> My documents contain contents like:
> { "id":"ANSW718363409", "table":"ADDRESS_DEFAULT_GEOCODE", "address":"123-127
> SMITH STREET, KEMPSEY NSW 2440", "address_location":
> "-31.07321967,152.84505473", "address_latitude":-31.07322, "
> address_longitude":152.84506, "locality_pid":"NSW2119", "locality_latitude
> ":-31.060476, "locality_longitude":152.84819, "suburb_postcode":"KEMPSEY
> NSW 2440", "number_first":123, "number_last":127, "street_number":
> "123-127", "street_name":"SMITH", "street_type_code":"STREET", "
> locality_name":"KEMPSEY", "state_name":"NEW SOUTH WALES", "
> state_abbreviation":"NSW", "postcode":"2440", "_version_":
> 1626515771141128204}
>
> These are Australian addresses extracted from
> https://data.gov.au/dataset/ds-dga-19432f89-dc3a-4ef3-b943-5326ef1dbecc/details
> .
>
> My managed schema has the following fields - I'm using the example managed
> schema *sample_techproducts_configs* with some additional fields that
> have been added using the schema API.:
>
> <field name="address" type="text_en" multiValued="false" indexed="true"
> stored="true"/> <field name="address_latitude" type="float" multiValued=
> "false" indexed="true" stored="true"/> <field name="address_location" type
> ="location" multiValued="false" indexed="true" stored="true"/> <field name
> ="address_longitude" type="float" multiValued="false" indexed="true"
> stored="true"/> <field name="building_name" type="string" multiValued=
> "false" indexed="true" stored="true"/> <field name="filename" type=
> "string" multiValued="false" indexed="true" stored="true"/> <field name=
> "flat_number" type="int" multiValued="false" indexed="true" stored="true"
> /> <field name="flat_type_code" type="string" multiValued="false" indexed=
> "true" stored="true"/> <field name="foo" type="string" indexed="true"
> stored="true"/> <field name="id" type="string" multiValued="false" indexed
> ="true" required="true" stored="true"/> <field name="index_id" type=
> "strings"/> <field name="level_number" type="int" multiValued="false"
> indexed="true" stored="true"/> <field name="locality_latitude" type=
> "float" multiValued="false" indexed="true" stored="true"/> <field name=
> "locality_location" type="location" multiValued="false" indexed="true"
> stored="true"/> <field name="locality_longitude" type="float" multiValued=
> "false" indexed="true" stored="true"/> <field name="locality_name" type=
> "string" multiValued="false" indexed="true" stored="true"/> <field name=
> "locality_pid" type="string" multiValued="false" indexed="true" stored=
> "true"/> <field name="number_first" type="int" multiValued="false" indexed
> ="true" stored="true"/> <field name="number_first_suffix" type="string"
> multiValued="false" indexed="true" stored="true"/> <field name=
> "number_last" type="int" multiValued="false" indexed="true" stored="true"
> /> <field name="number_last_suffix" type="string" multiValued="false"
> indexed="true" stored="true"/> <field name="postcode" type="string"
> multiValued="false" indexed="true" stored="true"/> <field name=
> "state_abbreviation" type="string" multiValued="false" indexed="true"
> stored="true"/> <field name="state_name" type="string" multiValued="false"
> indexed="true" stored="true"/> <field name="street_name" type="string"
> multiValued="false" indexed="true" stored="true"/> <field name=
> "street_number" type="string" multiValued="false" indexed="true" stored=
> "true"/> <field name="street_type_code" type="string" multiValued="false"
> indexed="true" stored="true"/> <field name="suburb_postcode" type=
> "text_en" multiValued="false" indexed="true" stored="true"/> <field name=
> "table" type="string" multiValued="false" indexed="true" stored="true"/> <
> field name="type" type="string" multiValued="false" indexed="true" stored=
> "true"/>
>
> The search component / requestHandler are defined as follows.
>
> <searchComponent name="suggest" class="solr.SuggestComponent"> <lst name=
> "suggester"> <str name="name">suburb</str> <str name="lookupImpl">
> FuzzyLookupFactory</str> <str name="dictionaryImpl">
> DocumentDictionaryFactory</str> <str name="field">suburb_postcode</str> <
> str name="suggestAnalyzerFieldType">string</str> <str name=
> "buildOnStartup">true</str> </lst> <lst name="suggester"> <str name="name"
> >address</str> <str name="lookupImpl">FuzzyLookupFactory</str> <str name=
> "dictionaryImpl">DocumentDictionaryFactory</str> <str name="field">address
> </str> <str name="suggestAnalyzerFieldType">string</str> <str name=
> "buildOnStartup">true</str> </lst> </searchComponent> <requestHandler name
> ="/suggest" class="solr.SearchHandler" startup="lazy" > <lst name=
> "defaults"> <str name="suggest">true</str> <str name="suggest.count">10</
> str> </lst> <arr name="components"> <str>suggest</str> </arr> </
> requestHandler>
>
> Please let me know if you need any more information in order to answer
> this?
> Thanks,
> Kehan
>
>
>

-- 
*------------------------------------*
Kehan Harman
Gaia Resources
p +61 8 92277309
m +61 406872510
w www.gaiaresources.com.au
e kehan.harman@gaiaresources.com.au
t @kehan <http://twitter.com/kehan>
g kehh <http://github.com/kehh>

I acknowledge the traditional custodians of the lands and waters where we
live and work, and pay my respects to elders past, present and future.