You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by SolrUser1543 <os...@gmail.com> on 2014/05/18 21:20:26 UTC
Index / Query IP Address as number.
This question was raised here for a few times , but no final solution was
provided .
I'am using a combination of ClassicTokenizer and WordDelimiterFactory in my
Query / Index chain.
as a result an IP like 192.168.1.3 is indexed as
192 - pos1
168 - pos2
1 - pos3
3 - pos4
19216813 - pos5
So searching for a similar ,but different address like 192.168.1.4 will
return wrong item because of match for all 3 first position.
So the question is , what is the best way do index / query by IP as number ,
but using ClassicTokenizer and WordDelimiter ?
actually I would like to have the IP as num , without breaking it on parts .
( have only 19216813 )
Thanks .
--
View this message in context: http://lucene.472066.n3.nabble.com/Index-Query-IP-Address-as-number-tp4136760.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Index / Query IP Address as number.
Posted by SolrUser1543 <os...@gmail.com>.
I dont have autogeneratephrasequeries set to true . I tried both false /
true for it , but nothing changed
Capture.JPG <http://lucene.472066.n3.nabble.com/file/n4136971/Capture.JPG>
the same chain defined for both query / index :
<fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100" >
<analyzer type="index">
<tokenizer class="solr.ClassicTokenizerFactory"/>" />
<filter class="solr.WordDelimiterFilterFactory"
splitOnCaseChange ="0"
splitOnNumerics ="1"
stemEnglishPossessive ="0"
generateWordParts="1"
generateNumberParts="1"
catenateWords="0"
catenateNumbers="1"
catenateAll="1"
preserveOriginal="0"
/>
</analyzer>
--
View this message in context: http://lucene.472066.n3.nabble.com/Index-Query-IP-Address-as-number-tp4136760p4136971.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Index / Query IP Address as number.
Posted by Jack Krupansky <ja...@basetechnology.com>.
What are you using for your default query operator, and do you have
autoGeneratePhraseQueries set to "true" for your field type?
I mean, a query for 192.168.1.4 shouldn't match 192.168.1.3 - unless you
have autoGeneratePhraseQueries set to "false" (the default.)
-- Jack Krupansky
-----Original Message-----
From: SolrUser1543
Sent: Sunday, May 18, 2014 3:20 PM
To: solr-user@lucene.apache.org
Subject: Index / Query IP Address as number.
This question was raised here for a few times , but no final solution was
provided .
I'am using a combination of ClassicTokenizer and WordDelimiterFactory in my
Query / Index chain.
as a result an IP like 192.168.1.3 is indexed as
192 - pos1
168 - pos2
1 - pos3
3 - pos4
19216813 - pos5
So searching for a similar ,but different address like 192.168.1.4 will
return wrong item because of match for all 3 first position.
So the question is , what is the best way do index / query by IP as number ,
but using ClassicTokenizer and WordDelimiter ?
actually I would like to have the IP as num , without breaking it on parts .
( have only 19216813 )
Thanks .
--
View this message in context:
http://lucene.472066.n3.nabble.com/Index-Query-IP-Address-as-number-tp4136760.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Index / Query IP Address as number.
Posted by Jack Krupansky <ja...@basetechnology.com>.
Consider an update processor - either raw Java or a snippet of JavaScript
with the stateless script update processor. The update processor could be
hard-coded or take parameters as to which source value to examine and what
field to output. It could use a simple regex to extract only IP addresses.
And then you could output to multiple fields - one for the raw string for
wildcard matches, say, and one as an integer for proximity or range checks.
-- Jack Krupansky
-----Original Message-----
From: SolrUser1543
Sent: Monday, May 19, 2014 3:04 PM
To: solr-user@lucene.apache.org
Subject: Re: Index / Query IP Address as number.
I have a text field containing a large piece of mixed text , like :
test test 12/12/2001 12345 192.168.1.1 1234324
I need to create a copy field which will capture only all IPs from the text
( may be more than one IP ) .
What will be the best way to do ?
I dont see any option to make WordDelimiter to not break down the IP , so as
alternative I will use a copy field .
--
View this message in context:
http://lucene.472066.n3.nabble.com/Index-Query-IP-Address-as-number-tp4136760p4136974.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Index / Query IP Address as number.
Posted by SolrUser1543 <os...@gmail.com>.
I have a text field containing a large piece of mixed text , like :
test test 12/12/2001 12345 192.168.1.1 1234324
I need to create a copy field which will capture only all IPs from the text
( may be more than one IP ) .
What will be the best way to do ?
I dont see any option to make WordDelimiter to not break down the IP , so as
alternative I will use a copy field .
--
View this message in context: http://lucene.472066.n3.nabble.com/Index-Query-IP-Address-as-number-tp4136760p4136974.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Index / Query IP Address as number.
Posted by Walter Underwood <wu...@wunderwood.org>.
Use a PatternReplaceCharFilterFactory to map the periods to empty strings, then use a KeywordTokenizer and a string field type. If you want to sort it or do range queries, you might use an integer field.
wunder
On May 18, 2014, at 12:20 PM, SolrUser1543 <os...@gmail.com> wrote:
> This question was raised here for a few times , but no final solution was
> provided .
>
> I'am using a combination of ClassicTokenizer and WordDelimiterFactory in my
> Query / Index chain.
>
> as a result an IP like 192.168.1.3 is indexed as
>
> 192 - pos1
> 168 - pos2
> 1 - pos3
> 3 - pos4
> 19216813 - pos5
>
>
> So searching for a similar ,but different address like 192.168.1.4 will
> return wrong item because of match for all 3 first position.
>
> So the question is , what is the best way do index / query by IP as number ,
> but using ClassicTokenizer and WordDelimiter ?
>
>
> actually I would like to have the IP as num , without breaking it on parts .
> ( have only 19216813 )
>
> Thanks .
>
>