You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by SolrUser1543 <os...@gmail.com> on 2014/05/18 21:20:26 UTC

Index / Query IP Address as number.

This question was  raised here  for a few times , but no final solution was
provided . 

I'am using a combination of ClassicTokenizer and WordDelimiterFactory in my
Query / Index chain. 

as a result an IP like 192.168.1.3 is indexed as 

192 - pos1
168 - pos2 
1    - pos3
3    - pos4 
19216813 - pos5


So searching for a similar ,but different address like 192.168.1.4 will
return wrong item because of match for all 3 first position. 

So the question is , what is the best way do index / query by IP as number ,
but using ClassicTokenizer and WordDelimiter  ? 


actually I would like to have the IP as num , without breaking it on parts . 
( have only 19216813 ) 

Thanks .






--
View this message in context: http://lucene.472066.n3.nabble.com/Index-Query-IP-Address-as-number-tp4136760.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Index / Query IP Address as number.

Posted by SolrUser1543 <os...@gmail.com>.
I dont have autogeneratephrasequeries set to true .  I tried both false /
true for it  , but nothing changed

Capture.JPG <http://lucene.472066.n3.nabble.com/file/n4136971/Capture.JPG>  

the same chain defined for both query / index : 

    <fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100" >
      <analyzer type="index">
	  
        <tokenizer class="solr.ClassicTokenizerFactory"/>" />
       <filter class="solr.WordDelimiterFilterFactory"
                splitOnCaseChange ="0"
                splitOnNumerics ="1"
                stemEnglishPossessive ="0"
                generateWordParts="1"
                generateNumberParts="1"
                catenateWords="0"
                catenateNumbers="1"
                catenateAll="1"
                preserveOriginal="0"
               
               
                />
        
      </analyzer>



--
View this message in context: http://lucene.472066.n3.nabble.com/Index-Query-IP-Address-as-number-tp4136760p4136971.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Index / Query IP Address as number.

Posted by Jack Krupansky <ja...@basetechnology.com>.
What are you using for your default query operator, and do you have 
autoGeneratePhraseQueries set to "true" for your field type?

I mean, a query for 192.168.1.4 shouldn't match 192.168.1.3 - unless you 
have autoGeneratePhraseQueries set to "false" (the default.)

-- Jack Krupansky

-----Original Message----- 
From: SolrUser1543
Sent: Sunday, May 18, 2014 3:20 PM
To: solr-user@lucene.apache.org
Subject: Index / Query IP Address as number.

This question was  raised here  for a few times , but no final solution was
provided .

I'am using a combination of ClassicTokenizer and WordDelimiterFactory in my
Query / Index chain.

as a result an IP like 192.168.1.3 is indexed as

192 - pos1
168 - pos2
1    - pos3
3    - pos4
19216813 - pos5


So searching for a similar ,but different address like 192.168.1.4 will
return wrong item because of match for all 3 first position.

So the question is , what is the best way do index / query by IP as number ,
but using ClassicTokenizer and WordDelimiter  ?


actually I would like to have the IP as num , without breaking it on parts .
( have only 19216813 )

Thanks .






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-Query-IP-Address-as-number-tp4136760.html
Sent from the Solr - User mailing list archive at Nabble.com. 


Re: Index / Query IP Address as number.

Posted by Jack Krupansky <ja...@basetechnology.com>.
Consider an update processor - either raw Java or a snippet of JavaScript 
with the stateless script update processor. The update processor could be 
hard-coded or take parameters as to which source value to examine and what 
field to output. It could use a simple regex to extract only IP addresses. 
And then you could output to multiple fields - one for the raw string for 
wildcard matches, say, and one as an integer for proximity or range checks.

-- Jack Krupansky

-----Original Message----- 
From: SolrUser1543
Sent: Monday, May 19, 2014 3:04 PM
To: solr-user@lucene.apache.org
Subject: Re: Index / Query IP Address as number.

I have a text field containing a large piece of mixed text , like :

test test 12/12/2001 12345 192.168.1.1 1234324


I need to  create a copy field which will capture only all IPs from the text
( may be more than one IP ) .

What will be the best way to do ?

I dont see any option to make WordDelimiter to not break down the IP , so as
alternative I will use a copy field .




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-Query-IP-Address-as-number-tp4136760p4136974.html
Sent from the Solr - User mailing list archive at Nabble.com. 


Re: Index / Query IP Address as number.

Posted by SolrUser1543 <os...@gmail.com>.
I have a text field containing a large piece of mixed text , like : 

test test 12/12/2001 12345 192.168.1.1 1234324


I need to  create a copy field which will capture only all IPs from the text
( may be more than one IP ) . 

What will be the best way to do ? 

I dont see any option to make WordDelimiter to not break down the IP , so as
alternative I will use a copy field . 




--
View this message in context: http://lucene.472066.n3.nabble.com/Index-Query-IP-Address-as-number-tp4136760p4136974.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Index / Query IP Address as number.

Posted by Walter Underwood <wu...@wunderwood.org>.
Use a PatternReplaceCharFilterFactory to map the periods to empty strings, then use a KeywordTokenizer and a string field type. If you want to sort it or do range queries, you might use an integer field.

wunder

On May 18, 2014, at 12:20 PM, SolrUser1543 <os...@gmail.com> wrote:

> This question was  raised here  for a few times , but no final solution was
> provided . 
> 
> I'am using a combination of ClassicTokenizer and WordDelimiterFactory in my
> Query / Index chain. 
> 
> as a result an IP like 192.168.1.3 is indexed as 
> 
> 192 - pos1
> 168 - pos2 
> 1    - pos3
> 3    - pos4 
> 19216813 - pos5
> 
> 
> So searching for a similar ,but different address like 192.168.1.4 will
> return wrong item because of match for all 3 first position. 
> 
> So the question is , what is the best way do index / query by IP as number ,
> but using ClassicTokenizer and WordDelimiter  ? 
> 
> 
> actually I would like to have the IP as num , without breaking it on parts . 
> ( have only 19216813 ) 
> 
> Thanks .
> 
>