You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jacob Rhoden <ja...@me.com> on 2013/11/16 01:36:23 UTC

Efficient IP address location lookup

Hi Guys,

It occurs to me that someone may have done this before and be willing to share, or may just be interested in helping work out it.

Assuming a database table where the partition key is the first component of a users IPv4 address, i.e. (ip=100.0.0.1, part=100) and the remaining three parts of the IP address become a 24bit integer.

create table location(
    part int,
    start bigint,
    end bigint,
    country text,
    city text,
    primary key (part, start, end));

// range 100.0.0.0 - 100.0.0.10
insert into location (part, start, end, country, city) values(100,0,10,'AU','Melbourne’);

// range 100.0.0.11 - 100.0.0.200
insert into location (part, start, end, country, city) values(100,11,200,'US','New York’);

// range 100.0.0.201-100.0.0.255
insert into location (part, start, end, country, city) values(100,201,255,'UK','London');

What is the appropriate way to then query this? While the following is possible:

select * from location where part=100 and start<=30

What I need to do, is this, which seems not allowed. What is the correct way to query this?

select * from location where part=100 and start<=30 and end>=30

Or perhaps I’m going about this all wrong? Thanks!

Re: Efficient IP address location lookup

Posted by "Laing, Michael" <mi...@nytimes.com>.
This approach is similar to Janne's.

But I used a shard as an example to make more even rows, and just converted
each IP to an int.

-- put this in <file> and run using 'cqlsh -f <file>

DROP KEYSPACE jacob_test;

CREATE KEYSPACE jacob_test WITH replication = {
    'class': 'SimpleStrategy',
    'replication_factor' : 1
};

USE jacob_test;

CREATE TABLE location (
    shard int,
    start bigint,
    end bigint,
    country text,
    city text,
    PRIMARY KEY (shard, start)
);

-- shard is calculated as start % 12

-- range 100.0.0.0 - 100.0.0.10 == 1677721600 - 1677721610
INSERT INTO location (shard, start, end, country, city) VALUES
(4,1677721600,1677721610,'AU','Melbourne');

-- range 100.0.0.11 - 100.0.0.200
INSERT INTO location (shard, start, end, country, city) VALUES
(3,1677721611,1677721800,'US','New York');

-- range 100.0.0.201-100.0.0.255
INSERT INTO location (shard, start, end, country, city) VALUES
(1,1677721801,1677721855,'UK','London');

-- where is IP 100.0.0.30?
SELECT * FROM location WHERE shard IN (0,1,2,3,4,5,6,7,8,9,10,11) AND start
<= 1677721630 LIMIT 1;

-- returns:

-- shard | start      | city     | country | end
---------+------------+----------+---------+------------
--     3 | 1677721611 | New York |      US | 1677721800

--(1 rows)

-- app should check that 'end' value is >= IP
-- alternatively fill in ranges with 'unknown', as previously suggested



On Sat, Nov 16, 2013 at 3:48 AM, Janne Jalkanen <Ja...@ecyrd.com>wrote:

> Idea:
>
> Put only range end points in the table with primary key (part, remainder)
>
> insert into location (part, remainder, city) values (100,10,Sydney) //
> 100.0.0.1-100.0.0.10 is Sydney
> insert into location (part, remainder, city) values (100,50,Melbourne) //
> 100.0.0.11-100.0.0.5 is Melb
>
> then look up (100.0.0.30) as
>
> select * from location where part=100 and remainder >= 30 limit 1
>
> For nonused ranges just put in an empty city or some other known value :)
>
> /Janne
>
> On Nov 16, 2013, at 04:51 , Jacob Rhoden <ja...@me.com> wrote:
>
>
> On 16 Nov 2013, at 1:47 pm, Jon Haddad <jo...@jonhaddad.com> wrote:
>
> Instead of determining your table first, you should figure out what you
> want to ask Cassandra.
>
>
> Thanks Jon, Perhaps I should have been more clear. I need to efficiently
> look up the location of an IP address.
>
> On Nov 15, 2013, at 4:36 PM, Jacob Rhoden <ja...@me.com> wrote:
>
> Hi Guys,
>
> It occurs to me that someone may have done this before and be willing to
> share, or may just be interested in helping work out it.
>
> Assuming a database table where the partition key is the first component
> of a users IPv4 address, i.e. (ip=100.0.0.1, part=100) and the remaining
> three parts of the IP address become a 24bit integer.
>
> create table location(
>     part int,
>     start bigint,
>     end bigint,
>     country text,
>     city text,
>     primary key (part, start, end));
>
> // range 100.0.0.0 - 100.0.0.10
> insert into location (part, start, end, country, city)
> values(100,0,10,'AU','Melbourne’);
>
> // range 100.0.0.11 - 100.0.0.200
> insert into location (part, start, end, country, city)
> values(100,11,200,'US','New York’);
>
> // range 100.0.0.201-100.0.0.255
> insert into location (part, start, end, country, city)
> values(100,201,255,'UK','London');
>
>
> What is the appropriate way to then query this? While the following is
> possible:
>
> select * from location where part=100 and start<=30
>
>
> What I need to do, is this, which seems not allowed. What is the correct
> way to query this?
>
> select * from location where part=100 and start<=30 and end>=30
>
>
> Or perhaps I’m going about this all wrong? Thanks!
>
>
>
>

Re: Efficient IP address location lookup

Posted by Janne Jalkanen <Ja...@ecyrd.com>.
Idea:

Put only range end points in the table with primary key (part, remainder)

insert into location (part, remainder, city) values (100,10,Sydney) // 100.0.0.1-100.0.0.10 is Sydney
insert into location (part, remainder, city) values (100,50,Melbourne) // 100.0.0.11-100.0.0.5 is Melb

then look up (100.0.0.30) as

select * from location where part=100 and remainder >= 30 limit 1

For nonused ranges just put in an empty city or some other known value :)

/Janne

On Nov 16, 2013, at 04:51 , Jacob Rhoden <ja...@me.com> wrote:

> 
> On 16 Nov 2013, at 1:47 pm, Jon Haddad <jo...@jonhaddad.com> wrote:
>> Instead of determining your table first, you should figure out what you want to ask Cassandra.
> 
> Thanks Jon, Perhaps I should have been more clear. I need to efficiently look up the location of an IP address.
> 
>> On Nov 15, 2013, at 4:36 PM, Jacob Rhoden <ja...@me.com> wrote:
>> 
>>> Hi Guys,
>>> 
>>> It occurs to me that someone may have done this before and be willing to share, or may just be interested in helping work out it.
>>> 
>>> Assuming a database table where the partition key is the first component of a users IPv4 address, i.e. (ip=100.0.0.1, part=100) and the remaining three parts of the IP address become a 24bit integer.
>>> 
>>> create table location(
>>>     part int,
>>>     start bigint,
>>>     end bigint,
>>>     country text,
>>>     city text,
>>>     primary key (part, start, end));
>>> 
>>> // range 100.0.0.0 - 100.0.0.10
>>> insert into location (part, start, end, country, city) values(100,0,10,'AU','Melbourne’);
>>> 
>>> // range 100.0.0.11 - 100.0.0.200
>>> insert into location (part, start, end, country, city) values(100,11,200,'US','New York’);
>>> 
>>> // range 100.0.0.201-100.0.0.255
>>> insert into location (part, start, end, country, city) values(100,201,255,'UK','London');
>>> 
>>> What is the appropriate way to then query this? While the following is possible:
>>> 
>>> select * from location where part=100 and start<=30
>>> 
>>> What I need to do, is this, which seems not allowed. What is the correct way to query this?
>>> 
>>> select * from location where part=100 and start<=30 and end>=30
>>> 
>>> Or perhaps I’m going about this all wrong? Thanks!
>> 


Re: Efficient IP address location lookup

Posted by Jacob Rhoden <ja...@me.com>.
> On 16 Nov 2013, at 1:47 pm, Jon Haddad <jo...@jonhaddad.com> wrote:
> Instead of determining your table first, you should figure out what you want to ask Cassandra.

Thanks Jon, Perhaps I should have been more clear. I need to efficiently look up the location of an IP address.

>> On Nov 15, 2013, at 4:36 PM, Jacob Rhoden <ja...@me.com> wrote:
>> 
>> Hi Guys,
>> 
>> It occurs to me that someone may have done this before and be willing to share, or may just be interested in helping work out it.
>> 
>> Assuming a database table where the partition key is the first component of a users IPv4 address, i.e. (ip=100.0.0.1, part=100) and the remaining three parts of the IP address become a 24bit integer.
>> 
>> create table location(
>>     part int,
>>     start bigint,
>>     end bigint,
>>     country text,
>>     city text,
>>     primary key (part, start, end));
>> 
>> // range 100.0.0.0 - 100.0.0.10
>> insert into location (part, start, end, country, city) values(100,0,10,'AU','Melbourne��);
>> 
>> // range 100.0.0.11 - 100.0.0.200
>> insert into location (part, start, end, country, city) values(100,11,200,'US','New York��);
>> 
>> // range 100.0.0.201-100.0.0.255
>> insert into location (part, start, end, country, city) values(100,201,255,'UK','London');
>> 
>> What is the appropriate way to then query this? While the following is possible:
>> 
>> select * from location where part=100 and start<=30
>> 
>> What I need to do, is this, which seems not allowed. What is the correct way to query this?
>> 
>> select * from location where part=100 and start<=30 and end>=30
>> 
>> Or perhaps I��m going about this all wrong? Thanks!
> 

Re: Efficient IP address location lookup

Posted by Jon Haddad <jo...@jonhaddad.com>.
Instead of determining your table first, you should figure out what you want to ask Cassandra.

What do you want to look up your data by?  For each query you may need to store the data multiple times, which is perfectly reasonable and is recommended.

On Nov 15, 2013, at 4:36 PM, Jacob Rhoden <ja...@me.com> wrote:

> Hi Guys,
> 
> It occurs to me that someone may have done this before and be willing to share, or may just be interested in helping work out it.
> 
> Assuming a database table where the partition key is the first component of a users IPv4 address, i.e. (ip=100.0.0.1, part=100) and the remaining three parts of the IP address become a 24bit integer.
> 
> create table location(
>     part int,
>     start bigint,
>     end bigint,
>     country text,
>     city text,
>     primary key (part, start, end));
> 
> // range 100.0.0.0 - 100.0.0.10
> insert into location (part, start, end, country, city) values(100,0,10,'AU','Melbourne’);
> 
> // range 100.0.0.11 - 100.0.0.200
> insert into location (part, start, end, country, city) values(100,11,200,'US','New York’);
> 
> // range 100.0.0.201-100.0.0.255
> insert into location (part, start, end, country, city) values(100,201,255,'UK','London');
> 
> What is the appropriate way to then query this? While the following is possible:
> 
> select * from location where part=100 and start<=30
> 
> What I need to do, is this, which seems not allowed. What is the correct way to query this?
> 
> select * from location where part=100 and start<=30 and end>=30
> 
> Or perhaps I’m going about this all wrong? Thanks!