Convert IP address for Amazon Redshift
PostgreSQL offers network address types to store IP address. You can define the IP column as inet
type, then simply insert the human readable IP address string (e.g. “192.168.1.1”) into the table.
You can also store the IP as an integer
, then use inet
to cast it to string:
-- Returns 192.168.1.1
select '0.0.0.0'::inet + 3232235777;
As you can see from the above sql example, we convert the integer 3232235777
back to the string format "192.168.1.1"
.
Even though it is based on Postgres 8, Amazon Redshift does not support network address types. We have to store the IP address into a BIGINT
column, and we can’t use inet
to cast the bigint to the human readable format.
In this post, I will show you how to convert IP address into an integer before inserting into the Redshift database, and also how to cast the integer representation of an IP address back into the string format when retrieving the data back from Redshift.
Convert IP Address in Ruby
Most programming languages include libraries that allow you to manipulate IP address. These libraries are probably more reliable and feature-rich. So the 1st and probably the easiest way is make use of those libraries in the language you are familiar with, and do the conversion there before sending and/or after receiving the data from Redshift.
In Ruby, you can use ipaddr to convert IP address from string to integer and vise versa:
require 'ipaddr'
ip = IPAddr.new('192.168.1.1')
ip.to_i # => 3232235777
ip2 = IPAddr.new(3232235777, Socket::AF_INET)
ip2.to_s # => "192.168.1.1"
Note that to convert integer into string, you need to specify the address family which is Socket::AF_INET
for IPv4.
Convert IP Address in SQL
However, there are times when you have to use sql language in Redshift directly to do this job. I recently come into this situation, where I need to use Redshift’s UNLOAD
command to export data into AWS S3 for better performance, so I have to come up with a solution by using pure SQL language.
To cast a IP string into bigint
when inserting the data into the column, you can do:
-- Returns 3232235777
select (split_part('192.168.1.1', '.', 1)::bigint << 24) +
(split_part('192.168.1.1', '.', 2)::bigint << 16) +
(split_part('192.168.1.1', '.', 3)::bigint << 8) +
(split_part('192.168.1.1', '.', 4)::bigint);
There’s a similar solution like this above in StackOverflow.
To convert bigint back to human-readable notations, you can do:
-- Returns 192.168.1.1
select ((3232235777 >> 24) & x'FF'::int)::varchar(3) || '.' ||
((3232235777 >> 16) & x'FF'::int)::varchar(3) || '.' ||
((3232235777 >> 8) & x'FF'::int)::varchar(3) || '.' ||
(3232235777 & x'FF'::int)::varchar(3);
There are bitwise operations going on here, but I won’t explain them to you in details. You can read more from wikipedia.
Finally, it’s interesting to see how to write these IP address conversations in Ruby without using any library. It turns out to be just as simple as these following few lines of code:
# convert IPv4 address from integer to string
ip_int = 3232235777
ip = [24, 16, 8].map { |i| (ip_int >> i) & 0xff }.join('.') # => "192.168.1.1"
# convert from string to integer
ip_addr = '192.168.1.1'
ip_addr.split('.').inject(0) { |i, s| i << 8 | s.to_i } # => 3232235777