Install Gisgraphy on Ubuntu 12.04 from Scratch

Gisgraphy is a free opensource geocoding and webservices solution. It is a greate alternative to google’s geocoding API, which has lots of limitation on usage. Gisgraphy can provide the best relevance of geocoding, since it combines both geonames and openstreetmap dataset. In fact besides geocoding, Gisgraphy can be used for Reverse geocoding / street search, Street search, Find nearby, Fulltext search, Address parser. I’d recommend you go to their demo site and try it!

Here, I’ll show you how to install Gisgraphy 3.0 on your local machine with Ubuntu 12.04 step by step. I’ll use: Java JDK 7, PostgreSQL9.1 and Postgis 1.5 . ( Notice that Gisgraphy 3.0 does NOT support Postgis 2.0 ). The official site did provide a installation guide, but it is sort of out of data.

1. Install Java SDK

1.1 Install oracle-jdk7

Run the following commands inside of terminal to install jdk7:

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-jdk7-installer

You can check if you successfully installed java by running:

java -version # should get java version "1.7.0_21" or something like that
javac -version # should get javac 1.7.0_21 or something like that

1.2 Set up Java environment

Open .bashrc file using vim or any other text editor:

sudo vim .bashrc

Add the following line at the end of file:

export JAVA_HOME=/usr/lib/jvm/java-7-oracle

Reload the settings by running:

source ~/.bashrc

Now you can check to see if the setting is effect by running

echo $JAVA_HOME # should return /usr/lib/jvm/java-7-oracle

2. Install PostgreSQL9.1 and Postgis 1.5

2.1 Install postgreSQL and postgis

Run the following command to install postgresql:

sudo apt-get install python-software-properties
sudo add-apt-repository ppa:pitti/postgresql
sudo apt-get update
sudo apt-get install libpq-dev
sudo apt-get install postgresql
sudo apt-get install postgresql-9.1-postgis

Check if postgresql is successfully installed:

psql -V

2.2 Create password for user: postgres

Enter postgres console with username ‘postgres’:

sudo -u postgres psql

Then, inside of postgres, run the following command to change password of ‘postgres’:

ALTER USER postgres PASSWORD 'yourpassword';
\q # quite postgres console

2.3 Create database, language and postgis function

All the following command will use the user ‘postgres’ with the password you just created.

# create the database
psql -U postgres -h 127.0.0.1 -c "CREATE DATABASE gisgraphy ENCODING = 'UTF8';"

#create language
createlang -U postgres -h 127.0.0.1 plpgsql gisgraphy

#create postgis function
psql -U postgres -h 127.0.0.1 -d gisgraphy -f /usr/share/postgresql/9.1/contrib/postgis-1.5/postgis.sql

psql -U postgres -h 127.0.0.1 -d gisgraphy -f /usr/share/postgresql/9.1/contrib/postgis-1.5/spatial_ref_sys.sql

After all the settings are done, restart the server by running:

sudo /etc/init.d/postgresql restart

3. Linux file limit settings

To avoid message like “Too many open files” when solr opens a large number of files, you must increase maximum number of files limit. Open terminal, and edit limits.conf file using vim:

sudo vim /etc/security/limits.conf

Add the following 2 lines to the file, notice do not miss the * mark:

* hard nofile 20000
* soft nofile 20000

That’s it, now everything is set up. And the next, we are going to install Gisgraphy server.

4. Insatll Gisgraphy

4.1 Download Gisgraphy

Download Gisgraphy from here.

Open your terminal, go to your directory where the file is downloaded, and unzip it:

unzip gisgraphy-3.0-beta2.zip

mv gisgraphy-3.0-beta2 gisgraphy

4.2 Initialize tables

After that, we need to create tables:

cd gisgraphy/

psql -Upostgres -d gisgraphy -h 127.0.0.1 -f ./sql/create_tables.sql

Then, add default user:

psql -Upostgres -d gisgraphy -h 127.0.0.1 -f ./sql/insert_users.sql

The above command will give two default user one is admin with password admin, the other one is user.

4.3 Settings

In order to make the server run, we need to fill the password of postgres in jdbc.properties file. Inside of gisgraphy directory:

vim webapps/ROOT/WEB-INF/classes/jdbc.properties

Open the jdbc.properties file, and fill the jdbc.password field with your password. Notice that do not leave any space after ‘=’

jdbc.username=postgres
jdbc.password=yourpassword

Then it’s pretty much done. Last thing is we can set up environment inside of env.properties file, which is also under webapps/ROOT/WEB-INF/classes/ directory.

There are 3 parameters that I think is worth to take a look at it. These are:

 importer.geonamesfilesToDownload=US.zip
 importer.openstreetmapfilesToDownload=US.tar.bz2
 googleMapAPIKey=yourkey

For me, I’m only interested about data in USA, so I set geonamesfilesToDownload and openstreetmapfilesToDownload to only download data for US. This will save us a lot of space. What’s more, the googleMapAPIKey can be used to show map in the demo server. You can get it from google’s api console.

All the other settings can be reference from the document.

4.4 Run the server

Ok, now it’s time to run the server. Change the file mode to executable, and then run it.

chmod +x launch.sh
./launch.sh

Now, you should be able to visit http://localhost:8080/mainMenu.html page.  Next thing is go through the wizard in the main page to download the dataset. Yeah!

Linode vs Digital Ocean Performance Benchmarks

Linode has recently increased the CPU from 4 cores to 8 cores, and also doubled memories of all their plans.

To be honest, I really don’t know how 8 cores could be fully used for a website, which just uses their 1GB or 2GB lower plans. I really wish they should have upgraded to SSD disk, I think that’s the real bottleneck.

Digital Ocean is becoming a real competitor, its $5, $10 low price server options with SSD disk make it stands out.

I purchased and benchmarked 3 servers:

DigitalOcean1G: 1 Core CPU, 1GB RAM, 30GB SSD, $10 /month

DigitalOcean2G: 2 Cores CPU, 2GB RAM, 40GB SSD, $20 /month

Linode 1G: 8 Cores CPU, 1GB RAM, 24GB Storage, $20 /month

All servers fresh installed Ubuntu 12.04 x64 server version. Servers from Digital Ocean are all located at New York, whereas Linode server is located at Atlanta. The test script is from ServerBear

Detailed results can be found at the links below:

DigitalOcean1G: http://serverbear.com/benchmark/2013/04/15/77Yz6LDBTg7Iofxu

DigitalOcean2G: http://serverbear.com/benchmark/2013/04/15/0NH1vFxtjGBme8Ze

Linode 1G: http://serverbear.com/benchmark/2013/04/15/BgVR1lhaq7ENOCUA

UnixBench results

DigitalOcean1G:
UnixBench (w/ all processors) 1387.1
UnixBench (w/ one processor) 1386.6

DigitalOcean2G:
UnixBench (w/ all processors) 1873.1
UnixBench (w/ one processor) 1183.7

Linode1G:
UnixBench (w/ all processors) 1860.7
UnixBench (w/ one processor) 491.4

UnixBench gives us an basic score of the system’s performance. I’m really surprised that Linode’s 8 cores didn’t play well as I expect. To give you an idea of how bad it is, below is the test of one of my cheap 4 Cores dedicated server from OVH:

UnixBench (w/ all processors) 4017.1
UnixBench (w/ one processor) 1603.1

At least you can see that’s how it looks, when every 1 core is REALLY 1 core :)

IOPS FIO results

DigitalOcean1G:
Read IOPS 4444.0
Read Bandwidth 17.7 MB/second
Write IOPS 2295.0
Write Bandwidth 9.1 MB/second

DigitalOcean2G:
Read IOPS 3838.0
Read Bandwidth 15.3 MB/second
Write IOPS 2572.0
Write Bandwidth 10.2 MB/second

Linode1G:
Read IOPS 776.0
Read Bandwidth 3.1 MB/second
Write IOPS 624.0
Write Bandwidth 2.4 MB/second

FIO provides a view of system’s I/O performance. Without SSD, Linode did play bad as I expect. But the result is about right at the average level of existing VPSs.

Conclusion

Yes, these raw performance results don’t mean everything. But not everyone can resist the temptation from lower price with better performance.

Validate Attachment File Size and Type in Rails

Upload a file is a common action for websites. And CarrierWave gem provides a very simple and flexible way to achieve that for Rails application.

In realistic situation, you may only allow user to upload files within a limited size or with certain type. For example, only image file type (extension .jpg .jpeg .gif .png) with 5 MB maximum size is allowed. Below, I will show you how to do validations for file size and file extensions.

The validations should be implemented on both front end (client side) and back end (model level). Here I assume you already created  an User model, and a string field :avatar to mount the uploader of CarrierWave, following this guide.

1. Client Side Validation

The front end side (client side) is implemented using jQuery.

First, we need to create a file field inside of your form, using Rails’ file field helper:

 <%= f.file_field :avatar,:onchange =>"validateFiles(this);",
 :data => {
 :max_file_size => 5.megabytes
 }%>

Here, onChange event is triggered when file is selected, and method validateFiles is called. And notice that I create a data attribute “max_file_size” to store the maximum allowed file size information. You can change this value to suit your needs.

Then, we need to implement the validateFiles method. Put the following javascript code inside of your .js file:

function validateFiles(inputFile) {
  var maxExceededMessage = "This file exceeds the maximum allowed file size (5 MB)";
  var extErrorMessage = "Only image file with extension: .jpg, .jpeg, .gif or .png is allowed";
  var allowedExtension = ["jpg", "jpeg", "gif", "png"];

  var extName;
  var maxFileSize = $(inputFile).data('max-file-size');
  var sizeExceeded = false;
  var extError = false;

  $.each(inputFile.files, function() {
    if (this.size && maxFileSize && this.size > parseInt(maxFileSize)) {sizeExceeded=true;};
    extName = this.name.split('.').pop();
    if ($.inArray(extName, allowedExtension) == -1) {extError=true;};
  });
  if (sizeExceeded) {
    window.alert(maxExceededMessage);
    $(inputFile).val('');
  };

  if (extError) {
    window.alert(extErrorMessage);
    $(inputFile).val('');
  };
}

Basically, this code will check if the input file is over the limit and if the input file type is included in the array allowedExtension. Alert window will pop up when validations fail, and the input field will be cleaned.

You can customize the error messages by change the values of maxExceededMessage and extErrorMessage, also you might want to change the allowed file extensions by changing the allowedExtension array.

That’s it for the client side!

2. Model Level Validation

Now, we need to add the same validation on the model level. For file extension check, you just need uncomment the extension_white_list method existing inside of your uploader class:

class AvatarUploader < CarrierWave::Uploader::Base

... ...

  def extension_white_list
    %w(jpg jpeg gif png)
  end
end

Furthermore, the validation for file size is done by Rails custom validator. Inside of User model, add the following code:

class User < ActiveRecord::Base
  ... ...
  validate :avatar_size_validation

  ... ...
  private

  def avatar_size_validation
    errors[:avatar] << "should be less than 5MB" if avatar.size > 5.megabytes
  end
end

And that’s it!

Date, Time, DateTime in Ruby and Rails

There are three classes in Ruby that handle date and time. Date and DateTime are both from date library. And there’s another class Time from its own time library.

Both DateTime and Time can be used to handle year, month, day, hour, min, sec attributes. But on the backend side, Time class stores integer numbers, which presents the seconds intervals since the Epoch. We also call it unix time.

Time class has some limit.  Firstly,  it can only represent dates between 1970 and 2038 ( since ruby v1.9.2, it can represent 1823-11-12 to 2116-02-20 ).  Secondly,  the time zone is limited to UTC and the system’s local time zone in ENV['TZ'].

What’s more, Rails provide a really good time class called ActiveSupport::TimeWithZone. This class is similar as ruby’s Time class, with the support for time zones.

One thing worth to notice is that, Rails will always convert time zone to UTC when ever it writes to or reads from the database, no matter what time zone you set in the configuration file. You can use `<attribute_name>_before_type_cast` to get the original time that store in database. For example ( e.g. created_at):

object.created_at_before_type_cast

Below are some useful snippet code that I use most to deal with date and time.

1. Time

# Get current time using the time zone of current local system
Time.now

# Get current time using the time zone of UTC
Time.now.utc

# Get the unix timestamp of current time => 1364046539
Time.now.to_i

# Convert from unix timestamp back to time form
Time.at(1364046539)

# Use some string format, this one returns => "March 23, 2013 at 09:48 AM"
Time.at(1364046539).strftime("%B %e, %Y at %I:%M %p")

For the time class, I prefer to convert it to unix timestamp, because the integer form presentation can be easily stored, indexed or ordered. Also it can be used in the situation where the distance between two time is more important than the actually time, like in tweets, where it’s better to show ’1 minute ago’ instead of the actually time.

More time string format can be found at ruby’s Time document.

2. Time with Zone (ActiveSupport::TimeWithZone)

TimeWithZone instances implement the same API as Ruby Time instances.

# Set the time zone of the TimeWithZone instance
Time.zone = 'Central Time (US & Canada)'

# Get current time using the time zone you set
Time.zone.now

# Convert from unix timestamp back to time format using the time zone you set
Time.zone.at(1364046539)

# Convert from unix timestamp back to time format using the time zone you set,
#  and the required string format => "03/23/13 09:48 AM"
Time.at(1364046539).in_time_zone("Eastern Time (US & Canada)").strftime("%m/%d/%y %I:%M %p")

Rails also provides a lot of very useful helper methods, they are using pretty straightforward english format.

# Get the date time of n day, week, month, year ago
1.day.ago
2.days.ago
1.week.ago
3.months.ago
1.year.ago

# beginning of or end of the day, week, month ...
Time.now.beginning_of_day
30.days.ago.end_of_day
1.week.ago.end_of_month

# feel free to use those methods from Time class
1.week.ago.beginning_of_day.to_i

You can find more methods by checking the doc.

Time distance

Rails also provides time distance methods to get the twitter styled time format inside of ActionView::Helpers

# inside of your .erb view files

diff = Time.now.to_i - 1.hour.ago.to_i
distance_of_time_in_words(diff)

distance_of_time_in_words_to_now(1.hour.ago)

Use customized time zone by user

For Rails application, you can set the default time zone under /config/application.rb

# /config/application.rb
config.time_zone = 'Central Time (US & Canada)'

To get a list of time zone names supported by Rails, you can use

ActiveSupport::TimeZone.zones_map(&:name)

Normally, we would like to provide a form for user to choose their desired time zone. You can create a string field (e.g. :time_zone), and the form can be implemented as

<%= f.time_zone_select :time_zone %>

# use US time zone only, with default
<%= f.time_zone_select :time_zone, ActiveSupport::TimeZone.us_zones, :default => "Pacific Time (US & Canada)" %>

To make user’s time zone setting work, we can use the method called use_zone, which override the Time.zone locally inside the supplied block.

To use this method. We can add around_filter inside of  ApplicationController as suggested by railscast, like this

# /app/controllers/application_controller.rb

around_filter :user_time_zone, if: :current_user

private

  def current_user
    @current_user ||= User.find(session[:user_id]) if session[:user_id]
  end
  helper_method :current_user

  def user_time_zone(&block)
    Time.use_zone(current_user.time_zone, &block)
  end

3. Date and DateTime

For most cases, the Time class with the time zone of Rails’ ActiveSupport is sufficient. But sometimes, when you just need a string format of year,  month and day, Date class still worth a try.

For example, in one of my applications,  we use date string as keys to store count information in Redis. To generate a list of date string, I use

# Generate date string in 30 days
days_str = (30.days.ago.to_date...Date.today).map{ |date| date.strftime("%Y:%m:%d") }

Time, Date, DateTime are all interchangeable by using to_time, to_date, to_datetime methods

# Convert DateTime to Time
DateTime.parse('March 3rd 2013 04:05:06 AM').to_time.class # => Time

# Convert Time to Date
1.day.ago.to_date.class # => Date

Quick Markdown Syntax Reference by Example

Tools and Steps

I use Markdown to write my blog, it’s simple and productive.

I use Sublime Text 2 for text editing. It’s powerful and easy to use.

My steps is very simple: use Markdown Preview Plugin to preview my blog. Then past the converted html code to WordPress. Done!

Sample Code for Basic Markdown Syntax

This piece of sample code is written in Markdown, it includes basic styles and syntax of Markdown that I normally use to write my blog

It’s simple and straightforward, i believe you can learn it within 5 minutes. Hope you can write a very nice blog :D

1. Heading or Title

Use # for header 1, ## for header 2, ### for header 3 and so on..

Under the header you can type your paragraph, it will have style <p> around

  1. Use [Text](http link) syntax, to insert link or anchor for the text like this: Steven Yue’s Blog
  2. Use <http link> syntax, gives you this: https://stevenyue.com

3. Add Bold, Italic to Emphasis Words

Use asterisks around text or words to emphasis them.

  • Syntax like *text*, gives you Italic style of emphasis.
  • Syntax like **text**, gives you Bold style of emphasis.

4. Blockquote

Just give some indent, use tab or spaces before paragraph, it will gives you nice quote with padding around

Markdown is a text-to-HTML conversion tool for web writers. Markdown allows you to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML).

Also use text around some words inside of the whole paragraph.

5. List

  • Unordered list use hyphens - or * before the text.Note: you can use combinations with list and emphasis, to make the display better.
  • Ordered list use numbers like 1., 2.,… instead of -If you put blank lines between items, you’ll get <p> tags for the list item text. Note: Don’t forget to put indent before items.

6. Code Highlight

Want highlighting for different programming language? Oh, that’s complicate. For different platform, it’s better to use they have their own way to do that.

For example, in wordpress, use can use sourcecode language=”ruby”  with [ ] around your code.

7. Reference

Check daringfireball, where I learned a lot from it.

Also I’ve attached the Markdown source code for this post for your reference. Remember to replace .doc suffix with .md from the file in order to use.

Check here for best display of this post

Download Sample Code Here

Jekyll vs WordPress, Why I prefer wordpress

Jekyll is becoming popular these days. It gives people the impression that it’s simple and fast for blogging. And besides, with the support from Github Pages, it’s attraction now not only comes from time saving, but also resource saving (money saving) as well.

But after two days of learning and experiment, I finally gave up. I think it’s better to just keep writing in WordPress. Below are some of my reasons:

I just Wanna Write

Yes, the reason is simple. I just want:

  • A place to leave my thoughts, my opinions, my experience.
  • A place where my writing can be easily shared with people around the world.

Content is my key, maybe some decoration for my site occasionally, but I’m not that interested in pursuing so-called fully control of everything.

I put all my stuff on wordpress.com now, which satisfied most of my needs. Huge amount of themes, lots of useful widgets. Of course, it’s annoying that you are unable to add some super plugins, unable to customize styles. But do you really need them or it’s just temporary curiosity that drives you?

Control or Management?

Jekyll is a simple, blog aware, static site generator. It’s a generator not a CMS (Sometimes, I even feel that it’s unfair to compare Jekyll to WordPress).
By default, WordPress gives you a full featured admin panel out of box, in which, you can

  • Add/delete/schedule posts, and assign privacy
  • Modify tags and catorgories
  • Monitor visitor statistics
  • Control comments

And more… All in one place and visually. However, Jekyll leaves all of these for you to configure. Yes, they have bunch of solutions, plugins, but you need time and patient to make all these done.

Command vs Visualize

It’s hard to make an conclusion on which one is better. Command is fast, and controllable, while visualize is easy to learn and use. As a programmer, I like type commands, it’s cool, it makes you looks like an expert. That’s one of reason that Jekyll attracts me. But I’m lazy too, sometimes life just easier and wonderful if you just move your mouse or fingure around and enjoy the beauty of colorized tables, charts and graphs.

Back to Writing Again

Some said that one of the reasons that makes them move to Jekyll is the love of Markdown. Some disputed the user-friendly experience in writing on WordPress. I use Markdown too, I wrote blog inside of sublime text 2, and use plugin to convert markdown to html, not hard.

Besides, Jekyll does not support markdwon syntax fluently. There’re still lots to consider for better parsing the markdwon styles. So for both systems, you still need to spend time for better display.

Input and Output

In order to move to Jekyll, I need to convert all the posts from WordPress to Jekyll. The convertion is fast, but that doesn’t mean you can use them right away. There’s still some clean up and modification needed like syntax highlight, tags, caterogories and so on.

After data migration, you need design your templates or modify templates from others to suit your needs. You need design or modify the layout, create pages, and so on. During this time, you may confront errors, bugs, wired problems. You have to refer documents, read tutorials to fix them, and understand them. It’s just like headache.

Luckily, there are quick out of box solution, like Jekyll Bootstrap and Github Page for hosting. But there are limitations on using Github, like lack of support for Jekyll plugins. Meanwhile, Github pages will show you 404 page if there’s any mistake inside your blog, and sometimes it’s really hard to find out.

Is it worth for so much work? I think it’s not. Providing there might be more ahead. Jekyll is still under the version of 0.11.2, who can guarantee that there will not be major changes or big bug fix?

Final Thoughts

Jekyll is still young, it’s light and fast. I like some of its features like version control using git, support markdown, Yaml formatted config and so on, but after second thought, I gave up.

But I do believe after a while, there will be more mature solution for out of box blogging, and some easy to use configuration tools, deploying tools for it.

Obtain RSS Feed url from Google Reader Using Ruby Nokogiri Gem

I’ve been using google reader for years and I think it’s by far the best rss reader that I’ve ever used.

By far, I’ve collected lots of valuable rss subscriptions, and also there are many good quality ones recommended by google.

For some reasons, I want to fetch the urls from all those source, and it can easily be done by using ruby’s Nokogiri gem.

First you need to export the source to a OPML format file. Go to reader settings -> Import/Export and choose download your subscriptions.

Then run this ruby script.

require 'Nokogiri'

f = File.open("google_rss_feed.xml")
doc = Nokogiri::XML(f)
doc.xpath("//outline").each { |x| puts x['xmlUrl'] unless x['xmlUrl']==nil }

Finally, it will output all the urls for you. Nice and easy.