Tracking SQL queries in Rails

When developing or optimizing Rails applications, one of the major concerns is about the underlying SQL queries. We ask questions like:

How many SQL queries have been triggered after certain request(s)?
How long does it take to finish the SQL query?
Which SQL query spends most of the time?
Is there any duplicate query triggered in different places of the code?
Which SQL query happens more frequent than the others?

and so on. Oftentimes, we figure them out by simply checking the log file, but if we want to analyze one request that turns out to be complicate or if we want to track many requests altogether, it could be hard for us to find out all the information we are interested and filter out those we don’t.

We can use tools like rack-mini-profiler, but it’s hard to use it when there is no html page rendering (e.g. json, ajax response), or when we want to get an aggregate result across multiple requests.

To answer some of the questions, I created a simple gem called sql_tracker. In this post, I will show you how sql_tracker works, and its features that may help you to track and analyze SQL queries.

Tracking

Basically, sql_tracker uses instrumentation API to track SQL queries that are called from Active Record, by subscribing to sql.active_record hook:

ActiveSupport::Notifications.subscribe('sql.active_record') do |_name, _start, _finish, _id, payload|
 sql_query = payload[:sql]
end

Instead of using a block, you can provide an instance object, that implements call method:

class Handler
  def call(_name, _start, _finish, _id, payload)
    sql_query = payload[:sql]
  end
end

ActiveSupport::Notifications.subscribe('sql.active_record', Handler.new)

sql_tracker initializes and starts to track the queries when a Rails process runs, and stops the tracking when the Rails process exits.

Filtering

You may not be interested in listening to all queries. For example, you may only want to track SELECT type of queries, or you may want to track queries called from within certain folders.

By default, sql_tracker tracks and records all four commands (SELECT, INSERT, UPDATE, DELETE), but it allows you to change this by using tracked_sql_command setting variable:

SqlTracker::Config.tracked_sql_command = %w(SELECT)

Note that this setting is case insensitive, you can use select if you prefer.

You can also change the tracked paths/folders, by using tracked_paths setting:

SqlTracker::Config.tracked_paths = %w(app/controllers/api)

By default, sql_tracker tracks only app and lib folders to ignore unrelated queries called from tests for example.

sql_tracker uses Ruby’s caller method with Rails’ backtrace_cleaner to achieve filtering by paths. The caller method returns an array of the current execution stack, and the backtrace_cleaner helps to clean up the stack information using regex. The simplified code looks like this:

Rails.backtrace_cleaner.add_silencer do |line|
  line !~ %r{^(#{tracked_paths.join('|')})\/}
end

cleaned_trace = Rails.backtrace_cleaner.clean(caller)

# if cleaned_trace is empty, then the caller of this query is not included in tracked_paths
return false if cleaned_trace.empty?

Grouping

sql_tracker tries to normalize and group similar queries by replacing specific values with xxx. For example, if you make a request that triggers a SQL query:

SELECT users.* FROM users WHERE users.id = 1

And then there is another query:

SELECT users.* FROM users WHERE users.id = 2

sql_tracker will group them into one query:

SELECT users.* FROM users WHERE users.id = xxx

Underneath, it’s using regular expressions to redact the SQL queries. For example, it tries to find and replace values after comparison operators, such as =, <>, >, <, or even BETWEEN ... AND ....

After they are grouped, it becomes easier to aggregate the queries, and calculate the total count and average durations etc.

Storing

sql_tracker keeps all tracking data into memory, and exports the data into a JSON file when the Rails process exits. All data contains in a Hash, and the format looks like this:

{
  key1: {
    sql: 'SELECT users.* FROM users ...',
    count: 1,
    duration: 0.34,
    source: ['apps/model/user.rb:57:in ...', ...]
  },
  key2: { ... },
  ... ...
}

where the keys are md5 of normalized sql queries, and the values are the full sql query and some statistics.

By default, the file will be saved into tmp folder under Rails root directory, but you can modify it in your config:

SqlTracker::Config.output_path = File.join(Rails.root.to_s, 'my_folder')

If you are using app server like puma, and set more than one worker, you will see more than one JSON output file within the output folder, since each worker will track and save the data separately.

Reporting

Finally, we can generate a report by using one or more JSON dump files:

sql_tracker tmp/sql_tracker-*.json

The report looks like this:

==================================
Total Unique SQL Queries: 24
==================================
Count | Avg Time (ms)   | SQL Query                                                                                                 | Source
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
8     | 0.33            | SELECT `users`.* FROM `users` WHERE `users`.`id` = xxx LIMIT 1                                            | app/controllers/users_controller.rb:125:in `create'
      |                 |                                                                                                           | app/controllers/projects_controller.rb:9:in `block in update'
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
4     | 0.27            | SELECT `projects`.* FROM `projects` WHERE `projects`.`user_id` = xxx AND `projects`.`id` = xxx LIMIT 1    | app/controllers/projects_controller.rb:4:in `update'
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2     | 0.27            | UPDATE `projects` SET `updated_at` = xxx WHERE `projects`.`id` = xxx                                      | app/controllers/projects_controller.rb:9:in `block in update'
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2     | 1.76            | SELECT projects.* FROM projects WHERE projects.priority BETWEEN xxx AND xxx ORDER BY created_at DESC      | app/controllers/projects_controller.rb:35:in `index'
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
... ...

It shows you the total count of each query, the average time of each query, and which line of source code calls the query. From this report, you can get an idea of which query happens the most, and where it happens. If similar query happens a lot, it’s probably a good starting point to add cache on it. And if there are a lot of places in the source code that trigger the query, it may be a good spot to refactoring the code.

The current report is still quite simple, but I’ve already found it’s very helpful for me to locate the part to improve in my use cases.

I like using sql_tracker in testing for profiling. Normally, I pick some controller tests or integration tests to run, and then after the tests are finished, run sql_tracker tmp/sql_tracker-*.json against newly dumped data to verify the queries.

Conclusion

You can find the code and more details at: https://github.com/steventen/sql_tracker. And of course, there are still a lot to improve. I hope you can use it and like it, and feel free to ask questions, report bugs and send PRs.