Tracking SQL queries in Rails
When developing or optimizing Rails applications, one of the major concerns is about the underlying SQL queries. We ask questions like:
- How many SQL queries have been triggered after certain request(s)?
- How long does it take to finish the SQL query?
- Which SQL query spends most of the time?
- Is there any duplicate query triggered in different places of the code?
- Which SQL query happens more frequent than the others?
and so on. Oftentimes, we figure them out by simply checking the log file, but if we want to analyze one request that turns out to be complicate or if we want to track many requests altogether, it could be hard for us to find out all the information we are interested and filter out those we don’t.
We can use tools like rack-mini-profiler, but it’s hard to use it when there is no html page rendering (e.g. json, ajax response), or when we want to get an aggregate result across multiple requests.
To answer some of the questions, I created a simple gem called sql_tracker. In this post, I will show you how sql_tracker
works, and its features that may help you to track and analyze SQL queries.
Tracking
Basically, sql_tracker
uses instrumentation API to track SQL queries that are called from Active Record
, by subscribing to sql.active_record
hook:
ActiveSupport::Notifications.subscribe('sql.active_record') do |_name, _start, _finish, _id, payload|
sql_query = payload[:sql]
end
Instead of using a block
, you can provide an instance object, that implements call
method:
class Handler
def call(_name, _start, _finish, _id, payload)
sql_query = payload[:sql]
end
end
ActiveSupport::Notifications.subscribe('sql.active_record', Handler.new)
sql_tracker
initializes and starts to track the queries when a Rails process runs, and stops the tracking when the Rails process exits.
Filtering
You may not be interested in listening to all queries. For example, you may only want to track SELECT
type of queries, or you may want to track queries called from within certain folders.
By default, sql_tracker
tracks and records all four commands (SELECT
, INSERT
, UPDATE
, DELETE
), but it allows you to change this by using tracked_sql_command
setting variable:
SqlTracker::Config.tracked_sql_command = %w(SELECT)
Note that this setting is case insensitive, you can use select
if you prefer.
You can also change the tracked paths/folders, by using tracked_paths
setting:
SqlTracker::Config.tracked_paths = %w(app/controllers/api)
By default, sql_tracker
tracks only app
and lib
folders to ignore unrelated queries called from tests for example.
sql_tracker
uses Ruby’s caller method with Rails’ backtrace_cleaner
to achieve filtering by paths. The caller
method returns an array of the current execution stack, and the backtrace_cleaner
helps to clean up the stack information using regex. The simplified code looks like this:
Rails.backtrace_cleaner.add_silencer do |line|
line !~ %r{^(#{tracked_paths.join('|')})\/}
end
cleaned_trace = Rails.backtrace_cleaner.clean(caller)
# if cleaned_trace is empty, then the caller of this query is not included in tracked_paths
return false if cleaned_trace.empty?
Grouping
sql_tracker
tries to normalize and group similar queries by replacing specific values with xxx
. For example, if you make a request that triggers a SQL query:
SELECT users.* FROM users WHERE users.id = 1
And then there is another query:
SELECT users.* FROM users WHERE users.id = 2
sql_tracker
will group them into one query:
SELECT users.* FROM users WHERE users.id = xxx
Underneath, it’s using regular expressions to redact the SQL queries. For example, it tries to find and replace values after comparison operators, such as =
, <>
, >
, <
, or even BETWEEN ... AND ...
.
After they are grouped, it becomes easier to aggregate the queries, and calculate the total count and average durations etc.
Storing
sql_tracker
keeps all tracking data into memory, and exports the data into a JSON file when the Rails process exits. All data contains in a Hash
, and the format looks like this:
{
key1: {
sql: 'SELECT users.* FROM users ...',
count: 1,
duration: 0.34,
source: ['apps/model/user.rb:57:in ...', ...]
},
key2: { ... },
... ...
}
where the keys are md5
of normalized sql queries, and the values are the full sql query and some statistics.
By default, the file will be saved into tmp
folder under Rails root
directory, but you can modify it in your config:
SqlTracker::Config.output_path = File.join(Rails.root.to_s, 'my_folder')
If you are using app server like puma
, and set more than one worker, you will see more than one JSON output file within the output folder, since each worker will track and save the data separately.
Reporting
Finally, we can generate a report by using one or more JSON dump files:
sql_tracker tmp/sql_tracker-*.json
The report looks like this:
==================================
Total Unique SQL Queries: 24
==================================
Count | Avg Time (ms) | SQL Query | Source
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
8 | 0.33 | SELECT `users`.* FROM `users` WHERE `users`.`id` = xxx LIMIT 1 | app/controllers/users_controller.rb:125:in `create'
| | | app/controllers/projects_controller.rb:9:in `block in update'
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
4 | 0.27 | SELECT `projects`.* FROM `projects` WHERE `projects`.`user_id` = xxx AND `projects`.`id` = xxx LIMIT 1 | app/controllers/projects_controller.rb:4:in `update'
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2 | 0.27 | UPDATE `projects` SET `updated_at` = xxx WHERE `projects`.`id` = xxx | app/controllers/projects_controller.rb:9:in `block in update'
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2 | 1.76 | SELECT projects.* FROM projects WHERE projects.priority BETWEEN xxx AND xxx ORDER BY created_at DESC | app/controllers/projects_controller.rb:35:in `index'
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
... ...
It shows you the total count of each query, the average time of each query, and which line of source code calls the query. From this report, you can get an idea of which query happens the most, and where it happens. If similar query happens a lot, it’s probably a good starting point to add cache on it. And if there are a lot of places in the source code that trigger the query, it may be a good spot to refactoring the code.
The current report is still quite simple, but I’ve already found it’s very helpful for me to locate the part to improve in my use cases.
I like using sql_tracker
in testing for profiling. Normally, I pick some controller tests or integration tests to run, and then after the tests are finished, run sql_tracker tmp/sql_tracker-*.json
against newly dumped data to verify the queries.
Conclusion
You can find the code and more details at: https://github.com/steventen/sql_tracker. And of course, there are still a lot to improve. I hope you can use it and like it, and feel free to ask questions, report bugs and send PRs.