We identified a single slow request in our moderation module: retrieving the json containing the entries to be moderated, feeding our react app.
So in short our datamodel looks as follows:
Class Topic
have_many :entries
have_many :dynamic_attributes
Class Entry
belongs_to :topic
has_many :entry_values
Class EntryValue
belongs_to :entry
belongs_to :dynamic_attribute
So in short: topics
have a set of (dynamic) attributes that can be entered. An entry_value
is the value entered for a dynamic_attribute
and those are grouped in a complete entry
.
In the moderation, our moderators verify that an entry (collection of entry-values) is appropriate, and have to option to edit or add missing information.
So in our controller we do something like
@entries = @q.result(distinct: true).page params[:page]
We are using ransack
to filter/search on our entries. So we know we need entry-values, and possibly their dynamic attributes when building the json, so what is the best approach here? Use eager_load
or includes
? So what better way to decide than actually test this? So I temporarily changed the controller code as follows:
preload_option = params[:pre].try(:to_i)
if preload_option == 0
@entries = @q.result(distinct: true).page params[:page]
elsif preload_option == 1
@entries = @q.result(distinct: true).eager_load(:entry_values).page params[:page]
else
@entries = @q.result(distinct: true).includes(:entry_values => :dynamic_attribute).page params[:page]
end
Note:
- I can only
eager_load
one level deep
- with
includes
I can immediately fetch all dynamic attributes for the entry-values.
But which will prove to be more beneficial?
So then I ran a small benchmark script:
require 'benchmark'
require 'rest-client'
n = 10
Benchmark.bm do |x|
x.report("normal ") do
n.times { RestClient.get("http://admin.lvh.me:3013/moderation/projects/11/entries.json", {}) }
end
x.report("eager ") do
n.times { RestClient.get("http://admin.lvh.me:3013/moderation/projects/11/entries.json?pre=1", {}) }
end
x.report("include") do
n.times { RestClient.get("http://admin.lvh.me:3013/moderation/projects/11/entries.json?pre=2", {}) }
end
end
(note: for testing purposes I also disabled the need to authenticate, so I could easily fetch the jsons and time and compare)
This first run gave me the following results:
normal 0.016362 0.005723 0.022085 ( 35.440171)
eager 0.010036 0.004336 0.014372 ( 28.632490)
include 0.012550 0.004061 0.016611 ( 29.173778)
Ok. Not the kind of improvement I had hoped. Also nice to notice that eager_load
in this case is more efficient than using the includes
(which seemed a little counter-intuitive maybe).
I had recently changed a small part of the code, because in the moderation we also wanted to be able to edit fields that were not entered, and before we only had to retrieve entered values (:entry_values
) so I presume that maybe there I fucked up the performance. Before we called entry.valid_entry_values
which looked like
def valid_entry_values
entry_values.sorted.select do |ev|
da = ev.dynamic_attribute
da.attribute_type != 'item' || (da.attribute_type == 'item' && !ev.item_content_type.nil?)
end
end
and I replaced it with the following, adding empty entry-values
to be filled in:
def entry_values_with_empty
result = []
self.topic.dynamic_attributes.each do |da|
ee = entry_values.find_by(dynamic_attribute_id: da.id)
if ee.nil? || (da.attribute_type == 'item' && ee.item_content_type.nil?)
ee = entry_values.build(dynamic_attribute: da)
end
result << ee
end
# check if we have entry-values not yet in the list
# (e.g. from another topic when the entry was moved, and add those too)
self.entry_values.each do |entry_value|
if result.select{|ee| ee.id == entry_value.id}.count == 0
result << entry_value
end
end
result
end
So what happens if we switch back to the old valid_entry_values
: how does that change performance?
I ran my small benchmark script again, and got the following results:
normal 0.015264 0.005280 0.020544 ( 33.283069)
eager 0.009901 0.004359 0.014260 ( 17.145350)
include 0.013153 0.004032 0.017185 ( 17.621856)
Wow! Now the eager_load
or includes
really seem to pay off. Also: almost the same speed improvement. Ok.
So if we check the entry_values_with_empty
more closely, the implementation is somewhat naive: for each dynamic-attribute
it will attempt to find the corresponding entry-value
,
except ... we use a query each time for each dynamic attribute, for each entry ... Mmmmmm. Let's see if we can improve this:
def entry_values_with_empty
result = []
self.topic.dynamic_attributes.each do |da|
ee = entry_values.detect{|ev| ev.dynamic_attribute_id == da.id}
if ee.nil? || (da.attribute_type == 'item' && ee.item_content_type.nil?)
ee = entry_values.build(dynamic_attribute: da)
end
result << ee
end
# check if we have entry-values not yet in the list
# (e.g. from another topic when the entry was moved, and add those too)
self.entry_values.each do |entry_value|
if result.select{|ee| ee.id == entry_value.id}.count == 0
result << entry_value
end
end
result
end
Notice: we only changed one line, replacing the find_by
with a detect
.
This will, instead of launching a new query, iterate over the already retrieved array of entry_values
. But does this make any difference?
Launching my small test script now returns the following:
normal 0.016051 0.005418 0.021469 ( 16.448649)
eager 0.009454 0.003858 0.013312 ( 22.479142)
include 0.012419 0.003872 0.016291 ( 14.236868)
NICE! **fireworks**
Not what I expected to see at all. A little baffled that the normal case is improved that much, and that the eager_load
does not improve it (on the contrary). We have now found our optimal combination: improving the entry_values_with_empty
and adding the includes
will give the best performance.
Is this what you would have expected? Bottomline remains: it helps to measure (in Dutch we say: meten is weten
which rhymes)