Archive
words and words and words
Technology ruby on rails optimization
[RAILS] optimizing a slow request

We identified a single slow request in our moderation module: retrieving the json containing the entries to be moderated, feeding our react app.

So in short our datamodel looks as follows:

Class Topic 
  have_many :entries
  have_many :dynamic_attributes

Class Entry
  belongs_to :topic
  has_many :entry_values

Class EntryValue
  belongs_to :entry
  belongs_to :dynamic_attribute 

So in short: topics have a set of (dynamic) attributes that can be entered. An entry_value is the value entered for a dynamic_attribute and those are grouped in a complete entry.

In the moderation, our moderators verify that an entry (collection of entry-values) is appropriate, and have to option to edit or add missing information.

So in our controller we do something like

@entries = @q.result(distinct: true).page params[:page]

We are using ransack to filter/search on our entries. So we know we need entry-values, and possibly their dynamic attributes when building the json, so what is the best approach here? Use eager_load or includes? So what better way to decide than actually test this? So I temporarily changed the controller code as follows:

        preload_option = params[:pre].try(:to_i)
        if preload_option == 0
          @entries = @q.result(distinct: true).page params[:page]
        elsif preload_option == 1
          @entries = @q.result(distinct: true).eager_load(:entry_values).page params[:page]
        else
          @entries = @q.result(distinct: true).includes(:entry_values => :dynamic_attribute).page params[:page]
        end

Note:

  • I can only eager_load one level deep
  • with includes I can immediately fetch all dynamic attributes for the entry-values.

But which will prove to be more beneficial?

So then I ran a small benchmark script:

require 'benchmark'
require 'rest-client'

n = 10
Benchmark.bm do |x|
  x.report("normal ") do
    n.times { RestClient.get("http://admin.lvh.me:3013/moderation/projects/11/entries.json", {}) }
  end
  x.report("eager  ") do
    n.times { RestClient.get("http://admin.lvh.me:3013/moderation/projects/11/entries.json?pre=1", {}) }
  end
  x.report("include") do
    n.times { RestClient.get("http://admin.lvh.me:3013/moderation/projects/11/entries.json?pre=2", {}) }
  end
end

(note: for testing purposes I also disabled the need to authenticate, so I could easily fetch the jsons and time and compare)

This first run gave me the following results:

    normal   0.016362   0.005723   0.022085 ( 35.440171)
    eager    0.010036   0.004336   0.014372 ( 28.632490)
    include  0.012550   0.004061   0.016611 ( 29.173778) 

Ok. Not the kind of improvement I had hoped. Also nice to notice that eager_load in this case is more efficient than using the includes (which seemed a little counter-intuitive maybe).

I had recently changed a small part of the code, because in the moderation we also wanted to be able to edit fields that were not entered, and before we only had to retrieve entered values (:entry_values) so I presume that maybe there I fucked up the performance. Before we called entry.valid_entry_values which looked like

  def valid_entry_values
    entry_values.sorted.select do |ev|
      da = ev.dynamic_attribute
      da.attribute_type != 'item' || (da.attribute_type == 'item' && !ev.item_content_type.nil?)
    end
  end

and I replaced it with the following, adding empty entry-values to be filled in:


  def entry_values_with_empty
    result = []
    self.topic.dynamic_attributes.each do |da|
      ee = entry_values.find_by(dynamic_attribute_id: da.id)
      if ee.nil? || (da.attribute_type == 'item' && ee.item_content_type.nil?)
        ee = entry_values.build(dynamic_attribute: da)
      end
      result << ee
    end
    # check if we have entry-values not yet in the list
    # (e.g. from another topic when the entry was moved, and add those too)
    self.entry_values.each do |entry_value|
      if result.select{|ee| ee.id == entry_value.id}.count == 0
        result << entry_value
      end
    end
   
    result
  end

So what happens if we switch back to the old valid_entry_values : how does that change performance?

I ran my small benchmark script again, and got the following results:

    normal   0.015264   0.005280   0.020544 ( 33.283069)
    eager    0.009901   0.004359   0.014260 ( 17.145350)
    include  0.013153   0.004032   0.017185 ( 17.621856)

Wow! Now the eager_load or includes really seem to pay off. Also: almost the same speed improvement. Ok.

So if we check the entry_values_with_empty more closely, the implementation is somewhat naive: for each dynamic-attribute it will attempt to find the corresponding entry-value, except ... we use a query each time for each dynamic attribute, for each entry ... Mmmmmm. Let's see if we can improve this:

  def entry_values_with_empty
    result = []
    self.topic.dynamic_attributes.each do |da|
      ee = entry_values.detect{|ev| ev.dynamic_attribute_id == da.id}
      if ee.nil? || (da.attribute_type == 'item' && ee.item_content_type.nil?)
        ee = entry_values.build(dynamic_attribute: da)
      end
      result << ee
    end
    # check if we have entry-values not yet in the list
    # (e.g. from another topic when the entry was moved, and add those too)
    self.entry_values.each do |entry_value|
      if result.select{|ee| ee.id == entry_value.id}.count == 0
        result << entry_value
      end
    end

    result
  end

Notice: we only changed one line, replacing the find_by with a detect. This will, instead of launching a new query, iterate over the already retrieved array of entry_values. But does this make any difference?

Launching my small test script now returns the following:

    normal   0.016051   0.005418   0.021469 ( 16.448649)
    eager    0.009454   0.003858   0.013312 ( 22.479142)
    include  0.012419   0.003872   0.016291 ( 14.236868)

NICE! **fireworks** Not what I expected to see at all. A little baffled that the normal case is improved that much, and that the eager_load does not improve it (on the contrary). We have now found our optimal combination: improving the entry_values_with_empty and adding the includes will give the best performance.

Is this what you would have expected? Bottomline remains: it helps to measure (in Dutch we say: meten is weten which rhymes)

More ...
Technology ruby on rails rspec
[RSPEC] Cleaning up orphaned attachments when running specs

So when running the specs we also create a lot of fake attachments, but they are never cleaned up. Which is probably obvious, because we never actually destroy the models (containing the attachments), but truncate the database or rollback the transactions.

So I tried to find a way to 1) automatically/more easily clean up those dummy attachments, and 2) make sure it works when using parallel:specs. And over my different projects, where in some I use different gems to manage my attachments.

In one project, I am using paperclip and there I took the following approach. In the initializer config/initializers/paperclip.rb I wrote

  Paperclip::UriAdapter.register
  if Rails.env.production?
    Paperclip::Attachment.default_options.merge!(
      hash_secret: ENV.fetch('SECRET_KEY_BASE'),
      s3_protocol: :https,
      url: ':s3_domain_url',
      path: "/:class/:attachment/:id/:style/:hash.:extension",
      storage: :s3,
      s3_credentials: { .. }
    )
  elsif Rails.env.development?
    Paperclip::Attachment.default_options.merge!(
      url: "/system/:class/:attachment/:id/:style/:hash.:extension",
      hash_secret: Rails.application.credentials.secret_key_base
    )
  elsif Rails.env.test? || Rails.env.cucumber?
    Paperclip::Attachment.default_options.merge!(
      url: "/spec_#{ENV['TEST_ENV_NUMBER']}/:class/:attachment/:id/:style/:hash.:extension",
      hash_secret: Rails.application.credentials.secret_key_base
    )
  end

and then in rspec rails_helper.rb I can add the following piece of code

  config.after(:suite) do
    FileUtils.remove_dir(File.join(Rails.root, 'public', "spec_#{ENV['TEST_ENV_NUMBER']}"), true)
  end

In another projects I am using carrier_wave and there it is a little more difficult, but it amounts to the same approach. In CarrierWave we create different uploaders, and each have their own configuration. In my project I first iterate over all uploaders in my own code-base, and explicitly require one uploader from our own shared gem (between different projects). So we add an initializer config/carrierwave_clean_spec_attachments.rb (or whatever name you prefer) to override the path when in test mode:

if Rails.env.test? || Rails.env.cucumber?
  Dir["#{Rails.root}/app/uploaders/*.rb"].each { |file| require file }
  require 'document_uploader'

  CarrierWave::Uploader::Base.descendants.each do |klass|
    next if klass.anonymous?
    klass.class_eval do
      def cache_dir
        "#{Rails.root}/spec/support/uploads_#{ENV['TEST_ENV_NUMBER']}/tmp"
      end

      def store_dir
        "#{Rails.root}/spec/support/uploads_#{ENV['TEST_ENV_NUMBER']}/#{model.class.to_s.underscore}/#{mounted_as}/#{model.id}"
      end
    end
  end
end

and then in my rails_helper.rb I can then add the following statement:

config.after(:suite) do
  FileUtils.rm_rf(Dir["#{Rails.root}/spec/support/uploads_#{ENV['TEST_ENV_NUMBER']}"])
end

How do you do this? Do you use another gem for storage/attachments and how do you solve it? E.g. when using ActiveStorage ?

More ...