Stratus3D

A blog on software engineering by Trevor Brown

Effective Debugging of Memory Leaks in Ruby

Recently at work I was tasked with finding a memory leak in a legacy Ruby application I was unfamiliar with. We had been monitoring the application’s memory usage for several weeks and it was clear that there was a slow memory leak somewhere in the application. When I began to look at the source code I realized it I didn’t know where to begin looking for the memory leak. The application was using a framework and several gems that I was unfamiliar with and was structured different than other Ruby projects I had been working on. I also found that I really missed Erlang’s excellent runtime_tools that I rely on for diagnosing issues in Erlang applications. After a lot of trial and error I finally found and fixed the memory leak. In this blog post I share the steps I devised to find the memory leak in the unfamiliar source code.

Steps to Find a Memory Leak

The list of steps below assumes you have determined there is a memory leak in your Ruby application but don’t know what is causing it. Your first instinct might be to reach for some memory profiling tool and begin looking at memory allocation over time. Some of these steps might seem like a waste of time, but in practice I’ve found them to the most of effective use of time as I often overlook simple things. These first two steps focus on gems. Often times third-party code is more widely used and pitfalls and memory leaks are more widely known.

1. Check for any unused gems in the Gemfile and remove them

There are numerous tools out there to help you find memory leaks in Ruby applications, but with a large codebase even the best tools still produce a lot of noise. If you find even one unused gem it will greatly reduce the amount of code you need to analyze to find the memory leak. Seldom have I not found at least one or two unused gems when reviewing the Gemfile of a legacy Ruby application. Removing unused gems sometimes has the added benefit of reducing overall memory usage.

2. Check the issue tracker of each gem still present in the Gemfile for reports of memory leaks

A gem may contain memory leaks that have already been reported on the gem’s issue tracker or mailing list. If you find a ticket or thread that describes something similar to the leak you are experiencing you may have found your memory leak. If a newer version of the gem contains a fix for the memory leak upgrade to the latest version. If you found an ticket but a fix is not available you may have to work with the maintainers of the gem to get it fixed or fork the project and fix it yourself. If you don’t find someone on the issue tracker describing your issue the changelogs will show if any released versions of the gem contain fixes for memory leaks. Even if a leak does not seem like the one you are experiencing it’s best upgrade to a version that doesn’t have any known leaks. If you do not find the source of the leak continue to step 3.

3. Run Rubocop with the rubocop-performance extension

This isn’t likely to find the cause of the memory leak, but it will alert you to general performance issues, which may give you clues as to where the leak is. If there are any Rubocop performance warnings correct the code and see if the memory leak is still present. The memory leak will likely still be present. If it is continue to step 4.

4. Visually review the Ruby code for possible memory leaks

Read through the application code and look for class attributes that grow in size, arrays that grow but never shrink, and long lived objects. Creating a memory leak is pretty easy. Fix any obvious issues. Don’t spend a ton of time on this; just read through the code quickly and look for any obvious issues. On applications with very large codebases you may need to skip this step as it will be too time consuming. If you haven’t found the cause of the leak continue to step 5.

5. If you still haven’t found the issue, use Ruby’s ObjectSpace class to find the leak

Now things get a little more involved. Follow the steps in the sections below to profile memory usage.

Profiling Memory Usage

We can use Ruby’s ObjectSpace class to dump the heap formatted as JSON. Doing so will allow us to see how memory was allocated at the time the heap was generated. A single dump may be enough to diagnose the cause of the leak but slow leaks will likely not be apparent in a single dump. As stated in Peter Wagenet’s excellent blog post on skylight.io comparing three dumps is best because it allows you to find memory that is allocated but not freed between the second and third dumps.

Generating Memory Dumps

Generating a memory dump is quite easy with the ObjectSpace class. To generate an accurate dump, we first start tracing object allocations, run the garbage collection so we get more accurate data, then dump all the allocations to file as NDJSON. Here is a helper function that does it all:

def dump_memory(file)
  # Start tracing object allocations
  ObjectSpace.trace_object_allocations_start
  # Run garbage collection to get a more accurate dump
  GC.start
  # Dump memory to file
  ObjectSpace.dump_all(output: file)
end

Now all that we need to do is invoke this function three times while our application is running to get the memory allocation dumps we need.

Hooking into the application

Since generating a memory dump is as easy as invoking the function above there are many different ways you can invoke the function. The three methods I use most often are listed below.

Add Directly to the Source Code

For debugging small applications or command line scripts adding dump_memory calls directly to the source code may be easiest. Just look for several places in the source code where dumping memory would make sense. The further apart in time the dump_memory calls are made the easier it will be to identify memory leaks.

Pry

If you already are using pry and have binding.pry calls placed in your code you can just invoke the dump_memory function from the Pry shell.

Rack

For larger applications it might not be clear when or where to dump memory, and it may take more time for a slow leak to manifest. For the most flexibility and ease of use add a simple Rack endpoint that invokes the function. If you are using something like Rack::URLMap adding a new endpoint is easy:

run Rack::URLMap.new(
  'dump-memory' -> MemoryEndpoint.new
  )

Then define the MemoryEndpoint class:

class MemoryEndpoint
  def call(env)
    # Open temp file
    time_str = Time.now.strftime('%Y-%m-%d_%H-%M-%S')
    File.open("tmp/dump-#{time_str}.json", 'w') do |file|
      # Use the helper function we defined above
      dump_memory(file)
    end
    [200, { 'Content-Type' => 'text/plain' }, ['Dumped Memory']]
  end
end

Now whenever you send a request to the /dump-memory endpoint the memory will be dumped to a file in the application’s current directory.

Finding Leaks by Comparing Dumps

Using one of the methods listed above generate at least three memory dumps. Once you have three or more dumps you can use the script below to generate a diff of three dumps. The script was created by Peter Wagenet at Skylight.io and taken from that same blog post I mentioned earlier.

Memory Dump Diff Script

#!/usr/bin/env ruby

require 'set'
require 'json'

if ARGV.length != 3
  puts "Usage: detect_leaks [FIRST.json] [SECOND.json] [THIRD.json]"
  exit 1
end

first_addrs = Set.new
third_addrs = Set.new

# Get a list of memory addresses from the first dump
File.open(ARGV[0], "r").each_line do |line|
  parsed = JSON.parse(line)
  first_addrs << parsed["address"] if parsed && parsed["address"]
end

# Get a list of memory addresses from the last dump
File.open(ARGV[2], "r").each_line do |line|
  parsed = JSON.parse(line)
  third_addrs << parsed["address"] if parsed && parsed["address"]
end

diff = []

# Get a list of all items present in both the second and
# third dumps but not in the first.
File.open(ARGV[1], "r").each_line do |line|
  parsed = JSON.parse(line)
  if parsed && parsed["address"]
    if !first_addrs.include?(parsed["address"]) && third_addrs.include?(parsed["address"])
      diff << parsed
    end
  end
end

# Group items
diff.group_by do |x|
  [x["type"], x["file"], x["line"]]
end.map do |x,y|
  # Collect memory size
  [x, y.count, y.inject(0){|sum,i| sum + (i['bytesize'] || 0) }, y.inject(0){|sum,i| sum + (i['memsize'] || 0) }]
end.sort do |a,b|
  b[1] <=> a[1]
end.each do |x,y,bytesize,memsize|
  # Output information about each potential leak
  puts "Retained #{y} #{x[0]} objects of size #{bytesize}/#{memsize} at: #{x[1]}:#{x[2]}"
end

# Also output total memory usage, because why not?
memsize = diff.inject(0){|sum,i| sum + (i['memsize'] || 0) }
bytesize = diff.inject(0){|sum,i| sum + (i['bytesize'] || 0) }
puts "\n\nTotal Size: #{bytesize}/#{memsize}"

Run this script and pass in the three dump files as arguments. The script will output lines like this:

Retained 77 STRING objects of size 10947/10961 at: ~/.asdf/installs/ruby/2.6.6/lib/ruby/gems/ruby-2.6.6/gems/actionview-4.1.6/lib/action_view/template.rb:297
Retained 15 DATA objects of size 0/51808 at: ~/.asdf/installs/ruby/2.6.6/lib/ruby/gems/ruby-2.6.6/gems/actionview-4.1.6/lib/action_view/template.rb:297
Retained 1 DATA objects of size 0/0 at: ~/.asdf/installs/ruby/2.6.6/lib/ruby/gems/ruby-2.6.6/gems/puma-2.7.1/lib/puma/client.rb:35
...

Each line contains information on objects that were allocated between the first and second dumps, but not freed between the second and third dumps. Objects that are allocated and not freed are possible memory leaks.

Review the output and look at the type and number of retained objects as well as where the objects came from. It may be obvious from the output where the memory leak is but don’t be discouraged if it is not immediately clear. Look for objects that are retained in higher numbers than you would expect, objects from files that are seldom used, or files that should not be creating new objects at all. If you think you’ve found the source of the leak see if you can confirm the leak by either fixing it or removing the leaky code altogether.

It may take a while for slow leak to manifest as a large number of objects in the diff output from this script. If you aren’t able to find the source of the leak you may need to increase the time between the memory dumps to make the leak more obvious. You may need to generate memory dumps several times before you are able to identify the leak in the diff output.

Conclusion

To make best use of our time it is best to resort to profiling only when we have exhausted the easier methods of finding the leak. Even with the ObjectSpace class memory profiling is time consuming. With this list of steps we start with the things that are most likely to help us find the source of the memory leak, like checking for gems with known memory leaks, before moving on to memory profiling.