Software Development Team: Garbage Collection for Rust: The Finalizer Frontier
Software Development Team
About
Events
Jobs
People
Projects
Publications
Software
Talks
Cite this paper as:
Jacob Hughes, Laurence Tratt,
Garbage Collection for Rust: The Finalizer Frontier,
OOPSLA 2025, Pages 3588 - 3614, October 2025.
Also available as:
DOI
PDF (arxiv)
See also:
Experiment
DOI
).
Garbage Collection for Rust: The Finalizer Frontier
Jacob Hughes
and
Laurence Tratt
October, 2025
Abstract
Rust is a non-Garbage Collected (GCed) language, but the lack of GC
makes expressing data-structures that require shared ownership awkward,
inefficient, or both. In this paper we explore a new design for, and
implementation of, GC in Rust, called
Alloy
. Unlike previous approaches to GC in
Rust,
Alloy
allows existing Rust
destructors to be automatically used as GC finalizers: this makes
Alloy
integrate better with existing Rust code
than previous solutions but introduces surprising soundness and
performance problems.
Alloy
provides
novel solutions for the core problems:
finalizer safety
analysis
rejects unsound destructors from automatically being
reused as finalizers;
finalizer elision
optimises away
unnecessary finalizers; and
premature finalizer prevention
ensures that finalizers are only run when it is provably safe to do
so.
Introduction
Amongst the ways one can classify programming languages are whether they are Garbage Collected
(GCed) or not: GCed languages enable implicit memory management; non-GCed languages require
explicit memory management (e.g
's
malloc
free
functions). Rust's use of affine types [
25
, p. 5] and
ownership does not fit within this classification: it is not GCed but it has implicit scope-based
memory management. Most portions of Rust programs are as succinct as a GCed equivalent, but
ownership is too inflexible to express
shared ownership
for data-structures that require multiple
owners (e.g. doubly linked lists). Workarounds such as reference counting impose an extra
burden on the programmer, make mistakes more likely, and often come with a performance
penalty.
In an attempt to avoid such problems, there are now a number of GCs for Rust (e.g. [
11
14
32
33
]).
Most introduce a user-visible type
Gc
which takes a value
of type
and moves
to the 'GC
heap'. The
Gc
value itself is a wrapper around a pointer to
on the GC heap.
Gc
can be
cloned
(i.e. duplicated) and
dereferenced
to a value of type
&T
(i.e. a type-safe pointer) at
will by the user. When no
Gc
wrappers pointing to
can be found, indirectly or directly,
from the program's
roots
(e.g. variables on the stack), then the GC heap memory for
can be
reclaimed.
It has proven hard to find a satisfying design and implementation for a GC for Rust, as perhaps
suggested by the number of attempts to do so. We identify two fundamental challenges for
GC for Rust: how to give
Gc
an idiomatic and complete API; and how to make
finalizers
(i.e. the code that is run just before a value is collected by the GC) safe, performant, and
ergonomic.
struct GcNode { value: u8, nbr: Option
impl Drop for GcNode { fn drop(&mut self) { println!("drop {}", self.value); } }
fn main() {
let mut gc1 = Gc::new(RefCell::new(GcNode { value: 1, nbr: None } ));
gc1.borrow_mut().nbr = Some(gc1);
let gc2 = Gc::new(RefCell::new(GcNode { value: 2, nbr: None } ));
gc2.borrow_mut().nbr = Some(gc2);
gc1 = gc2;
force_gc();
println!("{} {}", gc1.borrow().value, gc1.borrow().nbr.unwrap().borrow().value);
Listing 1.
An
Alloy
example, showing
Gc
and destructors as finalizers. We create a type
GcNode
which
models a graph: it stores an 8 bit integer value and a possibly-null
reference (via Rust's standard
Option
type) to a neighbouring node
line 1
). We add a normal Rust destructor which
Alloy
is able to use as a
finalizer when
GcNode
is used inside
Gc
line 2
).
Inside
main
we create the first GCed node in the graph (
line 5
).
We use Rust's normal
RefCell
type to allow the node to be mutated
(using the
RefCell::borrow_mut
method which dynamically checks
for mutation that would undermine Rust's static rules)
and a cycle created directly back to itself (
line 6
). We then create a second cyclic graph (
lines 7
and
), immediately assigning it to the
gc1
variable (
line 9
):
this copies, rather than moves, the
Gc
This causes the first cyclic graph
GcNode{value: 1, ..}
to no longer be reachable, so after forcing a collection (
line 10
) that node
can be collected. Its finalizer is then scheduled to be run, causing
drop 1
to be printed out at a later point; when it has completed the GC
heap memory can be reclaimed. The print statement outputs
2 2
line
11
).
In this paper we introduce
Alloy
, a new GC for Rust: an example of its use is shown in Listing
Alloy
uses
conservative
garbage collection (i.e. treating each reachable machine word as a potential
pointer), which naturally solves the API challenge. However, the finalization challenge is much
more involved: the causes of this challenge, and our solutions to it, occupy the bulk of this
paper.
Normal Rust code uses
destructors
(i.e. code which is run just before a value is reclaimed by Rust's
implicit memory management) extensively. Although finalizers and destructors may seem to be
synonyms, existing GCs for Rust cannot reuse destructors as finalizers: the latter must be manually
implemented for each type that needs it. Unfortunately, even this is trickier than it appears: it is not
possible to implement a finalizer for
Gc
if
is an external library; some parts of destructors are
automatically created by the Rust compiler, but hand-written finalizers must duplicate those parts
manually; and users can accidentally cause a type's finalizer to be run more than once. In short,
finalization in existing GCs for Rust is unpalatable.
GCs for Rust are not alone in requiring manually written finalizers. In a close cousin to our work, a
GC proposal for C++, the reuse of destructors as finalizers was ruled out due to seemingly
insoluble problems [
, p. 32], which we divide into four categories: (1) some safe destructors are
not safe finalizers; (2) finalizers can be run prematurely; (3) running finalizers on the same
thread as a paused mutator can cause race conditions and deadlocks; (4) and finalizers are
prohibitively slower than destructors. All are, at least to some degree, classical GC problems; all are
exacerbated in some way by Rust; and none, with the partial exception of #2, has existing
solutions.
We show that it is possible to reuse most Rust destructors as finalizers in a satisfying way. We
introduce novel solutions to the long-standing problems this implies by making use of some of Rust's
unusual static guarantees. We thus gain a better GC for Rust
and
solutions to open GC problems. Our
solutions, in order, are: (1)
finalizer safety analysis
extends Rust's static analyses to reject
programs whose destructors are not provably safe to be used as finalizers; (2)
premature finalizer
prevention
automatically inserts fences to prevent the GC from being 'tricked' into collecting
values before they are dead; (3) we run finalizers on a separate thread; and (4) and
finalizer
elision
statically optimises away finalizers if the underlying destructor duplicates the GC's
work.
Alloy
as an implementation is necessarily tied to Rust, though most of the novel techniques in this
paper rely on general properties of affine types and ownership. While we do not wish to claim generality
without evidence, it seems likely that many of the techniques in this paper will generalise to other
ownership-based languages, as and when such emerge.
Although
Alloy
is not production ready, its performance is already reasonable: when we control for
the (admittedly somewhat slow) conservative GC (
BDWGC
Alloy
currently uses, the performance of
Alloy
varies from 0.74× to, in the worst case, 1.17× that of reference counting.
Alloy
is also sufficiently
polished (e.g. good quality error messages) in other ways for it to: show a plausible path forwards for
those who may wish to follow it; and to allow others to evaluate whether GC for Rust is a good idea or
not.
This paper is divided into four main parts: GC and Rust background (Section
);
Alloy
's basic
design (Section
); destructor and finalizer challenges and solutions (Sections 4 to 7); and
evaluation (Section
). The first three parts have the challenge that our work straddles two
areas that can seem mutually exclusive: GC and Rust. We have tried to provide sufficient
material for readers expert in one of these areas to gain adequate familiarity with the other,
without boring either, but we encourage readers to skip material they are already comfortable
with.
Background
2.1
The Challenges of Shared Ownership in Rust
Rust uses affine types and
ownership
to statically guarantee that: a value has a single owner (e.g. a
variable); an owner can
move
(i.e. permanently transfer the ownership of) a value to another owner;
and when a value's owner goes out of scope, the value's destructor is run and its backing
memory reclaimed. An owner can pass
references
to a value to other code, subject to the
following static restrictions: there can be multiple immutable references ('
') to a value or a single
mutable reference ('
&mut
'); and references cannot outlast the owner. These rules allow many
Rust programs to be as succinct as their equivalents in GCed languages. This suggests that
the search for a good GC for Rust may be intellectually stimulating but of little practical
value.
However, there are many programs which need to express data structures which do not fit into the
restrictions of affine types and ownership. These are often described as 'cyclic data-structures', but in this
paper we use the more abstract term 'shared ownership', which includes, but is not limited to, cyclic
data-structures.
A common way of expressing shared ownership is to use the reference counting type
Rc
from
Rust's standard library. For many data-structures, this is a reasonable solution, but some forms of shared
ownership require juggling strong and weak counts. This complicates programs (see Listing
) and can
cause problems when values live for shorter or longer than intended.
struct RcNode { value: u8, nbr: Option
impl Drop for RcNode { fn drop(&mut self) { println!("drop {}", self.value); } }
fn main() {
let mut rc1 = Rc::new(RefCell::new(RcNode{value: 1, nbr: None}));
rc1.borrow_mut().nbr = Some(Rc::downgrade(&rc1));
let rc2 = Rc::new(RefCell::new(RcNode{value: 2, nbr: None}));
rc2.borrow_mut().nbr = Some(Rc::downgrade(&rc2));
rc1 = Rc::clone(&rc2);
println!("{} {}", rc1.borrow().value,
rc1.borrow().nbr.as_ref().unwrap().upgrade().unwrap().borrow().value);
Listing 2.
A version of Listing
using Rust's standard reference counting type
Rc
. To avoid
memory leaks we use
weak
references between nodes (
line 1
). We again create two cyclic
graphs (
lines 5
) using
Rc::downgrade
to create weak references (
lines 6
and
). Since
Rc
is
not copyable, we must use a manual
clone
call to have both the
rc1
and
rc2
variables point to
the same cyclic graph (
line 9
). Accessing a neighbour node becomes a delicate dance requiring
upgrading the weak reference (
line 11
). The need to downgrade
Rc
to
Weak
and upgrade
(which may fail, hence the
unwrap
) back to
Rc
creates significant extra complexity relative
to Listing
: compare line 11 in Listing
to
lines 10
11
above.
A different solution is to store values in a vector and use integer indices into that vector. Such indices
are morally closer to machine pointers than normal Rust references: the indices can become stale,
dangle, or may never have been valid in the first place. The programmer must also manually
deal with issues such as detecting unused values, compaction, and so on. In other words,
the programmer ends up writing a partial GC themselves. A variant of this idea are
arenas
which gradually accumulate multiple values but free all of them in one go: values can no
longer be reclaimed too early, though arenas tend to unnecessarily increase the lifetime of
values.
A type-based approach is
GhostCell
35
], which uses
branding
to statically ensure that at any given
point only one part of a program can access a shared ownership data-structure. This necessarily excludes
common use cases where multiple owners (e.g. in different threads) need to simultaneously access
disjoint parts of a data-structure.
Although it is easily overlooked, some workarounds (e.g.
Rc
) rely on using
unsafe
Rust
(i.e. parts of the language, often involving pointers, that are not fully statically checked by the
compiler). Pragmatically, we assume that widely used code, even if technically unsafe, has
been pored over sufficiently that it is trustworthy in practise. However, 'new' solutions that a
programmer implements using unsafe Rust are unlikely to immediately reach the same level of
trustworthiness.
While we do not believe that every Rust program would be improved by GC, the variety of
workarounds already present in Rust code, and the difficultly of creating new ones, suggests that there is
a subset that would benefit from GC.
2.2
GC Terminology
GC is a venerable field and has accumulated terminology that can seem unfamiliar or unintuitive. We mostly
use the same terminology as Jones et al [
19
],
the major parts of which we define here.
A program which uses GC is split between the
mutator
(the user's program) and the
collector
(the GC
itself). At any given point in time, a thread is either running as a mutator or a collector. In our context, all
threads run as a collector at least sometimes (for reasons that will become apparent later, some threads
always run as a collector). Tracing and reclamation is performed during a
collection
phase. Our
collections always
stop-the-world
, where all threads running mutator code are paused while collection
occurs.
tracing
GC is one that scans memory looking for reachable values from a program's roots: values,
including cycles of values, that are not reachable from the roots can then be
reclaimed
. In contrast, a pure
reference counting GC does not scan memory, and thus cannot free values that form a cycle. Increasingly,
GC implementations make use of multiple techniques (see [
]) but, for simplicity's sake, we assume that
implementations wholly use one technique or another except otherwise stated. For brevity, we use 'GC'
as a short-hand for 'tracing GC'; when we deal with other kinds of GC (e.g. reference counting), we
explicitly name them.
We refer to memory which is allocated via
Gc
as being on the
GC heap
. We use the term 'GC value'
to refer both to the pointer wrapped in a
Gc
and the underlying value on the GC heap, even though
multiple pointers / wrappers can refer to a single value on the GC heap, unless doing so would lead to
ambiguity.
We use '
Alloy
' to refer to the combination of: our extension to the Rust language; our modifications to
the
rustc
compiler; and our integration of the Boehm-Demers-Weiser GC (
BDWGC
) into the runtime of
programs compiled with our modified
rustc
Alloy
: Design and Implementation
In this section we outline
Alloy
's basic design and implementation choices – the rest of the paper then
goes into detail on the more advanced aspects.
3.1
Basic Design
Alloy
provides a
Gc
type that exposes an API modelled on the
Rc
type from Rust's
standard library, because
Rc
: is conceptually similar to
Gc
; widely used in Rust code, and
its API familiar; and that API reflects long-term experience about what Rust programmers
need.
When a user calls
Gc::new(v)
, the value
is moved to the GC heap: the
Gc
value returned
to the user is a simple wrapper around a pointer to
's new address. The same underlying
GCed value may thus have multiple, partly or wholly overlapping, references active at any
point. To avoid undermining Rust's ownership system, dereferencing a
Gc
produces an
immutable (i.e. '
') reference to the underlying value. If the user wishes to mutate the underlying
value, they must use other Rust types that enable
interior mutability
(e.g.
RefCell
or
Mutex
).
One feature that
Alloy
explicitly supports is the ability in Rust to cast references to raw pointers and
back again. This can occur in two main ways.
Gc
can be dereferenced to
&T
which can
then, as with any other reference, be converted to *const T (i.e. a C-esque pointer to T).
Gc
also supports the common Rust functions (
into_raw
and
from_raw
) which wrap the
value-to-pointer conversion in a slightly higher-level API. The ability to convert references to raw
pointers is used in many places (e.g. Rust's standard C Foreign Function Interface (FFI)).
We believe that a viable GC for Rust must allow the same conversions, but doing so has a
profound impact because Rust allows raw pointers to be converted to the integer type
usize
and
back
Having acknowledged that pointers can be 'disguised' as integers, it is then inevitable that
Alloy
must
be a conservative GC: if a machine word's integer value, when considered as a pointer, falls within a
GCed block of memory, then that block itself is considered reachable (and is transitively scanned). Since a
conservative GC cannot know if a word is really a pointer, or is a random sequence of bits that happens
to be the same as a valid pointer, this over-approximates the
live set
(i.e. the blocks that the GC will not
reclaim). Typically the false detection rate is very low (see e.g. a Java study which measures it at under
0.01% of the live set [
28
]).
Conservative GC occupies a grey zone in programming language semantics: in most languages, and
most compiler's internal semantics, conservative GC is, formally speaking, unsound; and
furthermore some languages (including Rust) allow arbitrary 'bit fiddling' on pointers, temporarily
obscuring the address they are referring to. Despite this, conservative GC is widely used,
including in the two most widespread web browsers: Chrome uses it in its Blink rendering
engine [
] and Safari uses it in its JavaScript VM JavaScriptCore [
26
]. Even in 2025,
we lack good alternatives to conservative GC: there is no cross-platform API for precise GC; and while
some compilers such as LLVM provide some support for GC features [
23
], we have found them
incomplete and buggy. Despite the potential soundness worries, conservative GC thus remains a widely
used technique.
Conservative GC enables
Alloy
to make a useful ergonomic improvement over most other GCs
for Rust whose
Gc
is only
cloneable
. Such types can be duplicated, but doing so requires
executing arbitrary user code. To make the possible run-time cost of this clear, Rust has no
direct syntax for cloning: users must explicitly call
Rc::clone(&v)
to duplicate a value
. In
contrast, since
Alloy
's
Gc
is just a wrapper around a pointer it is not just cloneable but
also
copyable
: duplication only requires copying bytes (i.e. no arbitrary user code need be
executed). Copying is implied by assignment (i.e.
w = v
), reducing the need for explicit
cloning
This is not just a syntactic convenience but also reflects an underlying semantic difference: duplicating a
Gc
in
Alloy
is is a cheaper and simpler operation than most other GCs for Rust which which tend to
rely, at least in part, on reference counting.
There is one notable limitation of
Gc
's API relative to
Rc
. The latter, by definition, knows how
many references there are to the underlying data, allowing the value stored inside it to be mutably
borrowed at run-time if there is only a single reference to it (via
get_mut
and
make_mut
). In contrast,
Gc
cannot know how many references there are to the underlying data. As we shall see in Section
some Rust programs are built around the performance advantages of this API (e.g. turning 'copy on
write' into just 'write' in some important cases).
3.2
Basic Implementation
The most visible aspect of
Alloy
is its fork, and extension of, the standard Rust compiler
rustc
. We
forked
rustc
1.79.0, adding or changing approximately 5,500 Lines of Code (LoC) in the core compiler,
and adding approximately 2,250 LoC of tests.
Alloy
uses
BDWGC
] as the underlying conservative GC, because it is the most widely ported
conservative GC we know of. We use
BDWGC
's
GC_set_finalize_on_demand(1)
API, which causes
finalizers to be run on their own thread.
We had to make some minor changes to
BDWGC
to suit our situation. First, we disabled
BDWGC
's
parallel collector because it worsens
Alloy
's performance. It is unclear to us why this happens:
we observe significant lock contention within
BDWGC
during GC collections, but have not
correlated this with a cause. Second,
BDWGC
cannot scan pointers stored in thread locals
because these are platform dependent. Fortunately,
rustc
uses LLVM's thread local storage
implementation, which stores such pointers in the
PT_TLS
segment of the ELF binary: we modified
BDWGC
to scan this ELF segment during each collection. Third,
BDWGC
dynamically intercepts
thread creation calls so that it can can scan their stacks, but (for bootstrapping reasons) is
unable to do so in our context: we explicitly changed
Alloy
to register new threads with
BDWGC
Destructors and Finalizers
In many GCed languages, 'destructor' and 'finalizer' are used as synonyms, as both terms refer to code
run when a value's lifetime has ended. In existing GCs for Rust, these two terms refer to completely
different hierarchies of code (i.e. destructors and finalizers are fundamentally different). In
Alloy
, in
contrast, a reasonable first approximation is that finalizers are a strict subset of destructors. In this
section we pick apart these differences, before describing the challenges of using destructors as
finalizers.
When a value in Rust is
dropped
(i.e. at the point its owner goes out of lexical scope) its destructor is
automatically run. Rust's destructors enable a style of programming that originated in C++ called RAII
(Resource Acquisition Is Initialization) [
30
, Section 14.4]: when a value is dropped, the underlying
resources it possesses (e.g. file handles or heap memory) are released. Destructors are used frequently in
Rust code (to give a rough idea: approximately 15% of source-level types in our benchmark suite have
destructors).
Rust destructors are formed of two parts, run in the following order: a user-defined
drop method
; and
automatically inserted
drop glue
. Drop methods are optional and users can provide one for a type by
implementing the
Drop
trait's
drop
method. Drop glue recursively calls destructors of contained types
(e.g. fields in a struct). Although it is common usage to conflate 'destructor' in Rust with drop methods,
drop glue is an integral part of a Rust destructor: we therefore use 'destructor' as the umbrella term for
both drop methods and drop glue.
When considering finalizers for a GC for Rust, there are several layers of design choices. We will
shortly see that finalizers cause a number of challenges (Section
4.1
) and one choice would be to forbid
finalizers entirely. However, this would mean that one could not sensibly embed types that have
destructors in a
Gc
. While Rust does not always call destructors, the situations where this occurs are
best considered 'exceptional': not calling destructors from
Gc
would completely undermine
reasonable programmer expectations. Because of this,
Alloy
, and indeed virtually all GCs for Rust,
support finalizers in some form.
However, existing GCs force distinct notions of destructors and finalizers onto the programmer. Where
the former have the
Drop
trait, the latter typically have a
Finalize
trait. If a user type needs to be
finalized then the user must provide an implementation of the
Finalize
trait. However, doing so
introduces a number of problems: (1) external libraries are unlikely to provide finalizers, so they must be
manually implemented by each consumer; (2) Rust's
orphan rule
27
, Section 6.12] prevents one
implementing traits for types defined in external libraries (i.e. unless a library's types were designed to
support
Gc
, those types cannot be directly GCed); (3) one cannot automatically replicate drop glue
for finalizers; and (4) one cannot replicate
rustc
's refusal to allow calls to the equivalent of
Drop::drop
Programmers can work around problems #1 and #2 in various ways. For example, they can wrap
external library types in
newtypes
(zero-cost wrappers) and implement finalizers on those instead [
20
Section 19.3]. Doing so is tedious but not conceptually difficult.
Problem #3 has partial solutions: for example, [
14
] uses the
Trace
macro to generate
finalizer glue
(the
finalizer equivalent of drop glue) for struct fields. This runs into an unsolvable variant of problem #2:
types in external libraries will not implement this trait and cannot be recursively scanned for finalizer
glue.
Problem #4 is impossible to solve in Rust as-is. One cannot define a function that can never be called —
what use would such a function have? A possible partial solution might seem to be for the
finalize
method take ownership of the value, but
Drop::drop
does not do so because that would not allow drop
glue to be run afterwards.
4.1
General Challenges When Using Destructors as Finalizers
We have stated as our aim that
Alloy
should use destructors as finalizers. Above we explained some
Rust-specific challenges — but there are several non-Rust-specific challenges too! Fundamentally, finalizers
and destructors have different, and sometimes incompatible, properties. The best guide to these differences,
and the resulting problems, is Boehm [
], supplemented by later work on support for GC in the C++
specification [
An obvious difference between destructors and finalizers is when both are run. While C++ and Rust define precisely when a destructor will be
run
finalizers run at an unspecified point in time. This typically happens at some point after the
equivalent destructor would run, though a program may exit before any given finalizer is
run
There are, however, two situations which invert this. First, if a thread exits due to an error, and the
program is either not compiled with unwinding, or the thread has crossed a non-unwinding ABI
boundary, then destructors might not be run at all, where a GC will naturally run the equivalent
finalizers: we do not dwell on this, as both behaviours are reasonable in their different contexts. Second,
and more surprisingly, it is possible for finalizers in non-error situations to run
prematurely
, that is before
the equivalent destructor [
, section 3.4].
A less obvious difference relates to where destructors and finalizers are run. Destructors run in the
same thread as the last owner of a value. However, running finalizers in the same thread as the last owner
of the value can lead to race conditions [
24
] and deadlocks [
, section 3.3] if a finalizer tries to
access a resource that the mutator expects to have exclusive access too. When such problems
affect destructors in normal Rust code, it is the clear result of programmer error, since they
should have taken into account the predictable execution point of destructors. However, since
finalizers do not have a predictable execution point, there is no way to safely access shared
resources if they are run on the same thread. The only way to avoid this is to run finalizers
on a non-mutator thread — but not all Rust types / destructors are safe to run on another
thread.
There are several additional differences such as: finalizers can reference other GCed values
that are partly, or wholly, 'finalized' and may have had their backing memory reused; and
finalizers can
resurrect
values by copying the reference passed to the finalizer and storing it
somewhere.
Over time, finalizers have thus come to be viewed with increasing suspicion. Java, for example, has
deprecated, and intends eventually removing, per-type finalizers: instead it has introduced deliberately
less flexible per-object 'cleaners', whose API prevents problems such as object resurrection and per-class
finalization [
13
].
4.2
The Challenge of Finalizers for
Alloy
At this point we hope to have convinced the reader that: a viable GC for Rust needs to be able to use
existing destructors as finalizers whenever possible; but that finalizers, even in existing GCs, cause
various problems.
It is our belief that some problems with finalizers are fundamental. For example, finalizers inevitably
introduce latency between the last use of a value and its finalization.
Some problems with finalizers are best considered the accidental artefacts of older designs.
Java's cleaners, for example, can be thought of as a more restrictive version of finalizers that
allow most common use-cases but forbid by design many dangerous use cases. For example,
per-class/struct finalization can easily be replaced by per-object/value finalization; and object
resurrection can be prevented if object access requires a level of indirection.
Alloy
benefits from
our better shared understanding of such problems and the potential solutions: it trivially
addresses per-object/value finalization (
Gc::new_unfinalizable
function turns finalization off
for specific values) and, as we shall see later, via only slightly more involved means, object
resurrection.
However, that leaves many problems that are potentially in the middle: they are not obviously
fundamental, but there are not obvious fixes for them either. In our context, where we wish to use
destructors as finalizers, four problems have hitherto been thought insoluble [
, p. 32]: (1) finalizers are
prohibitively slower than destructors; (2) finalizers can run prematurely; (3) running finalizers on the
same thread as a paused mutator can cause race conditions and deadlocks; (4) some safe destructors are
not safe finalizers.
Fortunately for us, Rust's unusual static guarantees, suitably expanded by
Alloy
, allow us to
address each problem in novel, satisfying, ways. In the following section, we tackle these
problems in the order above, noting that we tackle problems #1 and #2 separately, and #3 and #4
together.
Finalizer Elision
As we shall see in Section
, there is a correlation between the number of finalizers that are run and
overhead from GC (with a worst case, albeit a definite outlier, in our experiment of 3.35× slowdown). In
this section we show how to reduce the number of finalizers that are run, which helps reduce this
overhead.
A variety of factors contribute to the finalizer performance overhead, including: a queue of finalizers
must be maintained, whereas destructors can be run immediately; finalizers run some time after the last
access of a value, making cache misses more likely; and finalizers can cause values (including values they
own) to live for longer (e.g. leading to increased memory usage and marking overhead). Most of these
factors are inherent to any GC and our experience of using and working on
BDWGC
– a mature, widely
used GC – does not suggest that it is missing optimisations which would overcome all of this
overhead.
Instead, whenever possible,
Alloy
elides
finalizers so that they do not need to be run at all. We are able
to do this because: (1)
BDWGC
is responsible for all allocations and will, if necessary GC allocations even
if they are not directly wrapped in a
Gc
; and (2) many Rust destructors only free memory which
BDWGC
would, albeit with some latency, do anyway.
Consider the standard Rust type
Box
which heap allocates space for a value; when a
Box
value
is dropped, the heap allocation will be freed. We can then make two observations. First,
Box
's drop
method solely consists of a
deallocate
call. Second, while we informally say that
Box
allocates on
the 'heap' and
Gc
allocates on the 'GC heap', all allocations in
Alloy
are made through
BDWGC
and
stored in the same heap.
When used as a finalizer,
Box
's drop method is thus unneeded, as the underlying memory will be
freed by
BDWGC
anyway. This means that there is no need to run a finalizer for a type such as
Gc
at all, and the finalizer can be statically elided. However, we cannot elide a
finalizer for a type such as
Gc
because
Rc
's drop method must be run for the
reference count to be decremented. As this shows, we must consider the complete destructor, and
not just the top-level drop method, when deciding whether a corresponding finalizer can be
elided.
function NeedsFinalizer(T):
if Impls(T, Drop) and not Impls(T, DropMethodFinalizerElidable) then
return true;
for each field ∈ T do
if NeedsFinalizer(field) then
return true;
return false;
Algorithm 1.
Finalizer Elision
impl
pub fn new(value: T) -> Self {
if needs_finalizer::
else { Gc
...
Listing 3.
A simplified view of how finalizers are elided inside
Alloy
. The new compiler
intrinsic
needs_finalizer
returns true if a finalizer is required for a type. The
Gc
type uses this
intrinsic to ensure that the value is registered as requiring a finalizer. With optimisations turned
on, this seemingly dynamic, branching code will be turned into static, branchless code.
5.1
Implementing Finalizer Elision
Finalizer elision statically determines which type's destructors do not require corresponding finalizers
and elides them. It does so conservatively, and deals correctly with drop glue.
As shown in Algorithm
, any type which implements the
Drop
trait requires finalization unless it also
implements the new
DropMethodFinalizerElidable
marker trait
(i.e. a trait without methods). This
trait can be used by a programmer to signify that a type's drop method need not be called if the type is
placed inside a
Gc
. The 'Drop' part of the trait name is deliberate (i.e. it is not a simplification of
'destructor') as it allows the programmer to reason about a type locally (i.e. without considering drop
glue or concrete type paramaters). If the type has a transitively reachable field whose type implements
the
Drop
trait but not the
DropMethodFinalizerElidable
trait, then then the top-level type still
requires finalization.
Even though neither normal Rust destructors or
Alloy
finalizers are guaranteed to run, a program
whose destructors or finalizers never run would probably not be usable (leaking resources such as
memory, deadlocking, and so on). We therefore make
DropMethodFinalizerElidable
an unsafe trait,
because implementing it inappropriately is likely to lead to undesired – though not incorrect! –
behaviour at run-time.
Alloy
modifies the standard Rust library to implement
DropMethodFinalizerElidable
on the following types:
Box
Vec
RawVec
VecDeque
LinkedList
BTreeMap
BTreeSet
HashMap
HashSet
RawTable
and
BinaryHeap
. Fortunately, not only are these types' drop methods compatible with
DropMethodFinalizerElidable
, but they are extensively used in real Rust code: they enable significant
numbers of finalizers to be elided.
Listing
shows the new const compiler intrinsic
needs_finalizer
we added to implement
Algorithm
. The intrinsic is evaluated at compile-time: its result can be inlined into
Gc::new
, allowing
the associated conditional to be removed too. In other words – compiler optimisations allowing – the
'does this specific type require a finalizer?' check has no run-time overhead.
Premature Finalizer Prevention
Most of us assume that finalizers are always run later than the equivalent destructor would have run, but
they can sometimes run before [
, section 3.4], undermining soundness. Such premature finalization is
also possible in
Alloy
as described thus far (see Listing
). In this section we show how to prevent
premature finalization.
There are two aspects to premature finalization. First, language specifications often do not define, or do
not precisely define, when the earliest point that a value can be finalized is. While this means
that, formally, there is no 'premature' finalization, it seems unlikely that language designers
anticipated some of the resulting implementation surprises (see e.g. this example in Java [
29
]).
Second, compiler optimisations – at least in LLVM – are 'GC unaware', so optimisations such
as scalar replacement can change the point in a program when GCed values appear to be
finalizable.
struct S { value: u8 }
impl Drop for S { fn drop(&mut self) { self.value = 0; } }
fn main() {
let root = Gc::new(Box::new(S{ value: 1 }));
let inner: &u8 = &**root.value;
force_gc();
println!("{}", *inner);
Listing 4.
An example of possible premature finalization. We create a new struct
line 1
) with
a drop method that sets the wrapped integer to zero (
line 2
). In the main method, we move an
instance of the struct into a
Box
, which we then move into a
Gc
line 4
). We obtain a Rust
reference to the inner integer (
line 5
), which at run-time will be a pointer to the
Box
. At this
point, the compiler can determine that the
Gc
is no longer used and overwrite
root
's pointer
(which may be in a register). If a collection then occurs, a finalizer can run
's drop method,
causing the program to print '0' instead of the expected '1' (
line 7
).
In our context, it is natural to define premature finalization as a (dynamic) finalizer for a
Gc
value
running before the (static)
Gc
owner has gone out of scope. Similar to the high-level proposal mooted
in [
, Solution 1], we must ensure that the dynamic lifetime of a reference derived from a
Gc
matches
or exceeds the lifetime of the
Gc
itself.
Our solution relies on adjusting
Gc
's drop method to keep alive a GCed value for at least the static
lifetime of the
Gc
itself. In other words, we ensure that the conservative GC will always see a pointer
to a GCed value while the corresponding
Gc
is in-scope.
However, there is a major problem to overcome: copyable types such as
Gc
are forbidden from
having destructors. The fundamental challenge we have to solve is that each copied value will have a
destructor called on it, which has the potential for any shared underlying value to be destructed more
than once.
Alloy
explicitly allows
Gc
– but no other copyable type – to have a destructor, but to
ensure it doesn't cause surprises in the face of arbitrary numbers of copies, the destructor must be
idempotent. Our task is made easier because
Gc
naturally has no drop glue from Rust's perspective:
Gc
consists of a field with a pointer type, and such types are opaque from a destruction
perspective.
We therefore only need to make sure that
Gc
's drop method is idempotent. Fortunately, this is
sufficient for our purposes: we want the drop method to inhibit finalization but that does not
require run-time side effects. To achieve this, we use a
fence
. These come in various flavours.
What we need is a fence that prevents both: the compiler from reordering computations
around a particular syntactic point; and the CPU from reordering computations around a
particular address. We copy the platform specific code from the
BDWGC
GC_reachable_here
macro
into
Gc
's drop method, which achieves the effect we require.
6.1
Optimising Premature Finalizer Prevention
The drop method we add to
Gc
fully prevents premature finalization. It also naturally solves a
performance problem with the suggested solution for C++ in [
, Solution 1], which requires keeping
alive all pointers, no matter their type, for their full scope. By definition, our solution only
keeps alive
Gc
values: the compiler is free to optimise values of other types as it so wishes.
However, on an artificial microbenchmark we observed a noticeable overhead from our fence
insertion.
We thus implemented a simple optimisation: we only insert fences for a
Gc
if it has a finalizer.
Intuitively, it seems that we should not generate drop methods in such cases, but this is difficult to do
directly inside
rustc
. Instead, we suppress calls to the drop methods of such types: the two approaches
are functionally equivalent, though ours does put an extra burden on dead-code elimination in the
compiler tool-chain.
Alloy
adds a new pass
RemoveElidableDrops
to
rustc
's Mid-Level Intermediate Representation
(MIR) processing. MIR is best thought of as the main IR inside
rustc
: it contains the complete set of
functions in the program, where each function consists of a sequence of basic blocks. Simplifying
somewhat, function and drop method calls are represented as different kinds of
terminators
on basic
blocks. Terminators reference both a callee and a successor basic block.
The
remove_elidable_drops
pass iterates over a program's MIR, identifies drop method
terminators which reference elidable finalizers, and turns them into 'goto' terminators to the
successor basic basic block. Algorithm 4 in the Appendix presents a more formal version of this
algorithm.
Finalizer Safety Analysis
In this section we address two high-level problems: running finalizers on the same thread as a paused
mutator can cause race conditions and deadlocks; and some safe destructors are not safe finalizers.
Addressing the former problem is conceptually simple – finalizers must be run on a separate thread – but
we must ensure that doing so is sound. We therefore consider this a specific instance of the latter
problem, treating both equally in this section.
We therefore introduce Finalizer Safety Analysis (FSA), which prevents unsafe (in the sense of 'not
safe Rust') destructors being used as finalizers. As a first approximation, FSA guarantees
that finalizers are memory safe, cycle safe (i.e. do not access already finalized objects), and
thread safe. We present the three main components of FSA individually before bringing them
together.
7.1
Rust References
Gc
can store, directly or indirectly, normal Rust references (i.e.
and
&mut
), at which point it is
subject to Rust's normal borrow checker rules and cannot outlive the reference. However, finalizers
implicitly extend the lifetime of a GCed value, including any stored references: accessing a reference in a
finalizer could undermine Rust's borrow checking rules.
A simple way of avoiding this problem is to forbid
Gc
from storing, directly or indirectly,
references. This might seem to be no great loss: storing references in a
Gc
largely nullifies the
'GCness' of
Gc
. However, we found the result hard to use, as it can make simple tasks such as
gradually migrating existing code to use
Gc
painful.
A moderate, but in our experience insufficient, relaxation is to recognise that only types that
need a finalizer can possibly have problems with references, and to forbid such types from
storing references in
Gc
. For example, if there is no drop method for
struct S{x: &u8}
then its destructor is safe to use as a finalizer, since its non-drop aspects will not use the
&u8
reference.
The eventual rule we alighted upon for FSA is that a destructor for a type
can be used as a
finalizer provided the destructor's drop methods do not obtain references derived from
's
fields (including fields reachable from its attributes). Using Rust's terminology, we forbid
projections
(which include a struct's fields, indexes into a vector, and so on) in destructors from
generating references. Any non-projection references that are used in a destructor are by
definition safe to use, as they either exist only for the duration of the drop method (references to
variables on the stack) or will exist for the remainder of the program (references to global
variables).
This rule over-approximates the safe set of destructors. For example, a drop method that
creates a new value and tries to obtain a reference to a field in it (i.e. a projection) cannot be a
destructor under FSA, even though the reference cannot outlast the drop method. We found that
attempting to relax our rule further to deal with such cases rapidly complicates exposition and
implementation.
struct GcNode { value: u8, nbr: Option
>> }
impl Drop for GcNode {
fn drop(&mut self) { self.value = 0; println!("{}", self.nbr.unwrap().borrow().value); }
fn main() {
let mut gc1 = Gc::new(RefCell::new(GcNode{value: 1, nbr: None}));
let gc2 = Gc::new(RefCell::new(GcNode{value: 2, nbr: None}));
gc1.borrow_mut().nbr = Some(gc2);
gc2.borrow_mut().nbr = Some(gc1);
Listing 5.
An example of the problems that come from mixing cycles and finalization. The salient
difference from Listing
is that the drop method prints the value of a field inside
nbr
line
). Running this program on a strong memory model machine is likely to print either
2 0
or
1 0
depending on whether
gc1
or
gc2
is finalized first. The 'seemingly expected' output of
1 2
or
2 1
would never be printed: whichever GCed value is finalized first changes what the other GCed
value sees in its finalizer. As that implies, this example is unsound: whichever finalizer runs
second leads to undefined behaviour.
7.2
Cycles and Finalization
One of the main motivations for GCs is that they solve problems with cyclic data structures. However,
finalizers can be unsound if they access state shared within members of a cycle. Listing
shows an
example of undefined behaviour when two GCed values create a cycle and both their finalizers reference
the other GCed value. Whichever order the finalizers are run in, at least one of the finalizers will see the
other GCed value as partly or wholly 'finalized'.
Most languages and systems we are aware of assume that users either don't run into this problem
(finalization cycles are considered rare in GCed languages [
19
, p. 229]) or know how to deal with it when they do (e.g. refactoring the
types into parts that do and don't require finalization [
, p. 11]). There is no fully automatic
solution to this problem. Some GCs offer weak references, which allow users to detect when
finalization cycles have been broken, though they still have to deal with the consequences
manually.
We wanted to provide users with static guarantees that their destructors will not behave unexpectedly
when used as finalizers in a cycle. A first attempt at enforcing such a property might seem to
be that a
Gc
cannot have, directly or indirectly, fields of type
Gc
. This would indeed
prevent the mistakes we want to catch but also disallow shared ownership! We therefore
check only that a type's destructor does not, directly or indirectly, access a
Gc
. This allows
GCed types to express shared ownership so long as their destructor(s) do not access other GC
types.
To make this check easier to implement, we introduce an
auto trait
27
, Section. 11], a kind of marker
trait that the compiler propagates automatically. An auto trait
will be automatically implemented for a
type
unless one of the following is true: there is an explicit
negative implementation
of
for
; or
contains a field that is not itself
. Informally, we say that a negative implementation of an auto-trait
pollutes
containing types.
Our new auto trait is called
FinalizerSafe
, and we provide a single negative implementation
impl
. This naturally handles transitively reachable code, allowing FSA
itself to only check that a destructor's direct field accesses are
FinalizerSafe
7.3
Destructors Need to be Runnable on a Finalizer Thread
impl Drop for GcNode {
fn drop(&mut self) { println!("drop {}", self.value.lock().unwrap()); }
fn main() {
let counter = Rc::new(Mutex::new(0));
{ let _ = Gc::new(GcNode { value: Rc::clone(&counter), nbr: None }); }
let r1 = counter.lock().unwrap();
force_gc();
assert_eq!(*r1, 0);
Listing 6.
How destructors can cause deadlocks when used as finalizers. The mutator creates a
reference-counted mutex (
line 6
), placing a copy in a
GcNode
that immediately goes out of scope
line 7
). The mutator then acquires the lock (
line 8
) but before it can release the lock a GC cycle
occurs and the
GcNode
's finalizer run (
line 9
). If the finalizer is run on the same thread as the
mutator, then it will fail to grab the lock (
line 2
) and cause a deadlock.
Running finalizers on the same thread as a mutator can cause problems when the finalizer accesses
state shared with the mutator (see Section
4.1
for a general description and Listing
for a concrete
example). The most general solution to this problem is to run finalizers on a separate
finalizer thread
that
never runs mutator code.
We must therefore ensure that it is safe to run a type's destructor on the finalizer thread. A
conservative definition is that
Gc
is safe to use if
implements both of Rust's existing
Send
(denoting
a type that can be permanently moved from one thread to another) and
Sync
(denoting a type that can be
safely accessed simultaneously by multiple threads) auto traits. However, requiring that finalization be
restricted to types that implement both
Send
and
Sync
can be frustrating, particularly because more
types implement
Send
than
Sync
It may seem sufficient for
to implement
Send
alone so that the value can be safely sent to the finalizer
thread. However, this would not prevent a finalizer indirectly accessing state shared with a
non-GCed value via a mechanism such as
Arc
, causing the very problems we are trying to
avoid.
FSA thus ignores whether a type implements
Send
or
Sync
(or not) and instead examines the
destructor directly. To pass FSA: the destructor must not access thread locals; and any types the
destructor accesses via projections must implement both
Send
and
Sync
. Intuitively, this allows a
non-
Send
-or-
Sync
type
to have a safe finalizer provided that
's destructor only access the
Send
and
Sync
'subset' of
This rule shows clearly that FSA is a form of abstract interpretation rather than a mere extension of the type
system
After careful examination we believe this is compatible with Rust's semantics (and
rustc
and LLVM's
implementations) at the time of writing, but it is worth knowing that this rule would be unsafe in other
languages and implementations (for example our assumption would be unsafe in Java due to
synchronisation removal [
31
]). We leave it as an open question to others as to whether Rust should
deliberately permit or forbid such checks in its semantics.
The implementation of the finalization thread is fairly simple. For example, we do not need to
explicitly synchronise memory between the mutator and finalization threads because
BDWGC
's
stop-the-world collection phase already synchronises all memory between threads.
7.4
Putting it All Together
function FinalizerSafetyAnalysis(func):
for each basic_block ∈ func do
t ← basic_block.terminator;
if not IsGcConstructorCall(t) then
continue;
ty ← GetTyOfGcValue(t);
if isFinalizerUnchecked(ty) or not NeedsFinalizer(ty) then
continue;
for each drop_method ∈ GetDropGlue(ty) do
if not IsMIRAvailable(drop_method) then
EmitFinalizerUnsafeError();
CheckFunctionSafety(drop_method);
function CheckFunctionSafety(drop):
for each basic_block ∈ drop do
for each statement ∈ basic_block do
for each projection ∈ statement do
if not IsFinalizerSafe(projection.element) then
EmitFinalizerUnsafeError();
if IsFunctionCall(basic_block.terminator) then
CheckFunctionSafety(basic_block.terminator);
function IsFinalizerSafe(ty):
return Impls(ty, Send) and Impls(ty, Sync) and Impls(ty, FinalizerSafe);
Algorithm 2.
Finalizer Safety Analysis
FSA integrates the seemingly separate components presented above into one. It iterates over every
function in a Rust program analysing destructors of types that are used in
Gc
. Algorithm
shows the
essence of FSA (for example eliding details of caching which
Alloy
uses to speed up compile
times).
Because FSA is a form of abstract interpretation, we need to determine when to run FSA on a program.
In essence, whenever a previously unchecked type
is used to create a new
Gc
, FSA is run. As well as
the
Gc::new
constructor,
Gc
instances can be created with conversion traits such as
From
. We
annotated each such entry point with a new
rustc
-only attribute
rustc_fsa_entry_point
: calls to
functions with this attribute lead to FSA checks.
A naive implementation of FSA would be a notable cost, so
Alloy
uses several optimisations. As
alluded to above, FSA caches the results of various checks to avoid pointlessly repeating work. We also
extend
FinalizerSafe
with negative implementations for
&T
, and
&mut T
. If a type
implements all of
FinalizerSafe
Send
, and
Sync
, we know that there can be no unsafe projections used in a destructor,
and can bypass most FSA checks entirely (though we still need to check for thread local accesses).
Across our benchmark suite, FSA increases compilation time in release mode by a modest
0.8–1.6%.
Algorithm
also captures
Alloy
's approach to error messages. Rather than just inform a user that
'your drop method has not passed FSA',
Alloy
pinpoints which field or line in a drop method caused FSA
to fail:
EmitReferenceError
informs the user when a reference in a type is used in a way that violates
FSA (see Section
7.1
); and
EmitFinalizerUnsafeError
when a drop method contains code which is
unsafe (e.g. references a
Gc
type, an opaque function, etc.). Listing
shows an example of the errors
reported by
Alloy
: note that it pinpoints the line within a drop method that caused an FSA
error.
error: `RefCell::new(GcNode{value: 2, nbr: None})` cannot be safely finalized.
--> finalization_cycle.rs:11:21
7 | fn drop(&mut self) { self.value = 0; println!("{}", self.nbr.unwrap().borrow().value); }
^^^^^^^^
a finalizer cannot safely dereference this
because it might have already been finalized.
...
11 | let gc2 = Gc::new(RefCell::new(GcNode{value: 2, nbr: None}));
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
has a drop method which cannot be safely finalized.
Listing 7.
The compiler error produced by
Alloy
for the example in Listing
. This extends
rustc
's normal error messages: note that both the use of a type with a FSA-incompatible drop
method ('has a drop method') and the line in the drop method ('caused by the expression') are
highlighted to the user.
7.4.1
Awkward Kinds of Functions
FSA can encounter two kinds of 'awkward' functions.
First, some functions (e.g. due to use of trait objects, or FFIs) do not have a body available when FSA
runs: using such a function necessarily causes an FSA check to fail. One common class of functions
which causes this are Rust intrinsics (e.g.
min_align_of
etc.): we audited the most frequently used of
these and annotated those which are FSA-safe with a new
rustc_fsa_safe_fn
attribute. Other functions
whose bodies are unknown cause FSA to fail.
Second, in most cases, FSA runs on Rust functions whose generic types have been replaced with
concrete types (in Rust terminology, functions have been 'monomorphised'). Sometimes, however, FSA
encounters functions (e.g. intrinsics or functions with certain annotations) whose generic types have not
yet been replaced. FSA can still run on such functions, but will reject them unless all generic types imply
the
FinalizerSafe
Send
, and
Sync
traits. Note that calling a method on a generically typed value will
lead to FSA finding a method without a body: as in the first case above, this will cause FSA to
fail.
The common theme to both is that we wish FSA to be sound, at which point we forego completeness.
This can cause users frustration when FSA raises an error on code they know is FSA safe. As is common
in Rust, we therefore provide an unsafe escape hatch which allows users to silence FSA errors when they
can prove to their satisfaction that doing so does undermine correctness. We experimented
with a per-type approach, but found that unduly restrictive: we therefore provide a per-value
escape hatch with the
unsafe FinalizerUnchecked
type. Values wrapped in this type are
considered safe to use at all points in FSA. Our aim is that users should rarely need to resort to this
escape hatch, but, as is not uncommon in Rust, there are valid idioms of use where we found it
necessary.
Evaluation
Version
Description
#benchmarks
Binary Trees
Debian CLBG Rust#2
Heap allocation microbenchmark
Regex-Redux
Debian CLBG Rust#1
Regular expression matching
Alacritty
v0.15.0-dev
Terminal emulator
10
fd
v9.0.0
Unix find replacement
grmtools
v0.13.4
Lexer / parser library
Ripgrep
v14.1.1
Fast grep replacement
13
som-rs-ast
git #35b780
SOM AST VM
26
som-rs-bc
git #35b780
SOM bytecode VM
26
Table 1.
The benchmarks (top) and benchmark suites (bottom) that form our experiment. We altered them
to use different memory allocation strategies (
Alloy
Rc
, etc.).
Binary Trees
and
Regex-Redux
are classic
stand-alone GC benchmarks; the other 'benchmarks' represent benchmark suites (e.g.
Ripgrep
contains 13
benchmarks). The middle portion of the table shows a variety of 'normal' Rust programs; the bottom portion
of the program shows three implementations of the SOM programming language.
In this section we explain our methodology and our experimental results.
8.1
Methodology
8.1.1
The Benchmark Suite
There is no existing benchmark suite for GCs for Rust. Even if such a suite did exist, it may not have
been suitable for our purposes because in experiment
GCvs
we want to compare programs using
existing shared ownership approaches. We searched through roughly the 100 most popular Rust libraries
on
crates.io
(the
de facto
standard Rust package system) looking for suitable candidates. In
practise this meant we looked for crates using reference counting. In the interests of brevity,
for the rest of this section we use '
Rc
' to cover both
Rc
and its thread-safe cousin
Arc
Gc
Rc
Weak
Alacritty
107
9,450
1,970
Binary Trees
fd
421
grmtools
299
1,825
23
Regex-Redux
108
109
Ripgrep
104
249
som-rs-ast
206
35
som-rs-bc
464
39
Table 2.
How often relevant types are referenced in source code after our porting. For
Rc
, we also show
how many weak references are left in: this is a proxy both for partial porting, and also how the extent weak
references. This is a proxy for the extent of changes that cyclic data-structures impose upon source code.
Table
shows the resulting suite: note that, except for
Binary Trees
and
Regex-Redux
, the
'benchmarks' are themselves benchmark suites. Collectively, our suite contains – depending on whether
you count the SOM implementations' (identical) benchmark suites collectively or separately – 62 or 88
benchmarks. Table
shows how often relevant types are used after porting. Table
shows the
distribution of heap data at run-time. This shows that our suite contains benchmarks with a variety of
memory patterns.
Binary Trees
is allocation intensive and sufficiently simple that it can be easily and meaningfully
ported to additional shared ownership strategies:
Rust-GC
, a user library for GC for Rust [
14
]; and
Arena
, a non-GC memory arena [
10
].
Alacritty
fd
, and
Ripgrep
are well known Rust programs, all
of which have their own benchmark suites.
grmtools
is a parsing library which uses
Rc
extensively
in error recovery: we benchmarked it using 28KLoC of real Java source code, which we mutated with
syntax errors.
SOM is a small, but complete, language in the Smalltalk mould, which has a wide variety of
implementations. Our suite includes two of these:
som-rs-ast
(which represents programs as ASTs); and
som-rs-bc
(which represents programs as bytecode). Both are existing ports of a Java SOM VM into Rust
and use
Rc
. We use the same SOM
core-lib
benchmarks for both, derived from git commit
#afd5a6.
We were not able to port all parts of all programs. In particular, some programs make extensive use of
the
make_mut
and
get_mut
functions in
Rc
, which allow a programmer to mutate their contents if, at
run-time, they only have a single owner. There is not, and cannot be, equivalent functionality with a
copyable
Gc
type. In some cases we were able to successfully use alternative mechanisms. In others
we judged the usages to either be rare at run-time (i.e. not worth porting), or too difficult to port (i.e. too
much of the program is built around the resulting assumptions). In a small number of cases we
ended up introducing bugs.
Alacritty
's UTF-8 support is an example, resulting in deadlocks.
Whenever we encountered a bug in our porting, we reverted back to
Rc
for that portion of the
port.
Allocated (#)
GC owned (%)
Rc
Box
Gc
Alacritty
125
8,770
2.70
Binary Trees
3,222,201
3,222,190
100.00
fd
17,821
306,902
61
1.23
grmtools
2,283
19,859,431
4,038,605
44.19
Regex-Redux
45
3,132
78
15.39
Ripgrep
12,786
521,366
26,069
17.97
som-rs-ast
15
8,586,976
1,533,728
76.95
som-rs-bc
15
2,397,931
1,530,325
99.71
Table 3.
Run-time heap distributions. The 'Allocated (#)' columns shows the number of values of each type
that are allocated (note that most programs also allocate values of other types, but those are not shown
directly here). The 'GC Owned' columns shows the proportion of allocated values that are owned, directly
and indirectly, by
Gc
values. For example, a program consisting of a single
Gc
would have a
'GC Owned' value of 100% because the
Box
is owned by the
Gc
. As we shall see later, there can be a
number of knock-on effects when a
Gc
owns other such values.
8.1.2
What We Couldn't Include in the Benchmark Suite
We tried porting 10 other programs that are not included in our benchmark suite. To avoid readers
wondering if we have 'cherry-picked' our eventual benchmark suite, we briefly report why those
other programs have been excluded. All excluded benchmarks are shown in Table 4 in the
Appendix.
Several programs (e.g. numbat, mini-moka, and salsa), once ported, turned out to be uninteresting
from a GC benchmarking perspective. Irrespective of the number of source locations that reference
memory allocation types, the benchmarks we could run from them allocated sufficiently little memory
that there are no worthwhile differences between different allocation strategies. Put another way: these
programs are in a sense 'the same' from our evaluation perspective.
Two programs (bevy and rust-analyzer) did not run correctly after porting. Both extensively use the
make_mut
or
get_mut
functions in
Rc
and reverting those changes made the benchmarks
uninteresting.
We also ported RustPython, but were unable to adjust it to faithfully implement Python-level
destructors. In essence, in RustPython's default configuration, its representation of objects is
not compatible with FSA. This means that we can not run Python
__del__
methods in the
finalizer thread. Although technically this is still compatible with Python's semantics, we
felt this would be a misleading comparison, as our port of RustPython would be doing less
work.
8.1.3
Running the Experiment
Our experiment can be seen as a comparison of
Alloy
against 'normal' Rust. Fortunately,
Alloy
is a
strict superset of 'normal' Rust: only if users explicitly opt into GC does
Alloy
really become a 'GC
for Rust'. This allows us to use the same compiler, standard library, and so on, removing
several potential confounding factor in our results. We compile two binaries: one without
logging features compiled and one with. We only use the latter when reporting collector related
metrics.
A challenge in our experiment is that different allocation strategies can use different underlying
allocators. In particular,
Alloy
has to use
BDWGC
, but, for example,
Rc
can use a modern allocator
such as jemalloc. Much has changed in the performance of allocators since
BDWGC
's 1980s roots: in
Rc
-only benchmarks, we observe an inherent overhead from
BDWGC
of 2–26% relative to jemalloc
(see Table 6 in the Appendix), which is a significant, and variable, confounding factor. Fortunately,
BDWGC
can be used as a 'traditional' allocator that allocates and frees on demand (i.e. no
conservative GC occurs): in the main experiment, we thus use
BDWGC
as the allocator for all
benchmarks.
We want to understand the memory usage of different allocation strategies over the lifetime of a
benchmark. However, there is no single metric which captures 'memory usage', nor even an agreed set of
metrics [
]. We use two metrics to capture different facets: (1)
heap footprint
, the
amount of live heap memory recorded by by Heaptrack [
34
] at every allocation and deallocation; and (2)
Resident Set Size
(RSS), the total physical memory in RAM used by the process (including
memory-mapped files, stack, and code/text segments), sampled at 10Hz. The overhead of recording
heap footprint is much greater than RSS, but it provides a more detailed view of memory
usage.
Another pair of confounding factors are the initial and maximum sizes of the GC heap: too-small
values can lead to frequent resizing and/or 'thrashing'; large values to unrealistically few collections.
What 'small' and 'large' are varies by benchmark, and 'careful' (or thoughtless) choices can significantly
distort one's view of performance.
BDWGC
uses an adaptive strategy by default, growing the
heap size as it detects that it would benefit from doing so. To give some sense of whether a
different strategy and/or heap size would make a difference, we ran our benchmarks with
three different fixed heap sizes. Doing so either has little effect or speeds benchmarks up;
when it does so, the impact is generally under 10% and is at most 28% (the detailed results
are presented in Table 9 in the Appendix). Broadly speaking, this suggests that
BDWGC
's
default heap sizing approach, at least in our context, is not significantly distorting our view of
performance.
We ran each benchmark in our suite 30 times. We report wall-clock times as returned by the
standard Unix
time
utility. The SOM benchmarks are run using its conventional
rebench
tool; we adjusted
rebench
to use
time
for consistency with our other benchmarks. We ran
all benchmarks on an AMD EPYC 7773X 64-Core 3.5GHz CPU with 128GiB RAM, running
Debian 12 ('bookworm'). We turned off turbo boost and hyper-threading, as both can colour
results.
8.1.4
Data Presentation
Except where otherwise stated, we report means and 99% confidence intervals for all metrics. We
use the arithmetic mean for individual benchmarks and the geometric mean for benchmark
suites.
When plotting time-series (i.e. sampled) memory metrics, we face the challenge that different
configurations of the same benchmark can execute at different speeds. We thus resample each
benchmark's data to 1000 evenly spaced points using linear interpolation. We chose 1000 samples
because it is considerably above the visual resolution of our plots. After normalization, we calculate the
arithmetic mean of the memory footprint measurement at each grid point (and not the raw underlying
data) across all runs of the same benchmark. We record 99% confidence intervals at each point and show
the result as shaded regions around the mean.
Figure 1.
comparing the effects of
gc
and
rc
on wall-clock time; heap footprint (i.e. the size of the
live set); and rss. the baseline at 1 is
rc
; values less than 1 show
gc
as better than
rc
; and the blue
vertical line shows the geometric mean of ratios. the wall-clock times of
gc
and
rc
are similar; the rss
somewhat similar; and the average heap footprint often very different. broadly speaking,
gc
increases the
average heap footprint because gc, and especially finalization, causes values to live for longer. benchmarks
which allocate relatively little memory (particularly
ripgrep
as shown in table
) can exaggerate this effect.
perhaps surprisingly, the heap footprint and rss do not correlate. this is partly because the sample rate for
rss is rather low (which notably affects fast running benchmarks such as those for
fd
) and partly because
rss necessarily includes headroom, that is memory beyond that needed for the live set (and which may later
be returned to the os).
8.2
Results
The main results for
GCvs
can be seen in Fig.
. Though there is variation,
Alloy
has an overhead
on wall-clock time of 5% on our benchmark suite. The effect on memory is more variable though,
unsurprisingly,
Alloy
typically has a larger average heap footprint (i.e. allocated memory lives for
longer). This metric needs to treated with slight caution: benchmarks which allocate relatively small
amounts of memory (see Table
) can make the relative effect of average heap footprint seem much
worse than it is in absolute terms.
Binary Trees
is sufficiently simple that we also used it to compare against
Arena
and
Rust-GC
The time-series data in Fig.
is particularly illuminating (for completeness, Table 5 in the Appendix has
the raw timings).
Alloy
is around 3.5× slower than
Arena
. The time-series data for the latter shows it
going through distinct phases: a (relatively long) allocation phase, a (relatively moderate) 'work' phase,
and a (relatively short) deallocation phase. Put another way: these clear phases make
Binary Trees
perfect match for an arena. In the other approaches, the 'work' phase occupies a much greater proportion
of their execution, because it also incorporates allocator work.
Alloy
is around 1.3× faster than
Rc
but both have similar memory profiles.
Alloy
is around 3× faster than
Rust-GC
and has an
average heap footprint around 4× smaller, reflecting
Alloy
's advantage in not being a user-land
library that relies in part on
Rc
. Although we caution against over-interpreting a single
benchmark, this does give us at least some idea of the performance ceiling and floor for different
approaches.
Figure 2.
A selection of time-series data with various GC approaches, showing normalised time on the
-axis
and heap footprint (with 99% confidence intervals shaded) on the
-axis. (i.e. the amount of live memory)
over time.
Binary Trees
shows an example of
Alloy
having a comparable heap footprint to
Rc
Rust-GC
's
heap footprint is around 4× greater.
Binary Trees
is a perfect fit to
Arena
: it frees memory in one batch at
the end, and because it is 3× faster than
Alloy
, this 'wind down' period is a substantial portion of the overall
(quick!) execution.
Ripgrep
Alternates may seem to be an example of a memory leak in
Alloy
, but it is really
the result of the inevitable delay that GC imposes on noticing that values are lived, which is exacerbated by
the presence of finalizers. The frequent plateaus and dips show that memory is being freed, but at a later
point than one might initially expect. In contrast,
som-rs-bc
JSON Small shows a real memory leak due to
cyclic objects in
Rc
, where
Alloy
's heap footprint remains steady.
The time-series data in Fig.
helps explain other factors. For example, it shows that
som-rs-bc
leaks
memory on the JSON Small benchmark (we suspect it also leaks in some other benchmarks, though
rarely as visibly). This is because
Rc
keeps alive values in cycles;
Alloy
does not leak memory on
som-rs-bc
, as it naturally deals correctly with cycles.
We can see from the time-series data that
Ripgrep
has a complex heap footprint pattern.
This may suggest a memory leak, but in fact it is a consequence of the inevitable delay in
freeing memory in a GC. In general, GC notices that memory is unused later than reference
counting, but this is exacerbated further by finalizers. Surprisingly, finalizers can lengthen or
shorten an allocation's lifetime. GCed values with finalizers tend to have longer lifetimes,
because they have to wait in the finalizer queue. However, when a finalizer calls
free
on
indirectly owned values, those are immediately marked as not live, rather than having to
wait until the next collection to be discovered as such. This, albeit indirectly, explains the
seemingly random peaks and troughs in memory usage one can observe in
Ripgrep
's time-series
data.
Figure 3.
The effects of finalizer elision on various metrics. The top-left chart shows the proportion of
run-time
Gc
values that: have had their finalizers elided; cannot have their finalizers elided; have no
finalizers to elide. This chart is best read in conjunction in Table
to (a) get a sense of the quantity of
run-time memory involved (b) how much indirectly owned memory the
Gc
values have. The other plots
use 'no finalizer elision' as the normalization base (i.e. values below 1 show that finalizer elision improves a
metric). Total GC pause time is the cumulative time spent in stop-the-world collections. User time captures
the time spent in all threads, including the finalizer thread. Broadly speaking, the more finalizers are elided,
and the greater the proportion of the overall heap the memory owned by
Gc
, the better the metrics
become.
The results of
Elision
are shown in Fig.
. In general, there is a fairly clear correlation: the more
finalizers are removed, and the greater the proportion of the overall heap the memory owned by
Gc
is,
the better the metrics become. However, there are several caveats. First, when all finalizers are removed,
BDWGC
does not start a finalizer thread or invoke locking related to it, unduly flattering the time-based
metrics. Second, the quantity of finalizers is only a partial proxy for cost: some finalizers free
up large graphs of indirectly owned values, which can take some time to run. Third, some
benchmarks change the work they do:
grmtools
speeds up so much that its error recovery
algorithm has time to do more work, so while finalizer hugely benefits its GC pause time,
its wall-clock time changes much less. Finally, since finalizers can cause indirectly owned
allocations to be freed earlier than the GC itself does naturally, removing them can cause
indirectly owned values to live for longer:
Ripgrep
's average heap footprint highlights this
issue.
The results for
PremOpt
are shown in Fig.
. We created three configurations of
Alloy
None
has no
fences, and thus is unsound, but allows us to approximate (allowing for possible vagaries from running
unsound code!) the best possible outcome.
Naive
inserts all possible fences.
Optimised
inserts only
necessary fences. Once confidence intervals are taken into account, there are no statistically
significant results for this experiment. Although it is possible that benchmarking 'noise' is hiding a
meaningful result, our data suggests that any such differences are likely to be minimal. To
make up for this disappointment, the fact that there is no difference between any of these
suggests that, on non-artificial benchmarks, premature finalizer prevention is not a noticeable
cost.
Figure 4.
The effect of premature finalization optimisation, normalised to
None
(i.e. no fences). Grey bars
represent the ratio for
naive
(all possible fences) and blue bars
optimised
(obviously unnecessary fences
removed). Unfortunately, this attempted optimisation has no statistically significant effects.
Any performance judgements we make are necessarily contingent on our methodology the benchmark
suite we chose, including the proportion of benchmarks that we ported, and the way we process and
present data. For example, we did not port external libraries to use
Gc
so many benchmarks use a
variety of allocation strategies. Even had we ported everything, we would not be able to say, for example,
that finalizer elision will always improve performance by exactly the factor we see in our experiment:
there undoubtedly exist reasonable, non-pathological, programs which will see performance changes
outside the ranges that our results suggest.
Using
BDWGC
as the allocator for all benchmarks has the advantage of removing 'pure' allocator
performance as a confounding factor, but does mean that some of the performance characteristics of
benchmarks will be changed (e.g due to the portion of time we spend in the allocator; or
BDWGC
's
adaptive heap sizing strategy). A generic, modern conservative GC, using the insights of recent non-GC
allocators, would almost certainly give different – though we suspect not profoundly different
– results. To the best of our knowledge there is currently no production-quality modern,
generic conservative, GC we could use instead, though we are aware of at least one attempt to
create such an alternative: it will be interesting to rerun our experiments if and when that
arrives.
The RSS memory metric we collect is at Linux's whim: if it does not update as frequently
as we expect, we will see artificially 'smoothed' data that may miss out peaks and troughs.
Similar, our interpolation of time-series data onto a normalised grid can also smooth data. We
manually checked a large quantity of data to ensure this was not a significant effect; by running
benchmarks 30 times means it is also less more likely that peaks and troughs are caught at least
sometimes.
10
Related Work
In this paper we hope to have given sufficient background on GC and the use of destructors and finalizers
in general. In this section we mostly survey the major parts of the GC for Rust landscape more widely.
Our survey is inevitably incomplete, in part because this is a rapidly evolving field (a number of changes
have occurred since the most recent equivalent survey we are aware of [
16
]). We also cover some
relevant non-Rust GC work not mentioned elsewhere.
Early versions of Rust had 'managed pointers' (using the
@T
syntax) which were intended to
represent GC types [
16
]. The core implementation used reference counting though there
were several, sometimes short-lived, cycle detectors [
17
]. Managed pointer support was
removed
around a year before the first stable release of Rust. This was not the end of the story for
'GC as a core part of Rust', with core Rust developers exploring the problem space in more
detail [
15
21
22
]. Over time these efforts dwindled, and those interested in GC for Rust largely
moved from anticipating
rustc
support to expecting to have to do everything in user-level
libraries.
One of the earliest user-level GC for Rust libraries is
Bacon-Rajan-CC
12
]. This provides a type
Cc
which is similar in intention to
Alloy
's
Gc
. The mechanism by which objects are collected is rather
different: they have a naive reference count, which causes objects outside a cycle to have deterministic
destruction; and users can manually invoke a cycle detector, which uses trial deletion in the style of Bacon and
Rajan [
10
to identify objects in unused cycles. Cycle detection requires users manually implementing a
Trace
trait
which traverses a type's fields. Destructors are used as finalizers: to avoid the problems with Rust
references we solved in Section
7.1
Bacon-Rajan-CC
imposes a
T:'static
lifetime bound on the type
parameter passed to
Cc
. Simplifying slightly, this means that any references in such a type must be
valid for the remaining lifetime of the program, a severe restriction. Unlike our approach to the access of
already-finalized values (Section
7.2
), it can only detect such accesses at runtime, leading to a (safe) Rust
panic
Probably the best known GC for Rust is
Rust-GC
14
] (partly covered in Section
).
Rust-GC
's
Gc
provides a similar API to
Alloy
, with the notable exception that its
Gc
is not, and cannot be, copyable,
thus always requiring calls to
Gc::clone
. Although, like
Alloy
Rust-GC
allows
Gc
values to be
converted into pointers, its lack of conservative GC means that users must ensure that a
Gc
wrapper
is kept alive for the entire lifetime of pointers derived from it. Similarly to
Bacon-Rajan-CC
GCed values are reference counted, with occasional tracing sweeps to identify cycles, though
Rust-GC
performs cycle detection automatically (i.e. it doesn't require manual calls to a
function such as
collect_cycles
). Drop methods are not used as finalizers: if a finalizer is
required, a manual implementation of the
Finalize
trait must be provided; finalizer glue can be
largely, though not fully (see Section
), automatically created by the provided
Trace
macro.
Rust-GC
detects accesses to already-finalized values dynamically at run-time, panicking
if they occur. Unlike
Bacon-Rajan-CC
, these accesses are detected by recording what the
collector's state is in: if the collector is in a 'sweep' phase, any access of a
Gc
leads to a
panic. We have not yet verified whether cross-thread collection / sweeping can evade this
check.
An example of moving beyond reference counting in a GC for Rust is
Shifgrethor
]. It requires
Gc
values to be created by a
Root<'root>
: the resulting
Gc<'root, T>
is then tied to the lifetime of the
Root<'root>
. This allows roots to be precisely identified, but requires explicitly having access to a
Root<'root>
whenever a
Gc<'root, T>
is used. As with
Rust-GC
Shifgrethor
requires users to
manually implement a
Finalize
trait, though
Shifgrethor
's is more restrictive: not only can other
GCed values not be accessed (implicitly solving the same problem as Section
7.2
) but any other
type without the same
'root
lifetime as the GCed value is forbidden. This means that many
seemingly safe finalizers require implementing the unsafe
UnsafeFinalize
trait. We view
Shifgrethor
as proof that accurately tracking GC roots in normal Rust without reference
counting is possible, though it cannot deal with references being converted into pointers and
usize
s.
A different means of tackling the root-finding problem is
GcArena
32
], which uses branding in a
similar way to
GhostCell
s (see Section
). In essence, users provide a special 'root' type which is the
only place where roots can be stored. Mutating the heap can only be done in the context
of functions that are passed a branded reference to the GCed heap. Once such a function
has completed,
GcArena
is in full control of the GC heap, and knows that only the root
type needs to be scanned for roots. This leads to a precise guarantee about GC reference
lifetimes. However, if code executes in an arena for too long, the system can find itself starved of
resources, with no way of recovering, even if much of the arena is no longer used.
GcArena
was originally part of the
Piccolo
VM (which was itself previously called
Luster
), a Lua VM
written in Rust. Such VMs have a frequently executed main loop which is a natural point for a
program to relinquish references to the GCed heap, but this is not true of many other GCed
programs.
One attempt to improve upon
Rust-GC
is
Bronze
11
], though it shows how challenging it can be to meaningfully improve GC for
Rust: both of its main advances have subsequently been disabled because they are not just unsound but
actively lead to crashes. First,
Bronze
tried to solve the root-finding problem by using LLVM's
gc.root
intrinsic at function entries to generate stack-maps (a run-time mechanism for accurately tracking active
pointers). This rules out the false positives that are inevitable in conservative GC. However,
Bronze
could not track nested references: if a
Gc
was used as a field in a struct, it was not tracked
by the GC. Second,
Bronze
tried to give GC in Rust similar semantics to non-ownership
languages such as Java. It did this by allowing shared mutation, undermining Rust's borrow
checker.
Chrome's rendering engine
Blink
uses the conservative GC
Oilpan
. It has the interesting
property that it has two classes of finalizers. 'Full finalizers' are similar to finalizers in
Alloy
running on a finalizer thread at an indeterminate future point, but with the difference that they
can only reference parts of a GCed value. To mitigate this, 'pre-finalizers' are run by the
collector on the same thread as mutator as soon as an object as recognised as unused, and can
access all of a GCed value. Pre-finalizers are necessary, but not encouraged, because they
implicitly pause the stop-the-world phase of the collector. This reflects the fact that latency is a
fundamental concern for a rendering engine:
Alloy
currently makes no pretences to being low
latency.
11
Conclusions
We introduced a novel design for GC in Rust that solves a number of outstanding challenges in GC for
Rust, as well as – by taking advantage of Rust's unusual static guarantees – some classical GC finalizer
problems. By making integration with existing Rust code easier than previous GCs for Rust, we hope to
have shown a pragmatic route for partial or wholesale migration of Rust code that would benefit from
GC.
Challenges and future opportunities remain. For example,
Alloy
is an 'all or nothing' cost: if you
want to use
Gc
in a single location, you must pay the costs of the GC runtime and so
on.
Alloy
's absolute speed is, we believe, limited by
BDWGC
: it is probable that using a
semi-precise GC and/or a faster conservative GC could change our view of the absolute performance
speed
In summary, we do not claim that
Alloy
is the ultimate design for GC in Rust – reasonable people
may, for example, disagree on whether the costs of conservative GC are worth the gains –
but it does show what can be achieved if one is willing to alter the language's design and
rustc
Data Availability Statement
The accompanying artefact [
18
] contains: the source code necessary to run this paper's experiment
(including generating figures etc.) from scratch; and data from a run of the experiment that we used in
this paper.
Acknowledgments
This work was funded by an EPSRC PhD studentship and the Shopify / Royal
Academy of Engineering Research Chair in Language Engineering. We thank Steve Klabnik and Andy
Wingo for comments.
References
[1]
Mads Ager, Erik Corry, Vyacheslav Egorov, Kentaro Hara, Gustav Wibling, and Ian Zerny. 2013. Oilpan: tracing garbage collection for Blink.
. Accessed on 2024-10-15.
[2]
Saoirse Aronson. 2018. shifgrethor.
. Accessed: 2024-10-15.
[3]
David F Bacon, Perry Cheng, and VT Rajan. 2004. A unified theory of garbage collection. In
OOPSLA
. 50–68.
[4]
David F Bacon and Vadakkedathu T Rajan. 2001. Concurrent cycle collection in reference counted systems. In
ECOOP
. 207–235.
doi:10.1007/3-540-45337-7_12
[5]
Stephen M. Blackburn, Zixian Cai, Rui Chen, Xi Yang, John Zhang, and John Zigman. 2025. Rethinking Java Performance Analysis. In
ASPLOS
. 940–954.
doi:10.1145/3669940.3707217
[6]
Hans-J Boehm. 2003. Destructors, finalizers, and synchronization. In
POPL
doi:10.1145/604131.604153
[7]
Hans-J Boehm and Mike Spertus. 2007. Optimization-robust finalization.
. Accessed: 2024-08-08.
[8]
Hans-J Boehm and Mike Spertus. 2009. Garbage collection in the next C++ standard. In
ISMM
. 30–38.
doi:10.1145/1542431.1542437
[9]
Hans-Juergen Boehm and Mark Weiser. 1988. Garbage collection in an uncooperative environment.
SPE
18, 9 (Sept. 1988), 807–820.
doi:10.1002/spe.4380180902
[10]
Thom Chiovoloni. 2015. Typed Arena.
. Accessed: 2025-03-25.
[11]
Michael Coblenz, Michelle Mazurek, and Michael Hicks. 2022. Does the Bronze Garbage Collector Make Rust Easier to Use? A Controlled Experiment. In
ICSE
doi:10.1145/3510003.3510107
[12]
Nick Fitzgerald. 2015. bacon-rajan-cc: A reference counted type with cycle collection for Rust.
. Accessed: 2024-10-15.
[13]
Brian Goetz and Mikael Vidstedt. 2021. JEP 421: Deprecate Finalization for Removal.
. Accessed: 2024-10-15.
[14]
Manish Goregaokar. 2015. rust-gc.
. Accessed: 2024-10-15.
[15]
Manish Goregaokar. 2016. GC support in Rust: API design.
. Accessed: 2024-10-15.
[16]
Manish Goregaokar. 2021. A tour of safe tracing GC designs in Rust.
. Accessed: 2024-10-15.
[17]
Graydon Hoare. 2022. Reply to 'What should be included in a history of the Rust language?'.
[18]
Jacob Hughes and Laurence Tratt. 2025. Reproduction Package for Article 'Garbage Collection for Rust: The Finalizer Frontier'. Zenodo.
doi:10.5281/zenodo.17013382
[19]
Richard Jones, Antony Hosking, and Eliot Moss. 2023.
The Garbage Collection Handbook: the Art of Automatic Memory Management
(second ed.). Chapman and Hall/CRC.
doi:10.1201/9781003276142
[20]
Steve Klabnik and Carol Nichols. 2018.
The Rust Programming Language
. No Starch Press.
doi:10.5555/3271463
[21]
Felix S. Klock. 2015. GC and Rust: specifying the problem.
. Accessed: 2024-10-15.
[22]
Felix S. Klock. 2016. GC and Rust: The roots of the problem.
. Accessed: 2024-10-15.
[23]
LLVM. 2014. Garbage collection with LLVM.
. Accessed: 2024-10-15.
[24]
Niko Matsakis. 2013. Destructors and finalizers in Rust.
. Accessed: 2024-10-15.
[25]
Benjamin C Pierce. 2004.
Advanced topics in Types and Programming Languages
. MIT press.
doi:10.7551/mitpress/1104.001.0001
[26]
Filip Pizlo. 2017. Introducing Riptide: WebKit's retreating wavefront concurrent garbage collector.
. Accessed: 2024-10-15.
[27]
Rust. 2014. The Rust language reference.
. Accessed: 2024-10-01.
[28]
Rifat Shahriyar, Stephen M. Blackburn, and Kathryn S. McKinley. 2014. Fast conservative garbage collection. In
OOPSLA
. 121–139.
doi:10.1145/2660193.2660198
[29]
Alekesy Shpilëv. 2020. JVM Anatomy Quark #8: Local Variable Reachability.
. Accessed: 2024-10-08.
[30]
Bjarne Stroustrup. 1997.
The C++ Programming Language
(third ed.). Addison-Wesley.
[31]
Lei Wang and Xikun Sun. 2006. Escape analysis for synchronization removal. In
SAC
. 1419–1423.
doi:10.1145/1141277.1141607
[32]
Catherine West. 2019. gc-arena: An experimental system for Rust garbage collection.
. Accessed: 2024-10-15.
[33]
Jason Williams. 2018. Boa: an experimental JavaScript lexer, parser and interpreter written in Rust.
. Accessed: 2024-10-15.
[34]
Milian Wolff. 2014. Heaptrack - A Heap Memory Profiler for Linux.
. Accessed: 2025-03-25.
[35]
Joshua Yanovski, Hoang-Hai Dang, Ralf Jung, and Derek Dreyer. 2021. GhostCell: separating permissions from data in Rust. 5 (Aug. 2021), 1–30.
doi:10.1145/3473597
Appendix
function RemoveElidableDrops(func):
for each basic_block ∈ func do
if IsDropTerminator(basic_block.terminator.kind) then
ty ← GetTypeOfDroppedValue(block.terminator);
if IsGcType(ty) then
if not RequiresFinalizer(ty) then
ReplaceTerminator(basic_block);
function ReplaceTerminator(basic_block):
drop_func ← GetDropFunc(basic_block.terminator);
last_block ← GetLastBasicBlock(drop_func);
block.terminator ← last_block.terminator;
Algorithm 3:
Removing elidable drops
Additional Experimental Data
Benchmark
Description
Reason for exclusion
bevy
ECS game engine in Rust
Unable to port successfully (see Section
8.1.2
dyon
Scripting language in Rust
Unable to port successfully (see Section
8.1.2
jiff
A datetime library for Rust
Too few allocations to measure
mini-moka
Concurrent in-memory cache library
Too few allocations to measure
numbat
Math search engine
Too few allocations to measure
rkyv
Zero-copy deserialization framework
Insufficient
Gc
coverage in benchmarks
RustPython
Python interpreter written in Rust
Difficulty retro-fitting
__del__
semantics (see Section
8.1.2
rust-analyzer
Language server for Rust
Unable to port successfully (see Section
8.1.2
salsa
Incremental recomputation library
Too few allocations to measure
WLambda
Scripting language written in Rust
Insufficient
Gc
coverage in benchmarks
Table 4.
Rust programs excluded from our benchmark suite after attempted porting to
Alloy
Wall-clock time (s)
Gc
Rc
Gc
Rust-GC
Arena
Alacritty
0.41 [0.39, 0.45]
0.40 [0.38, 0.44]
Binary Trees
0.11 [0.11, 0.11]
0.15 [0.14, 0.15]
0.33 [0.32, 0.33]
0.03 [0.03, 0.04]
fd
0.33 [0.29, 0.38]
0.31 [0.26, 0.37]
grmtools
3.06 [3.00, 3.14]
3.24 [3.17, 3.31]
Regex-Redux
0.47 [0.47, 0.47]
0.45 [0.45, 0.46]
Ripgrep
1.61 [1.55, 1.69]
1.52 [1.45, 1.59]
som-rs-ast
0.92 [0.88, 0.95]
0.79 [0.76, 0.82]
som-rs-bc
0.28 [0.27, 0.29]
0.29 [0.28, 0.30]
Table 5.
Wall-clock times (in seconds) with 99% confidence intervals for the
GCvs
experiment comparing different memory management strategies. Strategies not supported by a given benchmark are marked with "–".
Wall-clock time (s)
Ratio
jemalloc
BDWGC
Alacritty
0.36
[0.33, 0.40]
0.40
[0.38, 0.44]
1.11
Binary Trees
0.12
[0.12, 0.12]
0.15
[0.14, 0.15]
1.26
fd
0.30
[0.25, 0.36]
0.31
[0.26, 0.37]
1.02
grmtools
3.09
[3.01, 3.17]
3.24
[3.17, 3.31]
1.05
Regex-Redux
0.45
[0.44, 0.45]
0.45
[0.45, 0.46]
1.01
Ripgrep
1.46
[1.40, 1.53]
1.52
[1.45, 1.59]
1.04
som-rs-ast
0.77
[0.74, 0.80]
0.79
[0.76, 0.82]
1.02
som-rs-bc
0.28
[0.27, 0.29]
0.29
[0.28, 0.30]
1.02
Table 6.
Wall-clock times comparing jemalloc and
BDWGC
as allocators for
Rc
-only code (i.e., no GC). The ratio column shows
BDWGC
time divided by jemalloc time.
Wall-clock time (s)
jemalloc
BDWGC
Rc
Gc
Rc
Alacritty
Cur. Motion
0.65
±0.02
0.66
±0.01
0.66
±0.01
Dense Cells
2.16
±0.03
2.16
±0.02
2.14
±0.04
Light Cells
0.37
±0.01
0.37
±0.01
0.38
±0.01
Scroll
0.25
±0.01
0.26
±0.02
0.26
±0.01
Scroll (fullscreen)
0.38
±0.01
0.39
±0.01
0.38
±0.01
Scroll Btm
0.32
±0.00
0.33
±0.00
0.32
±0.01
Scroll Btm (small)
0.32
±0.01
0.33
±0.00
0.32
±0.01
Scroll Top
0.25
±0.02
0.26
±0.02
0.24
±0.02
Scroll Top (small)
0.32
±0.01
0.32
±0.01
0.33
±0.00
Unicode
0.23
±0.02
0.27
±0.01
0.27
±0.02
fd
Cmd Exec.
1.29
±0.01
1.28
±0.01
1.29
±0.02
Cmd Exec. (large)
1.24
±0.01
1.25
±0.03
1.26
±0.01
File Extension
0.13
±0.00
0.15
±0.01
0.13
±0.00
File Type
0.12
±0.00
0.13
±0.01
0.13
±0.00
No Pattern
0.18
±0.00
0.29
±0.02
0.22
±0.01
Simple
0.27
±0.01
0.33
±0.01
0.32
±0.00
Simple (-HI)
0.12
±0.00
0.15
±0.00
0.13
±0.00
som-rs-ast
Bounce
0.62
±0.00
0.96
±0.02
0.80
±0.00
BubbleSort
0.56
±0.01
0.84
±0.01
0.71
±0.00
DeltaBlue
0.77
±0.00
1.09
±0.01
0.98
±0.00
Dispatch
0.59
±0.01
0.97
±0.00
0.77
±0.01
Fannkuch
0.65
±0.00
1.01
±0.06
0.81
±0.00
Fibonacci
0.83
±0.00
1.41
±0.01
1.08
±0.00
FieldLoop
0.94
±0.00
1.20
±0.01
1.07
±0.01
GraphSearch
0.31
±0.00
0.43
±0.01
0.38
±0.01
IntegerLoop
0.52
±0.01
0.84
±0.00
0.67
±0.01
JsonSmall
0.86
±0.00
1.22
±0.01
1.10
±0.00
List
0.38
±0.00
0.53
±0.01
0.48
±0.00
Loop
0.53
±0.01
0.86
±0.01
0.70
±0.00
Mandelbrot
0.46
±0.00
0.69
±0.01
0.57
±0.00
NBody
0.31
±0.00
0.38
±0.01
0.34
±0.00
PageRank
0.31
±0.00
0.44
±0.00
0.39
±0.00
Permute
0.67
±0.00
0.90
±0.02
0.82
±0.00
Queens
0.77
±0.01
1.18
±0.02
0.98
±0.00
QuickSort
0.98
±0.00
1.27
±0.02
1.26
±0.01
Recurse
0.60
±0.00
0.92
±0.00
0.79
±0.00
Richards
2.48
±0.02
3.72
±0.11
3.13
±0.01
Sieve
0.67
±0.00
0.94
±0.01
0.87
±0.00
Storage
0.56
±0.00
0.70
±0.01
0.69
±0.00
Sum
0.52
±0.01
0.83
±0.01
0.67
±0.00
Towers
0.31
±0.00
0.46
±0.01
0.38
±0.01
TreeSort
0.32
±0.00
0.41
±0.01
0.40
±0.00
WhileLoop
0.48
±0.00
0.72
±0.00
0.60
±0.00
grmtools
Eclipse
3.48
±0.03
3.44
±0.03
3.74
±0.04
Hadoop
2.86
±0.01
2.92
±0.01
3.09
±0.01
Jenkins
2.90
±0.00
2.99
±0.05
3.02
±0.01
Spring
2.43
±0.00
2.51
±0.01
2.63
±0.02
Ripgrep
Alternates
1.18
±0.01
1.29
±0.02
1.21
±0.01
Alternates (-i)
1.31
±0.01
1.42
±0.01
1.33
±0.01
Literal
1.12
±0.01
1.23
±0.01
1.15
±0.01
Literal (-i)
1.19
±0.00
1.27
±0.01
1.20
±0.01
Literal (default)
1.08
±0.01
1.19
±0.01
1.10
±0.01
Literal (mmap)
1.64
±0.01
1.75
±0.02
1.67
±0.01
Literal (mmap, -i)
1.67
±0.01
1.80
±0.01
1.72
±0.01
Literal (regex)
1.13
±0.01
1.25
±0.01
1.15
±0.01
UTF Greek
3.29
±0.01
3.41
±0.01
3.33
±0.01
UTF Greek (-i)
3.30
±0.01
3.42
±0.03
3.33
±0.01
UTF Word
1.09
±0.01
1.21
±0.01
1.14
±0.00
UTF Word (alt.)
1.11
±0.01
1.21
±0.01
1.13
±0.00
Word
1.12
±0.00
1.22
±0.02
1.14
±0.01
som-rs-bc
Bounce
0.25
±0.00
0.28
±0.00
0.30
±0.00
BubbleSort
0.23
±0.00
0.27
±0.00
0.27
±0.00
DeltaBlue
0.30
±0.00
0.34
±0.00
0.36
±0.00
Dispatch
0.25
±0.00
0.28
±0.00
0.29
±0.01
Fannkuch
0.25
±0.00
0.29
±0.00
0.29
±0.00
Fibonacci
0.30
±0.01
0.37
±0.00
0.36
±0.00
FieldLoop
0.49
±0.00
0.43
±0.00
0.54
±0.01
GraphSearch
0.12
±0.00
0.12
±0.00
0.15
±0.00
IntegerLoop
0.22
±0.00
0.24
±0.00
0.26
±0.00
JsonSmall
0.37
±0.00
0.39
±0.01
0.44
±0.00
List
0.18
±0.00
0.22
±0.00
0.21
±0.01
Loop
0.23
±0.00
0.26
±0.00
0.27
±0.00
Mandelbrot
0.20
±0.00
0.22
±0.01
0.23
±0.00
NBody
0.11
±0.00
0.12
±0.00
0.13
±0.00
PageRank
0.13
±0.00
0.13
±0.00
0.14
±0.00
Permute
0.24
±0.00
0.27
±0.00
0.27
±0.00
Queens
0.28
±0.01
0.33
±0.00
0.33
±0.00
QuickSort
0.36
±0.00
0.38
±0.00
0.43
±0.00
Recurse
0.25
±0.00
0.31
±0.00
0.30
±0.01
Richards
0.92
±0.00
1.20
±0.01
1.10
±0.01
Sieve
0.25
±0.00
0.26
±0.00
0.30
±0.01
Storage
0.22
±0.00
0.21
±0.00
0.25
±0.00
Sum
0.22
±0.00
0.25
±0.00
0.27
±0.00
Towers
0.13
±0.00
0.15
±0.00
0.15
±0.00
TreeSort
0.11
±0.00
0.12
±0.00
0.13
±0.00
WhileLoop
0.22
±0.00
0.23
±0.00
0.25
±0.00
Table 7.
Wall-clock execution times (in seconds) for each benchmark in the
GCvs
experiment. Values show arithmetic means over 30 runs and include 99% confidence intervals. Because
Binary Trees
and
Regex-Redux
are stand-alone benchmarks they are omitted here; their timings appear in Table
User time (s)
jemalloc
BDWGC
Rc
Gc
Rc
Alacritty
Cur. Motion
1.08
±0.05
1.09
±0.04
1.09
±0.04
Dense Cells
6.66
±0.13
6.62
±0.12
6.58
±0.15
Light Cells
0.34
±0.03
0.34
±0.03
0.34
±0.03
Scroll
0.11
±0.01
0.11
±0.01
0.12
±0.02
Scroll (fullscreen)
0.24
±0.02
0.28
±0.03
0.22
±0.02
Scroll Btm
0.14
±0.01
0.14
±0.01
0.14
±0.01
Scroll Btm (small)
0.14
±0.01
0.15
±0.01
0.14
±0.01
Scroll Top
0.12
±0.01
0.12
±0.02
0.11
±0.01
Scroll Top (small)
0.14
±0.01
0.14
±0.01
0.15
±0.01
Unicode
0.10
±0.01
0.13
±0.01
0.12
±0.01
fd
Cmd Exec.
1.05
±0.01
1.12
±0.02
1.12
±0.02
Cmd Exec. (large)
1.02
±0.01
1.10
±0.03
1.11
±0.02
File Extension
0.04
±0.00
0.04
±0.01
0.04
±0.01
File Type
0.03
±0.01
0.04
±0.01
0.04
±0.01
No Pattern
0.16
±0.01
0.29
±0.02
0.23
±0.01
Simple
0.14
±0.01
0.18
±0.01
0.16
±0.01
Simple (-HI)
0.03
±0.01
0.04
±0.01
0.04
±0.01
som-rs-ast
Bounce
0.62
±0.00
0.96
±0.02
0.79
±0.00
BubbleSort
0.56
±0.00
0.83
±0.01
0.71
±0.00
DeltaBlue
0.77
±0.00
1.07
±0.01
0.97
±0.00
Dispatch
0.59
±0.01
0.96
±0.01
0.76
±0.01
Fannkuch
0.64
±0.00
0.98
±0.05
0.80
±0.00
Fibonacci
0.82
±0.00
1.40
±0.01
1.07
±0.00
FieldLoop
0.94
±0.00
1.20
±0.01
1.07
±0.01
GraphSearch
0.30
±0.00
0.41
±0.01
0.37
±0.00
IntegerLoop
0.52
±0.01
0.83
±0.00
0.66
±0.01
JsonSmall
0.85
±0.00
1.21
±0.01
1.10
±0.00
List
0.38
±0.00
0.52
±0.01
0.47
±0.00
Loop
0.52
±0.01
0.85
±0.01
0.70
±0.00
Mandelbrot
0.45
±0.00
0.69
±0.01
0.57
±0.00
NBody
0.31
±0.00
0.38
±0.01
0.34
±0.00
PageRank
0.30
±0.00
0.43
±0.01
0.38
±0.00
Permute
0.66
±0.00
0.89
±0.02
0.82
±0.00
Queens
0.77
±0.00
1.18
±0.02
0.97
±0.00
QuickSort
0.98
±0.00
1.26
±0.02
1.26
±0.00
Recurse
0.60
±0.00
0.91
±0.00
0.79
±0.00
Richards
2.47
±0.02
3.72
±0.11
3.13
±0.01
Sieve
0.66
±0.00
0.94
±0.01
0.87
±0.00
Storage
0.55
±0.00
0.69
±0.01
0.68
±0.00
Sum
0.52
±0.00
0.82
±0.01
0.67
±0.00
Towers
0.31
±0.00
0.46
±0.01
0.38
±0.00
TreeSort
0.31
±0.00
0.40
±0.01
0.40
±0.00
WhileLoop
0.47
±0.00
0.72
±0.00
0.59
±0.00
grmtools
Eclipse
3.32
±0.03
3.28
±0.03
3.59
±0.04
Hadoop
2.77
±0.01
2.80
±0.01
2.99
±0.01
Jenkins
2.74
±0.01
2.81
±0.04
2.90
±0.01
Spring
2.35
±0.01
2.40
±0.01
2.54
±0.02
Ripgrep
Alternates
0.42
±0.02
0.51
±0.02
0.43
±0.02
Alternates (-i)
0.54
±0.02
0.65
±0.02
0.57
±0.02
Literal
0.34
±0.02
0.44
±0.01
0.36
±0.01
Literal (-i)
0.41
±0.02
0.50
±0.02
0.44
±0.02
Literal (default)
0.30
±0.02
0.40
±0.01
0.32
±0.01
Literal (mmap)
0.43
±0.02
0.52
±0.02
0.46
±0.02
Literal (mmap, -i)
0.48
±0.02
0.58
±0.02
0.50
±0.02
Literal (regex)
0.35
±0.02
0.44
±0.02
0.36
±0.01
UTF Greek
2.55
±0.02
2.64
±0.02
2.58
±0.03
UTF Greek (-i)
2.56
±0.03
2.66
±0.03
2.56
±0.02
UTF Word
0.35
±0.02
0.42
±0.02
0.37
±0.02
UTF Word (alt.)
0.33
±0.02
0.43
±0.02
0.35
±0.01
Word
0.35
±0.02
0.43
±0.02
0.36
±0.01
som-rs-bc
Bounce
0.25
±0.00
0.28
±0.00
0.29
±0.00
BubbleSort
0.22
±0.00
0.27
±0.00
0.27
±0.00
DeltaBlue
0.30
±0.00
0.34
±0.00
0.35
±0.01
Dispatch
0.24
±0.00
0.28
±0.00
0.29
±0.00
Fannkuch
0.24
±0.00
0.28
±0.00
0.28
±0.00
Fibonacci
0.30
±0.00
0.36
±0.00
0.36
±0.00
FieldLoop
0.48
±0.00
0.42
±0.01
0.53
±0.01
GraphSearch
0.12
±0.00
0.11
±0.00
0.14
±0.00
IntegerLoop
0.22
±0.00
0.24
±0.00
0.26
±0.00
JsonSmall
0.36
±0.00
0.39
±0.01
0.44
±0.00
List
0.17
±0.00
0.22
±0.00
0.21
±0.01
Loop
0.22
±0.00
0.25
±0.00
0.27
±0.00
Mandelbrot
0.19
±0.00
0.21
±0.01
0.23
±0.00
NBody
0.11
±0.00
0.12
±0.00
0.13
±0.00
PageRank
0.12
±0.00
0.12
±0.00
0.14
±0.00
Permute
0.23
±0.00
0.26
±0.00
0.27
±0.00
Queens
0.27
±0.01
0.33
±0.00
0.33
±0.00
QuickSort
0.35
±0.00
0.38
±0.00
0.42
±0.00
Recurse
0.25
±0.00
0.31
±0.00
0.30
±0.01
Richards
0.92
±0.00
1.20
±0.01
1.10
±0.01
Sieve
0.25
±0.00
0.26
±0.00
0.30
±0.01
Storage
0.22
±0.00
0.20
±0.00
0.25
±0.00
Sum
0.22
±0.00
0.24
±0.00
0.26
±0.00
Towers
0.12
±0.00
0.15
±0.00
0.14
±0.00
TreeSort
0.11
±0.00
0.11
±0.00
0.12
±0.00
WhileLoop
0.21
±0.00
0.22
±0.00
0.25
±0.00
Table 8.
User times (in seconds) for each benchmark in the
GCvs
experiment. Values show arithmetic means over 30 runs and include 99% confidence intervals.
Heap Size (MiB)
Relative wall-clock time
Benchmarks failed
Alacritty
16
0.96
[0.91, 0.99]
32
0.98
[0.95, 1.02]
64
0.94
[0.89, 0.98]
Binary Trees
0.88
[0.82, 1.02]
0.90
[0.80, 1.01]
16
0.87
[0.82, 0.94]
fd
16
0.94
[0.90, 0.99]
32
0.94
[0.88, 0.98]
64
0.94
[0.89, 1.00]
grmtools
1024
1.01
[1.00, 1.02]
2/4 (Eclipse, Jenkins)
2048
1.00
[1.00, 1.01]
4096
1.01
[1.00, 1.02]
Regex-Redux
256
0.94
[0.92, 0.95]
512
0.93
[0.90, 0.94]
1024
0.96
[0.92, 1.07]
Ripgrep
32
0.96
[0.95, 0.96]
64
0.95
[0.94, 0.95]
128
0.94
[0.93, 0.95]
som-rs-ast
64
0.72
[0.71, 0.74]
2/4 (Fannkuch, TreeSort)
96
0.74
[0.73, 0.75]
2/4 (Fannkuch, TreeSort)
128
0.75
[0.74, 0.76]
som-rs-bc
32
0.79
[0.78, 0.80]
64
0.79
[0.77, 0.80]
128
0.84
[0.83, 0.86]
Table 9.
The effects of fixing the heap size (see Section 8.1.3) on wall-clock time, reported as ratios relative to
BDWGC
's default adaptive RSS strategy. We chose per-benchmark heap sizes based on our observations of their memory usage, though some benchmarks fail at smaller values. Broadly speaking, these results show that the performance differences due to different heap sizes are relatively minor, and that
BDWGC
's adaptive strategy has not unduly coloured our results.
Fin. elided (%)
Alacritty
Cur. Motion
0.00
Dense Cells
0.00
Light Cells
0.00
Scroll
0.00
Scroll Btm
0.00
Scroll Btm (small)
0.00
Scroll (fullscreen)
0.00
Scroll Top
0.00
Scroll Top (small)
0.00
Unicode
0.00
fd
Cmd Exec.
8.93
Cmd Exec. (large)
9.09
File Extension
72.73
File Type
24.54
No Pattern
0.04
Simple
61.54
Simple (-HI)
61.54
som-rs-ast
Bounce
100.00
BubbleSort
100.00
DeltaBlue
100.00
Dispatch
100.00
Fannkuch
100.00
Fibonacci
100.00
FieldLoop
100.00
GraphSearch
99.99
IntegerLoop
100.00
JsonSmall
100.00
List
100.00
Loop
100.00
Mandelbrot
100.00
NBody
99.99
PageRank
99.99
Permute
100.00
Queens
100.00
QuickSort
100.00
Recurse
100.00
Richards
100.00
Sieve
100.00
Storage
100.00
Sum
100.00
Towers
99.99
TreeSort
99.99
WhileLoop
100.00
grmtools
Eclipse
11.79
Hadoop
32.99
Jenkins
26.74
Spring
37.25
Ripgrep
Alternates
79.04
Alternates (-i)
79.04
Literal
79.04
Literal (-i)
79.04
Literal (mmap, -i)
79.04
Literal (default)
79.04
Literal (mmap)
79.04
Literal (regex)
79.04
UTF Greek
79.04
UTF Greek (-i)
79.04
UTF Word
79.04
UTF Word (alt.)
79.04
Word
79.04
som-rs-bc
Bounce
72.50
BubbleSort
76.34
DeltaBlue
73.86
Dispatch
77.78
Fannkuch
72.74
Fibonacci
70.37
FieldLoop
83.33
GraphSearch
76.56
IntegerLoop
75.00
JsonSmall
71.85
List
79.41
Loop
74.82
Mandelbrot
70.93
NBody
86.17
PageRank
70.95
Permute
79.23
Queens
79.04
QuickSort
69.33
Recurse
82.61
Richards
71.17
Sieve
73.15
Storage
73.84
Sum
75.00
Towers
78.29
TreeSort
65.19
WhileLoop
74.98
Table 10.
Percentage of finalizers
Alloy
was able to elide for each benchmark.
Figure 5.
Wall-clock and user time performance comparison for finalizer elision on each benchmark. The bars show the relative performance of
Alloy
after applying our elision optimization, normalized against the baseline (solid black line). The vertical blue line marks the overall geometric mean (with shaded area for CIs). User time often shows greater improvement than wall-clock time, as elision reduces the CPU overhead of the finalization thread.
Wall-clock time (s)
Before elision
After elision
Alacritty
Cur. Motion
0.66
±0.01
0.66
±0.01
Dense Cells
2.17
±0.03
2.17
±0.02
Light Cells
0.39
±0.01
0.38
±0.01
Scroll
0.27
±0.01
0.27
±0.01
Scroll (fullscreen)
0.39
±0.01
0.39
±0.01
Scroll Btm
0.33
±0.00
0.33
±0.00
Scroll Btm (small)
0.33
±0.00
0.33
±0.00
Scroll Top
0.25
±0.01
0.24
±0.04
Scroll Top (small)
0.33
±0.00
0.33
±0.01
Unicode
0.26
±0.02
0.26
±0.02
fd
Cmd Exec.
1.29
±0.02
1.29
±0.02
Cmd Exec. (large)
1.25
±0.01
1.25
±0.02
File Extension
0.15
±0.00
0.14
±0.01
File Type
0.13
±0.01
0.13
±0.01
No Pattern
0.28
±0.02
0.29
±0.02
Simple
0.34
±0.01
0.35
±0.00
Simple (-HI)
0.14
±0.01
0.14
±0.01
som-rs-ast
Bounce
7.68
±0.40
1.03
±0.01
BubbleSort
5.27
±0.29
0.92
±0.01
DeltaBlue
7.84
±0.19
1.17
±0.01
Dispatch
6.52
±0.46
1.04
±0.01
Fannkuch
6.76
±0.09
1.15
±0.03
Fibonacci
22.16
±1.87
1.54
±0.01
FieldLoop
5.14
±0.16
1.25
±0.01
GraphSearch
2.82
±0.06
0.47
±0.00
IntegerLoop
5.88
±0.11
0.90
±0.01
JsonSmall
11.37
±0.77
1.33
±0.01
List
4.06
±0.06
0.58
±0.00
Loop
6.02
±0.14
0.90
±0.01
Mandelbrot
4.67
±0.11
0.74
±0.01
NBody
2.91
±0.16
0.40
±0.00
PageRank
2.44
±0.05
0.47
±0.00
Permute
6.69
±0.35
0.97
±0.02
Queens
9.07
±0.34
1.28
±0.02
QuickSort
10.86
±0.66
1.39
±0.02
Recurse
7.66
±0.24
0.98
±0.00
Richards
97.40
±10.46
3.95
±0.08
Sieve
7.06
±0.13
1.01
±0.01
Storage
5.06
±0.10
0.75
±0.01
Sum
5.91
±0.12
0.89
±0.01
Towers
3.20
±0.14
0.50
±0.01
TreeSort
4.65
±0.32
0.45
±0.00
WhileLoop
5.03
±0.21
0.76
±0.01
grmtools
Eclipse
5.09
±0.05
3.59
±0.06
Hadoop
4.49
±0.03
3.07
±0.05
Jenkins
4.45
±0.02
3.02
±0.01
Spring
4.11
±0.03
2.65
±0.01
Ripgrep
Alternates
1.44
±0.03
1.39
±0.01
Alternates (-i)
1.55
±0.02
1.50
±0.01
Literal
1.38
±0.03
1.32
±0.01
Literal (-i)
1.45
±0.02
1.38
±0.01
Literal (default)
1.38
±0.02
1.30
±0.01
Literal (mmap)
1.90
±0.02
1.81
±0.01
Literal (mmap, -i)
1.90
±0.03
1.84
±0.01
Literal (regex)
1.36
±0.03
1.33
±0.01
UTF Greek
3.53
±0.01
3.50
±0.02
UTF Greek (-i)
3.51
±0.02
3.52
±0.02
UTF Word
1.39
±0.02
1.30
±0.01
UTF Word (alt.)
1.36
±0.02
1.31
±0.02
Word
1.36
±0.02
1.31
±0.01
som-rs-bc
Bounce
3.03
±0.04
0.31
±0.00
BubbleSort
2.64
±0.04
0.29
±0.00
DeltaBlue
4.19
±0.08
0.37
±0.00
Dispatch
3.80
±0.36
0.30
±0.00
Fannkuch
2.86
±0.23
0.31
±0.00
Fibonacci
4.34
±0.20
0.40
±0.00
FieldLoop
2.45
±0.16
0.45
±0.01
GraphSearch
1.15
±0.03
0.13
±0.00
IntegerLoop
3.00
±0.16
0.26
±0.00
JsonSmall
4.96
±0.80
0.42
±0.01
List
2.69
±0.09
0.24
±0.01
Loop
3.07
±0.18
0.28
±0.00
Mandelbrot
2.21
±0.15
0.23
±0.00
NBody
1.23
±0.06
0.13
±0.00
PageRank
1.06
±0.01
0.14
±0.00
Permute
2.72
±0.09
0.29
±0.00
Queens
3.19
±0.07
0.35
±0.00
QuickSort
3.62
±0.05
0.41
±0.00
Recurse
4.27
±1.91
0.33
±0.00
Richards
15.51
±0.89
1.31
±0.01
Sieve
2.57
±0.07
0.28
±0.00
Storage
2.45
±0.05
0.22
±0.00
Sum
3.06
±0.37
0.27
±0.00
Towers
1.59
±0.05
0.16
±0.00
TreeSort
1.19
±0.02
0.13
WhileLoop
3.26
±0.08
0.24
±0.01
Table 11.
Wall-clock execution times (seconds) for each benchmark in the
Elision
experiment, shown before and after applying
Alloy
's finalizer elision optimisation. Values show arithmetic means over 30 runs, with 99% confidence intervals.
User time (s)
Before elision
After elision
Alacritty
Cur. Motion
1.08
±0.06
1.12
±0.05
Dense Cells
6.70
±0.12
6.68
±0.13
Light Cells
0.34
±0.04
0.33
±0.04
Scroll
0.12
±0.01
0.12
±0.01
Scroll (fullscreen)
0.27
±0.03
0.26
±0.03
Scroll Btm
0.15
±0.01
0.15
±0.01
Scroll Btm (small)
0.15
±0.01
0.14
±0.01
Scroll Top
0.12
±0.01
0.12
±0.06
Scroll Top (small)
0.15
±0.01
0.14
±0.01
Unicode
0.13
±0.02
0.13
±0.02
fd
Cmd Exec.
1.13
±0.01
1.13
±0.02
Cmd Exec. (large)
1.10
±0.02
1.11
±0.02
File Extension
0.05
±0.01
0.04
±0.00
File Type
0.04
±0.00
0.04
±0.01
No Pattern
0.27
±0.02
0.30
±0.02
Simple
0.19
±0.01
0.20
±0.01
Simple (-HI)
0.04
±0.01
0.04
±0.01
som-rs-ast
Bounce
11.15
±0.41
1.02
±0.01
BubbleSort
7.88
±0.40
0.90
±0.01
DeltaBlue
11.72
±0.21
1.15
±0.01
Dispatch
9.15
±0.63
1.03
±0.01
Fannkuch
9.60
±0.12
1.12
±0.03
Fibonacci
26.41
±1.82
1.53
±0.01
FieldLoop
6.42
±0.17
1.25
±0.01
GraphSearch
3.92
±0.07
0.45
±0.01
IntegerLoop
8.23
±0.13
0.90
±0.01
JsonSmall
16.08
±0.75
1.32
±0.01
List
5.79
±0.09
0.57
±0.00
Loop
8.51
±0.17
0.89
±0.01
Mandelbrot
6.62
±0.12
0.73
±0.01
NBody
3.98
±0.15
0.40
±0.00
PageRank
3.73
±0.07
0.47
±0.00
Permute
9.96
±0.34
0.96
±0.02
Queens
13.04
±0.33
1.27
±0.02
QuickSort
16.43
±0.65
1.38
±0.02
Recurse
11.18
±0.28
0.98
±0.00
Richards
109.49
±10.34
3.95
±0.08
Sieve
10.49
±0.18
1.00
±0.01
Storage
7.70
±0.12
0.74
±0.01
Sum
8.27
±0.14
0.89
±0.01
Towers
4.61
±0.14
0.49
±0.01
TreeSort
5.90
±0.35
0.44
±0.00
WhileLoop
6.74
±0.20
0.76
±0.01
grmtools
Eclipse
5.81
±0.07
3.42
±0.06
Hadoop
5.27
±0.04
2.96
±0.05
Jenkins
5.21
±0.04
2.86
±0.01
Spring
4.94
±0.05
2.54
±0.01
Ripgrep
Alternates
0.58
±0.02
0.50
±0.02
Alternates (-i)
0.72
±0.02
0.64
±0.02
Literal
0.53
±0.02
0.44
±0.02
Literal (-i)
0.59
±0.02
0.51
±0.01
Literal (default)
0.48
±0.02
0.40
±0.02
Literal (mmap)
0.63
±0.02
0.53
±0.02
Literal (mmap, -i)
0.65
±0.02
0.57
±0.02
Literal (regex)
0.52
±0.02
0.43
±0.01
UTF Greek
2.70
±0.02
2.66
±0.03
UTF Greek (-i)
2.71
±0.03
2.67
±0.02
UTF Word
0.53
±0.02
0.42
±0.02
UTF Word (alt.)
0.52
±0.02
0.43
±0.02
Word
0.51
±0.02
0.44
±0.02
som-rs-bc
Bounce
4.34
±0.04
0.30
±0.00
BubbleSort
3.76
±0.05
0.29
±0.00
DeltaBlue
5.76
±0.11
0.36
±0.00
Dispatch
4.91
±0.35
0.29
±0.00
Fannkuch
3.85
±0.29
0.30
±0.00
Fibonacci
6.15
±0.17
0.40
±0.00
FieldLoop
3.14
±0.15
0.45
±0.01
GraphSearch
1.54
±0.03
0.12
±0.00
IntegerLoop
3.94
±0.15
0.26
±0.00
JsonSmall
6.70
±0.73
0.41
±0.01
List
3.72
±0.09
0.24
±0.01
Loop
4.11
±0.17
0.28
±0.00
Mandelbrot
2.91
±0.13
0.23
±0.00
NBody
1.72
±0.06
0.12
±0.00
PageRank
1.51
±0.02
0.13
±0.00
Permute
3.84
±0.09
0.28
±0.00
Queens
4.53
±0.07
0.35
±0.00
QuickSort
5.28
±0.09
0.40
±0.00
Recurse
5.84
±1.85
0.33
±0.00
Richards
20.23
±0.81
1.30
±0.01
Sieve
3.76
±0.12
0.27
±0.01
Storage
3.23
±0.05
0.22
±0.00
Sum
4.03
±0.36
0.26
±0.00
Towers
2.20
±0.05
0.16
±0.00
TreeSort
1.71
±0.03
0.13
±0.00
WhileLoop
3.90
±0.09
0.24
±0.01
Table 12.
User times (seconds) for each benchmark in the
Elision
experiment, shown before and after applying
Alloy
's finalizer elision optimisation. Values show arithmetic means over 30 runs, with 99% confidence intervals.
Avg. heap footprint (MiB)
Before elision
After elision
Alacritty
Cur. Motion
3.76
±0.06
3.79
±0.06
Dense Cells
3.74
±0.06
3.74
±0.06
Light Cells
3.76
±0.08
3.73
±0.07
Scroll
3.88
±0.13
3.87
±0.12
Scroll (fullscreen)
6.46
±0.28
6.46
±0.34
Scroll Btm
3.72
±0.07
3.73
±0.07
Scroll Btm (small)
3.75
±0.05
3.74
±0.07
Scroll Top
3.78
±0.08
3.75
±0.10
Scroll Top (small)
3.76
±0.06
3.71
±0.08
Unicode
3.73
±0.15
3.77
±0.09
fd
execution
17.18
±0.33
17.89
±0.87
extension
13.52
±0.75
13.68
±0.84
hi
15.14
±0.36
14.99
±0.59
output
17.54
±0.65
17.47
±0.53
pattern
37.79
±5.22
38.03
±5.25
type
15.80
±0.40
15.84
±0.38
som-rs-ast
bounce
309.85
±43.90
362.44
±7.24
bubblesort
334.97
±6.55
209.99
±3.88
deltablue
305.24
±31.74
310.83
±5.32
dispatch
345.26
±18.67
329.90
±4.22
fannkuch
412.43
±5.54
363.19
±8.31
fibonacci
382.32
±22.98
587.04
±4.00
fieldloop
326.47
±8.38
300.93
±5.66
graphsearch
100.82
±4.86
61.15
±0.26
integerloop
323.50
±16.42
320.62
±0.44
jsonsmall
257.67
±34.84
316.56
±0.93
list
236.78
±5.27
156.13
±4.11
loop
332.20
±14.75
326.89
±7.76
mandelbrot
276.68
±4.14
236.49
±2.42
nbody
151.00
±2.63
137.42
±2.78
pagerank
193.25
±1.12
168.02
±0.95
permute
284.84
±43.08
371.67
±11.43
queens
278.89
±12.36
288.03
±2.63
quicksort
515.72
±20.82
424.56
±18.48
recurse
429.81
±12.37
410.97
±5.30
richards
1187.31
±98.12
1472.67
±31.17
sieve
355.11
±17.79
360.95
±6.06
storage
196.36
±9.70
193.06
±6.60
sum
320.61
±11.55
298.24
±2.27
towers
174.90
±4.23
175.82
±0.26
treesort
92.65
±4.87
98.82
±0.92
whileloop
272.66
±10.05
253.07
±3.60
grmtools
Eclipse
1993.86
±9.04
1069.95
±28.40
Hadoop
1882.88
±1.55
800.15
±48.62
Jenkins
1876.35
±14.92
962.16
±29.20
Spring
1752.72
±1.68
1036.27
±19.48
Ripgrep
Alternates
28.23
±1.59
15.80
±0.40
Alternates (-i)
29.52
±1.66
17.50
±0.66
Literal
25.14
±1.52
15.26
±0.33
Literal (-i)
26.69
±1.56
18.17
±0.41
Literal (default)
25.62
±1.59
15.25
±0.39
Literal (mmap)
26.91
±1.78
16.03
±0.39
Literal (mmap, -i)
28.72
±1.66
17.34
±0.24
Literal (regex)
25.87
±2.01
15.05
±0.46
UTF Greek
26.57
±1.40
17.43
±0.47
UTF Greek (-i)
26.85
±1.54
18.04
±0.65
UTF Word
28.03
±1.57
18.13
±0.14
UTF Word (alt.)
25.56
±1.60
15.48
±0.46
Word
29.35
±2.05
15.82
±0.51
som-rs-bc
bounce
138.59
±5.18
134.24
±0.22
bubblesort
121.69
±6.72
115.25
±0.03
deltablue
186.12
±1.32
135.10
±2.09
dispatch
168.63
±7.00
137.08
±0.01
fannkuch
139.94
±5.95
115.98
±0.02
fibonacci
205.88
±10.08
175.88
±0.03
fieldloop
98.52
±4.07
89.07
±0.01
graphsearch
45.81
±1.21
24.33
±0.74
integerloop
133.10
±6.16
117.24
±0.01
jsonsmall
164.80
±5.83
142.20
±5.29
list
106.38
±7.84
98.95
±0.04
loop
144.73
±6.09
120.89
±0.00
mandelbrot
90.31
±4.84
80.51
±0.03
nbody
58.61
±1.77
48.15
±0.03
pagerank
69.21
±0.08
51.90
±0.46
permute
126.65
±5.94
109.74
±0.03
queens
157.93
±5.89
134.31
±0.02
quicksort
183.42
±4.54
175.50
±0.04
recurse
169.34
±7.22
146.41
±0.00
richards
582.34
±41.87
487.95
±0.03
sieve
131.11
±5.83
114.14
±3.04
storage
71.95
±2.32
85.36
±3.26
sum
136.99
±5.40
117.56
±0.00
towers
71.09
±3.09
58.84
±0.03
treesort
57.12
±2.12
44.90
±1.02
whileloop
132.37
±2.45
96.28
±0.02
Table 13.
Average heap footprint (MiB) across benchmarks in the
Elision
experiment, measured before and after applying
Alloy
's finalizer elision optimisation. Values are arithmetic means over 30 runs with resampled traces from heaptrack, with 99% confidence intervals (see Section 8.1.4 for details).
Although it is outside the scope of this paper, it would be preferable for Rust to have different types for 'data width' and 'address'. Modern C, for example, captures this difference with the
size_t
and
uintptr_t
types respectively. Rust now has a provenance lint to nudge users in this general direction, but the
as
keyword still allows arbitrary conversions.
The lengthier syntax
y = Gc::clone(&v)
is available, since every copyable type is also cloneable.
These features were added to the C+11 specification but removed in C++23.
Mostly. Rust's 'temporary lifetime extension' delays destruction, but for how long is currently unspecified.
A program could pause after 'exit' while all queued finalizers are run. We are not aware of a system which does this.
This is a white lie, though the visible effect is the same.
RawTable
is contained in the separate
hashbrown
crate which is then included in Rust's standard library. We previously maintained a fork of this, but synchronising it is painful. For now, at least, we have hacked explicit knowledge of
RawTable
into the
needs_finalize
function.
Initially we wrapped this macro in a function. This led to surprising performance results: an analysis of LLVM IR suggests – but the sheer quantity of data makes it hard to confirm – that this may be due to the effect on inlining heuristics.
Our initial implementation of FSA was, in essence, an extension of Rust's type system and suffered from false positives. For example, porting
Alacritty
led to 110 errors — but our abstract interpretation approach raises just 2.
In commit
10
While the term 'trial deletion' does not appear in the original paper, it is now widely used to describe part of the algorithm.
Internal
info@soft-dev.org