Software Development Team: Garbage Colle

Software Development Team: Garbage Collection for Rust: The Finalizer Frontier
Software Development Team
About
Events
Jobs
People
Projects
Publications
Software
Talks
Cite this paper as:
Jacob Hughes, Laurence Tratt,
Garbage Collection for Rust: The Finalizer Frontier,
OOPSLA 2025, Pages 3588 - 3614, October 2025.
Also available as:
DOI
PDF (arxiv)
See also:
Experiment
DOI
).
Garbage Collection for Rust: The Finalizer Frontier
Jacob Hughes
and
Laurence Tratt
October, 2025
Abstract
Rust is a non-Garbage Collected (GCed) language, but the lack of GC
makes expressing data-structures that require shared ownership awkward,
inefficient, or both. In this paper we explore a new design for, and
implementation of, GC in Rust, called
Alloy
. Unlike previous approaches to GC in
Rust,
Alloy
allows existing Rust
destructors to be automatically used as GC finalizers: this makes
Alloy
integrate better with existing Rust code
than previous solutions but introduces surprising soundness and
performance problems.
Alloy
provides
novel solutions for the core problems:
finalizer safety
analysis
rejects unsound destructors from automatically being
reused as finalizers;
finalizer elision
optimises away
unnecessary finalizers; and
premature finalizer prevention
ensures that finalizers are only run when it is provably safe to do
so.
Introduction
Amongst the ways one can classify programming languages are whether they are Garbage Collected
(GCed) or not: GCed languages enable implicit memory management; non-GCed languages require
explicit memory management (e.g
's
malloc
free
functions). Rust's use of affine types [
25
, p. 5] and
ownership does not fit within this classification: it is not GCed but it has implicit scope-based
memory management. Most portions of Rust programs are as succinct as a GCed equivalent, but
ownership is too inflexible to express
shared ownership
for data-structures that require multiple
owners (e.g. doubly linked lists). Workarounds such as reference counting impose an extra
burden on the programmer, make mistakes more likely, and often come with a performance
penalty.
In an attempt to avoid such problems, there are now a number of GCs for Rust (e.g. [
11
14
32
33
]).
Most introduce a user-visible type
Gc
which takes a value
of type
and moves
to the 'GC
heap'. The
Gc
value itself is a wrapper around a pointer to
on the GC heap.
Gc
can be
cloned
(i.e. duplicated) and
dereferenced
to a value of type
&T
(i.e. a type-safe pointer) at
will by the user. When no
Gc
wrappers pointing to
can be found, indirectly or directly,
from the program's
roots
(e.g. variables on the stack), then the GC heap memory for
can be
reclaimed.
It has proven hard to find a satisfying design and implementation for a GC for Rust, as perhaps
suggested by the number of attempts to do so. We identify two fundamental challenges for
GC for Rust: how to give
Gc
an idiomatic and complete API; and how to make
finalizers
(i.e. the code that is run just before a value is collected by the GC) safe, performant, and
ergonomic.
struct GcNode { value: u8, nbr: Option>>}
impl Drop for GcNode { fn drop(&mut self) { println!("drop {}", self.value); } }

fn main() {
let mut gc1 = Gc::new(RefCell::new(GcNode { value: 1, nbr: None } ));
gc1.borrow_mut().nbr = Some(gc1);
let gc2 = Gc::new(RefCell::new(GcNode { value: 2, nbr: None } ));
gc2.borrow_mut().nbr = Some(gc2);
gc1 = gc2;
force_gc();
println!("{} {}", gc1.borrow().value, gc1.borrow().nbr.unwrap().borrow().value);
Listing 1.
An
Alloy
example, showing
Gc
and destructors as finalizers. We create a type
GcNode
which
models a graph: it stores an 8 bit integer value and a possibly-null
reference (via Rust's standard
Option
type) to a neighbouring node
line 1
). We add a normal Rust destructor which
Alloy
is able to use as a
finalizer when
GcNode
is used inside
Gc
line 2
).
Inside
main
we create the first GCed node in the graph (
line 5
).
We use Rust's normal
RefCell
type to allow the node to be mutated
(using the
RefCell::borrow_mut
method which dynamically checks
for mutation that would undermine Rust's static rules)
and a cycle created directly back to itself (
line 6
). We then create a second cyclic graph (
lines 7
and
), immediately assigning it to the
gc1
variable (
line 9
):
this copies, rather than moves, the
Gc
This causes the first cyclic graph
GcNode{value: 1, ..}
to no longer be reachable, so after forcing a collection (
line 10
) that node
can be collected. Its finalizer is then scheduled to be run, causing
drop 1
to be printed out at a later point; when it has completed the GC
heap memory can be reclaimed. The print statement outputs
2 2
line
11
).
In this paper we introduce
Alloy
, a new GC for Rust: an example of its use is shown in Listing
Alloy
uses
conservative
garbage collection (i.e. treating each reachable machine word as a potential
pointer), which naturally solves the API challenge. However, the finalization challenge is much
more involved: the causes of this challenge, and our solutions to it, occupy the bulk of this
paper.
Normal Rust code uses
destructors
(i.e. code which is run just before a value is reclaimed by Rust's
implicit memory management) extensively. Although finalizers and destructors may seem to be
synonyms, existing GCs for Rust cannot reuse destructors as finalizers: the latter must be manually
implemented for each type that needs it. Unfortunately, even this is trickier than it appears: it is not
possible to implement a finalizer for
Gc
if
is an external library; some parts of destructors are
automatically created by the Rust compiler, but hand-written finalizers must duplicate those parts
manually; and users can accidentally cause a type's finalizer to be run more than once. In short,
finalization in existing GCs for Rust is unpalatable.
GCs for Rust are not alone in requiring manually written finalizers. In a close cousin to our work, a
GC proposal for C++, the reuse of destructors as finalizers was ruled out due to seemingly
insoluble problems [
, p. 32], which we divide into four categories: (1) some safe destructors are
not safe finalizers; (2) finalizers can be run prematurely; (3) running finalizers on the same
thread as a paused mutator can cause race conditions and deadlocks; (4) and finalizers are
prohibitively slower than destructors. All are, at least to some degree, classical GC problems; all are
exacerbated in some way by Rust; and none, with the partial exception of #2, has existing
solutions.
We show that it is possible to reuse most Rust destructors as finalizers in a satisfying way. We
introduce novel solutions to the long-standing problems this implies by making use of some of Rust's
unusual static guarantees. We thus gain a better GC for Rust
and
solutions to open GC problems. Our
solutions, in order, are: (1)
finalizer safety analysis
extends Rust's static analyses to reject
programs whose destructors are not provably safe to be used as finalizers; (2)
premature finalizer
prevention
automatically inserts fences to prevent the GC from being 'tricked' into collecting
values before they are dead; (3) we run finalizers on a separate thread; and (4) and
finalizer
elision
statically optimises away finalizers if the underlying destructor duplicates the GC's
work.
Alloy
as an implementation is necessarily tied to Rust, though most of the novel techniques in this
paper rely on general properties of affine types and ownership. While we do not wish to claim generality
without evidence, it seems likely that many of the techniques in this paper will generalise to other
ownership-based languages, as and when such emerge.
Although
Alloy
is not production ready, its performance is already reasonable: when we control for
the (admittedly somewhat slow) conservative GC (
BDWGC
Alloy
currently uses, the performance of
Alloy
varies from 0.74× to, in the worst case, 1.17× that of reference counting.
Alloy
is also sufficiently
polished (e.g. good quality error messages) in other ways for it to: show a plausible path forwards for
those who may wish to follow it; and to allow others to evaluate whether GC for Rust is a good idea or
not.
This paper is divided into four main parts: GC and Rust background (Section
);
Alloy
's basic
design (Section
); destructor and finalizer challenges and solutions (Sections 4 to 7); and
evaluation (Section
). The first three parts have the challenge that our work straddles two
areas that can seem mutually exclusive: GC and Rust. We have tried to provide sufficient
material for readers expert in one of these areas to gain adequate familiarity with the other,
without boring either, but we encourage readers to skip material they are already comfortable
with.
Background
2.1
The Challenges of Shared Ownership in Rust
Rust uses affine types and
ownership
to statically guarantee that: a value has a single owner (e.g. a
variable); an owner can
move
(i.e. permanently transfer the ownership of) a value to another owner;
and when a value's owner goes out of scope, the value's destructor is run and its backing
memory reclaimed. An owner can pass
references
to a value to other code, subject to the
following static restrictions: there can be multiple immutable references ('
') to a value or a single
mutable reference ('
&mut
'); and references cannot outlast the owner. These rules allow many
Rust programs to be as succinct as their equivalents in GCed languages. This suggests that
the search for a good GC for Rust may be intellectually stimulating but of little practical
value.
However, there are many programs which need to express data structures which do not fit into the
restrictions of affine types and ownership. These are often described as 'cyclic data-structures', but in this
paper we use the more abstract term 'shared ownership', which includes, but is not limited to, cyclic
data-structures.
A common way of expressing shared ownership is to use the reference counting type
Rc
from
Rust's standard library. For many data-structures, this is a reasonable solution, but some forms of shared
ownership require juggling strong and weak counts. This complicates programs (see Listing
) and can
cause problems when values live for shorter or longer than intended.
struct RcNode { value: u8, nbr: Option>> }
impl Drop for RcNode { fn drop(&mut self) { println!("drop {}", self.value); } }

fn main() {
let mut rc1 = Rc::new(RefCell::new(RcNode{value: 1, nbr: None}));
rc1.borrow_mut().nbr = Some(Rc::downgrade(&rc1));
let rc2 = Rc::new(RefCell::new(RcNode{value: 2, nbr: None}));
rc2.borrow_mut().nbr = Some(Rc::downgrade(&rc2));
rc1 = Rc::clone(&rc2);
println!("{} {}", rc1.borrow().value,
rc1.borrow().nbr.as_ref().unwrap().upgrade().unwrap().borrow().value);
Listing 2.
A version of Listing
using Rust's standard reference counting type
Rc
. To avoid
memory leaks we use
weak
references between nodes (
line 1
). We again create two cyclic
graphs (
lines 5
) using
Rc::downgrade
to create weak references (
lines 6
and
). Since
Rc
is
not copyable, we must use a manual
clone
call to have both the
rc1
and
rc2
variables point to
the same cyclic graph (
line 9
). Accessing a neighbour node becomes a delicate dance requiring
upgrading the weak reference (
line 11
). The need to downgrade
Rc
to
Weak
and upgrade
(which may fail, hence the
unwrap
) back to
Rc
creates significant extra complexity relative
to Listing
: compare line 11 in Listing
to
lines 10
11
above.
A different solution is to store values in a vector and use integer indices into that vector. Such indices
are morally closer to machine pointers than normal Rust references: the indices can become stale,
dangle, or may never have been valid in the first place. The programmer must also manually
deal with issues such as detecting unused values, compaction, and so on. In other words,
the programmer ends up writing a partial GC themselves. A variant of this idea are
arenas
which gradually accumulate multiple values but free all of them in one go: values can no
longer be reclaimed too early, though arenas tend to unnecessarily increase the lifetime of
values.
A type-based approach is
GhostCell
35
], which uses
branding
to statically ensure that at any given
point only one part of a program can access a shared ownership data-structure. This necessarily excludes
common use cases where multiple owners (e.g. in different threads) need to simultaneously access
disjoint parts of a data-structure.
Although it is easily overlooked, some workarounds (e.g.
Rc
) rely on using
unsafe
Rust
(i.e. parts of the language, often involving pointers, that are not fully statically checked by the
compiler). Pragmatically, we assume that widely used code, even if technically unsafe, has
been pored over sufficiently that it is trustworthy in practise. However, 'new' solutions that a
programmer implements using unsafe Rust are unlikely to immediately reach the same level of
trustworthiness.
While we do not believe that every Rust program would be improved by GC, the variety of
workarounds already present in Rust code, and the difficultly of creating new ones, suggests that there is
a subset that would benefit from GC.
2.2
GC Terminology
GC is a venerable field and has accumulated terminology that can seem unfamiliar or unintuitive. We mostly
use the same terminology as Jones et al [
19
],
the major parts of which we define here.
A program which uses GC is split between the
mutator
(the user's program) and the
collector
(the GC
itself). At any given point in time, a thread is either running as a mutator or a collector. In our context, all
threads run as a collector at least sometimes (for reasons that will become apparent later, some threads
always run as a collector). Tracing and reclamation is performed during a
collection
phase. Our
collections always
stop-the-world
, where all threads running mutator code are paused while collection
occurs.
tracing
GC is one that scans memory looking for reachable values from a program's roots: values,
including cycles of values, that are not reachable from the roots can then be
reclaimed
. In contrast, a pure
reference counting GC does not scan memory, and thus cannot free values that form a cycle. Increasingly,
GC implementations make use of multiple techniques (see [
]) but, for simplicity's sake, we assume that
implementations wholly use one technique or another except otherwise stated. For brevity, we use 'GC'
as a short-hand for 'tracing GC'; when we deal with other kinds of GC (e.g. reference counting), we
explicitly name them.
We refer to memory which is allocated via
Gc
as being on the
GC heap
. We use the term 'GC value'
to refer both to the pointer wrapped in a
Gc
and the underlying value on the GC heap, even though
multiple pointers / wrappers can refer to a single value on the GC heap, unless doing so would lead to
ambiguity.
We use '
Alloy
' to refer to the combination of: our extension to the Rust language; our modifications to
the
rustc
compiler; and our integration of the Boehm-Demers-Weiser GC (
BDWGC
) into the runtime of
programs compiled with our modified
rustc
Alloy
: Design and Implementation
In this section we outline
Alloy
's basic design and implementation choices – the rest of the paper then
goes into detail on the more advanced aspects.
3.1
Basic Design
Alloy
provides a
Gc
type that exposes an API modelled on the
Rc
type from Rust's
standard library, because
Rc
: is conceptually similar to
Gc
; widely used in Rust code, and
its API familiar; and that API reflects long-term experience about what Rust programmers
need.
When a user calls
Gc::new(v)
, the value
is moved to the GC heap: the
Gc
value returned
to the user is a simple wrapper around a pointer to
's new address. The same underlying
GCed value may thus have multiple, partly or wholly overlapping, references active at any
point. To avoid undermining Rust's ownership system, dereferencing a
Gc
produces an
immutable (i.e. '
') reference to the underlying value. If the user wishes to mutate the underlying
value, they must use other Rust types that enable
interior mutability
(e.g.
RefCell
or
Mutex
).
One feature that
Alloy
explicitly supports is the ability in Rust to cast references to raw pointers and
back again. This can occur in two main ways.
Gc
can be dereferenced to
&T
which can
then, as with any other reference, be converted to *const T (i.e. a C-esque pointer to T).
Gc
also supports the common Rust functions (
into_raw
and
from_raw
) which wrap the
value-to-pointer conversion in a slightly higher-level API. The ability to convert references to raw
pointers is used in many places (e.g. Rust's standard C Foreign Function Interface (FFI)).
We believe that a viable GC for Rust must allow the same conversions, but doing so has a
profound impact because Rust allows raw pointers to be converted to the integer type
usize
and
back
Having acknowledged that pointers can be 'disguised' as integers, it is then inevitable that
Alloy
must
be a conservative GC: if a machine word's integer value, when considered as a pointer, falls within a
GCed block of memory, then that block itself is considered reachable (and is transitively scanned). Since a
conservative GC cannot know if a word is really a pointer, or is a random sequence of bits that happens
to be the same as a valid pointer, this over-approximates the
live set
(i.e. the blocks that the GC will not
reclaim). Typically the false detection rate is very low (see e.g. a Java study which measures it at under
0.01% of the live set [
28
]).
Conservative GC occupies a grey zone in programming language semantics: in most languages, and
most compiler's internal semantics, conservative GC is, formally speaking, unsound; and
furthermore some languages (including Rust) allow arbitrary 'bit fiddling' on pointers, temporarily
obscuring the address they are referring to. Despite this, conservative GC is widely used,
including in the two most widespread web browsers: Chrome uses it in its Blink rendering
engine [
] and Safari uses it in its JavaScript VM JavaScriptCore [
26
]. Even in 2025,
we lack good alternatives to conservative GC: there is no cross-platform API for precise GC; and while
some compilers such as LLVM provide some support for GC features [
23
], we have found them
incomplete and buggy. Despite the potential soundness worries, conservative GC thus remains a widely
used technique.
Conservative GC enables
Alloy
to make a useful ergonomic improvement over most other GCs
for Rust whose
Gc
is only
cloneable
. Such types can be duplicated, but doing so requires
executing arbitrary user code. To make the possible run-time cost of this clear, Rust has no
direct syntax for cloning: users must explicitly call
Rc::clone(&v)
to duplicate a value
. In
contrast, since
Alloy
's
Gc
is just a wrapper around a pointer it is not just cloneable but
also
copyable
: duplication only requires copying bytes (i.e. no arbitrary user code need be
executed). Copying is implied by assignment (i.e.
w = v
), reducing the need for explicit
cloning
This is not just a syntactic convenience but also reflects an underlying semantic difference: duplicating a
Gc
in
Alloy
is is a cheaper and simpler operation than most other GCs for Rust which which tend to
rely, at least in part, on reference counting.
There is one notable limitation of
Gc
's API relative to
Rc
. The latter, by definition, knows how
many references there are to the underlying data, allowing the value stored inside it to be mutably
borrowed at run-time if there is only a single reference to it (via
get_mut
and
make_mut
). In contrast,
Gc
cannot know how many references there are to the underlying data. As we shall see in Section
some Rust programs are built around the performance advantages of this API (e.g. turning 'copy on
write' into just 'write' in some important cases).
3.2
Basic Implementation
The most visible aspect of
Alloy
is its fork, and extension of, the standard Rust compiler
rustc
. We
forked
rustc
1.79.0, adding or changing approximately 5,500 Lines of Code (LoC) in the core compiler,
and adding approximately 2,250 LoC of tests.
Alloy
uses
BDWGC
] as the underlying conservative GC, because it is the most widely ported
conservative GC we know of. We use
BDWGC
's
GC_set_finalize_on_demand(1)
API, which causes
finalizers to be run on their own thread.
We had to make some minor changes to
BDWGC
to suit our situation. First, we disabled
BDWGC
's
parallel collector because it worsens
Alloy
's performance. It is unclear to us why this happens:
we observe significant lock contention within
BDWGC
during GC collections, but have not
correlated this with a cause. Second,
BDWGC
cannot scan pointers stored in thread locals
because these are platform dependent. Fortunately,
rustc
uses LLVM's thread local storage
implementation, which stores such pointers in the
PT_TLS
segment of the ELF binary: we modified
BDWGC
to scan this ELF segment during each collection. Third,
BDWGC
dynamically intercepts
thread creation calls so that it can can scan their stacks, but (for bootstrapping reasons) is
unable to do so in our context: we explicitly changed
Alloy
to register new threads with
BDWGC
Destructors and Finalizers
In many GCed languages, 'destructor' and 'finalizer' are used as synonyms, as both terms refer to code
run when a value's lifetime has ended. In existing GCs for Rust, these two terms refer to completely
different hierarchies of code (i.e. destructors and finalizers are fundamentally different). In
Alloy
, in
contrast, a reasonable first approximation is that finalizers are a strict subset of destructors. In this
section we pick apart these differences, before describing the challenges of using destructors as
finalizers.
When a value in Rust is
dropped
(i.e. at the point its owner goes out of lexical scope) its destructor is
automatically run. Rust's destructors enable a style of programming that originated in C++ called RAII
(Resource Acquisition Is Initialization) [
30
, Section 14.4]: when a value is dropped, the underlying
resources it possesses (e.g. file handles or heap memory) are released. Destructors are used frequently in
Rust code (to give a rough idea: approximately 15% of source-level types in our benchmark suite have
destructors).
Rust destructors are formed of two parts, run in the following order: a user-defined
drop method
; and
automatically inserted
drop glue
. Drop methods are optional and users can provide one for a type by
implementing the
Drop
trait's
drop
method. Drop glue recursively calls destructors of contained types
(e.g. fields in a struct). Although it is common usage to conflate 'destructor' in Rust with drop methods,
drop glue is an integral part of a Rust destructor: we therefore use 'destructor' as the umbrella term for
both drop methods and drop glue.
When considering finalizers for a GC for Rust, there are several layers of design choices. We will
shortly see that finalizers cause a number of challenges (Section
4.1
) and one choice would be to forbid
finalizers entirely. However, this would mean that one could not sensibly embed types that have
destructors in a
Gc
. While Rust does not always call destructors, the situations where this occurs are
best considered 'exceptional': not calling destructors from
Gc
would completely undermine
reasonable programmer expectations. Because of this,
Alloy
, and indeed virtually all GCs for Rust,
support finalizers in some form.
However, existing GCs force distinct notions of destructors and finalizers onto the programmer. Where
the former have the
Drop
trait, the latter typically have a
Finalize
trait. If a user type needs to be
finalized then the user must provide an implementation of the
Finalize
trait. However, doing so
introduces a number of problems: (1) external libraries are unlikely to provide finalizers, so they must be
manually implemented by each consumer; (2) Rust's
orphan rule
27
, Section 6.12] prevents one
implementing traits for types defined in external libraries (i.e. unless a library's types were designed to
support
Gc
, those types cannot be directly GCed); (3) one cannot automatically replicate drop glue
for finalizers; and (4) one cannot replicate
rustc
's refusal to allow calls to the equivalent of
Drop::drop
Programmers can work around problems #1 and #2 in various ways. For example, they can wrap
external library types in
newtypes
(zero-cost wrappers) and implement finalizers on those instead [
20
Section 19.3]. Doing so is tedious but not conceptually difficult.
Problem #3 has partial solutions: for example, [
14
] uses the
Trace
macro to generate
finalizer glue
(the
finalizer equivalent of drop glue) for struct fields. This runs into an unsolvable variant of problem #2:
types in external libraries will not implement this trait and cannot be recursively scanned for finalizer
glue.
Problem #4 is impossible to solve in Rust as-is. One cannot define a function that can never be called —
what use would such a function have? A possible partial solution might seem to be for the
finalize
method take ownership of the value, but
Drop::drop
does not do so because that would not allow drop
glue to be run afterwards.
4.1
General Challenges When Using Destructors as Finalizers
We have stated as our aim that
Alloy
should use destructors as finalizers. Above we explained some
Rust-specific challenges — but there are several non-Rust-specific challenges too! Fundamentally, finalizers
and destructors have different, and sometimes incompatible, properties. The best guide to these differences,
and the resulting problems, is Boehm [
], supplemented by later work on support for GC in the C++
specification [
An obvious difference between destructors and finalizers is when both are run. While C++ and Rust define precisely when a destructor will be
run
finalizers run at an unspecified point in time. This typically happens at some point after the
equivalent destructor would run, though a program may exit before any given finalizer is
run
There are, however, two situations which invert this. First, if a thread exits due to an error, and the
program is either not compiled with unwinding, or the thread has crossed a non-unwinding ABI
boundary, then destructors might not be run at all, where a GC will naturally run the equivalent
finalizers: we do not dwell on this, as both behaviours are reasonable in their different contexts. Second,
and more surprisingly, it is possible for finalizers in non-error situations to run
prematurely
, that is before
the equivalent destructor [
, section 3.4].
A less obvious difference relates to where destructors and finalizers are run. Destructors run in the
same thread as the last owner of a value. However, running finalizers in the same thread as the last owner
of the value can lead to race conditions [
24
] and deadlocks [
, section 3.3] if a finalizer tries to
access a resource that the mutator expects to have exclusive access too. When such problems
affect destructors in normal Rust code, it is the clear result of programmer error, since they
should have taken into account the predictable execution point of destructors. However, since
finalizers do not have a predictable execution point, there is no way to safely access shared
resources if they are run on the same thread. The only way to avoid this is to run finalizers
on a non-mutator thread — but not all Rust types / destructors are safe to run on another
thread.
There are several additional differences such as: finalizers can reference other GCed values
that are partly, or wholly, 'finalized' and may have had their backing memory reused; and
finalizers can
resurrect
values by copying the reference passed to the finalizer and storing it
somewhere.
Over time, finalizers have thus come to be viewed with increasing suspicion. Java, for example, has
deprecated, and intends eventually removing, per-type finalizers: instead it has introduced deliberately
less flexible per-object 'cleaners', whose API prevents problems such as object resurrection and per-class
finalization [
13
].
4.2
The Challenge of Finalizers for
Alloy
At this point we hope to have convinced the reader that: a viable GC for Rust needs to be able to use
existing destructors as finalizers whenever possible; but that finalizers, even in existing GCs, cause
various problems.
It is our belief that some problems with finalizers are fundamental. For example, finalizers inevitably
introduce latency between the last use of a value and its finalization.
Some problems with finalizers are best considered the accidental artefacts of older designs.
Java's cleaners, for example, can be thought of as a more restrictive version of finalizers that
allow most common use-cases but forbid by design many dangerous use cases. For example,
per-class/struct finalization can easily be replaced by per-object/value finalization; and object
resurrection can be prevented if object access requires a level of indirection.
Alloy
benefits from
our better shared understanding of such problems and the potential solutions: it trivially
addresses per-object/value finalization (
Gc::new_unfinalizable
function turns finalization off
for specific values) and, as we shall see later, via only slightly more involved means, object
resurrection.
However, that leaves many problems that are potentially in the middle: they are not obviously
fundamental, but there are not obvious fixes for them either. In our context, where we wish to use
destructors as finalizers, four problems have hitherto been thought insoluble [
, p. 32]: (1) finalizers are
prohibitively slower than destructors; (2) finalizers can run prematurely; (3) running finalizers on the
same thread as a paused mutator can cause race conditions and deadlocks; (4) some safe destructors are
not safe finalizers.
Fortunately for us, Rust's unusual static guarantees, suitably expanded by
Alloy
, allow us to
address each problem in novel, satisfying, ways. In the following section, we tackle these
problems in the order above, noting that we tackle problems #1 and #2 separately, and #3 and #4
together.
Finalizer Elision
As we shall see in Section
, there is a correlation between the number of finalizers that are run and
overhead from GC (with a worst case, albeit a definite outlier, in our experiment of 3.35× slowdown). In
this section we show how to reduce the number of finalizers that are run, which helps reduce this
overhead.
A variety of factors contribute to the finalizer performance overhead, including: a queue of finalizers
must be maintained, whereas destructors can be run immediately; finalizers run some time after the last
access of a value, making cache misses more likely; and finalizers can cause values (including values they
own) to live for longer (e.g. leading to increased memory usage and marking overhead). Most of these
factors are inherent to any GC and our experience of using and working on
BDWGC
– a mature, widely
used GC – does not suggest that it is missing optimisations which would overcome all of this
overhead.
Instead, whenever possible,
Alloy
elides
finalizers so that they do not need to be run at all. We are able
to do this because: (1)
BDWGC
is responsible for all allocations and will, if necessary GC allocations even
if they are not directly wrapped in a
Gc
; and (2) many Rust destructors only free memory which
BDWGC
would, albeit with some latency, do anyway.
Consider the standard Rust type
Box
which heap allocates space for a value; when a
Box
value
is dropped, the heap allocation will be freed. We can then make two observations. First,
Box
's drop
method solely consists of a
deallocate
call. Second, while we informally say that
Box
allocates on
the 'heap' and
Gc
allocates on the 'GC heap', all allocations in
Alloy
are made through
BDWGC
and
stored in the same heap.
When used as a finalizer,
Box
's drop method is thus unneeded, as the underlying memory will be
freed by
BDWGC
anyway. This means that there is no need to run a finalizer for a type such as
Gc>
at all, and the finalizer can be statically elided. However, we cannot elide a
finalizer for a type such as
Gc>>
because
Rc
's drop method must be run for the
reference count to be decremented. As this shows, we must consider the complete destructor, and
not just the top-level drop method, when deciding whether a corresponding finalizer can be
elided.
function NeedsFinalizer(T):
if Impls(T, Drop) and not Impls(T, DropMethodFinalizerElidable) then
return true;
for each field ∈ T do
if NeedsFinalizer(field) then
return true;
return false;
Algorithm 1.
Finalizer Elision
impl Gc {
pub fn new(value: T) -> Self {
if needs_finalizer::() { Gc::new_with_finalizer(value) }
else { Gc::new_without_finalizer(value) }
...
Listing 3.
A simplified view of how finalizers are elided inside
Alloy
. The new compiler
intrinsic
needs_finalizer
returns true if a finalizer is required for a type. The
Gc
type uses this
intrinsic to ensure that the value is registered as requiring a finalizer. With optimisations turned
on, this seemingly dynamic, branching code will be turned into static, branchless code.
5.1
Implementing Finalizer Elision
Finalizer elision statically determines which type's destructors do not require corresponding finalizers
and elides them. It does so conservatively, and deals correctly with drop glue.
As shown in Algorithm
, any type which implements the
Drop
trait requires finalization unless it also
implements the new
DropMethodFinalizerElidable
marker trait
(i.e. a trait without methods). This
trait can be used by a programmer to signify that a type's drop method need not be called if the type is
placed inside a
Gc
. The 'Drop' part of the trait name is deliberate (i.e. it is not a simplification of
'destructor') as it allows the programmer to reason about a type locally (i.e. without considering drop
glue or concrete type paramaters). If the type has a transitively reachable field whose type implements
the
Drop
trait but not the
DropMethodFinalizerElidable
trait, then then the top-level type still
requires finalization.
Even though neither normal Rust destructors or
Alloy
finalizers are guaranteed to run, a program
whose destructors or finalizers never run would probably not be usable (leaking resources such as
memory, deadlocking, and so on). We therefore make
DropMethodFinalizerElidable
an unsafe trait,
because implementing it inappropriately is likely to lead to undesired – though not incorrect! –
behaviour at run-time.
Alloy
modifies the standard Rust library to implement
DropMethodFinalizerElidable
on the following types:
Box
Vec
RawVec
VecDeque
LinkedList
BTreeMap
BTreeSet
HashMap
HashSet
RawTable
and
BinaryHeap
. Fortunately, not only are these types' drop methods compatible with
DropMethodFinalizerElidable
, but they are extensively used in real Rust code: they enable significant
numbers of finalizers to be elided.
Listing
shows the new const compiler intrinsic
needs_finalizer
we added to implement
Algorithm
. The intrinsic is evaluated at compile-time: its result can be inlined into
Gc::new
, allowing
the associated conditional to be removed too. In other words – compiler optimisations allowing – the
'does this specific type require a finalizer?' check has no run-time overhead.
Premature Finalizer Prevention
Most of us assume that finalizers are always run later than the equivalent destructor would have run, but
they can sometimes run before [
, section 3.4], undermining soundness. Such premature finalization is
also possible in
Alloy
as described thus far (see Listing
). In this section we show how to prevent
premature finalization.
There are two aspects to premature finalization. First, language specifications often do not define, or do
not precisely define, when the earliest point that a value can be finalized is. While this means
that, formally, there is no 'premature' finalization, it seems unlikely that language designers
anticipated some of the resulting implementation surprises (see e.g. this example in Java [
29
]).
Second, compiler optimisations – at least in LLVM – are 'GC unaware', so optimisations such
as scalar replacement can change the point in a program when GCed values appear to be
finalizable.
struct S { value: u8 }
impl Drop for S { fn drop(&mut self) { self.value = 0; } }

fn main() {
let root = Gc::new(Box::new(S{ value: 1 }));
let inner: &u8 = &**root.value;
force_gc();
println!("{}", *inner);
Listing 4.
An example of possible premature finalization. We create a new struct
line 1
) with
a drop method that sets the wrapped integer to zero (
line 2
). In the main method, we move an
instance of the struct into a
Box
, which we then move into a
Gc
line 4
). We obtain a Rust
reference to the inner integer (
line 5
), which at run-time will be a pointer to the
Box
. At this
point, the compiler can determine that the
Gc
is no longer used and overwrite
root
's pointer
(which may be in a register). If a collection then occurs, a finalizer can run
's drop method,
causing the program to print '0' instead of the expected '1' (
line 7
).
In our context, it is natural to define premature finalization as a (dynamic) finalizer for a
Gc
value
running before the (static)
Gc
owner has gone out of scope. Similar to the high-level proposal mooted
in [
, Solution 1], we must ensure that the dynamic lifetime of a reference derived from a
Gc
matches
or exceeds the lifetime of the
Gc
itself.
Our solution relies on adjusting
Gc
's drop method to keep alive a GCed value for at least the static
lifetime of the
Gc
itself. In other words, we ensure that the conservative GC will always see a pointer
to a GCed value while the corresponding
Gc
is in-scope.
However, there is a major problem to overcome: copyable types such as
Gc
are forbidden from
having destructors. The fundamental challenge we have to solve is that each copied value will have a
destructor called on it, which has the potential for any shared underlying value to be destructed more
than once.
Alloy
explicitly allows
Gc
– but no other copyable type – to have a destructor, but to
ensure it doesn't cause surprises in the face of arbitrary numbers of copies, the destructor must be
idempotent. Our task is made easier because
Gc
naturally has no drop glue from Rust's perspective:
Gc
consists of a field with a pointer type, and such types are opaque from a destruction
perspective.
We therefore only need to make sure that
Gc
's drop method is idempotent. Fortunately, this is
sufficient for our purposes: we want the drop method to inhibit finalization but that does not
require run-time side effects. To achieve this, we use a
fence
. These come in various flavours.
What we need is a fence that prevents both: the compiler from reordering computations
around a particular syntactic point; and the CPU from reordering computations around a
particular address. We copy the platform specific code from the
BDWGC
GC_reachable_here
macro
into
Gc
's drop method, which achieves the effect we require.
6.1
Optimising Premature Finalizer Prevention
The drop method we add to
Gc
fully prevents premature finalization. It also naturally solves a
performance problem with the suggested solution for C++ in [
, Solution 1], which requires keeping
alive all pointers, no matter their type, for their full scope. By definition, our solution only
keeps alive
Gc
values: the compiler is free to optimise values of other types as it so wishes.
However, on an artificial microbenchmark we observed a noticeable overhead from our fence
insertion.
We thus implemented a simple optimisation: we only insert fences for a
Gc
if it has a finalizer.
Intuitively, it seems that we should not generate drop methods in such cases, but this is difficult to do
directly inside
rustc
. Instead, we suppress calls to the drop methods of such types: the two approaches
are functionally equivalent, though ours does put an extra burden on dead-code elimination in the
compiler tool-chain.
Alloy
adds a new pass
RemoveElidableDrops
to
rustc
's Mid-Level Intermediate Representation
(MIR) processing. MIR is best thought of as the main IR inside
rustc
: it contains the complete set of
functions in the program, where each function consists of a sequence of basic blocks. Simplifying
somewhat, function and drop method calls are represented as different kinds of
terminators
on basic
blocks. Terminators reference both a callee and a successor basic block.
The
remove_elidable_drops
pass iterates over a program's MIR, identifies drop method
terminators which reference elidable finalizers, and turns them into 'goto' terminators to the
successor basic basic block. Algorithm 4 in the Appendix presents a more formal version of this
algorithm.
Finalizer Safety Analysis
In this section we address two high-level problems: running finalizers on the same thread as a paused
mutator can cause race conditions and deadlocks; and some safe destructors are not safe finalizers.
Addressing the former problem is conceptually simple – finalizers must be run on a separate thread – but
we must ensure that doing so is sound. We therefore consider this a specific instance of the latter
problem, treating both equally in this section.
We therefore introduce Finalizer Safety Analysis (FSA), which prevents unsafe (in the sense of 'not
safe Rust') destructors being used as finalizers. As a first approximation, FSA guarantees
that finalizers are memory safe, cycle safe (i.e. do not access already finalized objects), and
thread safe. We present the three main components of FSA individually before bringing them
together.
7.1
Rust References
Gc
can store, directly or indirectly, normal Rust references (i.e.
and
&mut
), at which point it is
subject to Rust's normal borrow checker rules and cannot outlive the reference. However, finalizers
implicitly extend the lifetime of a GCed value, including any stored references: accessing a reference in a
finalizer could undermine Rust's borrow checking rules.
A simple way of avoiding this problem is to forbid
Gc
from storing, directly or indirectly,
references. This might seem to be no great loss: storing references in a
Gc
largely nullifies the
'GCness' of
Gc
. However, we found the result hard to use, as it can make simple tasks such as
gradually migrating existing code to use
Gc
painful.
A moderate, but in our experience insufficient, relaxation is to recognise that only types that
need a finalizer can possibly have problems with references, and to forbid such types from
storing references in
Gc
. For example, if there is no drop method for
struct S{x: &u8}
then its destructor is safe to use as a finalizer, since its non-drop aspects will not use the
&u8
reference.
The eventual rule we alighted upon for FSA is that a destructor for a type
can be used as a
finalizer provided the destructor's drop methods do not obtain references derived from
's
fields (including fields reachable from its attributes). Using Rust's terminology, we forbid
projections
(which include a struct's fields, indexes into a vector, and so on) in destructors from
generating references. Any non-projection references that are used in a destructor are by
definition safe to use, as they either exist only for the duration of the drop method (references to
variables on the stack) or will exist for the remainder of the program (references to global
variables).
This rule over-approximates the safe set of destructors. For example, a drop method that
creates a new value and tries to obtain a reference to a field in it (i.e. a projection) cannot be a
destructor under FSA, even though the reference cannot outlast the drop method. We found that
attempting to relax our rule further to deal with such cases rapidly complicates exposition and
implementation.
struct GcNode { value: u8, nbr: Option
>> }
impl Drop for GcNode {
fn drop(&mut self) { self.value = 0; println!("{}", self.nbr.unwrap().borrow().value); }

fn main() {
let mut gc1 = Gc::new(RefCell::new(GcNode{value: 1, nbr: None}));
let gc2 = Gc::new(RefCell::new(GcNode{value: 2, nbr: None}));
gc1.borrow_mut().nbr = Some(gc2);
gc2.borrow_mut().nbr = Some(gc1);
Listing 5.
An example of the problems that come from mixing cycles and finalization. The salient
difference from Listing
is that the drop method prints the value of a field inside
nbr
line
). Running this program on a strong memory model machine is likely to print either
2 0
or
1 0
depending on whether
gc1
or
gc2
is finalized first. The 'seemingly expected' output of
1 2
or
2 1
would never be printed: whichever GCed value is finalized first changes what the other GCed
value sees in its finalizer. As that implies, this example is unsound: whichever finalizer runs
second leads to undefined behaviour.
7.2
Cycles and Finalization
One of the main motivations for GCs is that they solve problems with cyclic data structures. However,
finalizers can be unsound if they access state shared within members of a cycle. Listing
shows an
example of undefined behaviour when two GCed values create a cycle and both their finalizers reference
the other GCed value. Whichever order the finalizers are run in, at least one of the finalizers will see the
other GCed value as partly or wholly 'finalized'.
Most languages and systems we are aware of assume that users either don't run into this problem
(finalization cycles are considered rare in GCed languages [
19
, p. 229]) or know how to deal with it when they do (e.g. refactoring the
types into parts that do and don't require finalization [
, p. 11]). There is no fully automatic
solution to this problem. Some GCs offer weak references, which allow users to detect when
finalization cycles have been broken, though they still have to deal with the consequences
manually.
We wanted to provide users with static guarantees that their destructors will not behave unexpectedly
when used as finalizers in a cycle. A first attempt at enforcing such a property might seem to
be that a
Gc
cannot have, directly or indirectly, fields of type
Gc
. This would indeed
prevent the mistakes we want to catch but also disallow shared ownership! We therefore
check only that a type's destructor does not, directly or indirectly, access a
Gc
. This allows
GCed types to express shared ownership so long as their destructor(s) do not access other GC
types.
To make this check easier to implement, we introduce an
auto trait
27
, Section. 11], a kind of marker
trait that the compiler propagates automatically. An auto trait
will be automatically implemented for a
type
unless one of the following is true: there is an explicit
negative implementation
of
for
; or
contains a field that is not itself
. Informally, we say that a negative implementation of an auto-trait
pollutes
containing types.
Our new auto trait is called
FinalizerSafe
, and we provide a single negative implementation
impl !FinalizerSafe for Gc
. This naturally handles transitively reachable code, allowing FSA
itself to only check that a destructor's direct field accesses are
FinalizerSafe
7.3
Destructors Need to be Runnable on a Finalizer Thread
impl Drop for GcNode {
fn drop(&mut self) { println!("drop {}", self.value.lock().unwrap()); }

fn main() {
let counter = Rc::new(Mutex::new(0));
{ let _ = Gc::new(GcNode { value: Rc::clone(&counter), nbr: None }); }
let r1 = counter.lock().unwrap();
force_gc();
assert_eq!(*r1, 0);
Listing 6.
How destructors can cause deadlocks when used as finalizers. The mutator creates a
reference-counted mutex (
line 6
), placing a copy in a
GcNode
that immediately goes out of scope
line 7
). The mutator then acquires the lock (
line 8
) but before it can release the lock a GC cycle
occurs and the
GcNode
's finalizer run (
line 9
). If the finalizer is run on the same thread as the
mutator, then it will fail to grab the lock (
line 2
) and cause a deadlock.
Running finalizers on the same thread as a mutator can cause problems when the finalizer accesses
state shared with the mutator (see Section
4.1
for a general description and Listing
for a concrete
example). The most general solution to this problem is to run finalizers on a separate
finalizer thread
that
never runs mutator code.
We must therefore ensure that it is safe to run a type's destructor on the finalizer thread. A
conservative definition is that
Gc
is safe to use if
implements both of Rust's existing
Send
(denoting
a type that can be permanently moved from one thread to another) and
Sync
(denoting a type that can be
safely accessed simultaneously by multiple threads) auto traits. However, requiring that finalization be
restricted to types that implement both
Send
and
Sync
can be frustrating, particularly because more
types implement
Send
than
Sync
It may seem sufficient for
to implement
Send
alone so that the value can be safely sent to the finalizer
thread. However, this would not prevent a finalizer indirectly accessing state shared with a
non-GCed value via a mechanism such as
Arc
, causing the very problems we are trying to
avoid.
FSA thus ignores whether a type implements
Send
or
Sync
(or not) and instead examines the
destructor directly. To pass FSA: the destructor must not access thread locals; and any types the
destructor accesses via projections must implement both
Send
and
Sync
. Intuitively, this allows a
non-
Send
-or-
Sync
type
to have a safe finalizer provided that
's destructor only access the
Send
and
Sync
'subset' of
This rule shows clearly that FSA is a form of abstract interpretation rather than a mere extension of the type
system
After careful examination we believe this is compatible with Rust's semantics (and
rustc
and LLVM's
implementations) at the time of writing, but it is worth knowing that this rule would be unsafe in other
languages and implementations (for example our assumption would be unsafe in Java due to
synchronisation removal [
31
]). We leave it as an open question to others as to whether Rust should
deliberately permit or forbid such checks in its semantics.
The implementation of the finalization thread is fairly simple. For example, we do not need to
explicitly synchronise memory between the mutator and finalization threads because
BDWGC
's
stop-the-world collection phase already synchronises all memory between threads.
7.4
Putting it All Together
function FinalizerSafetyAnalysis(func):
for each basic_block ∈ func do
t ← basic_block.terminator;
if not IsGcConstructorCall(t) then
continue;
ty ← GetTyOfGcValue(t);
if isFinalizerUnchecked(ty) or not NeedsFinalizer(ty) then
continue;
for each drop_method ∈ GetDropGlue(ty) do
if not IsMIRAvailable(drop_method) then
EmitFinalizerUnsafeError();
CheckFunctionSafety(drop_method);

function CheckFunctionSafety(drop):
for each basic_block ∈ drop do
for each statement ∈ basic_block do
for each projection ∈ statement do
if not IsFinalizerSafe(projection.element) then
EmitFinalizerUnsafeError();
if IsFunctionCall(basic_block.terminator) then
CheckFunctionSafety(basic_block.terminator);

for each basic_block ∈ func do
if IsDropTerminator(basic_block.terminator.kind) then
ty ← GetTypeOfDroppedValue(block.terminator);
if IsGcType(ty) then
if not RequiresFinalizer(ty) then
ReplaceTerminator(basic_block);

function ReplaceTerminator(basic_block):

drop_func ← GetDropFunc(basic_block.terminator);
last_block ← GetLastBasicBlock(drop_func);
block.terminator ← last_block.terminator;
Algorithm 3:
Removing elidable drops
Additional Experimental Data
Benchmark
Description
Reason for exclusion
bevy
ECS game engine in Rust
Unable to port successfully (see Section
8.1.2
dyon
Scripting language in Rust
Unable to port successfully (see Section
8.1.2
jiff
A datetime library for Rust
Too few allocations to measure
mini-moka
Concurrent in-memory cache library
Too few allocations to measure
numbat
Math search engine
Too few allocations to measure
rkyv
Zero-copy deserialization framework
Insufficient
Gc
coverage in benchmarks
RustPython
Python interpreter written in Rust
Difficulty retro-fitting
__del__
semantics (see Section
8.1.2
rust-analyzer
Language server for Rust
Unable to port successfully (see Section
8.1.2
salsa
Incremental recomputation library
Too few allocations to measure
WLambda
Scripting language written in Rust
Insufficient
Gc
coverage in benchmarks
Table 4.
Rust programs excluded from our benchmark suite after attempted porting to
Alloy
Wall-clock time (s)
Gc
Rc
Gc
Rust-GC
Arena
Alacritty
0.41 [0.39, 0.45]
0.40 [0.38, 0.44]
Binary Trees
0.11 [0.11, 0.11]
0.15 [0.14, 0.15]
0.33 [0.32, 0.33]
0.03 [0.03, 0.04]
fd
0.33 [0.29, 0.38]
0.31 [0.26, 0.37]
grmtools
3.06 [3.00, 3.14]
3.24 [3.17, 3.31]
Regex-Redux
0.47 [0.47, 0.47]
0.45 [0.45, 0.46]
Ripgrep
1.61 [1.55, 1.69]
1.52 [1.45, 1.59]
som-rs-ast
0.92 [0.88, 0.95]
0.79 [0.76, 0.82]
som-rs-bc
0.28 [0.27, 0.29]
0.29 [0.28, 0.30]
Table 5.
Wall-clock times (in seconds) with 99% confidence intervals for the
GCvs
experiment comparing different memory management strategies. Strategies not supported by a given benchmark are marked with "–".
Wall-clock time (s)
Ratio
jemalloc
BDWGC
Alacritty
0.36
[0.33, 0.40]
0.40
[0.38, 0.44]
1.11
Binary Trees
0.12
[0.12, 0.12]
0.15
[0.14, 0.15]
1.26
fd
0.30
[0.25, 0.36]
0.31
[0.26, 0.37]
1.02
grmtools
3.09
[3.01, 3.17]
3.24
[3.17, 3.31]
1.05
Regex-Redux
0.45
[0.44, 0.45]
0.45
[0.45, 0.46]
1.01
Ripgrep
1.46
[1.40, 1.53]
1.52
[1.45, 1.59]
1.04
som-rs-ast
0.77
[0.74, 0.80]
0.79
[0.76, 0.82]
1.02
som-rs-bc
0.28
[0.27, 0.29]
0.29
[0.28, 0.30]
1.02
Table 6.
Wall-clock times comparing jemalloc and
BDWGC
as allocators for
Rc
-only code (i.e., no GC). The ratio column shows
BDWGC
time divided by jemalloc time.
Wall-clock time (s)
jemalloc
BDWGC
Rc
Gc
Rc
Alacritty
Cur. Motion
0.65
±0.02
0.66
±0.01
0.66
±0.01
Dense Cells
2.16
±0.03
2.16
±0.02
2.14
±0.04
Light Cells
0.37
±0.01
0.37
±0.01
0.38
±0.01
Scroll
0.25
±0.01
0.26
±0.02
0.26
±0.01
Scroll (fullscreen)
0.38
±0.01
0.39
±0.01
0.38
±0.01
Scroll Btm
0.32
±0.00
0.33
±0.00
0.32
±0.01
Scroll Btm (small)
0.32
±0.01
0.33
±0.00
0.32
±0.01
Scroll Top
0.25
±0.02
0.26
±0.02
0.24
±0.02
Scroll Top (small)
0.32
±0.01
0.32
±0.01
0.33
±0.00
Unicode
0.23
±0.02
0.27
±0.01
0.27
±0.02
fd
Cmd Exec.
1.29
±0.01
1.28
±0.01
1.29
±0.02
Cmd Exec. (large)
1.24
±0.01
1.25
±0.03
1.26
±0.01
File Extension
0.13
±0.00
0.15
±0.01
0.13
±0.00
File Type
0.12
±0.00
0.13
±0.01
0.13
±0.00
No Pattern
0.18
±0.00
0.29
±0.02
0.22
±0.01
Simple
0.27
±0.01
0.33
±0.01
0.32
±0.00
Simple (-HI)
0.12
±0.00
0.15
±0.00
0.13
±0.00
som-rs-ast
Bounce
0.62
±0.00
0.96
±0.02
0.80
±0.00
BubbleSort
0.56
±0.01
0.84
±0.01
0.71
±0.00
DeltaBlue
0.77
±0.00
1.09
±0.01
0.98
±0.00
Dispatch
0.59
±0.01
0.97
±0.00
0.77
±0.01
Fannkuch
0.65
±0.00
1.01
±0.06
0.81
±0.00
Fibonacci
0.83
±0.00
1.41
±0.01
1.08
±0.00
FieldLoop
0.94
±0.00
1.20
±0.01
1.07
±0.01
GraphSearch
0.31
±0.00
0.43
±0.01
0.38
±0.01
IntegerLoop
0.52
±0.01
0.84
±0.00
0.67
±0.01
JsonSmall
0.86
±0.00
1.22
±0.01
1.10
±0.00
List
0.38
±0.00
0.53
±0.01
0.48
±0.00
Loop
0.53
±0.01
0.86
±0.01
0.70
±0.00
Mandelbrot
0.46
±0.00
0.69
±0.01
0.57
±0.00
NBody
0.31
±0.00
0.38
±0.01
0.34
±0.00
PageRank
0.31
±0.00
0.44
±0.00
0.39
±0.00
Permute
0.67
±0.00
0.90
±0.02
0.82
±0.00
Queens
0.77
±0.01
1.18
±0.02
0.98
±0.00
QuickSort
0.98
±0.00
1.27
±0.02
1.26
±0.01
Recurse
0.60
±0.00
0.92
±0.00
0.79
±0.00
Richards
2.48
±0.02
3.72
±0.11
3.13
±0.01
Sieve
0.67
±0.00
0.94
±0.01
0.87
±0.00
Storage
0.56
±0.00
0.70
±0.01
0.69
±0.00
Sum
0.52
±0.01
0.83
±0.01
0.67
±0.00
Towers
0.31
±0.00
0.46
±0.01
0.38
±0.01
TreeSort
0.32
±0.00
0.41
±0.01
0.40
±0.00
WhileLoop
0.48
±0.00
0.72
±0.00
0.60
±0.00
grmtools
Eclipse
3.48
±0.03
3.44
±0.03
3.74
±0.04
Hadoop
2.86
±0.01
2.92
±0.01
3.09
±0.01
Jenkins
2.90
±0.00
2.99
±0.05
3.02
±0.01
Spring
2.43
±0.00
2.51
±0.01
2.63
±0.02
Ripgrep
Alternates
1.18
±0.01
1.29
±0.02
1.21
±0.01
Alternates (-i)
1.31
±0.01
1.42
±0.01
1.33
±0.01
Literal
1.12
±0.01
1.23
±0.01
1.15
±0.01
Literal (-i)
1.19
±0.00
1.27
±0.01
1.20
±0.01
Literal (default)
1.08
±0.01
1.19
±0.01
1.10
±0.01
Literal (mmap)
1.64
±0.01
1.75
±0.02
1.67
±0.01
Literal (mmap, -i)
1.67
±0.01
1.80
±0.01
1.72
±0.01
Literal (regex)
1.13
±0.01
1.25
±0.01
1.15
±0.01
UTF Greek
3.29
±0.01
3.41
±0.01
3.33
±0.01
UTF Greek (-i)
3.30
±0.01
3.42
±0.03
3.33
±0.01
UTF Word
1.09
±0.01
1.21
±0.01
1.14
±0.00
UTF Word (alt.)
1.11
±0.01
1.21
±0.01
1.13
±0.00
Word
1.12
±0.00
1.22
±0.02
1.14
±0.01
som-rs-bc
Bounce
0.25
±0.00
0.28
±0.00
0.30
±0.00
BubbleSort
0.23
±0.00
0.27
±0.00
0.27
±0.00
DeltaBlue
0.30
±0.00
0.34
±0.00
0.36
±0.00
Dispatch
0.25
±0.00
0.28
±0.00
0.29
±0.01
Fannkuch
0.25
±0.00
0.29
±0.00
0.29
±0.00
Fibonacci
0.30
±0.01
0.37
±0.00
0.36
±0.00
FieldLoop
0.49
±0.00
0.43
±0.00
0.54
±0.01
GraphSearch
0.12
±0.00
0.12
±0.00
0.15
±0.00
IntegerLoop
0.22
±0.00
0.24
±0.00
0.26
±0.00
JsonSmall
0.37
±0.00
0.39
±0.01
0.44
±0.00
List
0.18
±0.00
0.22
±0.00
0.21
±0.01
Loop
0.23
±0.00
0.26
±0.00
0.27
±0.00
Mandelbrot
0.20
±0.00
0.22
±0.01
0.23
±0.00
NBody
0.11
±0.00
0.12
±0.00
0.13
±0.00
PageRank
0.13
±0.00
0.13
±0.00
0.14
±0.00
Permute
0.24
±0.00
0.27
±0.00
0.27
±0.00
Queens
0.28
±0.01
0.33
±0.00
0.33
±0.00
QuickSort
0.36
±0.00
0.38
±0.00
0.43
±0.00
Recurse
0.25
±0.00
0.31
±0.00
0.30
±0.01
Richards
0.92
±0.00
1.20
±0.01
1.10
±0.01
Sieve
0.25
±0.00
0.26
±0.00
0.30
±0.01
Storage
0.22
±0.00
0.21
±0.00
0.25
±0.00
Sum
0.22
±0.00
0.25
±0.00
0.27
±0.00
Towers
0.13
±0.00
0.15
±0.00
0.15
±0.00
TreeSort
0.11
±0.00
0.12
±0.00
0.13
±0.00
WhileLoop
0.22
±0.00
0.23
±0.00
0.25
±0.00
Table 7.
Wall-clock execution times (in seconds) for each benchmark in the
GCvs
experiment. Values show arithmetic means over 30 runs and include 99% confidence intervals. Because
Binary Trees
and
Regex-Redux
are stand-alone benchmarks they are omitted here; their timings appear in Table
User time (s)
jemalloc
BDWGC
Rc
Gc
Rc
Alacritty
Cur. Motion
1.08
±0.05
1.09
±0.04
1.09
±0.04
Dense Cells
6.66
±0.13
6.62
±0.12
6.58
±0.15
Light Cells
0.34
±0.03
0.34
±0.03
0.34
±0.03
Scroll
0.11
±0.01
0.11
±0.01
0.12
±0.02
Scroll (fullscreen)
0.24
±0.02
0.28
±0.03
0.22
±0.02
Scroll Btm
0.14
±0.01
0.14
±0.01
0.14
±0.01
Scroll Btm (small)
0.14
±0.01
0.15
±0.01
0.14
±0.01
Scroll Top
0.12
±0.01
0.12
±0.02
0.11
±0.01
Scroll Top (small)
0.14
±0.01
0.14
±0.01
0.15
±0.01
Unicode
0.10
±0.01
0.13
±0.01
0.12
±0.01
fd
Cmd Exec.
1.05
±0.01
1.12
±0.02
1.12
±0.02
Cmd Exec. (large)
1.02
±0.01
1.10
±0.03
1.11
±0.02
File Extension
0.04
±0.00
0.04
±0.01
0.04
±0.01
File Type
0.03
±0.01
0.04
±0.01
0.04
±0.01
No Pattern
0.16
±0.01
0.29
±0.02
0.23
±0.01
Simple
0.14
±0.01
0.18
±0.01
0.16
±0.01
Simple (-HI)
0.03
±0.01
0.04
±0.01
0.04
±0.01
som-rs-ast
Bounce
0.62
±0.00
0.96
±0.02
0.79
±0.00
BubbleSort
0.56
±0.00
0.83
±0.01
0.71
±0.00
DeltaBlue
0.77
±0.00
1.07
±0.01
0.97
±0.00
Dispatch
0.59
±0.01
0.96
±0.01
0.76
±0.01
Fannkuch
0.64
±0.00
0.98
±0.05
0.80
±0.00
Fibonacci
0.82
±0.00
1.40
±0.01
1.07
±0.00
FieldLoop
0.94
±0.00
1.20
±0.01
1.07
±0.01
GraphSearch
0.30
±0.00
0.41
±0.01
0.37
±0.00
IntegerLoop
0.52
±0.01
0.83
±0.00
0.66
±0.01
JsonSmall
0.85
±0.00
1.21
±0.01
1.10
±0.00
List
0.38
±0.00
0.52
±0.01
0.47
±0.00
Loop
0.52
±0.01
0.85
±0.01
0.70
±0.00
Mandelbrot
0.45
±0.00
0.69
±0.01
0.57
±0.00
NBody
0.31
±0.00
0.38
±0.01
0.34
±0.00
PageRank
0.30
±0.00
0.43
±0.01
0.38
±0.00
Permute
0.66
±0.00
0.89
±0.02
0.82
±0.00
Queens
0.77
±0.00
1.18
±0.02
0.97
±0.00
QuickSort
0.98
±0.00
1.26
±0.02
1.26
±0.00
Recurse
0.60
±0.00
0.91
±0.00
0.79
±0.00
Richards
2.47
±0.02
3.72
±0.11
3.13
±0.01
Sieve
0.66
±0.00
0.94
±0.01
0.87
±0.00
Storage
0.55
±0.00
0.69
±0.01
0.68
±0.00
Sum
0.52
±0.00
0.82
±0.01
0.67
±0.00
Towers
0.31
±0.00
0.46
±0.01
0.38
±0.00
TreeSort
0.31
±0.00
0.40
±0.01
0.40
±0.00
WhileLoop
0.47
±0.00
0.72
±0.00
0.59
±0.00
grmtools
Eclipse
3.32
±0.03
3.28
±0.03
3.59
±0.04
Hadoop
2.77
±0.01
2.80
±0.01
2.99
±0.01
Jenkins
2.74
±0.01
2.81
±0.04
2.90
±0.01
Spring
2.35
±0.01
2.40
±0.01
2.54
±0.02
Ripgrep
Alternates
0.42
±0.02
0.51
±0.02
0.43
±0.02
Alternates (-i)
0.54
±0.02
0.65
±0.02
0.57
±0.02
Literal
0.34
±0.02
0.44
±0.01
0.36
±0.01
Literal (-i)
0.41
±0.02
0.50
±0.02
0.44
±0.02
Literal (default)
0.30
±0.02
0.40
±0.01
0.32
±0.01
Literal (mmap)
0.43
±0.02
0.52
±0.02
0.46
±0.02
Literal (mmap, -i)
0.48
±0.02
0.58
±0.02
0.50
±0.02
Literal (regex)
0.35
±0.02
0.44
±0.02
0.36
±0.01
UTF Greek
2.55
±0.02
2.64
±0.02
2.58
±0.03
UTF Greek (-i)
2.56
±0.03
2.66
±0.03
2.56
±0.02
UTF Word
0.35
±0.02
0.42
±0.02
0.37
±0.02
UTF Word (alt.)
0.33
±0.02
0.43
±0.02
0.35
±0.01
Word
0.35
±0.02
0.43
±0.02
0.36
±0.01
som-rs-bc
Bounce
0.25
±0.00
0.28
±0.00
0.29
±0.00
BubbleSort
0.22
±0.00
0.27
±0.00
0.27
±0.00
DeltaBlue
0.30
±0.00
0.34
±0.00
0.35
±0.01
Dispatch
0.24
±0.00
0.28
±0.00
0.29
±0.00
Fannkuch
0.24
±0.00
0.28
±0.00
0.28
±0.00
Fibonacci
0.30
±0.00
0.36
±0.00
0.36
±0.00
FieldLoop
0.48
±0.00
0.42
±0.01
0.53
±0.01
GraphSearch
0.12
±0.00
0.11
±0.00
0.14
±0.00
IntegerLoop
0.22
±0.00
0.24
±0.00
0.26
±0.00
JsonSmall
0.36
±0.00
0.39
±0.01
0.44
±0.00
List
0.17
±0.00
0.22
±0.00
0.21
±0.01
Loop
0.22
±0.00
0.25
±0.00
0.27
±0.00
Mandelbrot
0.19
±0.00
0.21
±0.01
0.23
±0.00
NBody
0.11
±0.00
0.12
±0.00
0.13
±0.00
PageRank
0.12
±0.00
0.12
±0.00
0.14
±0.00
Permute
0.23
±0.00
0.26
±0.00
0.27
±0.00
Queens
0.27
±0.01
0.33
±0.00
0.33
±0.00
QuickSort
0.35
±0.00
0.38
±0.00
0.42
±0.00
Recurse
0.25
±0.00
0.31
±0.00
0.30
±0.01
Richards
0.92
±0.00
1.20
±0.01
1.10
±0.01
Sieve
0.25
±0.00
0.26
±0.00
0.30
±0.01
Storage
0.22
±0.00
0.20
±0.00
0.25
±0.00
Sum
0.22
±0.00
0.24
±0.00
0.26
±0.00
Towers
0.12
±0.00
0.15
±0.00
0.14
±0.00
TreeSort
0.11
±0.00
0.11
±0.00
0.12
±0.00
WhileLoop
0.21
±0.00
0.22
±0.00
0.25
±0.00
Table 8.
User times (in seconds) for each benchmark in the
GCvs
experiment. Values show arithmetic means over 30 runs and include 99% confidence intervals.
Heap Size (MiB)
Relative wall-clock time
Benchmarks failed
Alacritty
16
0.96
[0.91, 0.99]
32
0.98
[0.95, 1.02]
64
0.94
[0.89, 0.98]
Binary Trees
0.88
[0.82, 1.02]
0.90
[0.80, 1.01]
16
0.87
[0.82, 0.94]
fd
16
0.94
[0.90, 0.99]
32
0.94
[0.88, 0.98]
64
0.94
[0.89, 1.00]
grmtools
1024
1.01
[1.00, 1.02]
2/4 (Eclipse, Jenkins)
2048
1.00
[1.00, 1.01]
4096
1.01
[1.00, 1.02]
Regex-Redux
256
0.94
[0.92, 0.95]
512
0.93
[0.90, 0.94]
1024
0.96
[0.92, 1.07]
Ripgrep
32
0.96
[0.95, 0.96]
64
0.95
[0.94, 0.95]
128
0.94
[0.93, 0.95]
som-rs-ast
64
0.72
[0.71, 0.74]
2/4 (Fannkuch, TreeSort)
96
0.74
[0.73, 0.75]
2/4 (Fannkuch, TreeSort)
128
0.75
[0.74, 0.76]
som-rs-bc
32
0.79
[0.78, 0.80]
64
0.79
[0.77, 0.80]
128
0.84
[0.83, 0.86]
Table 9.
The effects of fixing the heap size (see Section 8.1.3) on wall-clock time, reported as ratios relative to
BDWGC
's default adaptive RSS strategy. We chose per-benchmark heap sizes based on our observations of their memory usage, though some benchmarks fail at smaller values. Broadly speaking, these results show that the performance differences due to different heap sizes are relatively minor, and that
BDWGC
's adaptive strategy has not unduly coloured our results.
Fin. elided (%)
Alacritty
Cur. Motion
0.00
Dense Cells
0.00
Light Cells
0.00
Scroll
0.00
Scroll Btm
0.00
Scroll Btm (small)
0.00
Scroll (fullscreen)
0.00
Scroll Top
0.00
Scroll Top (small)
0.00
Unicode
0.00
fd
Cmd Exec.
8.93
Cmd Exec. (large)
9.09
File Extension
72.73
File Type
24.54
No Pattern
0.04
Simple
61.54
Simple (-HI)
61.54
som-rs-ast
Bounce
100.00
BubbleSort
100.00
DeltaBlue
100.00
Dispatch
100.00
Fannkuch
100.00
Fibonacci
100.00
FieldLoop
100.00
GraphSearch
99.99
IntegerLoop
100.00
JsonSmall
100.00
List
100.00
Loop
100.00
Mandelbrot
100.00
NBody
99.99
PageRank
99.99
Permute
100.00
Queens
100.00
QuickSort
100.00
Recurse
100.00
Richards
100.00
Sieve
100.00
Storage
100.00
Sum
100.00
Towers
99.99
TreeSort
99.99
WhileLoop
100.00
grmtools
Eclipse
11.79
Hadoop
32.99
Jenkins
26.74
Spring
37.25
Ripgrep
Alternates
79.04
Alternates (-i)
79.04
Literal
79.04
Literal (-i)
79.04
Literal (mmap, -i)
79.04
Literal (default)
79.04
Literal (mmap)
79.04
Literal (regex)
79.04
UTF Greek
79.04
UTF Greek (-i)
79.04
UTF Word
79.04
UTF Word (alt.)
79.04
Word
79.04
som-rs-bc
Bounce
72.50
BubbleSort
76.34
DeltaBlue
73.86
Dispatch
77.78
Fannkuch
72.74
Fibonacci
70.37
FieldLoop
83.33
GraphSearch
76.56
IntegerLoop
75.00
JsonSmall
71.85
List
79.41
Loop
74.82
Mandelbrot
70.93
NBody
86.17
PageRank
70.95
Permute
79.23
Queens
79.04
QuickSort
69.33
Recurse
82.61
Richards
71.17
Sieve
73.15
Storage
73.84
Sum
75.00
Towers
78.29
TreeSort
65.19
WhileLoop
74.98
Table 10.
Percentage of finalizers
Alloy
was able to elide for each benchmark.
Figure 5.
Wall-clock and user time performance comparison for finalizer elision on each benchmark. The bars show the relative performance of
Alloy
after applying our elision optimization, normalized against the baseline (solid black line). The vertical blue line marks the overall geometric mean (with shaded area for CIs). User time often shows greater improvement than wall-clock time, as elision reduces the CPU overhead of the finalization thread.
Wall-clock time (s)
Before elision
After elision
Alacritty
Cur. Motion
0.66
±0.01
0.66
±0.01
Dense Cells
2.17
±0.03
2.17
±0.02
Light Cells
0.39
±0.01
0.38
±0.01
Scroll
0.27
±0.01
0.27
±0.01
Scroll (fullscreen)
0.39
±0.01
0.39
±0.01
Scroll Btm
0.33
±0.00
0.33
±0.00
Scroll Btm (small)
0.33
±0.00
0.33
±0.00
Scroll Top
0.25
±0.01
0.24
±0.04
Scroll Top (small)
0.33
±0.00
0.33
±0.01
Unicode
0.26
±0.02
0.26
±0.02
fd
Cmd Exec.
1.29
±0.02
1.29
±0.02
Cmd Exec. (large)
1.25
±0.01
1.25
±0.02
File Extension
0.15
±0.00
0.14
±0.01
File Type
0.13
±0.01
0.13
±0.01
No Pattern
0.28
±0.02
0.29
±0.02
Simple
0.34
±0.01
0.35
±0.00
Simple (-HI)
0.14
±0.01
0.14
±0.01
som-rs-ast
Bounce
7.68
±0.40
1.03
±0.01
BubbleSort
5.27
±0.29
0.92
±0.01
DeltaBlue
7.84
±0.19
1.17
±0.01
Dispatch
6.52
±0.46
1.04
±0.01
Fannkuch
6.76
±0.09
1.15
±0.03
Fibonacci
22.16
±1.87
1.54
±0.01
FieldLoop
5.14
±0.16
1.25
±0.01
GraphSearch
2.82
±0.06
0.47
±0.00
IntegerLoop
5.88
±0.11
0.90
±0.01
JsonSmall
11.37
±0.77
1.33
±0.01
List
4.06
±0.06
0.58
±0.00
Loop
6.02
±0.14
0.90
±0.01
Mandelbrot
4.67
±0.11
0.74
±0.01
NBody
2.91
±0.16
0.40
±0.00
PageRank
2.44
±0.05
0.47
±0.00
Permute
6.69
±0.35
0.97
±0.02
Queens
9.07
±0.34
1.28
±0.02
QuickSort
10.86
±0.66
1.39
±0.02
Recurse
7.66
±0.24
0.98
±0.00
Richards
97.40
±10.46
3.95
±0.08
Sieve
7.06
±0.13
1.01
±0.01
Storage
5.06
±0.10
0.75
±0.01
Sum
5.91
±0.12
0.89
±0.01
Towers
3.20
±0.14
0.50
±0.01
TreeSort
4.65
±0.32
0.45
±0.00
WhileLoop
5.03
±0.21
0.76
±0.01
grmtools
Eclipse
5.09
±0.05
3.59
±0.06
Hadoop
4.49
±0.03
3.07
±0.05
Jenkins
4.45
±0.02
3.02
±0.01
Spring
4.11
±0.03
2.65
±0.01
Ripgrep
Alternates
1.44
±0.03
1.39
±0.01
Alternates (-i)
1.55
±0.02
1.50
±0.01
Literal
1.38
±0.03
1.32
±0.01
Literal (-i)
1.45
±0.02
1.38
±0.01
Literal (default)
1.38
±0.02
1.30
±0.01
Literal (mmap)
1.90
±0.02
1.81
±0.01
Literal (mmap, -i)
1.90
±0.03
1.84
±0.01
Literal (regex)
1.36
±0.03
1.33
±0.01
UTF Greek
3.53
±0.01
3.50
±0.02
UTF Greek (-i)
3.51
±0.02
3.52
±0.02
UTF Word
1.39
±0.02
1.30
±0.01
UTF Word (alt.)
1.36
±0.02
1.31
±0.02
Word
1.36
±0.02
1.31
±0.01
som-rs-bc
Bounce
3.03
±0.04
0.31
±0.00
BubbleSort
2.64
±0.04
0.29
±0.00
DeltaBlue
4.19
±0.08
0.37
±0.00
Dispatch
3.80
±0.36
0.30
±0.00
Fannkuch
2.86
±0.23
0.31
±0.00
Fibonacci
4.34
±0.20
0.40
±0.00
FieldLoop
2.45
±0.16
0.45
±0.01
GraphSearch
1.15
±0.03
0.13
±0.00
IntegerLoop
3.00
±0.16
0.26
±0.00
JsonSmall
4.96
±0.80
0.42
±0.01
List
2.69
±0.09
0.24
±0.01
Loop
3.07
±0.18
0.28
±0.00
Mandelbrot
2.21
±0.15
0.23
±0.00
NBody
1.23
±0.06
0.13
±0.00
PageRank
1.06
±0.01
0.14
±0.00
Permute
2.72
±0.09
0.29
±0.00
Queens
3.19
±0.07
0.35
±0.00
QuickSort
3.62
±0.05
0.41
±0.00
Recurse
4.27
±1.91
0.33
±0.00
Richards
15.51
±0.89
1.31
±0.01
Sieve
2.57
±0.07
0.28
±0.00
Storage
2.45
±0.05
0.22
±0.00
Sum
3.06
±0.37
0.27
±0.00
Towers
1.59
±0.05
0.16
±0.00
TreeSort
1.19
±0.02
0.13
WhileLoop
3.26
±0.08
0.24
±0.01
Table 11.
Wall-clock execution times (seconds) for each benchmark in the
Elision
experiment, shown before and after applying
Alloy
's finalizer elision optimisation. Values show arithmetic means over 30 runs, with 99% confidence intervals.
User time (s)
Before elision
After elision
Alacritty
Cur. Motion
1.08
±0.06
1.12
±0.05
Dense Cells
6.70
±0.12
6.68
±0.13
Light Cells
0.34
±0.04
0.33
±0.04
Scroll
0.12
±0.01
0.12
±0.01
Scroll (fullscreen)
0.27
±0.03
0.26
±0.03
Scroll Btm
0.15
±0.01
0.15
±0.01
Scroll Btm (small)
0.15
±0.01
0.14
±0.01
Scroll Top
0.12
±0.01
0.12
±0.06
Scroll Top (small)
0.15
±0.01
0.14
±0.01
Unicode
0.13
±0.02
0.13
±0.02
fd
Cmd Exec.
1.13
±0.01
1.13
±0.02
Cmd Exec. (large)
1.10
±0.02
1.11
±0.02
File Extension
0.05
±0.01
0.04
±0.00
File Type
0.04
±0.00
0.04
±0.01
No Pattern
0.27
±0.02
0.30
±0.02
Simple
0.19
±0.01
0.20
±0.01
Simple (-HI)
0.04
±0.01
0.04
±0.01
som-rs-ast
Bounce
11.15
±0.41
1.02
±0.01
BubbleSort
7.88
±0.40
0.90
±0.01
DeltaBlue
11.72
±0.21
1.15
±0.01
Dispatch
9.15
±0.63
1.03
±0.01
Fannkuch
9.60
±0.12
1.12
±0.03
Fibonacci
26.41
±1.82
1.53
±0.01
FieldLoop
6.42
±0.17
1.25
±0.01
GraphSearch
3.92
±0.07
0.45
±0.01
IntegerLoop
8.23
±0.13
0.90
±0.01
JsonSmall
16.08
±0.75
1.32
±0.01
List
5.79
±0.09
0.57
±0.00
Loop
8.51
±0.17
0.89
±0.01
Mandelbrot
6.62
±0.12
0.73
±0.01
NBody
3.98
±0.15
0.40
±0.00
PageRank
3.73
±0.07
0.47
±0.00
Permute
9.96
±0.34
0.96
±0.02
Queens
13.04
±0.33
1.27
±0.02
QuickSort
16.43
±0.65
1.38
±0.02
Recurse
11.18
±0.28
0.98
±0.00
Richards
109.49
±10.34
3.95
±0.08
Sieve
10.49
±0.18
1.00
±0.01
Storage
7.70
±0.12
0.74
±0.01
Sum
8.27
±0.14
0.89
±0.01
Towers
4.61
±0.14
0.49
±0.01
TreeSort
5.90
±0.35
0.44
±0.00
WhileLoop
6.74
±0.20
0.76
±0.01
grmtools
Eclipse
5.81
±0.07
3.42
±0.06
Hadoop
5.27
±0.04
2.96
±0.05
Jenkins
5.21
±0.04
2.86
±0.01
Spring
4.94
±0.05
2.54
±0.01
Ripgrep
Alternates
0.58
±0.02
0.50
±0.02
Alternates (-i)
0.72
±0.02
0.64
±0.02
Literal
0.53
±0.02
0.44
±0.02
Literal (-i)
0.59
±0.02
0.51
±0.01
Literal (default)
0.48
±0.02
0.40
±0.02
Literal (mmap)
0.63
±0.02
0.53
±0.02
Literal (mmap, -i)
0.65
±0.02
0.57
±0.02
Literal (regex)
0.52
±0.02
0.43
±0.01
UTF Greek
2.70
±0.02
2.66
±0.03
UTF Greek (-i)
2.71
±0.03
2.67
±0.02
UTF Word
0.53
±0.02
0.42
±0.02
UTF Word (alt.)
0.52
±0.02
0.43
±0.02
Word
0.51
±0.02
0.44
±0.02
som-rs-bc
Bounce
4.34
±0.04
0.30
±0.00
BubbleSort
3.76
±0.05
0.29
±0.00
DeltaBlue
5.76
±0.11
0.36
±0.00
Dispatch
4.91
±0.35
0.29
±0.00
Fannkuch
3.85
±0.29
0.30
±0.00
Fibonacci
6.15
±0.17
0.40
±0.00
FieldLoop
3.14
±0.15
0.45
±0.01
GraphSearch
1.54
±0.03
0.12
±0.00
IntegerLoop
3.94
±0.15
0.26
±0.00
JsonSmall
6.70
±0.73
0.41
±0.01
List
3.72
±0.09
0.24
±0.01
Loop
4.11
±0.17
0.28
±0.00
Mandelbrot
2.91
±0.13
0.23
±0.00
NBody
1.72
±0.06
0.12
±0.00
PageRank
1.51
±0.02
0.13
±0.00
Permute
3.84
±0.09
0.28
±0.00
Queens
4.53
±0.07
0.35
±0.00
QuickSort
5.28
±0.09
0.40
±0.00
Recurse
5.84
±1.85
0.33
±0.00
Richards
20.23
±0.81
1.30
±0.01
Sieve
3.76
±0.12
0.27
±0.01
Storage
3.23
±0.05
0.22
±0.00
Sum
4.03
±0.36
0.26
±0.00
Towers
2.20
±0.05
0.16
±0.00
TreeSort
1.71
±0.03
0.13
±0.00
WhileLoop
3.90
±0.09
0.24
±0.01
Table 12.
User times (seconds) for each benchmark in the
Elision
experiment, shown before and after applying
Alloy
's finalizer elision optimisation. Values show arithmetic means over 30 runs, with 99% confidence intervals.
Avg. heap footprint (MiB)
Before elision
After elision
Alacritty
Cur. Motion
3.76
±0.06
3.79
±0.06
Dense Cells
3.74
±0.06
3.74
±0.06
Light Cells
3.76
±0.08
3.73
±0.07
Scroll
3.88
±0.13
3.87
±0.12
Scroll (fullscreen)
6.46
±0.28
6.46
±0.34
Scroll Btm
3.72
±0.07
3.73
±0.07
Scroll Btm (small)
3.75
±0.05
3.74
±0.07
Scroll Top
3.78
±0.08
3.75
±0.10
Scroll Top (small)
3.76
±0.06
3.71
±0.08
Unicode
3.73
±0.15
3.77
±0.09
fd
execution
17.18
±0.33
17.89
±0.87
extension
13.52
±0.75
13.68
±0.84
hi
15.14
±0.36
14.99
±0.59
output
17.54
±0.65
17.47
±0.53
pattern
37.79
±5.22
38.03
±5.25
type
15.80
±0.40
15.84
±0.38
som-rs-ast
bounce
309.85
±43.90
362.44
±7.24
bubblesort
334.97
±6.55
209.99
±3.88
deltablue
305.24
±31.74
310.83
±5.32
dispatch
345.26
±18.67
329.90
±4.22
fannkuch
412.43
±5.54
363.19
±8.31
fibonacci
382.32
±22.98
587.04
±4.00
fieldloop
326.47
±8.38
300.93
±5.66
graphsearch
100.82
±4.86
61.15
±0.26
integerloop
323.50
±16.42
320.62
±0.44
jsonsmall
257.67
±34.84
316.56
±0.93
list
236.78
±5.27
156.13
±4.11
loop
332.20
±14.75
326.89
±7.76
mandelbrot
276.68
±4.14
236.49
±2.42
nbody
151.00
±2.63
137.42
±2.78
pagerank
193.25
±1.12
168.02
±0.95
permute
284.84
±43.08
371.67
±11.43
queens
278.89
±12.36
288.03
±2.63
quicksort
515.72
±20.82
424.56
±18.48
recurse
429.81
±12.37
410.97
±5.30
richards
1187.31
±98.12
1472.67
±31.17
sieve
355.11
±17.79
360.95
±6.06
storage
196.36
±9.70
193.06
±6.60
sum
320.61
±11.55
298.24
±2.27
towers
174.90
±4.23
175.82
±0.26
treesort
92.65
±4.87
98.82
±0.92
whileloop
272.66
±10.05
253.07
±3.60
grmtools
Eclipse
1993.86
±9.04
1069.95
±28.40
Hadoop
1882.88
±1.55
800.15
±48.62
Jenkins
1876.35
±14.92
962.16
±29.20
Spring
1752.72
±1.68
1036.27
±19.48
Ripgrep
Alternates
28.23
±1.59
15.80
±0.40
Alternates (-i)
29.52
±1.66
17.50
±0.66
Literal
25.14
±1.52
15.26
±0.33
Literal (-i)
26.69
±1.56
18.17
±0.41
Literal (default)
25.62
±1.59
15.25
±0.39
Literal (mmap)
26.91
±1.78
16.03
±0.39
Literal (mmap, -i)
28.72
±1.66
17.34
±0.24
Literal (regex)
25.87
±2.01
15.05
±0.46
UTF Greek
26.57
±1.40
17.43
±0.47
UTF Greek (-i)
26.85
±1.54
18.04
±0.65
UTF Word
28.03
±1.57
18.13
±0.14
UTF Word (alt.)
25.56
±1.60
15.48
±0.46
Word
29.35
±2.05
15.82
±0.51
som-rs-bc
bounce
138.59
±5.18
134.24
±0.22
bubblesort
121.69
±6.72
115.25
±0.03
deltablue
186.12
±1.32
135.10
±2.09
dispatch
168.63
±7.00
137.08
±0.01
fannkuch
139.94
±5.95
115.98
±0.02
fibonacci
205.88
±10.08
175.88
±0.03
fieldloop
98.52
±4.07
89.07
±0.01
graphsearch
45.81
±1.21
24.33
±0.74
integerloop
133.10
±6.16
117.24
±0.01
jsonsmall
164.80
±5.83
142.20
±5.29
list
106.38
±7.84
98.95
±0.04
loop
144.73
±6.09
120.89
±0.00
mandelbrot
90.31
±4.84
80.51
±0.03
nbody
58.61
±1.77
48.15
±0.03
pagerank
69.21
±0.08
51.90
±0.46
permute
126.65
±5.94
109.74
±0.03
queens
157.93
±5.89
134.31
±0.02
quicksort
183.42
±4.54
175.50
±0.04
recurse
169.34
±7.22
146.41
±0.00
richards
582.34
±41.87
487.95
±0.03
sieve
131.11
±5.83
114.14
±3.04
storage
71.95
±2.32
85.36
±3.26
sum
136.99
±5.40
117.56
±0.00
towers
71.09
±3.09
58.84
±0.03
treesort
57.12
±2.12
44.90
±1.02
whileloop
132.37
±2.45
96.28
±0.02
Table 13.
Average heap footprint (MiB) across benchmarks in the
Elision
experiment, measured before and after applying
Alloy
's finalizer elision optimisation. Values are arithmetic means over 30 runs with resampled traces from heaptrack, with 99% confidence intervals (see Section 8.1.4 for details).
Although it is outside the scope of this paper, it would be preferable for Rust to have different types for 'data width' and 'address'. Modern C, for example, captures this difference with the
size_t
and
uintptr_t
types respectively. Rust now has a provenance lint to nudge users in this general direction, but the
as
keyword still allows arbitrary conversions.
The lengthier syntax
y = Gc::clone(&v)
is available, since every copyable type is also cloneable.
These features were added to the C+11 specification but removed in C++23.
Mostly. Rust's 'temporary lifetime extension' delays destruction, but for how long is currently unspecified.
A program could pause after 'exit' while all queued finalizers are run. We are not aware of a system which does this.
This is a white lie, though the visible effect is the same.
RawTable
is contained in the separate
hashbrown
crate which is then included in Rust's standard library. We previously maintained a fork of this, but synchronising it is painful. For now, at least, we have hacked explicit knowledge of
RawTable
into the
needs_finalize
function.
Initially we wrapped this macro in a function. This led to surprising performance results: an analysis of LLVM IR suggests – but the sheer quantity of data makes it hard to confirm – that this may be due to the effect on inlining heuristics.
Our initial implementation of FSA was, in essence, an extension of Rust's type system and suffered from false positives. For example, porting
Alacritty
led to 110 errors — but our abstract interpretation approach raises just 2.
In commit
10
While the term 'trial deletion' does not appear in the original paper, it is now widely used to describe part of the algorithm.
Internal
info@soft-dev.org