Features

STMX is an actively maintained, high-performance concurrency library providing Software, Hardware and Hybrid Transactional Memory for Common Lisp.

Transactional Memory is a paradigm for concurrent programming – for a general description of what it means, how it works, and how it compares to other concurrent programming paradigms, see introduction.md and also Wikipedia

Its main features are:

  • Extremely intuitive to use and to write correct, thread-safe concurrent code.
  • Brings database-style transactions to Common Lisp by introducing transactional memory.
  • High performance implementation, benchmarked to reach up to 6 millions transactions per second per CPU core on commodity PC hardware.
  • Removes the need for traditional locks, mutexes and conditions – writing correct concurrent code with them is well known to be hard.
  • Transactional code is intrinsically deadlock-free: if two transactions conflict one of them will be re-executed.
  • Automatic commit and rollback: if a transaction completes normally it will be committed, if it exits with a non-local control transfer (signals an error, throws, or calls (go …) to exit an atomic block) it will be rolled back.
  • Transactions are composable: they can be executed in a larger transaction, either in sequence (all-or-nothing) or as alternatives (try them in order until one succeeds).
  • Guarantees a consistent view of memory during transactions: concurrent updates from other threads are not visible – if the consistency cannot be guaranteed, the transaction will be automatically rolled back and re-executed from scratch.
  • Offers freedom of choice between blocking and non-blocking transactional functions: given either behaviour, it is trivial to transform it into the other.
  • Features transactional versions of popular data structures: hash tables, red-black trees, stack, fifo, etc.
  • Includes transactional data structure for multicast publish/subscribe
  • Creating new transactional data structures is easy.
  • Extensive test suite.
  • Tested on SBCL, ABCL, CCL, CLISP and CMUCL.
  • Very simple to install with Quicklisp.

A quick-start guide and installation instructions are provided in the file README.md.

License: LLGPL

Supported systems

STMX is currently tested on the following Common Lisp implementations:

  • SBCL version 2.0.9 (x86_64) on Debian GNU/Linux bullseye (x86_64)
  • ABCL version 1.7.1 with OpenJDK 11.0.8 (x86_64) on Debian GNU/Linux bullseye (x86_64)
  • CCL version 1.12 (x86_64) on Debian GNU/Linux bullseye (x86_64)
  • CCL version 1.12 (x86) on Debian GNU/Linux bullseye (x86_64)
  • CLISP version 2.49.92 (x86_64) on Debian GNU/Linux bullseye (x86_64)

CLISP needs some workarounds to load STMX, see doc/supported-systems.md.

Partially supported systems

There are known problems running STMX on the following implementations, see (doc/supported-systems.md) for details:

  • ECL version 13.5.1, on both x86 and x86_64
  • CMUCL >= 20f no longer supports the command line option “-fpu x87” needed to run STMX reliably, and its ASDF must be upgraded to version >= 3.1 to load STMX.

Untested systems

STMX will probably work on several other Common Lisp implementations as long as they support log4cl, closer-mop, bordeaux-threads and trivial-garbage, but the author gives no guarantees.

Hardware transactions

STMX versions 1.9.0 or later can take advantage of hardware transactions on Intel CPUs that support Transactional Synchronization Extensions (TSX).

To actually use hardware transactions from STMX, there are two more requirements:

  • 64-bit version of SBCL 1.1.19 or later
  • a 64-bit unix-like operating system – at the moment only Linux x86_64 is tested

Also, hardware transactions only work in compiled code – SBCL sometimes interprets very short functions and simple code executed at REPL instead of compiling them, which may cause hardware transactions to fail.

How to tell if hardware transactions are supported

There are several ways. The easiest are:

  • From outside transactions, run the macro (HW-TRANSACTION-SUPPORTED?). It internally calls the CPUID assembler instruction and returns T if hardware transactions are supported, or NIL if they are not.
  • Try to use them, for example by executing (ATOMIC (HW-TRANSACTION-SUPPORTED-AND-RUNNING?))

How to use hardware transactions

STMX automatically uses hardware transactions if they are supported. There is no need for special commands, just execute the usual (ATOMIC ...) or (RUN-ATOMIC ...) forms.

Hardware transactions have several limitations, and STMX will seamlessly switch to (slower) software transactions in the following cases:

  • hardware limits are exceeded, for example read-set or write-set are larger than CPU L1 cache
  • executing a function or macro not supported by hardware transactions. The list is subject to change, it currently includes:
    • STMX functions and macros: RETRY, ORELSE, RUN-ORELSE, BEFORE-COMMIT, AFTER-COMMIT, CALL-BEFORE-COMMIT, CALL-AFTER-COMMIT
    • any Common Lisp function or macro that signals an error, or allocates non-trivial amounts of memory, or performs any kind of system calls, including input/output, sleeping and context switching.
  • executing a CPU instruction not allowed inside hardware transaction. In particular, Intel TSX guarantees that CPU instructions
    • CPUID, PAUSE, XABORT

    will always abort a hardware transaction, but many other CPU instructions typically have the same effect, including possibly:

    • Calls to the operating system and returns from it: SYSENTER, SYSCALL, SYSEXIT, SYSRET.
    • Interrupts: INT n, INTO.
    • Input/Output: IN, INS, REP INS, OUT, OUTS, REP OUTS and their variants.
    • All X87 and MMX instructions. On the opposite, XMM and YMM registers and the MXCSR register can be used inside a hardware transaction.
    • CLI, STI, POPFD, POPFQ, CLTS.
    • Instructions that update segment registers, debug registers and/or control registers such as DF (CLD and STD instructions), DS/ES/FS/GS/SS and CR0/CR2/CR3/CR4/CR8.
    • TLB and Cacheability control: CLFLUSH, INVD, WBINVD, INVLPG, INVPCID, and memory instructions with a non-temporal hint (MOVNTDQA, MOVNTDQ, MOVNTI, MOVNTPD, MOVNTPS, and MOVNTQ).
    • Processor state save: XSAVE, XSAVEOPT, and XRSTOR.
    • VMX instructions: VMPTRLD, VMPTRST, VMCLEAR, VMREAD, VMWRITE, VMCALL, VMLAUNCH, VMRESUME, VMXOFF, VMXON, INVEPT, and INVVPID.
    • SMX instructions: GETSEC.
    • Miscellaneous: UD2, RSM, RDMSR, WRMSR, HLT, MONITOR, MWAIT, XSETBV, VZEROUPPER, MASKMOVQ, and V/MASKMOVDQU.

    For details and up-to-date information, see Intel Instruction Set Programming Reference, Chapter “Transactional Synchronization Extensions”.

Performance

STMX automatically discovers and takes advantage of many optional, non-standard features of the underlying Common Lisp compiler. It also performs graceful degradation, i.e. if the fastest version of a feature is not available it automatically switches to a slower, available alternative.

Depending on the available features, STMX performance can vary up to a factor 100 or more (!).

To reach its peak performance, several requirements need to be satisfied by the hardware and by the Lisp compiler being used. They are listed here in order of importance:

Hardware requirements:

  • support hardware transactions (Intel TSX). Without them, STMX is at least 4-5 times slower. Or, if you prefer since Intel TSX is currently very rare, with it STMX is at least 4-5 times faster. As of August 2013, STMX can use hardware transactions only on 64-bit SBCL.

Lisp compiler requirements:

  1. it must have good multi-threading support. Without it, what would you need a concurrency library as STMX for?
  2. it must expose atomic compare-and-swap operations, to implement fast mutexes. A much slower alternative, but still better than nothing, is to expose a function that returns which thread has acquired a bordeaux-threads lock.
  3. it must produce fast, highly optimized code.
  4. it must be 64-bit. 32-bit is much slower because transactional memory version counters are then BIGNUMs instead of FIXNUMs.
  5. it must expose memory barrier operations. This is less important on x86 and x86-64, and more important on unordered architectures (almost all others).

Among the non-commercial Lisp compilers, SBCL is the only one known to STMX author that satisfies all the compiler requirements, and (guess why) the only one where STMX author has implemented support for hardware transactions.

Actually, all the other tested free Lisp compilers (ABCL, CCL, CMUCL, ECL) are at least somewhat lacking in the area “fast, highly optimized code”, and none of them offers atomic compare-and-swap or memory barrier operations at all. One – CMUCL – produces relatively fast code, but does not support native threads. STMX is not tested on any commercial Lisp compiler, so performance on them is simply unknown.

For these reasons, STMX will reach the highest known performance on SBCL by a large margin – possibly by a factor from 10 to 100 or more with respect to other tested systems.

For more performance considerations and a lot of raw numbers produced by running micro-benchmarks, see the included files doc/benchmark.md, doc/benchmark-abcl.md, doc/benchmark-ccl64.md and doc/benchmark-cmucl.md.

The short version is: as of August 2013, on a fast consumer PC (Core i7 4770 @ 3.5GHz or better) with SBCL 1.1.9 or better, STMX can execute more than 39 millions hardware transactions per second per CPU core, and more than 7 millions software transactions per second per CPU core. The second platform in terms of performance is CCL (x86_64), that reaches 1.1 millions software transactions per second per CPU core using two threads, but STMX performance quickly decreases with more threads (reason still needs to be investigated).

A small example with very short transactions is the dining philosophers, with 5 reads and 5 writes to transactional memory per atomic block, where each CPU core runs approximately 4.4 millions software transactions per second – hyperthreading has very limited effects.

Obviously, performance in other usage scenarios will depend on the complexity of the code inside transactions, on the availability of hardware transactions, on the number of reads and writes to transactional memory, and the rate of conflicts and rollbacks.

Note

These result are not absolute performance considerations of the tested Lisp systems. They are simply the outcome of running micro-benchmarks of a particular library optimized for SBCL (see the hardware transactions, atomic compare-and-swap and memory barriers considerations) on several other Lisp systems. Do not try to construct these results as STMX author’s opinions on the mentioned Lisp systems.

Lee-STMX

For a less artificial and hopefully more realistic benchmark, the author has ported Lee-TM, a non-trivial benchmark suite for transactional memory developed in 2007 by the University of Manchester (UK). The result is Lee-STMX.


Leave a comment