HN Debrief

Writing Portable ARM64 Assembly (2023)

  • Programming
  • Hardware
  • Developer Tools

The post is a practical guide to writing ARM64 assembly that can assemble and run on both Apple Silicon and non-Darwin systems. It focuses on the overlap between Mach-O and ELF toolchains, the calling convention details that line up closely enough to share source, and the Apple-specific assembler syntax you need to avoid. The useful claim is narrow but real: if you are writing self-contained routines like math kernels, compression code, or other hot paths that live behind a C interface, you can usually keep one ARM64 source file for macOS and Linux.

If you ship hand-written ARM64, treat portability as a scoped goal: keep compute-heavy routines in a platform-neutral core, then isolate ABI, object format, and syscall glue per target. Do not assume Apple plus Linux covers the whole market, and avoid registers like x18 unless you have checked each platform ABI.

Discussion mood

Mostly positive about the article's practical value, with persistent nitpicking about the word "portable" and the omission of Windows on ARM. The mood was experienced engineers saying the trick works for isolated compute routines, while insisting that ABI and OS boundaries still matter a lot.

Key insights

  1. 01

    x18 is not yours to use

    On AArch64, x18 is commonly treated as a platform register rather than a free general-purpose register. The comment sharpens the article's ABI warning by calling out that Darwin reserves it, Android also reserves it, and Linux distributions generally follow the same AArch64 ABI conventions even if the post's wording around Alpine was sloppy. That means code that grabs x18 as an extra scratch register may seem fine in testing and then fail on another target or runtime.

    Audit any hand-written ARM64 for hidden x18 use before calling it portable. Keep your register allocation inside the conservative cross-platform subset unless you are writing for one OS on purpose.

      Attribution:
    • t-3 #1
  2. 02

    Apple needs its own assembler path

    Real-world toolchain work often ends up littered with __APPLE__ branches because Apple's assembler, object format, and alignment rules differ enough to force separate handling. The comment is useful because it moves the discussion from abstract ABI compatibility to maintenance cost. Even when the instruction sequence is portable, the build plumbing often is not.

    Budget for platform-specific assembly generation and CI even if the core routine is shared. If you do not have native Apple Silicon machines, expect slower debugging and more fragile fixes.

      Attribution:
    • rurban #1
  3. 03

    Portable assembly has always meant a thin HAL

    Several commenters rescued the term from the all-or-nothing argument by pointing to older systems like CP/M and DOS. Assembly could be portable when it stayed above a stable interface and pushed machine-specific I/O and hardware access into a small wrapper layer. That framing makes the article easier to judge. It is describing the same pattern on ARM64, not claiming that raw assembly suddenly escapes platform boundaries.

    Structure low-level code the same way you would structure portable C with platform shims. Put the hot algorithm in shared assembly and quarantine syscalls, startup, and hardware-facing code behind tiny per-platform stubs.

      Attribution:
    • whobre #1 #2
    • pjmlp #1
    • MomsAVoxell #1 #2

Against the grain

  1. 01

    Windows on ARM is a real omission

    Calling the technique portable while ignoring PE and Windows on ARM leaves out a major ABI family. The point is not that Windows dominates ARM laptops today. It is that a portability claim should at least acknowledge the third mainstream target, especially when object format, calling convention details, and platform tooling differ again.

    If your product may ever target Windows on ARM, do not adopt a "portable ARM64" codebase based only on Apple and ELF testing. Add Windows to the support matrix early or narrow the claim in your own docs.

      Attribution:
    • steve1977 #1 #2 #3
  2. 02

    Assembly still stops at the CPU boundary

    The skeptical view is that portability ends the moment code depends on anything beyond the instruction set. System calls, exception handling, binary formats, and hardware access all diverge by OS, so talking about portable assembly risks overstating what is shared. That pushback is useful because it guards against turning a local optimization trick into a platform strategy.

    Use hand-written assembly for isolated kernels, not as the foundation of a broad cross-platform codebase. The more surface area your assembly owns, the faster the portability story falls apart.

In plain english

aarch64
The 64-bit version of the Arm architecture used by modern Arm Linux systems.
ABI
Application Binary Interface, the low-level rules that let compiled programs interact correctly with an operating system or other binaries.
ARM64
A 64-bit processor architecture from Arm, also called AArch64, used in Apple Silicon and many servers, phones, and embedded systems.
CP/M
Control Program for Microcomputers, an early operating system for 8-bit and some 16-bit computers that provided a common software interface across many machines.
Darwin
The core operating system layer underlying macOS, iOS, and other Apple operating systems.
ELF
Executable and Linkable Format, the executable and object file format commonly used on Linux and other Unix-like systems.
Mach-O
The executable and object file format used by Apple operating systems.
PE
Private Equity, investment firms that buy companies or controlling stakes and try to improve financial returns before selling them.
unwind info
Metadata used by debuggers, profilers, and exception handlers to walk back through a program's call stack.
x18
One of the general-purpose registers in AArch64 that some operating systems reserve for platform-specific use instead of letting normal code use it freely.