--- layout: post title: "Codegolf - Reverse NATO phonetic alphabet" date: 2025-06-14 01:52:56 +0100 --- A long time ago, I used to visit Codegolf Stack Exchange regularly, and took part in a few contests. Due to certain reasons, I have decided not to contribute to the Stack Exchange network anymore, but I recently came across a codegolf contest that looked interesting, so I thought I'd post my solution in my blog instead. The post is linked [here][cg], but to summarise, the task is to convert a string spelled out in NATO phonetic alphabet, without any delimiter, back to a string spelled normally, so for example, from "ECHOXRAYALFAMIKEPAPALIMAECHO" to "EXAMPLE". I decided to attempt this in the RISC-V ISA, for no particular reason other than that I have been using it at work. My current score is **52 bytes**. A few details before getting to the code: - Instead of a full program handling I/O, I will only be implementing a function, using the standard RISC-V ABI. - I am using the 32-bit ISA with the Zca and Zcb extensions. - I will assume C strings (i.e. NUL-terminated array of chars) in ASCII. - I will assume the caller has allocated a sufficiently large output buffer. My solution relies on the fact that all phonetic alphabets start with the letter it represents, for example, ALFA starts with A, BRAVO starts with B. So all it needs to do is know how long each phonetic alphabet is, and just copy the first letter and skip over the rest. Here's the assembly: ``` .globl revnato revnato: la a3, revnato_data - 65 revnato_loop: lbu a2, 0(a0) sb a2, 0(a1) beqz a2, revnato_ret add a2, a2, a3 lbu a2, 0(a2) add a0, a0, a2 addi a1, a1, 1 j revnato_loop revnato_ret: ret revnato_data: .byte 4,5,7,5,4,7,4,5,5,7,4,4,4,8,5,4,6,5,6,5,7,6,7,4,6,4 ``` If you're not familiar with the syntax, all operations are in the form `opcode destination operand` or `opcode destination operand1 operand2`. Here, `a0` is expected to contain the input string, and `a1` the output buffer. We first **l**oad the **a**ddress (`la`) of a lookup table to `a3`. This table is the string length of each phonetic alphabet. The `- 65` will be explained later. In the loop, we first **l**oad one **u**nsigned **b**yte (`lbu`) at `a0[0]` into `a2`, and **s**tore that **b**yte (`sb`) into `a1[0]`, i.e. copy the first character. We exit from the loop (and **ret**urn from the function) if the byte is NUL (`beqz`: **b**ranch if **eq**ual to **z**ero). If non-zero, we compute the index for the lookup table. The unoptimised way would be to subtract `'A'` (ASCII 65) from the byte, and then add it to the lookup table address. To optimise this, I have moved the "subtract 65" to the address itself, so that we can just **add** `a3` to `a2`. We then dereference the address by **l**oading the **u**nsigned **b**yte.\* We then increment the input pointer by the lookup table entry, and increment the output pointer by 1, and loop. \*: Note that another way to optimise it would be to leave the `a3` address as is, and do `lbu a2, -65(a2)` instead. This may be more readable, but actually produces bigger code, because `-65` is outside the range of the 16-bit `lbu` immediate opcode range (see RISC-V Zcb), so a 32-bit `lbu` has to be used instead. This can be compiled to an object file with: ``` YOUR-TOOLCHAIN-gcc -mabi=ilp32e -march=rv32emzca_zcb -c -o revnato.o revnato.s ``` We can look at the disassembly to check the binary size:† ``` YOUR-TOOLCHAIN-objdump -S revnato.o > revnato.dump ``` ``` 00000000 : 0: 00000697 auipc a3,0x0 4: 00068693 mv a3,a3 00000008 : 8: 8110 lbu a2,0(a0) a: 8990 sb a2,0(a1) c: c611 beqz a2,18 e: 9636 add a2,a2,a3 10: 8210 lbu a2,0(a2) 12: 9532 add a0,a0,a2 14: 0585 addi a1,a1,1 16: bfcd j 8 00000018 : 18: 8082 ret 0000001a : 1a: 05070504 .word 0x05070504 1e: 05040704 .word 0x05040704 22: 04040705 .word 0x04040705 26: 04050804 .word 0x04050804 2a: 05060506 .word 0x05060506 2e: 04070607 .word 0x04070607 32: 0406 .short 0x0406 ``` Note that the `mv a3,a3`, which looks like a NOP, is actually space reserved for a relocation. This is because we are only creating an object file without linking, so the address of `revnato_data` is not yet known. You can prove this by generating a linked ELF with `YOUR-TOOLCHAIN-gcc -mabi=ilp32e -march=rv32emzca_zcb -o revnato.elf revnato.s -nostdlib` and observing in that disassembly that the instruction has now changed, for example, to `addi a3,a3,-39`. †: Alternatively, we can just run `readelf -S revnato.o` and look at the size field of .text. Finally, just for completeness, an example C program to call this function: ```c #include void revnato(const char *, char *); int main(void) { const char *in = "ECHOXRAYALFAMIKEPAPALIMAECHO"; char out[100]; revnato(in, out); puts(out); return 0; } ``` [cg]: https://codegolf.stackexchange.com/a/282211