Codegolf - Reverse NATO phonetic alphabet
A long time ago, I used to visit Codegolf Stack Exchange regularly, and took part in a few contests. Due to certain reasons, I have decided not to contribute to the Stack Exchange network anymore, but I recently came across a codegolf contest that looked interesting, so I thought I’d post my solution in my blog instead.
The post is linked here, but to summarise, the task is to convert a string spelled out in NATO phonetic alphabet, without any delimiter, back to a string spelled normally, so for example, from “ECHOXRAYALFAMIKEPAPALIMAECHO” to “EXAMPLE”.
I decided to attempt this in the RISC-V ISA, for no particular reason other than that I have been using it at work. My current score is 52 bytes.
A few details before getting to the code:
- Instead of a full program handling I/O, I will only be implementing a function, using the standard RISC-V ABI.
- I am using the 32-bit ISA with the Zca and Zcb extensions.
- I will assume C strings (i.e. NUL-terminated array of chars) in ASCII.
- I will assume the caller has allocated a sufficiently large output buffer.
My solution relies on the fact that all phonetic alphabets start with the letter it represents, for example, ALFA starts with A, BRAVO starts with B. So all it needs to do is know how long each phonetic alphabet is, and just copy the first letter and skip over the rest.
Here’s the assembly:
.globl revnato
revnato:
la a3, revnato_data - 65
revnato_loop:
lbu a2, 0(a0)
sb a2, 0(a1)
beqz a2, revnato_ret
add a2, a2, a3
lbu a2, 0(a2)
add a0, a0, a2
addi a1, a1, 1
j revnato_loop
revnato_ret:
ret
revnato_data:
.byte 4,5,7,5,4,7,4,5,5,7,4,4,4,8,5,4,6,5,6,5,7,6,7,4,6,4
If you’re not familiar with the syntax, all operations are in the form
opcode destination operand
or opcode destination operand1 operand2
.
Here, a0
is expected to contain the input string, and a1
the output buffer.
We first load the address (la
) of a lookup table to a3
. This table is
the string length of each phonetic alphabet. The - 65
will be explained later.
In the loop, we first load one unsigned byte (lbu
) at a0[0]
into a2
, and store that byte (sb
) into a1[0]
, i.e. copy the
first character.
We exit from the loop (and return from the function) if the byte is NUL
(beqz
: branch if equal to zero).
If non-zero, we compute the index for the lookup table.
The unoptimised way would be to subtract 'A'
(ASCII 65)
from the byte, and then add it to the lookup table address. To optimise
this, I have moved the “subtract 65” to the address itself, so that we can
just add a3
to a2
. We then dereference the address by loading
the unsigned byte.*
We then increment the input pointer by the lookup table entry, and increment the output pointer by 1, and loop.
*: Note that another way to optimise it would be to leave the a3
address
as is, and do lbu a2, -65(a2)
instead. This may be more readable, but
actually produces bigger code, because -65
is outside the range of the
16-bit lbu
immediate opcode range (see RISC-V Zcb), so a 32-bit lbu
has to be used instead.
This can be compiled to an object file with:
YOUR-TOOLCHAIN-gcc -mabi=ilp32e -march=rv32emzca_zcb -c -o revnato.o revnato.s
We can look at the disassembly to check the binary size:†
YOUR-TOOLCHAIN-objdump -S revnato.o > revnato.dump
00000000 <revnato>:
0: 00000697 auipc a3,0x0
4: 00068693 mv a3,a3
00000008 <revnato_loop>:
8: 8110 lbu a2,0(a0)
a: 8990 sb a2,0(a1)
c: c611 beqz a2,18 <revnato_ret>
e: 9636 add a2,a2,a3
10: 8210 lbu a2,0(a2)
12: 9532 add a0,a0,a2
14: 0585 addi a1,a1,1
16: bfcd j 8 <revnato_loop>
00000018 <revnato_ret>:
18: 8082 ret
0000001a <revnato_data>:
1a: 05070504 .word 0x05070504
1e: 05040704 .word 0x05040704
22: 04040705 .word 0x04040705
26: 04050804 .word 0x04050804
2a: 05060506 .word 0x05060506
2e: 04070607 .word 0x04070607
32: 0406 .short 0x0406
Note that the mv a3,a3
, which looks like a NOP, is actually space
reserved for a relocation. This is because we are only creating an
object file without linking, so the address of revnato_data
is not
yet known. You can prove this by generating a linked ELF with
YOUR-TOOLCHAIN-gcc -mabi=ilp32e -march=rv32emzca_zcb -o revnato.elf revnato.s -nostdlib
and observing in that disassembly that the instruction has now changed,
for example, to addi a3,a3,-39
.
†: Alternatively, we can just run readelf -S revnato.o
and look at the
size field of .text.
Finally, just for completeness, an example C program to call this function:
#include <stdio.h>
void revnato(const char *, char *);
int main(void)
{
const char *in = "ECHOXRAYALFAMIKEPAPALIMAECHO";
char out[100];
revnato(in, out);
puts(out);
return 0;
}