Hack Assembler

1. Description

The Hack Assembler is a software tool that translates Hack assembly code (.asm) into binary machine code (.hack) that can be executed by the Hack computer platform.

This project marks the final step in Nand2Tetris Part I, combining everything learned about binary computation, instruction formats, and symbolic parsing.

This assembler performs:

Two-pass translation

Symbol resolution (labels and variables)

Instruction decoding (A and C types)

2. High-Level Architecture

Input:  Assembly (.asm)
          ↓
  [Pass 1] → Resolve Labels → Update symbol table
          ↓
  [Pass 2] → Translate Instructions → Output binary
          ↓
Output: Binary (.hack)

3. Key Concepts

A-Instructions (@value)

Represent memory addresses

@21 becomes 0000000000010101

@LOOP resolves to the address of a label

C-Instructions (dest=comp;jump)

Represent computations and control flow

D=A+1 → computation from comp_table, dest_table

D;JGT → uses jump_table

Labels and Symbols

(LOOP) labels are resolved in the first pass

New variables like @i start from address 16

4. Tables Used

comp_table (Selected Examples)

comp Binary
D+1 0011111
A-1 0110010
D|M 1010101
comp Binary
0 0101010
1 0111111
-1 0111010
D 0001100
A 0110000
M 1110000
!D 0001101
!A 0110001
!M 1110001
-D 0001111
-A 0110011
-M 1110011
D+1 0011111
A+1 0110111
M+1 1110111
D-1 0001110
A-1 0110010
M-1 1110010
D+A 0000010
D+M 1000010
D-A 0010011
D-M 1010011
A-D 0000111
M-D 1000111
D&A 0000000
D&M 1000000
D|A 0010101
D|M 1010101

dest_table

dest Binary
  000
M 001
D 010
MD 011
A 100
AM 101
AD 110
AMD 111

jump_table

jump Binary
  000
JGT 001
JEQ 010
JGE 011
JLT 100
JNE 101
JLE 110
JMP 111

5. Symbol Table (Built-In)

Symbol Address
SP 0
LCL 1
ARG 2
THIS 3
THAT 4
R0–R15 0–15
SCREEN 16384
KBD 24576
Variables 16+

6. Code Structure (Python)

Main Phases:

    first_pass(lines)
    Resolves labels like (LOOP) into ROM addresses.

    second_pass(lines)
    Translates each instruction into a 16-bit binary string. Handles:

        A-instructions: direct or symbolic

        C-instructions: parsed into dest=comp;jump

    assemble(file.asm)
    Calls both passes and writes output to .hack

CLI Usage:

python assembler.py Mult.asm

7. Example Translation

Assembly Input

@R0
D=M
@R1
D=D+M
@R2
M=D

Binary Input

0000000000000000
1111110000010000
0000000000000001
1111000010010000
0000000000000010
1110001100001000

8. Project Files

File Name Purpose
assembler.py Main assembler script
Add.asm Example input program
Add.hack Output from running assembler
SymbolTable.md Reference table (optional)

9. Test Strategy

Run .asm files from the projects/06 folder (like Max.asm, Rect.asm, Mult.asm)

Load .hack output into CPU Emulator

Compare against .cmp files or visually verify behavior

10. Code (Python)

    import sys
    import re

    # Predefined symbols
    symbol_table = {
        "SP": 0, "LCL": 1, "ARG": 2, "THIS": 3, "THAT": 4,
        "SCREEN": 16384, "KBD": 24576,
        **{f"R{i}": i for i in range(16)}
    }

    # Binary lookup tables
    comp_table = {
        # a = 0
        "0":   "0101010", "1":   "0111111", "-1":  "0111010",
        "D":   "0001100", "A":   "0110000", "!D":  "0001101",
        "!A":  "0110001", "-D":  "0001111", "-A":  "0110011",
        "D+1": "0011111", "A+1": "0110111", "D-1": "0001110",
        "A-1": "0110010", "D+A": "0000010", "D-A": "0010011",
        "A-D": "0000111", "D&A": "0000000", "D|A": "0010101",
        # a = 1
        "M":   "1110000", "!M":  "1110001", "-M":  "1110011",
        "M+1": "1110111", "M-1": "1110010", "D+M": "1000010",
        "D-M": "1010011", "M-D": "1000111", "D&M": "1000000",
        "D|M": "1010101"
    }

    dest_table = {
        "":    "000", "M":   "001", "D":   "010", "MD":  "011",
        "A":   "100", "AM":  "101", "AD":  "110", "AMD": "111"
    }

    jump_table = {
        "":    "000", "JGT": "001", "JEQ": "010", "JGE": "011",
        "JLT": "100", "JNE": "101", "JLE": "110", "JMP": "111"
    }

    # Clean a line of comments and whitespace
    def clean(line):
        return re.sub("//.*", "", line).strip()

    # First pass: resolve labels
    def first_pass(lines):
        rom = 0
        for line in lines:
            line = clean(line)
            if not line:
                continue
            if line.startswith("("):
                label = line[1:-1]
                symbol_table[label] = rom
            else:
                rom += 1

    # Second pass: generate binary code
    def second_pass(lines):
        ram_address = 16
        binary_lines = []
        for line in lines:
            line = clean(line)
            if not line or line.startswith("("):
                continue

            if line.startswith("@"):  # A-instruction
                symbol = line[1:]
                if symbol.isdigit():
                    address = int(symbol)
                else:
                    if symbol not in symbol_table:
                        symbol_table[symbol] = ram_address
                        ram_address += 1
                    address = symbol_table[symbol]
                binary_lines.append(f"{address:016b}")

            else:  # C-instruction
                if "=" in line:
                    dest, comp_jump = line.split("=")
                else:
                    dest, comp_jump = "", line
                if ";" in comp_jump:
                    comp, jump = comp_jump.split(";")
                else:
                    comp, jump = comp_jump, ""

                comp_bits = comp_table.get(comp.strip(), "0000000")
                dest_bits = dest_table.get(dest.strip(), "000")
                jump_bits = jump_table.get(jump.strip(), "000")

                binary_lines.append("111" + comp_bits + dest_bits + jump_bits)

        return binary_lines

    # Main assembler function
    def assemble(asm_path, hack_path=None):
        with open(asm_path, 'r') as file:
            lines = file.readlines()

        first_pass(lines)
        binary = second_pass(lines)

        out_path = hack_path or asm_path.replace(".asm", ".hack")
        with open(out_path, 'w') as file:
            file.write("\n".join(binary))

        print(f"Assembly complete: {out_path}")

    # If running from CLI
    if __name__ == "__main__":
        if len(sys.argv) < 2:
            print("Usage: python assembler.py <file.asm>")
        else:
            assemble(sys.argv[1])