Hack Assembler

1. Description

The Hack Assembler is a software tool that translates Hack assembly code (.asm) into binary machine code (.hack) that can be executed by the Hack computer platform.

This project marks the final step in Nand2Tetris Part I, combining everything learned about binary computation, instruction formats, and symbolic parsing.

This assembler performs:

Two-pass translation

Symbol resolution (labels and variables)

Instruction decoding (A and C types)

2. High-Level Architecture

Input:  Assembly (.asm)
          ↓
  [Pass 1] → Resolve Labels → Update symbol table
          ↓
  [Pass 2] → Translate Instructions → Output binary
          ↓
Output: Binary (.hack)

3. Key Concepts

A-Instructions (@value)

Represent memory addresses

@21 becomes 0000000000010101

@LOOP resolves to the address of a label

C-Instructions (dest=comp;jump)

Represent computations and control flow

D=A+1 → computation from comp_table, dest_table

D;JGT → uses jump_table

Labels and Symbols

(LOOP) labels are resolved in the first pass

New variables like @i start from address 16

4. Tables Used

`comp_table` (Selected Examples)

comp	Binary
D+1	0011111
A-1	0110010
D\|M	1010101

comp	Binary
0	0101010
1	0111111
-1	0111010
D	0001100
A	0110000
M	1110000
!D	0001101
!A	0110001
!M	1110001
-D	0001111
-A	0110011
-M	1110011
D+1	0011111
A+1	0110111
M+1	1110111
D-1	0001110
A-1	0110010
M-1	1110010
D+A	0000010
D+M	1000010
D-A	0010011
D-M	1010011
A-D	0000111
M-D	1000111
D&A	0000000
D&M	1000000
D\|A	0010101
D\|M	1010101

`dest_table`

dest	Binary
	000
M	001
D	010
MD	011
A	100
AM	101
AD	110
AMD	111

`jump_table`

jump	Binary
	000
JGT	001
JEQ	010
JGE	011
JLT	100
JNE	101
JLE	110
JMP	111

5. Symbol Table (Built-In)

Symbol	Address
SP	0
LCL	1
ARG	2
THIS	3
THAT	4
R0–R15	0–15
SCREEN	16384
KBD	24576
Variables	16+

6. Code Structure (Python)

Main Phases:

    first_pass(lines)
    Resolves labels like (LOOP) into ROM addresses.

    second_pass(lines)
    Translates each instruction into a 16-bit binary string. Handles:

        A-instructions: direct or symbolic

        C-instructions: parsed into dest=comp;jump

    assemble(file.asm)
    Calls both passes and writes output to .hack

CLI Usage:

python assembler.py Mult.asm

7. Example Translation

Assembly Input

@R0
D=M
@R1
D=D+M
@R2
M=D

Binary Input

0000000000000000
1111110000010000
0000000000000001
1111000010010000
0000000000000010
1110001100001000

8. Project Files

File Name	Purpose
`assembler.py`	Main assembler script
`Add.asm`	Example input program
`Add.hack`	Output from running assembler
`SymbolTable.md`	Reference table (optional)

9. Test Strategy

Run .asm files from the projects/06 folder (like Max.asm, Rect.asm, Mult.asm)

Load .hack output into CPU Emulator

Compare against .cmp files or visually verify behavior

10. Code (Python)

    import sys
    import re

    # Predefined symbols
    symbol_table = {
        "SP": 0, "LCL": 1, "ARG": 2, "THIS": 3, "THAT": 4,
        "SCREEN": 16384, "KBD": 24576,
        **{f"R{i}": i for i in range(16)}
    }

    # Binary lookup tables
    comp_table = {
        # a = 0
        "0":   "0101010", "1":   "0111111", "-1":  "0111010",
        "D":   "0001100", "A":   "0110000", "!D":  "0001101",
        "!A":  "0110001", "-D":  "0001111", "-A":  "0110011",
        "D+1": "0011111", "A+1": "0110111", "D-1": "0001110",
        "A-1": "0110010", "D+A": "0000010", "D-A": "0010011",
        "A-D": "0000111", "D&A": "0000000", "D|A": "0010101",
        # a = 1
        "M":   "1110000", "!M":  "1110001", "-M":  "1110011",
        "M+1": "1110111", "M-1": "1110010", "D+M": "1000010",
        "D-M": "1010011", "M-D": "1000111", "D&M": "1000000",
        "D|M": "1010101"
    }

    dest_table = {
        "":    "000", "M":   "001", "D":   "010", "MD":  "011",
        "A":   "100", "AM":  "101", "AD":  "110", "AMD": "111"
    }

    jump_table = {
        "":    "000", "JGT": "001", "JEQ": "010", "JGE": "011",
        "JLT": "100", "JNE": "101", "JLE": "110", "JMP": "111"
    }

    # Clean a line of comments and whitespace
    def clean(line):
        return re.sub("//.*", "", line).strip()

    # First pass: resolve labels
    def first_pass(lines):
        rom = 0
        for line in lines:
            line = clean(line)
            if not line:
                continue
            if line.startswith("("):
                label = line[1:-1]
                symbol_table[label] = rom
            else:
                rom += 1

    # Second pass: generate binary code
    def second_pass(lines):
        ram_address = 16
        binary_lines = []
        for line in lines:
            line = clean(line)
            if not line or line.startswith("("):
                continue

            if line.startswith("@"):  # A-instruction
                symbol = line[1:]
                if symbol.isdigit():
                    address = int(symbol)
                else:
                    if symbol not in symbol_table:
                        symbol_table[symbol] = ram_address
                        ram_address += 1
                    address = symbol_table[symbol]
                binary_lines.append(f"{address:016b}")

            else:  # C-instruction
                if "=" in line:
                    dest, comp_jump = line.split("=")
                else:
                    dest, comp_jump = "", line
                if ";" in comp_jump:
                    comp, jump = comp_jump.split(";")
                else:
                    comp, jump = comp_jump, ""

                comp_bits = comp_table.get(comp.strip(), "0000000")
                dest_bits = dest_table.get(dest.strip(), "000")
                jump_bits = jump_table.get(jump.strip(), "000")

                binary_lines.append("111" + comp_bits + dest_bits + jump_bits)

        return binary_lines

    # Main assembler function
    def assemble(asm_path, hack_path=None):
        with open(asm_path, 'r') as file:
            lines = file.readlines()

        first_pass(lines)
        binary = second_pass(lines)

        out_path = hack_path or asm_path.replace(".asm", ".hack")
        with open(out_path, 'w') as file:
            file.write("\n".join(binary))

        print(f"Assembly complete: {out_path}")

    # If running from CLI
    if __name__ == "__main__":
        if len(sys.argv) < 2:
            print("Usage: python assembler.py <file.asm>")
        else:
            assemble(sys.argv[1])