da65 Users Guide

Ullrich von Bassewitz,
Greg King


da65 is a 6502/65C02 disassembler that is able to read user-supplied information about its input data, for better results. The output is ready for feeding into ca65, the macro assembler supplied with the cc65 C compiler.

1. Overview

2. Usage

3. Detailed workings

4. Info File Format

5. Helper scripts

6. Copyright


1. Overview

da65 is a disassembler for 6502/65C02 code. It is supplied as a utility with the cc65 C compiler and generates output that is suitable for the ca65 macro assembler.

Besides generating output for ca65, one of the design goals was that the user is able to feed additional information about the code into the disassembler, for improved results. This information may include the location and size of tables, and their format.

One nice advantage of this concept is that disassembly of copyrighted binaries may be handled without problems: One can just pass the information file for disassembling the binary, so everyone with a legal copy of the binary can generate a nicely formatted disassembly with readable labels and other information.

2. Usage

2.1 Command line option overview

The assembler accepts the following options:

---------------------------------------------------------------------------
Usage: da65 [options] [inputfile]
Short options:
  -g                    Add debug info to object file
  -h                    Help (this text)
  -i name               Specify an info file
  -o name               Name the output file
  -v                    Increase verbosity
  -F                    Add formfeeds to the output
  -s                    Accept line markers in the info file
  -S addr               Set the start/load address
  -V                    Print the disassembler version

Long options:
  --argument-column n   Specify argument start column
  --comment-column n    Specify comment start column
  --comments n          Set the comment level for the output
  --cpu type            Set cpu type
  --debug-info          Add debug info to object file
  --formfeeds           Add formfeeds to the output
  --help                Help (this text)
  --hexoffs             Use hexadecimal label offsets
  --info name           Specify an info file
  --label-break n       Add newline if label exceeds length n
  --mnemonic-column n   Specify mnemonic start column
  --pagelength n        Set the page length for the listing
  --start-addr addr     Set the start/load address
  --sync-lines          Accept line markers in the info file
  --text-column n       Specify text start column
  --verbose             Increase verbosity
  --version             Print the disassembler version
---------------------------------------------------------------------------

2.2 Command line options in detail

Here is a description of all the command line options:

--argument-column n

Specifies the column where the argument for a mnemonic or pseudo instruction starts.

--comment-column n

Specifies the column where the comment for an instruction starts.

--comments n

Set the comment level for the output. Valid arguments are 0..4. Greater values will increase the level of additional information written to the output file in form of comments.

--cpu type

Set the CPU type. The option takes a parameter, which may be one of

6502x is for the NMOS 6502 with unofficial opcodes. 6502dtv is for the emulated CPU of the C64DTV device. huc6280 is the CPU of the PC engine. 4510 is the CPU of the Commodore C65. 65816 is the CPU of the SNES.

-F, --formfeeds

Add formfeeds to the generated output. This feature is useful together with the --pagelength option. If --formfeeds is given, a formfeed is added to the output after each page.

-g, --debug-info

This option adds the .DEBUGINFO command to the output file, so the assembler will generate debug information when re-assembling the generated output.

-h, --help

Print the short option summary shown above.

--hexoffs

Output label offsets in hexadecimal instead of decimal notation.

-i name, --info name

Specify an info file. The info file contains global options that may override or replace command line options plus information about the code that has to be disassembled. See the separate section Info File Format.

-o name

Specify a name for an output file. The default is to use stdout, so without this switch or the corresponding global option OUTPUTNAME, the output will go to the terminal.

--label-break n

Adds a newline if the length of a label exceeds the given length. Note: If the label would run into the code in the mid column, a linefeed is always inserted regardless of this setting.

This option overrides the global option LABELBREAK.

--mnemonic-column n

Specifies the column where a mnemonic or pseudo instruction is output.

--pagelength n

Sets the length of a listing page in lines. After this number of lines, a new page header is generated. If the --formfeeds is also given, a formfeed is inserted before generating the page header.

A value of zero for the page length will disable paging of the output.

-S addr, --start-addr addr

Specify the start/load address of the binary code that is going to be disassembled. The given address is interpreted as an octal value if preceded with a '0' digit, as a hexadecimal value if preceded with '0x', '0X', or '$', and as a decimal value in all other cases. If no start address is specified, $10000 minus the size of the input file is used.

-s, --sync-lines

Accept line markers in the info file in the following syntax:

#line <lineno> ["<filename>"]
# <lineno> "<filename>" [<flag>] ...
This option is intended for preprocessing info files with "cpp" or "m4".

--text-column n

Specifies the column where additional text is output. This additional text consists of the bytes encoded in this line in text representation.

-v, --verbose

Increase the disassembler verbosity. Usually only needed for debugging purposes. You may use this option more than one time for even more verbose output.

-V, --version

Print the version number of the assembler. If you send any suggestions or bugfixes, please include the version number.

3. Detailed workings

3.1 Supported CPUs

The default (no CPU given on the command line or in the GLOBAL section of the info file) is the 6502 CPU. The disassembler knows all "official" opcodes for this CPU. Invalid opcodes are translated into .byte commands.

With the command line option --cpu, the disassembler may be told to recognize either the 65SC02 or 65C02 CPUs. The latter understands the same opcodes as the former, plus 16 additional bit manipulation and bit test-and-branch commands. Using 6502x as CPU the illegal opcodes of 6502 CPU are detected and displayed. 6502dtv setting recognizes the emulated CPU instructions of the C64DTV device.

When disassembling 4510 code, due to handling of 16-bit wide branches, da65 can produce output that can not be re-assembled, when one or more of those branches point outside of the disassembled memory. This can happen when text or binary data is processed.

The 65816 support requires annotating ranges with the M and X flag states. This can be recorded with an emulator that supports Code and Data Logging, for example. Disassemble one bank at a time.

3.2 Attribute map

The disassembler works by creating an attribute map for the whole address space ($0000 - $FFFF). Initially, all attributes are cleared. Then, an external info file (if given) is read. Disassembly is done in several passes. In all passes, with the exception of the last one, information about the disassembled code is gathered and added to the symbol and attribute maps. The last pass generates output using the information from the maps.

3.3 Labels

Some instructions may generate labels in the first pass, while most other instructions do not generate labels, but use them if they are available. Among others, the branch and jump instructions will generate labels for the target of the branch in the first pass. External labels (taken from the info file) have precedence over internally generated ones. They must be valid identifiers as specified for the ca65 assembler. Internal labels (generated by the disassembler) have the form Labcd, where abcd is the hexadecimal address of the label in upper case letters. You should probably avoid using such label names for external labels.

3.4 Info File

The info file is used to pass additional information about the input code to the disassembler. This includes label names, data areas or tables, and global options like input and output file names. See the next section for more information.

4. Info File Format

The info file contains lists of specifications grouped together. Each group directive has an identifying token and an attribute list enclosed in curly braces. Attributes have a name followed by a value. The syntax of the value depends on the type of the attribute. String attributes are placed in double quotes, numeric attributes may be specified as decimal numbers or hexadecimal with a leading dollar sign. There are also attributes where the attribute value is a keyword; in this case, the keyword is given as-is (without quotes or anything). Each attribute is terminated by a semicolon.

        group-name { attribute1 attribute-value; attribute2 attribute-value; }

4.1 Comments

Comments start with a hash mark (#) or a double slash (//) and extend from the position of the mark to the end of the current line. Hash marks or double slashes inside of strings will not start a comment, of course.

4.2 Specifying global options

Global options may be specified in a group with the name GLOBAL. The following attributes are recognized:

ARGUMENTCOLUMN

This attribute specifies the column in the output, where the argument for an opcode or pseudo instruction starts. The corresponding command line option is --argument-column.

COMMENTCOLUMN

This attribute specifies the column in the output, where the comment starts in a line. It is only used for in-line comments. The corresponding command line option is --comment-column.

COMMENTS

This attribute may be used instead of the --comments option on the command line. It takes a numerical parameter between 0 and 4. Higher values increase the amount of information written to the output file in form of comments.

CPU

This attribute may be used instead of the --cpu option on the command line. For possible values see there. The value is a string and must be enclosed in quotes.

HEXOFFS

This attribute is followed by a boolean value. If true, offsets to labels are output in hex, otherwise they're output in decimal notation. The default is false. The attribute may be changed on the command line using the --hexoffs option.

INPUTNAME

This attribute is followed by a string value, which gives the name of the input file to read. If it is present, the disassembler does not accept an input file name on the command line.

INPUTOFFS

This attribute is followed by a numerical value that gives an offset into the input file which is skipped before reading data. The attribute may be used to skip headers or unwanted code sections in the input file.

INPUTSIZE

INPUTSIZE is followed by a numerical value that gives the amount of data to read from the input file. Data beyond INPUTOFFS + INPUTSIZE is ignored.

LABELBREAK

LABELBREAK is followed by a numerical value that specifies the label length that will force a newline. To have all labels on their own lines, you may set this value to zero.

See also the --label-break command line option. A LABELBREAK statement in the info file will override any value given on the command line.

MNEMONICCOLUMN

This attribute specifies the column in the output, where the mnemonic or pseudo instruction is placed. The corresponding command line option is --mnemonic-column.

NEWLINEAFTERJMP

This attribute is followed by a boolean value. When true, a newline is inserted after each JMP instruction. The default is false.

NEWLINEAFTERRTS

This attribute is followed by a boolean value. When true, a newline is inserted after each RTS instruction. The default is false.

OUTPUTNAME

This attribute is followed by string value, which gives the name of the output file to write. If it is present, specification of an output file on the command line using the -o option is not allowed.

The default is to use stdout for output, so without this attribute or the corresponding command line option -o the output will go to the terminal.

PAGELENGTH

This attribute may be used instead of the --pagelength option on the command line. It takes a numerical parameter. Using zero as page length (which is the default) means that no pages are generated.

STARTADDR

This attribute may be used instead of the --start-addr option on the command line. It takes a numerical parameter. The default for the start address is $10000 minus the size of the input file. (This assumes that the input file is a ROM that contains the reset and irq vectors.)

TEXTCOLUMN

This attribute specifies the column, where the data bytes are output translated into ASCII text. It is only used if COMMENTS is set to at least 4. The corresponding command line option is --text-column.

4.3 Specifying Ranges

The RANGE directive is used to give information about address ranges. The following attributes are recognized:

COMMENT

This attribute is only allowed if a label is also given. It takes a string as argument. See the description of the LABEL directive for an explanation.

END

This gives the end address of the range. The end address is inclusive, that means, it is part of the range. Of course, it may not be smaller than the start address. Optionally, the end may be given as a decimal offset instead of an absolute address, "+3", to specify it as a size.

NAME

This is a convenience attribute. It takes a string argument and will cause the disassembler to define a label for the start of the range with the given name so a separate LABEL directive is not needed.

START

This gives the start address of the range.

TYPE

This attribute specifies the type of data within the range. The attribute value is one of the following keywords:

ADDRTABLE

The range consists of data and is disassembled as a table of words (16 bit values). The difference to the WORDTABLE type is that a label is defined for each entry in the table.

BYTETABLE

The range consists of data and is disassembled as a byte table.

CODE

The range consists of code.

DBYTETABLE

The range consists of data and is disassembled as a table of dbytes (double byte values, 16 bit values with the low byte containing the most significant byte of the 16 bit value).

DWORDTABLE

The range consists of data and is disassembled as a table of double words (32 bit values).

RTSTABLE

The range consists of data and is disassembled as a table of words (16 bit values). The values are interpreted as words that are pushed onto the stack and jump to it via RTS. This means that they contain address-1 of a function, for which a label will get defined by the disassembler.

SKIP

The range is simply ignored when generating the output file. Please note that this means that reassembling the output file will not generate the original file, not only because of the missing piece in between, but also because the following code will be located on wrong addresses. Output generated with SKIP ranges will need manual rework.

TEXTTABLE

The range consists of readable text.

WORDTABLE

The range consists of data and is disassembled as a table of words (16 bit values).

UNIT

Split the table into sections of this size. For example, if you have a ByteTable of size 48, but it has logical groups of size 16, specifying 16 for UNIT adds newlines after every 16 bytes. UNIT is always in bytes.

ADDRMODE

When disassembling 65816 code, this specifies the M and X flag states for this range. It's a string argument of the form "mx". Capital letters mean the flag is enabled.

4.4 Specifying Labels

The LABEL directive is used to give names for labels in the disassembled code. The following attributes are recognized:

ADDR

Followed by a numerical value. Specifies the value of the label.

COMMENT

Attribute argument is a string. The comment will show up in a separate line before the label, if the label is within code or data range, or after the label if it is outside.

Example output:

        foo     := $0001        ; Comment for label named "foo"

        ; Comment for label named "bar"
        bar:

NAME

The attribute is followed by a string value which gives the name of the label. Empty names are allowed; in this case the disassembler will create an unnamed label. (See the assembler docs for more information about unnamed labels.)

SIZE

This attribute is optional and may be used to specify the size of the data that follows. If a size greater than 1 is specified, the disassembler will create labels in the form label+offs for all bytes within the given range, where label is the label name given with the NAME attribute, and offs is the offset within the data.

PARAMSIZE

This optional attribute is followed by a numerical value. It tells the assembler that subroutine calls to this label are followed by "inline parameters" with the given number of bytes, like this:

        JSR     LabelWithParamSize2
        .byte   $00, $10
        (return here)
        code...

4.5 Specifying Segments

The SEGMENT directive is used to specify a segment within the disassembled code. The following attributes are recognized:

START

This attribute is followed by a numerical value which specifies the start address of the segment.

END

This attribute is followed by a numerical value which specifies the end address of the segment. The end address is the last address that is a part of the segment.

NAME

This attribute is followed by a string value which gives the name of the segment.

All attributes are mandatory. Segments must not overlap. The disassembler will change back to the (default) .code segment after the end of each defined segment. That might not be what you want. As a rule of thumb, if you're using segments, you should define segments for all disassembled code.

4.6 Specifying Assembler Includes

The ASMINC directive is used to give the names of input files containing symbol assignments in assembler syntax:

        Name = value
        Name := value

The usual conventions apply for symbol names. Values may be specified as hex (leading $), binary (leading %) or decimal. The values may optionally be signed.

NOTE: The include file parser is very simple. Expressions are not allowed, and anything but symbol assignments is flagged as an error (but see the IGNOREUNKNOWN directive below).

The following attributes are recognized:

FILE

This attribute is followed by a string value. It specifies the name of the file to read.

COMMENTSTART

This optional attribute is followed by a character constant. It specifies the character that starts a comment. The default value is a semicolon. This value is ignored if IGNOREUNKNOWN is true.

IGNOREUNKNOWN

This attribute is optional and is followed by a boolean value. It allows to ignore input lines that don't have a valid syntax. This allows to read in assembler include files that contain more than just symbol assignments. Note: When this attribute is used, the disassembler will ignore any errors in the given include file. This may have undesired side effects.

4.7 An Info File Example

The following is a short example for an info file that contains most of the directives explained above:

        # This is a comment. It extends to the end of the line
        GLOBAL {
            OUTPUTNAME      "kernal.s";
            INPUTNAME       "kernal.bin";
            STARTADDR       $E000;
            PAGELENGTH      0;                  # No paging
            CPU             "6502";
        };

        # One segment for the whole stuff
        SEGMENT { START $E000;  END   $FFFF; NAME "kernal"; };

        RANGE { START $E612;    END   $E631; TYPE Code;      };
        RANGE { START $E632;    END   $E640; TYPE ByteTable; };
        RANGE { START $EA51;    END   $EA84; TYPE RtsTable;  };
        RANGE { START $EC6C;    END   $ECAB; TYPE RtsTable;  };
        RANGE { START $ED08;    END   $ED11; TYPE AddrTable; };

        # Zero-page variables
        LABEL { NAME "fnadr";   ADDR  $90;   SIZE 3;    };
        LABEL { NAME "sal";     ADDR  $93;   };
        LABEL { NAME "sah";     ADDR  $94;   };
        LABEL { NAME "sas";     ADDR  $95;   };

        # Stack
        LABEL { NAME "stack";   ADDR  $100;  SIZE 255;  };

        # Indirect vectors
        LABEL { NAME "cinv";    ADDR  $300;  SIZE 2;    };      # IRQ
        LABEL { NAME "cbinv";   ADDR  $302;  SIZE 2;    };      # BRK
        LABEL { NAME "nminv";   ADDR  $304;  SIZE 2;    };      # NMI

        # Jump table at end of kernal ROM
        LABEL { NAME "kscrorg"; ADDR  $FFED; };
        LABEL { NAME "kplot";   ADDR  $FFF0; };
        LABEL { NAME "kiobase"; ADDR  $FFF3; };
        LABEL { NAME "kgbye";   ADDR  $FFF6; };

        # Hardware vectors
        LABEL { NAME "hanmi";   ADDR  $FFFA; };
        LABEL { NAME "hares";   ADDR  $FFFC; };
        LABEL { NAME "hairq";   ADDR  $FFFE; };

5. Helper scripts

util/parse-bsnes-log.awk is a supplied script for 65816 disassembly, to parse bsnes-plus Code-Data log files and output the RANGE sections for your info file. For typical usage, you'd check the S-CPU log and trace log mask boxes in the bsnes-plus debugger, play through the game, then grep for the bank you're disassembling, and pass that to this script.

grep ^83 my-game-log | parse-bsnes-log.awk

6. Copyright

da65 (and all cc65 binutils) is (C) Copyright 1998-2011, Ullrich von Bassewitz. For usage of the binaries and/or sources, the following conditions do apply:

This software is provided 'as-is', without any expressed or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software.

Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely, subject to the following restrictions:

  1. The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required.
  2. Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software.
  3. This notice may not be removed or altered from any source distribution.