Re: [cc65] Re: Print heart character on C64?

From: Ullrich von Bassewitz <uz1musoftware.de> Date: 2004-09-21 21:44:50 · This archive was generated by hypermail 2.1.8 : 2004-09-21 21:44:58 CEST

On Tue, Sep 21, 2004 at 01:01:52AM +0100, Jonathan Graham Harston wrote:
> The C standard says that C uses ASCII, regardless of what character set
> encoding the underlying system uses.

I'm sorry but this is plain wrong. If it would be true, C compilers for EBCDIC
systems (IBM mainframes for example) would not be possible.

Here are the relevant paragraphs from the ISO C standard ISO/IEC 9899:1999
(E). Please note that the standard talks nowhere about ASCII.

-------------------------------------------------------------------------------
5.2 Environmental considerations

5.2.1 Character sets

1 Two sets of characters and their associated collating sequences shall be
  defined: the set in which source files are written (the source character
  set), and the set interpreted in the execution environment (the execution
  character set). Each set is further divided into a basic character set,
  whose contents are given by this subclause, and a set of zero or more
  locale-specific members (which are not members of the basic character set)
  called extended characters. The combined set is also called the extended
  character set. The values of the members of the execution character set are
  implementation-defined.

2 In a character constant or string literal, members of the execution
  character set shall be represented by corresponding members of the source
  character set or by escape sequences consisting of the backslash \ followed
  by one or more characters. A byte with all bits set to 0, called the null
  character, shall exist in the basic execution character set; it is used to
  terminate a character string.

3 Both the basic source and basic execution character sets shall have the
  following members: the 26 uppercase letters of the Latin alphabet

        A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

  the 26 lowercase letters of the Latin alphabet

        a b c d e f g h i j k l m n o p q r s t u v w x y z

  the 10 decimal digits

        0 1 2 3 4 5 6 7 8 9

  the following 29 graphic characters

        ! " # % & ' ( ) * + , - . / : ; < = > ? [ \ ] ^ _ { | } ~

  the space character, and control characters representing horizontal tab,
  vertical tab, and form feed. The representation of each member of the source
  and execution basic character sets shall fit in a byte. In both the source
  and execution basic character sets, the value of each character after 0 in
  the above list of decimal digits shall be one greater than the value of the
  previous. In source files, there shall be some way of indicating the end of
  each line of text; this International Standard treats such an end-of-line
  indicator as if it were a single new-line character. In the basic execution
  character set, there shall be control characters representing alert,
  backspace, carriage return, and new line. If any other characters are
  encountered in a source file (except in an identifier, a character constant,
  a string literal, a header name, a comment, or a preprocessing token that is
  never converted to a token), the behavior is undefined.
-------------------------------------------------------------------------------

As you can see from the above, the standard does not even require that the
letters A-Z or a-z are in ascending order, this is only required for the
numbers. So the strictly speaking, the often used hex conversion

        unsigned hexval (char c)
        {
            if (c >= 'a' && c <= 'f') {
                return c - 'a' + 10;
            } else if (c >= 'A' && c <= 'F') {
                return c - 'A' + 10;
            } else {
                return c - '0';
            }
        }

is non portable.

Regards

        Uz

-- 
Ullrich von Bassewitz                                  uz@musoftware.de
----------------------------------------------------------------------
To unsubscribe from the list send mail to majordomo@musoftware.de with
the string "unsubscribe cc65" in the body(!) of the mail.