On Tue, Sep 21, 2004 at 01:01:52AM +0100, Jonathan Graham Harston wrote: > The C standard says that C uses ASCII, regardless of what character set > encoding the underlying system uses. I'm sorry but this is plain wrong. If it would be true, C compilers for EBCDIC systems (IBM mainframes for example) would not be possible. Here are the relevant paragraphs from the ISO C standard ISO/IEC 9899:1999 (E). Please note that the standard talks nowhere about ASCII. ------------------------------------------------------------------------------- 5.2 Environmental considerations 5.2.1 Character sets 1 Two sets of characters and their associated collating sequences shall be defined: the set in which source files are written (the source character set), and the set interpreted in the execution environment (the execution character set). Each set is further divided into a basic character set, whose contents are given by this subclause, and a set of zero or more locale-specific members (which are not members of the basic character set) called extended characters. The combined set is also called the extended character set. The values of the members of the execution character set are implementation-defined. 2 In a character constant or string literal, members of the execution character set shall be represented by corresponding members of the source character set or by escape sequences consisting of the backslash \ followed by one or more characters. A byte with all bits set to 0, called the null character, shall exist in the basic execution character set; it is used to terminate a character string. 3 Both the basic source and basic execution character sets shall have the following members: the 26 uppercase letters of the Latin alphabet A B C D E F G H I J K L M N O P Q R S T U V W X Y Z the 26 lowercase letters of the Latin alphabet a b c d e f g h i j k l m n o p q r s t u v w x y z the 10 decimal digits 0 1 2 3 4 5 6 7 8 9 the following 29 graphic characters ! " # % & ' ( ) * + , - . / : ; < = > ? [ \ ] ^ _ { | } ~ the space character, and control characters representing horizontal tab, vertical tab, and form feed. The representation of each member of the source and execution basic character sets shall fit in a byte. In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous. In source files, there shall be some way of indicating the end of each line of text; this International Standard treats such an end-of-line indicator as if it were a single new-line character. In the basic execution character set, there shall be control characters representing alert, backspace, carriage return, and new line. If any other characters are encountered in a source file (except in an identifier, a character constant, a string literal, a header name, a comment, or a preprocessing token that is never converted to a token), the behavior is undefined. ------------------------------------------------------------------------------- As you can see from the above, the standard does not even require that the letters A-Z or a-z are in ascending order, this is only required for the numbers. So the strictly speaking, the often used hex conversion unsigned hexval (char c) { if (c >= 'a' && c <= 'f') { return c - 'a' + 10; } else if (c >= 'A' && c <= 'F') { return c - 'A' + 10; } else { return c - '0'; } } is non portable. Regards Uz -- Ullrich von Bassewitz uz@musoftware.de ---------------------------------------------------------------------- To unsubscribe from the list send mail to majordomo@musoftware.de with the string "unsubscribe cc65" in the body(!) of the mail.Received on Tue Sep 21 21:44:49 2004
This archive was generated by hypermail 2.1.8 : 2004-09-21 21:44:58 CEST