|
|
|
Besides references, arrays, and variants, which are described later, QDL has four classes of built-in data types:
All QDL built-in types, even the ones not listed in this section, belong to the root (or Null) namespace. Namespaces are discussed here.
All variables, except those in function headers, are declared using the var-statement. Its syntax is
var-statement: [Final | Static | Automatic | attribute-set]* variable-list;
The syntax of a variable-list is shown here, but basically it's one or more variable names, separated by commas, followed by a colon, followed by the data type, optionally followed by constructor arguments (to initialize the variables.) or an array initialization list. Here are some simple examples:
X: Integer;
X, Y: Double;
Hello, Hello2: String ("Hello");
The text before the variable names makes up the modifiers. There can be only one of Static and Automatic. (the two are mutually exclusive.) The var-statement is differentiated from other statements in the language by the following properties:
A statement with these properties can be treated as a var-statement by the compiler.
The Static keyword means one of two things, depending on the context. Inside a Class statement, Static indicates that the variable(s) are placed in the static scope, so there is just one copy of the variable for the class. At function scope, Static means that the variable(s) has a scope local to the function, but are given a permanent storage location so that the value(s) of the variable(s) persist between calls to the function. At global scope, Static cannot be used. If you want the name-hiding functionality provided by C with its static keyword, you should use a private namespace or inner class instead.
The Automatic keyword has one of two meanings. If used on a variable inside a function, it indicates that the variable should be allocated temporary storage on the stack. This is the default, so using Automatic in this fashion is redundant. At global scope, or within an Implement or Class block, this keyword declares stack variables that should be automatically allocated to every function that follows it within the scope in which it is declared. For example:
Automatic J, K, N, Count: Integer;
. . .
Function ShowTimesTable() Do
For J = 1; J <= 10; J++ Do {
For K = 1; K <= 10; K++ Do
StdOut.Print J * K + " ";
StdOut.Print "\n";
}
End;
Normally, the function would have to declare J and K within the function to use them. However, the Automatic statement allows any function afterward to use the variables without declaring them.
Variables declared with Automatic can have constructor arguments, but those arguments must not refer to other Automatic variables, or variables out of the scope of the Automatic var-statement. The meaning of the initializer for an automatic variable is evaluated in the context where the Automatic var-statement is compiled. In the following code:
X: Integer; Automatic Y: Integer (X); Class Z: X: Integer; Function F Do StdOut.Print Y; End; End Class;
This would have the consequence that the value of global X, not Z::X, is printed.
Storage space is not allocated for a variable until its name is used in a function. In cases where another variable or function has the same name as an Automatic variable, the Automatic variable is not allocated. For example:
Automatic A, B, C: Integer(6); // Line 1
A: Integer(3); // Line 2
Function TheFunc() Do // Line 3
B: Integer(12); // Line 4
Function SubFunc() // Line 5
{ // Line 6
StdOut.Print B + " "; // Line 7
C = 100; // Line 8
} // Line 9
StdOut.Print A; // Line 10
SubFunc; // Line 11
StdOut.Print C + " "; // Line 12
} // Line 13
Output: 3 12 6
Notice that A and B are explicitly declared in a scope accessible to TheFunc and SubFunc, so the automatic variables are not used.
When the compiler automatically creates a variable, it is constructed in one of two places: at the very beginning of the function, or immediately after all the subfunctions. Normally it is constructed after the subfunctions, but if the variable is used in the construction of a variable located before a subfunction, then it must be constructed before that variable, hence at the very beginning of the function. In this example, C is used in both TheFunc and SubFunc (lines 12 and 8), but these functions get different copies of C. Consequently the assignment of C = 100 in SubFunc has no effect on the output on line 12. The rationale for this compiler behavior is that, if variable sharing between outer and inner functions is desired, the programmer should explicitly say so. Since most automatic variables will probably be used for loop indexing, it is natural to expect the index variable to be local to a given function, not shared.
In files included with Uses, Automatic variables in the included file do not affect the main file.
The Final keyword can be used only with a variable in a Class, and is always redundant because all variables are implicitly Final. It means that a variable with the same name cannot be created in a derived class.
Integers, for those of you who haven't got to high school yet, are numbers that do not contain a fractional component. Integers generally use less memory than floating point numbers, and are faster to process. Floating-point numbers are numbers that can have a fractional component, and are so named because the position of the decimal point is variable.
When C first came out, the idea of having a non-size-specific int type was a better idea than it is now, because various machines had word sizes unheard of today, like, say, 9, 12 or 20 bits. But today's platforms are so similar now that programmers often make assumptions like, "a char is one byte, a short is two bytes, a long is four bytes, and chars are signed by default." Programmers will always think like this and there's nothing that can be done about it; on the other hand, the language can be changed to accomodate programmers.
QDL has a non-size-specific Integer type, guaranteed to be at least 24 bits (it's 32 bits in my implementation), but it also has size-specific types, which you are encouraged to use when size matters: Int8, Int16, Int32, and Int64. All of these are signed; the unsigned versions are prefixed with a "U": UInteger, UInt8, UInt16, UInt32, and UInt64. The Integer type should be of a size optimal for processing on the code's target platform, so when size doesn't matter, use the Integer type.
Because C doesn't have a byte type, programmers the world over have been using unsigned char to mean 8-bit number. Bad! Now, I usually stuck a "typedef unsigned char byte;" somewhere at the top of my code. With QDL you don't need to because you get a UInt8. QDL does have a Char type, though, and it is treated as a number as in C, or as a length-one string, depending on the context. It's designed with the great nations of China, Russia, Greece et al in mind, though: Char, by default, is a 16-bit type, so it can hold Unicode characters. The types Char8 and Char16 are also predefined in QDL.
Char, Integer and UInteger are called the "non-numbered" types; all other integral types are "numbered".
In QDL, integral types with different names, but of the same size and signed/unsigned status are considered to be different types (although implicit conversions can be made from one type to another.) One example of the effect of this is that the following functions are are not ambiguous:
Function Hello (N: UInt16) Function Hello (N: Char) Function Hello (N: Char16)
The compiler should issue a warning when implicitly converting to a smaller integer type.
When two different integral types are used in an expression, one of the values must be converted to the type of the other. This is done according to the promotion rules.
Character literals have the following syntax:
char-constant: ('literal-char'|'literal-charliteral-char')
Where a literal-char represents a character. In the version of char-constant that uses two literal-chars, does not represent anything; it is simply there to separate the italicized words. This version constructs a UNICODE character from two ANSI characters; if either of the characters are UNICODE, a compile-time error is generated. The first character is the low character, and the second character is the high character.
A literal-char is either
Rule: The compiler should give a character constant the smaller character type if its code is below 256, and prefer to give it the Char type over the equivalent numbered type. If Chars are 16-bit, constants with a code below 256 are Char8s; all other constants are Chars.
Rule: In any calculation that will result in a constant, such as 'A' * 'A', the compiler will prevent overflow from occurring by promoting both constants as necessary. If the result cannot be represented in Char16 because it is too large or is signed, the compiler should issue a warning and promote the values to an appropriate integral type.
For the purposes of the above rule, a Const variable is not a constant, even if its value is known at compile-time.
Integer constants have the following syntax:
integral-constant: (0xh | d)[u]
Where d is a series of one or more digits (0 to 9), and h is a series of one or more hexadecimal digits (0 to 9, A to F). The presence of the u at the end indicates that the constant is unsigned. Any non-digit character, including whitespace, is considered to delimit the integer. A - or + in front of the integer is considered a separate unary operator. An integer constant must be below 263, unless the u is present, in which case it must be below below 264.
The compiler should issue a warning if a integral constant expression is implicitly coerced to a lower-precision type.
Rule: The compiler should give an integer constant the lowest-precision type that can hold the number according to the precision list shown in the promotion rules, using a signed type if u is not present and an unsigned type if it is. The use of a non-numbered type over a numbered type of the same size is preferred. The compiler cannot assign a character type to an integer constant.
In any calculation involving two constants, where:
- both have an integral type, or
- one has an integral type and the other has a character type,
The values are both promoted to a type that will prevent overflow when the calculation is performed. If this cannot be accomplished, the compiler should use the Int64 or UInt64 type and issue a warning. In a binary calculation, the type should be signed only if both input values are signed.
For the purposes of the above rule, a Const variable is not a constant, even if its value is known at compile-time.
There are two floating-point types, Single and Double, indicating the size and degree of precision of the numbers stored therein. Single is typically four bytes and Double is typically eight bytes.
Floating-point constants, which are identified by the presence of a decimal point (.) or exponent specification (beginning with the letter e), are always considered to be of type Double, although they may be implicitly coerced, without a warning, to Single if required to compile the expression in which they are located.
To get technical, the syntax of floating-point constants is as follows:
exponent-spec: (e | E)[(+|-)]d
double-constant: d(exponent-spec | .d[exponent-spec])
Where d is a series of one or more digits. Notice that a digit is required both before and after the decimal point. The number after the e specifies the exponent, which is a power of ten by which to multiply the first number.
This is the simplest data type in QDL, and uses the smallest addressible piece of memory available on the target architecturetypically one byte. It can only hold two values, identified by the keywords True and False. When converting from Boolean to another built-in type, True has the value 1, 1.0 or "True", while False has the value 0, 0.0, or "False". When converting from an integral or floating-point type to Boolean, 0 is translated to False, and any other value is translated to True. When converting from a String to Boolean, True is used if:
Otherwise, False is used.
If the piece of memory containing the Boolean is larger than one bit, the compiler may optionally ignore the other bitsor notfor optimization reasons. When setting the value of a Boolean, the compiler should set the unused bits to zero.
It would have been possible to implement the string type as a QDL Class. In fact, I've made up a class declaration that describes it. However, I made it a built-in type to allow better optimization of string access, and so that string literals could have the String data type. Strings can be accessed with like an array of Chars; the first character in the string has an index of zero.
Here is a list of the functions in the string type.
Examples of valid integers, assuming a magnitude char of ',': "0", " 2 ", "-1,234,567,890,123,456,789", "+1,2,3,4", "50000001", "0x89AB", "-0"
Examples of invalid integers: "", "2.1", "$1000", "1234 abcd", "--1", "-", "hello", "True", "1 2 3"
Examples of valid floating-point numbers, assuming a magnitude char of ',': "12,345.67890", "3e8", "+2.3e-12", "0.0", ".5", "5.", "5.e2", "0xA.B" (by the way, this represents the number 0xAB/0x10, or 10.6875.)
Examples of invalid floating-point numbers: ".", "-.", "e5", "1.2.3", "3.14159 Bite Me", "12e1.5", "1e2e3", "1 . 0 e 3"
Both IsInteger and IsReal recognize numbers that are formatted in accordance with the settings specified in the Regional Settings built-in class.
In this discussion, Char, Char8 and Char16 are not considered to be numeric types.
Integral types, floating-point types, Booleans, and characters are implicitly converted to strings when used in expressions involving another string.
When converting implicitly from a numeric type to a string, it is done according to the settings in the built-in Regional Settings class, except that but the magnitude separator character is not used. When explicitly converting to a string, you can specify additional formats as an argument to the type-cast function. The next section uses type casts, which are described here.
If N is a numeric variable,
Please note that, when converting a floating-point number to a string, a decimal point is not guaranteed to be present.
When converting from a string to an integral type, the string must be in one of the following formats:
where w represents zero or more space (ASCII 32) characters, and D represents one or more digits, in the midst of which there may be magnitude separator characters. h represents one or more hexadecimal digits, each of which can be a number from 0 to 9, or letter from A through F, in upper or lower case. If there is an error parsing the string, only the portion of the left side of the string that can be identified as a valid number is parsed. If none of the left side can be parsed into a number, 0 is the result.
When converting from a string to a floating-point type, the string must be in one of the following formats:
In addition to the D and h used in integers, floating-point numbers can contain:
As with integer parsing, when there is an error parsing the string, as much of the string that can be parsed successfully is used in the conversion to a number. Examples:
When converting from a Char, Char8 or Char16 to String, the result is simply the character in a length-one string.
When converting from a String to a Char, Char8 or Char16, the first character of the string is used as the character. If the String is zero-length, 0 is used for the character. If the character is of type Char8, the high 8 bits are discarded in the conversion.
String literals have the following syntax:
string-constant: "literal-char*"
String literals may not contain comments; the characters that start comments would be considered part of the string. String literals may not span multiple lines. However, like in C, Two or more consecutive string literals are concatenated at compile-time and treated as one. For example, "He" "llo" has the same meaning to the compiler as "Hello". This provides a way to put a string on multiple lines. If a line ends without ending the string, a compile-time error is generated.
There is a definite reason the built-in String type must be built into the language: In QDL, string literals are acted upon as if they were Const String variables, and operations can be evaluated at compile time. Thus, it is possible to do weird things like:
String1 = " Hello ".Trim; @ConstStringRef = @"Hmmm"; String2 = "Con" + "catenation";
The language has no way of allowing user-defined constant types, and certainly could not perform user-defined operations at compile-time.
You can even do this sort of thing in preprocessor statements.
Most operators don't work on strings. Only the following operators have meaning with strings:
Strings may have a maximum length (implementation-dependant), but it should be high enough that you can hold any reasonable-size string (my implementation supports strings up to 64K-1 characters in length.)
* This section will need clarification after the garbage collection/memory management thing is established *
Any implementation of the String data type must be able to tolerate relocation without notification. In other words, it should be possible to move instances of the String class in memory (e.g. in a linear array class), without introducing the possibility of data corruption.
If non-relocatable Strings can be implemented more efficiently, the compiler may provide a switch to allow the programmer to use them. In this case, any symbols the compiler may use in object files to represent run-time string management routines should be different for relocatable than for non-relocatable strings, so that object files compiled with a different switch setting do not conflict directly.
| Table of Contents | Qwertie's Site/Mirror |