On DefinitionOfTypeTag and elsewhere, a "tag model" is presented as a way to categorise popular imperative programming languages and explain aspects of their TypeSystem behaviour. To do this, the notion of a "tag" is introduced and "type" is avoided.
This page also categorises imperative programming languages and explains aspects of their TypeSystem behaviour, but without introduction of any new notions such as "tags". Explanations rely on types, variables, values, and the relationships between them. The term "operators" is used for functions, procedures, or symbolic operators like '+', and reference may be made to their arguments and parameters (aka operands).
Reference to StaticTyping vs DynamicTyping, or compilation vs interpretation, is deliberately avoided to keep the explanations simple and focus on essential characteristics. Unless otherwise specified, assume that the descriptions below refer to scalar values. Again, this is to keep the explanations simple.
Note that the following categories are not intended to be a definitive list, and individual languages may belong to more than one category depending on particular language features.
== Language Category S ==
Every variable is directly associated with, or has, a type. In other words, every variable has a "type" property.
The type of a variable can be determined by explicit declaration of its type (e.g., in C, C++, C#, Java):
int x;
Or by type inference (e.g., in C#):
var x = 3;
Every value is associated with, or has, a type. E.g, integer, float, string, or some user-defined type.
The type of a value may be inferred or explicitly specified.
The following, in most languages, will be inferred to be an integer:
The following, in most C-like languages, explicitly specifies that the value is of type "double":
(double)23498
Actions
Illustration
Diagrammatically, the conceptual structure of variables and values can be shown as:
Variable is [ Value | Type ]
Value is [ Representation | Type ]
The Representation is usually a string of machine bits, bytes, or words. For example, a Representation consisting of the bit string 110110 with a Type of 'integer' is the integer value 54. The same bit string with a Type of 'character' is the ASCII character '6'. Thus, a Value is the combination of a Representation and a Type.
Illustrated via an XML notation:
Illustrated via an XML notation, showing compatible types:
Illustration of assignment to a variable:
splat = 3423;
splat = 3;
Examples
This category includes C, C++, and Java (up to at least 1.7).
C# is mainly in this category, but it has a "dynamic" type which emulates "Variable is [ Value ]" via "Variable is [ Value | dynamic ]". C#'s "dynamic" type allows assignment of any value of any type at any time, equivalent to Category D1 below. See http://msdn.microsoft.com/en-us/library/dd264736.aspx
== Language Category D1 ==
Variables are not directly associated with a type. In other words, they do not have a "type" property.
Every value has a type, such as integer, float, string, or some user-defined type.
The type of a value may be inferred or explicitly specified.
Actions
Illustration
Diagrammatically, the conceptual structure of variables and values can be shown as:
Variable is [ Value ]
Value is [ Representation | Type ]
Illustrated via an XML notation:
Illustration of assignment to a variable:
splat = 3423;
splat = "fish";
Examples
PHP is a language in this category.
== Language Category D2 ==
Variables are not directly associated with a type. In other words, they do not have a "type" property.
Every value is represented by a string of characters.
Actions
Illustration
Diagrammatically, the conceptual structure of variables and values can be shown as:
Variable is [ Value ]
Value is [ Representation ]
Note: This does not mean values have no type -- i.e., that they are uncategorised bit streams of unknown length -- but that they are always strings of characters.
Or, alternatively:
Variable is [ Value ]
Value is [ Representation | string ]
Illustrated via an XML notation:
Illustration of assignment to a variable:
splat = 3423;
splat = "fish";
Examples
ColdFusion and Perl are languages in this category.
== Notes on the Above ==
== Operator Invocation ==
In most languages, a given operator name can have multiple operator definitions -- one operator definition for each combination of operand types. Each operator definition has its own operator signature, which consists of its name, operand types, and possibly its return type. For example:
+(integer, integer) returns integer
+(string, integer) returns string
+(integer, string) returns string
+(string, string) returns string
The interpreter (or compiler) uses the operand types to determine which definition to invoke. For example, an expression like 34 + "34" can be written as +(34, "34"). Replacing the operand values with their types, we get +(integer, string). That matches the third signature above, and so we invoke its corresponding operator definition.
The definitions determine the language behaviour. For example, in the above, only the first definition performs addition. The other three perform string concatenation. Thus, the language would return 68 given 34 + 34 and "3434" given 34 + "34" or "34" + 34.
Given the following:
+(integer, integer) returns integer
+(string, integer) returns integer
+(integer, string) returns integer
+(string, string) returns string
The first three definitions attempt addition, the second and third parsing strings to determine if they represent numeric values. The last definition performs string concatenation. Thus, the language would return 68 given 34 + 34, 68 given 34 + "34", "3434" given "34" + "34", and an error given 34 + "splat" because "splat" doesn't represent an integer.
Given the following:
+(integer, integer) returns integer
+(string, string) returns string
The first definition performs addition, the second performs concatenation. Thus, the language would return 68 given 34 + 34 and "3434" given "34" + "34"; 34 + "34" is an error because there's no definition for +(integer, string).
Some languages do not distinguish operand types outside of operators and treat all values as strings, so the only signature (for "+") is effectively:
+(string, string) returns string
In such languages, when "+" is invoked it internally attempts to convert its operands to numeric values. If successful, the operator performs addition and returns a string containing only digits. If the conversion to numeric values is unsuccessful, the operator performs string concatenation on the operands and returns the result. See SignaturesAndSoftPolymorphism for further discussion.
Other combinations are possible. For example, given the following:
+(integer, integer) returns integer
+(string, integer) returns integer
+(string, string) returns string
The first two definitions attempt addition; the second parsing the string to determine if it represents a numeric value. Thus, the language would return 68 given 34 + 34, 68 given "34" + 34, "3434" given "34" + "34", an error given "splat" + 34 because parsing the string fails, and an error given 34 + "splat" because there is no operator definition for +(integer, string).
Note that the return type of operators can be used to infer expression types prior to execution. For example, given operator signatures:
+(integer, integer) returns integer
+(string, integer) returns string
+(integer, string) returns string
+(string, string) returns string
And given an expression like:
It can be represented as operator invocations:
+(+(+(34, 38), "splat"), 55)
Now replacing the operand values with their types:
+(+(+(integer, integer), string), integer)
And by replacing each operator invocation with its corresponding return value until we can't simplify further:
+(+(+(integer, integer), string), integer) =
+(+(integer, string), integer) =
+(string, integer) =
string
We can see that the expression is of type "string".
Note that how expressions are evaluated can also vary. Since those are language-specific, we'd need to consider those on a language-by-language basis. If there are de-facto standards for evaluating expressions, that can help simplify our examples, but perhaps should not be considered "universal" rules. -t
What sort of variation are you referring to? There are numerous issues not considered here, such as MultipleDispatch vs SingleDispatch and so on.
== More Formal Descriptions ==
Language Category S
Language Category D1
An example of #9 is is_numeric() in PHP.
Language Category D2
An example of #9 is cfArgument in ColdFusion.
Where would a statically type-safe FORTH-like language be categorized? No explicit types (totally inferred); no variables (just an environment)...
I'm not attempting to categorise type systems and languages in general, only show that Top's "tag model" is unnecessary. Everything he tries to explain with "tags" can be better accounted for with types, variables, and values.
See TypeSystemCategoriesInImperativeLanguagesDiscussion
See DefinitionOfTypeTag, TypeTagDifferenceDiscussion, TypeSystemCategoriesInImperativeLanguagesTwo, ThirtyFourThirtyFour, SignaturesAndSoftPolymorphism