Type System Categories In Imperative Languages

On DefinitionOfTypeTag and elsewhere, a "tag model" is presented as a way to categorise popular imperative programming languages and explain aspects of their TypeSystem behaviour. To do this, the notion of a "tag" is introduced and "type" is avoided.

This page also categorises imperative programming languages and explains aspects of their TypeSystem behaviour, but without introduction of any new notions such as "tags". Explanations rely on types, variables, values, and the relationships between them. The term "operators" is used for functions, procedures, or symbolic operators like '+', and reference may be made to their arguments and parameters (aka operands).

Reference to StaticTyping vs DynamicTyping, or compilation vs interpretation, is deliberately avoided to keep the explanations simple and focus on essential characteristics. Unless otherwise specified, assume that the descriptions below refer to scalar values. Again, this is to keep the explanations simple.

Note that the following categories are not intended to be a definitive list, and individual languages may belong to more than one category depending on particular language features.

== Language Category S ==

Every variable is directly associated with, or has, a type. In other words, every variable has a "type" property.

The type of a variable can be determined by explicit declaration of its type (e.g., in C, C++, C#, Java):


        int x;

Or by type inference (e.g., in C#):


        var x = 3;

Every value is associated with, or has, a type. E.g, integer, float, string, or some user-defined type.

The type of a value may be inferred or explicitly specified.

The following, in most languages, will be inferred to be an integer:

The following, in most C-like languages, explicitly specifies that the value is of type "double":


        (double)23498

Actions

Assignment to a variable: The variable's type is used to ensure that it can only be assigned values whose type is compatible. A language implementation does this by throwing an exception or generating an error if an attempt is made to assign a value whose type is not compatible with the variable's type. In some languages, "compatible" means the value's type must strictly be the same as the variable's type. In other languages, "compatible" means the value's type must be the same as or a subtype of the variable's type. Depending on the language, other definitions of "compatible" are possible. Compatible assignments of values replace the variable's current value.

Invocation of operators: Value types are used by the language to select the compatible specific operator when operators are polymorphic, and to ensure that argument types are compatible with operators' parameter types. See the "Operator Invocation" subsection below for more detail.

Illustration

Diagrammatically, the conceptual structure of variables and values can be shown as:


        Variable is [ Value | Type ]

        Value	is [ Representation | Type ]

The Representation is usually a string of machine bits, bytes, or words. For example, a Representation consisting of the bit string 110110 with a Type of 'integer' is the integer value 54. The same bit string with a Type of 'character' is the ASCII character '6'. Thus, a Value is the combination of a Representation and a Type.

Illustrated via an XML notation:

Illustrated via an XML notation, showing compatible types:

Illustration of assignment to a variable:


        splat = 3423;


        splat = 3;

Examples

This category includes C, C++, and Java (up to at least 1.7).

C# is mainly in this category, but it has a "dynamic" type which emulates "Variable is [ Value ]" via "Variable is [ Value | dynamic ]". C#'s "dynamic" type allows assignment of any value of any type at any time, equivalent to Category D1 below. See http://msdn.microsoft.com/en-us/library/dd264736.aspx

== Language Category D1 ==

Variables are not directly associated with a type. In other words, they do not have a "type" property.

Every value has a type, such as integer, float, string, or some user-defined type.

The type of a value may be inferred or explicitly specified.

Actions

Assignment to a variable: Variables may be assigned any value of any type at any time.

Invocation of operators: Values' types are used by the language to ensure that the compatible specific operator -- if from a set of polymorphic operators -- is selected when invoked, and to ensure that values are compatible with operator parameters. Notably, many Category D1 languages do not expose this functionality to the language user, but it is used internally to (for example) dispatch the correct built-in "+" operator: the one implementing numeric addition if the operands are numeric, or the one implementing string concatenation if the operands are not numeric. See the "Operator Invocation" subsection below for more detail.

Illustration

Diagrammatically, the conceptual structure of variables and values can be shown as:


        Variable is [ Value ]

        Value	is [ Representation | Type ]

Illustrated via an XML notation:

Illustration of assignment to a variable:


        splat = 3423;


        splat = "fish";

Examples

PHP is a language in this category.

== Language Category D2 ==

Variables are not directly associated with a type. In other words, they do not have a "type" property.

Every value is represented by a string of characters.

Actions

Assignment to a variable: Variables may be assigned any value at any time.

Invocation of operators: Operators perform parsing as needed to determine whether each argument value (which is a string of characters) represents an integer, number, date, etc. See the "Operator Invocation" subsection below for more detail.

Illustration

Diagrammatically, the conceptual structure of variables and values can be shown as:


        Variable is [ Value ]

        Value	is [ Representation ]

Note: This does not mean values have no type -- i.e., that they are uncategorised bit streams of unknown length -- but that they are always strings of characters.

Or, alternatively:


        Variable is [ Value ]

        Value	is [ Representation | string ]

Illustrated via an XML notation:

Illustration of assignment to a variable:


        splat = 3423;


        splat = "fish";

Examples

ColdFusion and Perl are languages in this category.

== Notes on the Above ==

In all three language categories, the only observable run-time state change is via variable assignment. Given a , variable assignment replaces only . There is no other apparent state change.

In all three language categories, operators may interpret their operands as they see fit, including recognising values of various types that may be encoded within their operand values. For example, in PHP, the "is_numeric()" operator may be used to test whether or not its operand is numeric, which can include both operands of numeric type and numeric strings. (E.g., 123 is of numeric type, "123" is a numeric string.) In ColdFusionLanguage, can be used inside a function definition to reject invocation if the argument (which is always a character string) does not match the type named in the 'type' attribute.

== Operator Invocation ==

In most languages, a given operator name can have multiple operator definitions -- one operator definition for each combination of operand types. Each operator definition has its own operator signature, which consists of its name, operand types, and possibly its return type. For example:


        +(integer, integer) returns integer

        +(string, integer) returns string

        +(integer, string) returns string

        +(string, string) returns string

The interpreter (or compiler) uses the operand types to determine which definition to invoke. For example, an expression like 34 + "34" can be written as +(34, "34"). Replacing the operand values with their types, we get +(integer, string). That matches the third signature above, and so we invoke its corresponding operator definition.

The definitions determine the language behaviour. For example, in the above, only the first definition performs addition. The other three perform string concatenation. Thus, the language would return 68 given 34 + 34 and "3434" given 34 + "34" or "34" + 34.

Given the following:


        +(integer, integer) returns integer

        +(string, integer) returns integer

        +(integer, string) returns integer

        +(string, string) returns string

The first three definitions attempt addition, the second and third parsing strings to determine if they represent numeric values. The last definition performs string concatenation. Thus, the language would return 68 given 34 + 34, 68 given 34 + "34", "3434" given "34" + "34", and an error given 34 + "splat" because "splat" doesn't represent an integer.

Given the following:


        +(integer, integer) returns integer

        +(string, string) returns string

The first definition performs addition, the second performs concatenation. Thus, the language would return 68 given 34 + 34 and "3434" given "34" + "34"; 34 + "34" is an error because there's no definition for +(integer, string).

Some languages do not distinguish operand types outside of operators and treat all values as strings, so the only signature (for "+") is effectively:


        +(string, string) returns string

In such languages, when "+" is invoked it internally attempts to convert its operands to numeric values. If successful, the operator performs addition and returns a string containing only digits. If the conversion to numeric values is unsuccessful, the operator performs string concatenation on the operands and returns the result. See SignaturesAndSoftPolymorphism for further discussion.

Other combinations are possible. For example, given the following:


        +(integer, integer) returns integer

        +(string, integer) returns integer

        +(string, string) returns string

The first two definitions attempt addition; the second parsing the string to determine if it represents a numeric value. Thus, the language would return 68 given 34 + 34, 68 given "34" + 34, "3434" given "34" + "34", an error given "splat" + 34 because parsing the string fails, and an error given 34 + "splat" because there is no operator definition for +(integer, string).

Note that the return type of operators can be used to infer expression types prior to execution. For example, given operator signatures:


        +(integer, integer) returns integer

        +(string, integer) returns string

        +(integer, string) returns string

        +(string, string) returns string

And given an expression like:

+ 38 + "splat" + 55

It can be represented as operator invocations:


        +(+(+(34, 38), "splat"), 55)

Now replacing the operand values with their types:


        +(+(+(integer, integer), string), integer)

And by replacing each operator invocation with its corresponding return value until we can't simplify further:


        +(+(+(integer, integer), string), integer) =

        +(+(integer, string), integer) =

        +(string, integer) =

        string

We can see that the expression is of type "string".

Note that how expressions are evaluated can also vary. Since those are language-specific, we'd need to consider those on a language-by-language basis. If there are de-facto standards for evaluating expressions, that can help simplify our examples, but perhaps should not be considered "universal" rules. -t

What sort of variation are you referring to? There are numerous issues not considered here, such as MultipleDispatch vs SingleDispatch and so on.

== More Formal Descriptions ==

Language Category S

Let S be a character sequence, which is an ordered collection of characters from a character set

Let T be a type, which is defined here to be a set of character sequences T = {S1 .. Sn}

Axiom: For every S, there exists at least one type T such that S ∈ T, or S is considered invalid, i.e., every valid S is an element of at least one T

Let V be a value, defined as V = (S, T) such that S ∈ T

Let R be a variable, defined as R = (V, T). Assignment of a new value V' to R is only allowed if V'(T) is compatible with R(T)

Let C be source code for a program, which contains character sequences S1 .. Sn

Let M be the computer memory in which a program C runs

Then, for every Si in C, the interpreter finds a T where Si ∈ T and converts Si to a value Vi in M such that Vi(T) = T and Vi(S) = Si

Then, given a value V(S, T) in M and a type T, program C can determine if V(T) = T or V(S) ∈ T

Language Category D1

Let S be a character sequence, which is an ordered collection of characters from a character set

Let T be a type, which is defined here to be a set of character sequences T = {S1 .. Sn}

Axiom: For every S, there exists at least one type T such that S ∈ T, or S is considered invalid, i.e., every valid S is an element of at least one T

Let V be a value, defined as V = (S, T) such that S ∈ T

Let R be a variable, defined as R = (V). Assignment of a new value V' to R can be done at any time with any V'

Let C be source code for a program, which contains character sequences S1 .. Sn

Let M be the computer memory in which a program C runs

Then, for every Si in C, the interpreter finds a T where Si ∈ T and converts Si to a value Vi in M such that Vi(T) = T and Vi(S) = Si

Then, given a value V(S, T) in M and a type T, program C can determine if V(T) = T or V(S) ∈ T

An example of #9 is is_numeric() in PHP.

Language Category D2

Let S be a character sequence, which is an ordered collection of characters from a character set

Let T be a type, which is defined here to be a set of character sequences T = {S1 .. Sn}

Axiom: For every S, there exists at least one type T such that S ∈ T, or S is considered invalid, i.e., every valid S is an element of at least one T

Let V be a value, defined as V = (S)

Let R be a variable, defined as R = (V). Assignment of a new value V' to R can be done at any time with any V'

Let C be source code for a program, which contains character sequences S1 .. Sn

Let M be the computer memory in which a program C runs

Then, for every Si in C, the interpreter converts Si to a value Vi in M such that Vi(S) = Si

Then, given a value V(S) in M and a type T, program C can determine if V(S) ∈ T

An example of #9 is cfArgument in ColdFusion.

Where would a statically type-safe FORTH-like language be categorized? No explicit types (totally inferred); no variables (just an environment)...

I'm not attempting to categorise type systems and languages in general, only show that Top's "tag model" is unnecessary. Everything he tries to explain with "tags" can be better accounted for with types, variables, and values.

See TypeSystemCategoriesInImperativeLanguagesDiscussion

See DefinitionOfTypeTag, TypeTagDifferenceDiscussion, TypeSystemCategoriesInImperativeLanguagesTwo, ThirtyFourThirtyFour, SignaturesAndSoftPolymorphism

AugustThirteen