C syntax
A C program consists of functions and variables. C functions are like the subroutines and functions of Fortran or the procedures and functions of Pascal. The function main() is special — it is the first function invoked at the beginning of the execution of a C program. This means that every C program must have a main() function.
The main() function will usually call other functions to help perform its job. Functions may be written by the programmer, or provided by existing libraries; the latter are accessed by including "standard headers" via the #include preprocessing directive. Certain library functions, such as printf(), are defined by the C standard; these are referred to as the standard library functions. (An implementation of C providing all of the standard library functions is called a "hosted implementation"; some implementations are not hosted, usually because they are not intended to be used with an operating system. Such implementations are called "freestanding" in the C standard.)
A function may return a value to the environment which called it. This is usually another C function. The main() function's calling environment is the operating system. Typically, the return value zero of main() signifies successful completion when the program terminates. (The printf function mentioned above returns how many characters were printed, but this value is ignored by most programmers.)
A C function consists of a return type (void if no value is returned), a unique name, a list of parameters in parentheses (void if there are none) and a function body delimited by braces. The syntax of the function body is equivalent to that of a compound statement.
Control structures
Basically, C is a free-form language.
Note: bracing style varies from programmer to programmer and can be the subject of great debate ("religious wars"). See Indent style for more details.
Compound statements
Compound statements in C have the form
{ <optional-declaration-list> <optional-statement-list> }
and are used as the body of a function or anywhere that a single statement is expected.
Expression statements
A statement of the form
<optional-expression> ;
is an expression statement. If the expression is missing, the statement is called a null statement.
Selection statements
C has three types of selection statements: two kinds of if and the switch statement.
The two kinds of if statement are
if (<expression>)
<statement>
and
if (<expression>)
<statement>
else
<statement>
In the if statement, if the expression in parentheses is nonzero or true, control passes to the statement following the if. If the else clause is present, control will pass to the statement following the else clause if the expression in parentheses is zero or false. The two are disambiguated by matching an else to the next previous unmatched if at the same nesting level. Braces may be used to override this or for clarity.
The switch statement causes control to be transferred to one of several statements depending on the value of an expression, which must have integral type. The substatement controlled by a switch is typically compound. Any statement within the substatement may be labeled with one or more case labels, which consist of the keyword case followed by a constant expression and then a colon (:).
No two of the case constants associated with the same switch may have the same value. There may be at most one default label associated with a switch; control passes to the default label if none of the case labels are equal to the expression in the parentheses following switch.
Switches may be nested; a case or default label is associated with the smallest switch that contains it. Switch statements can "fall-through", that is, when one case section has completed its execution, statements will continue to be executed downward until a break statement is encountered. This may prove useful in certain circumstances, newer programming languages forbid case statements to "fall-through".
In the below example, if <label2> is reached, the statements <statements 2> are executed and nothing more inside the braces. However if <label1> is reached, both <statements 1> and <statements 2> are executed since there is no break to separate the two case statements.
switch (<expression>) {
case <label1> :
<statements 1>
case <label2> :
<statements 2>
break;
default :
<statements>
}
Iteration statements
C has three forms of iteration statement:
do
<statement>
while (<expression>);
while (<expression>)
<statement>
for (<expression> ; <expression> ; <expression>)
<statement>
In the while and do statements, the substatement is executed repeatedly so long as the value of the expression remains nonzero or true. With while, the test, including all side effects from the expression, occurs before each execution of the statement; with do, the test follows each iteration.
If all three expressions are present in a for, the statement
for (e1; e2; e3)
s;
is equivalent to
e1;
while (e2) {
s;
e3;
}
Any of the three expressions in the for loop may be omitted. A missing second expression makes the while test nonzero, creating an infinite loop.
Jump statements
Jump statements transfer control unconditionally. There are four types of jump statements in C: goto, continue, break, and return.
The goto statement looks like this:
goto <identifier>;
The identifier must be a label located in the current function. Control transfers to the labeled statement.
A continue statement may appear only within an iteration statement and causes control to pass to the loop-continuation portion of the smallest enclosing such statement. That is, within each of the statements
while (expression) {
/* ... */
cont: ;
}
do {
/* ... */
cont: ;
} while (expression);
for (optional-expr; optexp2; optexp3) {
/* ... */
cont: ;
}
a continue not contained within a nested iteration statement is the same as goto cont.
The break statement is used to get out of a for loop, while loop, do loop, or switch statement. Control passes to the statement following the terminated statement.
A function returns to its caller by the return statement. When return is followed by an expression, the value is returned to the caller of the function. Flowing off the end of the function is equivalent to a return with no expression. In either case, the returned value is undefined.
Operator precedence in C89
() [] -> . ++ -- (cast) postfix operators
++ -- * & ~ ! + - sizeof unary operators
* / % multiplicative operators
+ - additive operators
<< >> shift operators
< <= > >= relational operators
== != equality operators
& bitwise and
^ bitwise exclusive or
| bitwise inclusive or
&& logical and
|| logical or
?: conditional operator
= += -= *= /= %= <<= >>=
&= |= ^= assignment operators
, comma operator
Data declaration
Elementary data types
The values in the <limits.h> and <float.h> headers determine the ranges of the fundamental data types. The ranges of the float, double, and long double types are typically those mentioned in the IEEE 754 Standard.
| name | minimum range |
|---|---|
char |
-127..127 or 0..255 |
unsigned char |
0..255 |
signed char |
-127..127 |
int |
-32767..32767 |
short int |
-32767..32767 |
long int |
-2147483647..2147483647 |
float |
1e-37..1e+37 (positive range) |
double |
1e-37..1e+37 (positive range) |
long double |
1e-37..1e+37 (positive range) |
long long int (C99 only) |
-2^63..2^63 - 1 |
boolean (C99 only) |
false, true |
complex (C99 only) |
Arrays
If a declaration is suffixed by a number in square brackets ([]), the declaration is said to be an array declaration. Strings are just character arrays. They are terminated by a character zero (represented in C by '\0', the null character). Array bounds are not checked, and if a memory location beyond the array is written to, it may result in a segmentation fault.
Examples:
int myvector [100];
char mystring [80];
float mymatrix [3][2] = {2.0, 10.0, 20.0, 123.0, 1.0, 1.0};
char lexicon [10000][300]; /* 10000 entries with max 300 chars each. */
int a[3][4];
The last example above creates an array of arrays, but can be thought of as
a multidimensional array for most purposes. The 12 int values
created could be accessed as follows:
a[0][0] |
a[0][1] |
a[0][2] |
a[0][3] |
a[1][0] |
a[1][1] |
a[1][2] |
a[1][3] |
a[2][0] |
a[2][1] |
a[2][2] |
a[2][3] |
Pointers
If a variable has an asterisk (*) in its declaration it is said to be a pointer.
Examples:
int *pi; /* pointer to int */ int *api[3]; /* array of 3 pointers to int */ char **argv; /* pointer to pointer to char */
The value at the address stored in a pointer variable can then be accessed in the program with an asterisk. For example, given the first example declaration above, *pi is an int. This is called "dereferencing" a pointer.
Another operator, the & (ampersand), called the address-of
operator, returns the address of variable, array, or function. Thus, given the following
int i, *pi; /* int and pointer to int */ pi = &i;
i and *pi could be used interchangeably (at least
until pi is set to something else).
Strings
Strings may be manipulated without using the standard library. However, the library contains many useful functions for working with both zero-terminated strings and unterminated arrays of char.
The most commonly used string functions are:
-
strcat(dest, source)- appends the stringsourceto the end of stringdest -
strchr(s, c)- finds the first instance of charactercin stringsand returns a pointer to it or a null pointer ifcis not found -
strcmp(a, b)- compares stringsaandb(lexical ordering); returns negative ifais less thanb, 0 if equal, positive if greater. -
strcpy(dest, source)- copies the stringsourceto the stringdest -
strlen(st)- return the length of stringst -
strncat(dest, source, n)- appends a maximum ofncharacters from the stringsourceto the end of stringdest; characters after the null terminator are not copied. -
strncmp(a, b, n)- compares a maximum ofncharacters from stringsaandb(lexical ordering); returns negative ifais less thanb, 0 if equal, positive if greater. -
strncpy(dest, source, n)- copies a maximum ofncharacters from the stringsourceto the stringdest -
strrchr(s, c)- finds the last instance of charactercin stringsand returns a pointer to it or a null pointer ifcis not found
The less important string functions are:
-
strcoll(s1, s2)- compare two strings according to a locale-specific collating sequence -
strcspn(s1, s2)- returns the index of the first character ins1that matches any character ins2 -
strerror(err)- returns a string with an error message corresponding to the code inerr -
strpbrk(s1, s2)- returns a pointer to the first character ins1that matches any character ins2or a null pointer if not found -
strspn(s1, s2)- returns the index of the first character ins1that matches no character ins2 -
strstr(st, subst)- returns a pointer to the first occurrence of the stringsubstinstor a null pointer if no such substring exists. -
strtok(s1, s2)- returns a pointer to a token withins1delimited by the characters ins2. -
strxfrm(s1, s2, n)- transformss2intos1using locale-specific rules
File Input / Output
In C, input and output are performed via a group of functions in the standard library. In ANSI/ISO C, those functions are defined in the <stdio.h> header.
- fopen
- fclose
Standard I/O
Three standard I/O streams are predefined:
-
stdinstandard input -
stdoutstandard output -
stderrstandard error
These streams are automatically opened and closed by the runtime environment, they need not and should not be opened explicitly.
The following example demonstrates how a filter program is typically structured:
#include <stdio.h>
int main()
{
int c;
while (( c = getchar()) != EOF ) {
/* do various things
to the characters */
if (anErrorOccurs) {
fputs("an error eee occurred\n", stderr);
break;
}
/* ... */
putchar(c);
/* ... */
}
return 0;
}
Passing command line arguments
The parameters given on a command line are passed to a C program with two predefined variables - the count of the command line arguments in argc and the individual arguments as character arrays in the pointer array argv.
So the command
myFilt p1 p2 p3
results in something like
(Note: there is no guarantee that the individual strings are contiguous.)
The individual values of the parameters may be accessed with argv[1], argv[2], and argv[3], as shown in the following program:
#include <stdio.h>
int main(int argc, char *argv[])
{
int i;
printf ("argc\t= %i\n", argc);
for (i = 0; i < argc; i++)
printf ("argv[%i]\t= %s\n", i, argv[i]);
return 0;
}
Evaluation order
A conforming C compiler can evaluate expressions in any order between sequence points. Sequence points are defined by:
- Statement ends at semicolons.
- The sequencing operator: a comma.
- The short-circuit operators: logical and (
&&) and logical or (||). - The conditional operator (
?:): This operator evaluates its first sub-expression first, and then its second or third (never both of them) based on the value of the first.
Expressions before a sequence point are always evaluated before those after a sequence point. In the case of short-circuit evaluation, the second expression may not be evaluated depending on the result of the first expression. For example, in the expression (a() || b()), if the first argument evaluates to true, the result of the entire expression will also be true, so b() is not evaluated.
Undefined behavior
An interesting (though certainly not unique) aspect of the C standards is that the behavior of certain code is said to be "undefined". In practice, this means that the program produced from this code can do anything, from working as intended, to crashing every time it is run.
For example, the following code produces undefined behavior, because the variable b is operated on more than once in the expression a = b + b++;:
#include <stdio.h>
int main (void)
{
int a, b = 1;
a = b + b++;
printf ("%d\n", a);
return 0;
}
Because there is no sequence point between the access of b in b + b++, it is apparent the compiler could decide to increment b before or after the addition, resulting in either 2 or 3. However, to allow the compiler to make certain optimizations the standard is even more pessimistic than this. In general, any separate modification and access of a value between sequence points invokes undefined behavior.
Bibliography
- Kernighan, Brian W. and Ritchie, Dennis M. The C Programming Language.
