For this week, here's the plan, Fran…
|
REALLanguage Representations
What is a real
programming language? Many people seem to feel that the real-ness
of a language is based on what facilities it may have [closures, for-each loops, records, etc.], while
others seem to think that basically ANY language that you can use to program
a computation engine
[SQL? BASIC? VBA? HP calculator language?] is a real language. The moniker is open for debate,
certainly. Here is a list of
one person's ideas of different programming languages and what they are useful for.
Warning: This page will have lots of code examples. There were all tested on a Windows 10 Alienware PC with gcc version 2.95.2 which is really, really, REALLY REALLYold, but still works for our purposes. I have no idea if it's C90, C99, or what…
First of all, in our case, there are several things to consider, not the least of which is, when we
are designing our software
, from what perspective do we start? Should we
start at a high level and work our way deeper, drilling down
into the details? Or should we
begin at the bottom with the hardware functions and work our way upwards, combining the functions as we
go to get to the full application?
The first of these is known as a Actually, there is need for both of these. Top down is good when you have a high-level idea of what the software should do, and you are working to design the parts that will create and support that required functionality. In this case you would want to use a high level language, one which is the best fit for the task at hand. For example, you wouldn't write a web page using C#, and you wouldn't write a machine learning algorithm in HTML. Bottom up is good for when you know what much [or most] of the underlying functionality will be, but you don't know yet how that will be implemented at a higher level. An example of this would be a scientific application in which you know that you will need several high-order functions to process very large data sets, but you're not sure just what the data outcomes will be so you don't know for certain how to make the presentation of that output appear to the user. We are going to investigate this from both directions in this class, using the |
C
is a very popular systems programming language, and by some measures might be the most popular
programming language, period [Python and JavaScript notwithstanding]. Every developer needs to know
C
, whether or not she uses it – it's a kind of lingua franca,
and as most high-level languages's runtime systems are written in C
, knowing it helps you deal
with the problem of
leaky
abstractions.
There are five official versions
of C
:
Name | Date | Description/Notes/Comments |
---|---|---|
K&R C | 1972 | Never officially standardized. Pretty much dead now. |
C90/ANSI C/C89 | 1989 | Practically a different language from K&R; still in use. |
NA1 | 1995 | Actually not considered a "real" version; just C90 with minor clarifications and new library modules. |
C99 | 1999 | A pretty big revision from C90; still widely used. |
C11 | 2011 | The current standard; a pretty small step up from C99. |
Note: C99 is not upwardly compatible with C90, but a lot of people still use C90. David Tribble has a great page describing the differences between C90, C99 and C++.
There is an awesome C
portal at
http://www.lysator.liu.se/c/.
Perhaps the most important references you'll find there are the IOCCC [don't miss the
Winners Page],
Rob Pike's notes on C
programming, and
Steve Summit's C programming course notes.
Also, you have to have the Kernighan and Ritchie book, even though it only covers C90.
Please check out this
talk by Dennis Ritchie, one of the creators of the language. Very enlightening about C
, and
that even the creator of the language gets things wrong when writing programs. [approx. 1 hour]
A C
program is spread over one or more files. Each file is a sequence of declarations. A
declaration is either:
In the best Dr. Toal
style, we'll start out with some examples, then get into the nitty-gritty
details. That way you get more cluck for your buck.
Traditionally, as you no doubt remember, the first program that you always write in any language you are
learning is the Hello World
program. Here it is in C
:
1 #include <stdio.h> 2 3 int main() 4 printf( "Hello, World!\n" ); 5 return 0; 6 }
Not to belabor the obvious, since you all know Java, JavaScript, Python, and probably a couple of other languages, but this is how the code breaks down:
main
which is the entry point to the program, just like Java;
this is the way the operating system can figure out where to start the program, much like the JVM is
told where to start with public static void main( String [] args )
Cfunction from the
stdiolibrary to print to the screen
OK, now we've written the source code, so we have to get it ready to run. Since C
is truly
a compiled
language, we have to run the compiler on our source code. The compiler, as you may
remember, is a translator program that turns our human-readable source code
into the object code that we need to execute. Object code is the machine
code that will be used, but it isn't a complete program yet, so it can't run on its own. We still need
another step to make a complete runnable program.
To compile the program we need to run gcc
[which is the short name for the Gnu
Compiler Collection]. Assuming that your source code is saved in the file named
hello.c
, compilation is done using the following line:
gcc hello.c
In this case, because the gcc
program is smart, it will automatically compile and then will
perform the next step for us, linking our object code with the object code from the C
standard
libraries to make a complete runnable program. The result is a program file called a.exe
on Windows and a.out
just about everywhere else.
To run the program on Windows, you simply type a
and press the Enter
key.
To run the program on a Mac or on UNIX/Linux, you type ./a.out
.
The gcc
program has several options that you can use on the command line to affect its
operation. Here are a few:
Notes on the a.exe
and a.out
thing: If you don't tell gcc what to name the executable
output file, it defaults to a
with whatever the appropriate extension is for the operating
system on which it is creating the application. There needs to be SOME name for the output file, so
the convention of using the letter a
has traditionally been used. You can re-name the output
file to whatever you want, as we'll see here.
gcc -xc hello.c | [to tell gcc to compile C code, not C++ – this automatically
happens if the file extent is a .c] |
gcc -std=c99 hello.c | [to use C99] |
gcc -c hello.c | [to compile, but not link, the program] |
gcc -S hello.c | [to translate to assembly language only] |
gcc -o hello hello.c | [to name the output file something other than the default name] |
gcc -o hello.exe hello.c | [if you are using a Windows computer] |
Don't forget to try these out and see what happens!
These look just like Java, but with a couple of twists you'll see that you MAY not be used to. Here is a sample:
/* * file: triple.c */ #include <stdio.h> int main() { int a, b, c; // declare three integers printf(" A B C\n"); // header line 1 printf("------------------\n"); // header line 2 for (c = 1; c <= 100; c++) { for (b = 1; b <= 100; b++) { for (a = 1; a <= 100; a++) { if (a * a + b * b == c * c) { printf("%6d%6d%6d\n", a, b, c); } else { // nothing here, just to show 'else' } } } } return 0; }
This works in C90. In C99, you can declare a, b, and c
directly in the for
statements themselves, instead of separately.
Notice the comment style… but remember you can't nest them!
Investigation: Can you figure out what this code does? Why does it work the way it does?
Investigation: In a throwback to your Data Structures class, can you figure out what the algorithm run time of this program would be?
Investigation: Notice there are no initial values assigned for the three integers when they are declared. Are they automatically assigned a value?
Investigation: What is the return
statement doing at the end of the program? If this is a main
program, where is it returning?
Note: we'll get to the %6d%6d%6d
stuff a little later…
The C
language is statically typed, just like Java. This means you have to define everything
before you can use it. Java, as you know, allows you to define variables all over the place, and the
Java compiler can [usually] figure out your intent. Not so with C
. You must define variables,
functions, constants, in short, everything before it gets used. This is one
reason why most of the declarations that appear in a source file are at or near the top of the file.
The following example should make this clear:
/* * A program that makes an approximation to pi by generating * a million random points in the unit square and computing * the ratio of those inside the unit circle to the total * number in the square. That value should be pretty close * to Pi/4. The program displays the approximation as well * as the actual value to 10 digits. * * BTW, this should look REALLY FAMILIAR from CMSI 1010... */ #include <stdio.h> #include <math.h> #include <stdlib.h> #include <time.h> // Not the best; should be a command line argument! #define NUMBER_OF_DARTS 1000000 // Returns the 'c-squared' value of 'x' and 'y' double squareOfDistanceToOrigin( double x, double y ) { return x * x + y * y; } // Returns a random value in [-1..1] double randomValue() { return 2.0 * rand() / RAND_MAX - 1.0; } // The main program 'entry point' int main() { int i; int inside = 0; srand( time(0) ); for (i = 0; i < NUMBER_OF_DARTS; i++) { double x = randomValue(); double y = randomValue(); if (squareOfDistanceToOrigin(x, y) < 1.0) { inside++; } } printf( "Pi [est.]: %12.10f\n", 4.0 * ((double)inside / NUMBER_OF_DARTS) ); printf( "[actual to 10 digits is %12.10f)\n", M_PI ); return 0; }
#include
and #define
statements
NUMBER_OF_DARTS
is defined before it is used
unit squarewe don't need the square root to check if the distance from the origin is less than one
x
and y
inside the loop are LOCAL TO THE LOOP'S SCOPE
return 0
returns control to the operating system at the end of the program; the
0
value is a convention that indicates successful completion that
dates back to the earliest days of UNIX programming.
srand( time(0) )
function calls seedthe random number generator with the current time; this insures a different set of random numbers for each program run
srand( time(0) )
function calls ALSO show that functions are treated in Cas expressions that return a value, just like Java
\] at the end of the line
printf()
inside
in the calculation
C arrays are extraordinarily primitive. They do not not know how big they are, so you can read and write beyond the array bounds and the compiler will not barf, but your program might. For example:
int f() { int x; int a[4]; int y; int z = 23; a[5] = 100; return y; }
This code might return 100
, or it might return some other random value, since
y
might be written over in memory by the compiler. And a[-1]
is probably the
same variable as x.
You can also write things like a[x]
, x[a]
, a[3]
and
3[a]
. Trying to read or write a[234523132]
will probably crash your program,
though. Remember:
rememberhow big you created your array
Here's a program with arrays
/* * A program that displays all the prime numbers up to and * including 1000, using the famous algorithm of Erathostenes. * This is a C99 program, not a C90 program. * * The purpose of the program is only to illustrate arrays * assuming one has not yet seen pointers or command line * arguments, so it isn't very good. */ #include <stdio.h> #include <stdbool.h> // To get primes up to and including 1000, the sieve has // to have a slot at index 1000. But indices must start // at 0, so there have to be 1001 slots in the array. #define SIZE 1001 // Fills the first n slots of array s with the given value. void fillArray( bool s[], bool value, int n ) { for( int i = 0; i < n; i++ ) { s[i] = value; } } // This function writes false in each slot of the array // corresponding to a nonprime number. First, we know 0 and // 1 are not prime. Then for each value starting with 2, if // the value is still thought to be prime, we write false in // each slot corresponding to its multiples. void checkOffComposites( bool s[], int n ) { s[0] = false; s[1] = false; for( int i = 2; i * i < n; i++ ) { if( s[i] ) { for( int j = i + i; j < n; j += i ) { s[j] = false; } } } } // This function writes out all the values which correspond to // positions in a vector containing the value "true". Each // value is written to the standard output in a field of // eight characters. void displayTrueIndices( bool s[], int n ) { for( int i = 0; i < n; i++ ) { if( s[i] ) { printf( "%8d", i ); } } printf( "\n" ); } // main() just calls the worker functions. int main() { bool sieve[SIZE]; fillArray( sieve, true, SIZE ); checkOffComposites( sieve, SIZE ); displayTrueIndices( sieve, SIZE ); return 0; }
In Java [and other object-oriented languages] we have references which are identifiers
that refer to some object in memory. [In C
identifiers are also called
tokens.] In C
we have the same thing, but in this case they are known
as pointers.
A pointer is basically an object through which you reference another object, just like in Java. You
saw pointers back in the in-class assignment week 04, so I won't belabor the
here.
point
Here is some sample code that shows a little more about how this works, though:
int x = 5; // a normal variable x int* p = &x; // a pointer to where x is stored int* q = NULL; // a pointer to nothing at this point // this is how we allocate space for something int* r = malloc( sizeof(int) ); // this is how we allocate space for 100 somethings int* s = malloc( 100 * sizeof(int) ); printf( "%d %d %d", *p, *r, s[20] ); printf( "%d", *q ); // this will CRASH because it's NULL free(r); // we have to "free" things free(s); // when we allocate space // But we do not free p or q
Here's more specific information:
|
Pointers and arrays are closely related. The value of an array variable is treated as
a pointer to its first element, e.g. a == &a[0]
, and
e1[e2]
is the same as *(e1 + e2)
.
For definitions pointers and arrays are different:
int *x; /* is totally different from: */ int x[100]; int *a[n]; /* is totally different from: */ int a[n][100];
But for declarations, at least in parameter declarations, you can blur the distinction:
void f(int* a) { ... } void g(int b[]) { ... }
An array of ints, or pointer to an int, can be passed to either.
A string is an array of characters, just like in many other languages, that ends with
the NULL
character. Here are some definition points:
wchar_t
] terminated by and including the first null wide character
So, we can make a string with an array of characters with a zero at the end, or we can use string literals, which are sequences of:
\a, \b, \f, \n, \r, \t, \v, \', \", \?, \\, \one-to-three-octaldigits, \xhexdigits,
\ufour-hex-digits, and \Ueight-hex-digits
]
Let's just do examples:
/* * A program that illustrates strings in C. * Designed for C99, but should run fine in C90 * [with a lot of warnings] */ #include <stdio.h> #include <string.h> #include <wchar.h> // Simple strings from the basic character set char s1[] = {'d', 'o', 'g', (char)0}; char s2[] = {'d', 'o', 'g', '\0'}; char* s3 = "dog"; wchar_t* s4 = L"dog"; // String with some non-ascii, but still "8-bit" characters char* s5 = "c\xe9ili"; char* s6 = "c\u00e9ili"; char* s7 = "c\U000000e9ili"; wchar_t* s8 = L"c\U000000e9ili"; // Strings with characters with codepoints > 0xFF char* s9 = "k\u014dpa`a"; wchar_t* s10 = L"k\u014dpa`a"; // function to output information about a string void inspectString( char* s ) { int i, n; printf( "[%s] length=%d codepoints=[ ", s, strlen(s) ); for( i = 0, n = strlen(s)+1; i < n; i++ ) { printf( "%02x ", (unsigned char)s[i] ); } printf( "]\n" ); } // function to output information about a 'wide' string void inspectWideString( wchar_t* s ) { int i, n; printf( "[%ls] length=%d codepoints=[ ", s, wcslen(s) ); for( i = 0, n = (wcslen(s)+1)*sizeof(wchar_t); i < n; i++ ) { printf( "%02x ", ((unsigned char*)s)[i] ); } printf( "]\n" ); } int main() { inspectString(s1); inspectString(s2); inspectString(s3); inspectString((char*)s4); inspectWideString(s4); inspectString(s5); inspectString(s6); inspectString(s7); inspectString((char*)s8); inspectWideString(s8); inspectString(s9); inspectString((char*)s10); inspectWideString(s10); return 0; }
Open a browser and navigate to this site: https://kahoot.it/.
Enter the game PIN number in the box and click |
Because C
demands that things be declared before they are used, it falls on the programmer to
make sure that happens. One way this is done is to define the entire function before it is used.
Another, somewhat easier way to do this is to define a function prototype which is a
definition of the function without the innards, much like we do in Java when
we declare an Interface or an Abstract Class.
A prototype for the function we've seen already in the above code for the inspectString()
function would look like this:
void inspectString( char* s ); [*OR*] void inspectWideString( wchar_t* s );
Being able to define the prototypes like this means you can include them in another file which is then
included as part of your code using a pound include statement in your code. This will
tell the compiler, go look in this other file and find the definitions you need ~ some of them will
be in there!
The compiler needs to know where things are in order to compile them properly. Much of the time your
code will all reside in the same directory, so it's easy for the compiler to locate things. However,
your code will also make use of LOTS of libraries in order to keep you from having to
rewrite code that is used over and over. Just like in Java with the import
statement, we
have in C
the include
statement. This, along with several others such as the
#define
, ifdef
, ifndef
, and several others are what are known as
pre-processor directives. The compiler runs the preprocessor as one of the first steps
of the compilation process. This is the step that reads in files from the libraries, whenever the
directive include
is encountered. These include files are also known as header
files or often just headers.
The gcc
installation should put the libraries into a standard location so that the compiler
can find the files, but if you write your own header files, you have the option of leaving them in your
own directory. In that case, the include statement looks a bit different. Here's an example:
// header files that the compiler knows about #include <stdio.h> #include <stdlib.h> #include <time.h> #include <string.h> // header files you have written and must tell // the compiler about #include "myHeader.h" #include "../different/otherHeader.h" #include "lowerDir/thirdHeader.h"
The ability to write header files that specify all kinds of things to include
as part of your
programs is another great strength of this language.
Modulesin a Program
And as long as we're talking about it, this is a good way to divide your program up into pieces, so that
you can re-use the pieces, making your OWN libraries! Modularization is a good practice when writing
your code, as we've seen in other classes. In Object-Oriented programming, you try to separate out the
data and the operations that belong together, in a philosophy that is known
as separation of concerns, which you've probably also heard me calling
division of labor. The idea is to group data and operations that belong
together into a single entity. In C
you can do this by putting the related things into separate
files. This modularization is a good practice:
DRY]
One of the oldest build
tools on the planet is the make utility. This program
runs based on a special instruction file called a make file that has the definitions
of what make needs to know to build your program. This is a bit advanced
for now, but it's good to know it's out there, and it is an easy way of maintaining programs. Once you
set up your make file in the directory of your program, all you need to do when you make a change is
type the word make on the command line! Make will compile your code, link things with other
libraries, remake your own libraries, and has directives that allow conditional compilation.
Here is the Wikipedia page that describes makefiles which is interesting reading to see the power and flexibility of this tool.
CInput/Output Specifications
You have probably noted that in the printf()
function calls above, there are numerous
references to things like
, %s
, %d
, and
%c
. These are the format specifiers that control what the output
looks like. Remember in Java we were able to just use the plus sign to get values into an output string.
In that case, if we wanted a specific number of digits, or precision, we had to use another class, the
%f
DecimalFormat
class. In C
, we specify that information as part of the output string
and then use a comma-delimited list of variables/values to fill those specifications.
Here is a list of the format specifiers for the commonly used C
data types:
Data Type | Specifier |
---|---|
char | %c |
signed char | %c |
unsigned char | %c |
short int | %hd |
int | %d |
long int | %li |
long long int | %lli |
unsigned int | %u |
unsigned long int | %lu |
unsigned long long int | %llu |
float | %f |
double | %lf |
long double | %Lf |
Interesting that there are unsigned and signed characters, huh?
We've seen examples in the code snippets and programs above, but it's interesting to note that you can
make C
output different versions of the same data type! For example, the code:
int value = 97; printf( "value: %c as char,\n" \ "value: %d as int,\n" \ "value: %u as unsigned int,\n" \ "value: %f as float,\n" \ "value: %f as cast to float\n", value, value, value, value, (float)value );
…will output:
value: a as char, value: 97 as int, value: 97 as unsigned int, value: 0.000000 as float, value: 0.000000 as cast to float
Also note in this code how the continuation of lines is done using a backslash.
Note that the compiler won't complain that I'm using an integer as a float, but it gives a zero instead of coercing the value. BE CAREFUL of this, you may not be getting expected outputs, but the values could be correct behind the scenes!
In your homework groups, implement the following code using what you know now about the C
langage to implement the mouse and cheese game as specified by the following
items:
E'sin the word, one guess of
Ewill display all three
mouseandcheese.c
NANDGATEand the letters
Nand
Ahave been correctly guessed, and the letters
S,
T, and
I, have been incorrectly guessed, after five guesses you should display:
main()
method for running the program is contained in the
mouseandcheese.c
source file
I know this isn't due for a while, but i wanted to give you a heads-up on the homework assignments that you will be doing this semester. They are all available from the syllabus page, but just to make sure …
That's probably enough for the this week. Be sure to check out the links to the related materials that are listed on the class links page.