CMSI 2210: Welcome to Week 07

This Week's Agenda

For this week, here's the plan, Fran…

Announcements
REAL Language Representations
Top-down or Bottom-up?
Functions and Pointers and Strings ~ OH MY!
Compiling and Linking
Header files and prototypes
Multiple File Modules in a Program
C Input/Output Specifications
Wednesday/Thursday: In-class Exercise

REAL Language Representations

What is a real programming language? Many people seem to feel that the real-ness of a language is based on what facilities it may have [closures, for-each loops, records, etc.], while others seem to think that basically ANY language that you can use to program a computation engine [SQL? BASIC? VBA? HP calculator language?] is a real language. The moniker is open for debate, certainly. Here is a list of one person's ideas of different programming languages and what they are useful for.

Warning: This page will have lots of code examples. There were all tested on a Windows 10 Alienware PC with gcc version 2.95.2 which is really, really, REALLY REALLYold, but still works for our purposes. I have no idea if it's C90, C99, or what…

Top-down or Bottom-up?

First of all, in our case, there are several things to consider, not the least of which is, when we are designing our software, from what perspective do we start? Should we start at a high level and work our way deeper, drilling down into the details? Or should we begin at the bottom with the hardware functions and work our way upwards, combining the functions as we go to get to the full application?

The first of these is known as a top down development philosophy, and the second is known as a bottom up methodology.

Actually, there is need for both of these. Top down is good when you have a high-level idea of what the software should do, and you are working to design the parts that will create and support that required functionality. In this case you would want to use a high level language, one which is the best fit for the task at hand. For example, you wouldn't write a web page using C#, and you wouldn't write a machine learning algorithm in HTML.

Bottom up is good for when you know what much [or most] of the underlying functionality will be, but you don't know yet how that will be implemented at a higher level. An example of this would be a scientific application in which you know that you will need several high-order functions to process very large data sets, but you're not sure just what the data outcomes will be so you don't know for certain how to make the presentation of that output appear to the user.

We are going to investigate this from both directions in this class, using the C language for our top down approach, and nasm for our bottom up methods. First, let's learn more about C.

Some Preliminary Information and Background

C is a very popular systems programming language, and by some measures might be the most popular programming language, period [Python and JavaScript notwithstanding]. Every developer needs to know C, whether or not she uses it – it's a kind of lingua franca, and as most high-level languages's runtime systems are written in C, knowing it helps you deal with the problem of leaky abstractions.

There are five official versions of C:

Name	Date	Description/Notes/Comments
K&R C	1972	Never officially standardized. Pretty much dead now.
C90/ANSI C/C89	1989	Practically a different language from K&R; still in use.
NA1	1995	Actually not considered a "real" version; just C90 with minor clarifications and new library modules.
C99	1999	A pretty big revision from C90; still widely used.
C11	2011	The current standard; a pretty small step up from C99.

Note: C99 is not upwardly compatible with C90, but a lot of people still use C90. David Tribble has a great page describing the differences between C90, C99 and C++.

Some Notes And Links for You to Check Out…

There is an awesome C portal at http://www.lysator.liu.se/c/. Perhaps the most important references you'll find there are the IOCCC [don't miss the Winners Page], Rob Pike's notes on C programming, and Steve Summit's C programming course notes. Also, you have to have the Kernighan and Ritchie book, even though it only covers C90.

Please check out this talk by Dennis Ritchie, one of the creators of the language. Very enlightening about C, and that even the creator of the language gets things wrong when writing programs. [approx. 1 hour]

A C program is spread over one or more files. Each file is a sequence of declarations. A declaration is either:

an object declaration
a type declaration
a function declaration
a directive

Each file is compiled then the compiled units are linked together to form an executable program.

Some Examples

In the best Dr. Toal style, we'll start out with some examples, then get into the nitty-gritty details. That way you get more cluck for your buck.

Traditionally, as you no doubt remember, the first program that you always write in any language you are learning is the Hello World program. Here it is in C:

1      #include <stdio.h>
2
3      int main()
4         printf( "Hello, World!\n" );
5         return 0;
6      }

Not to belabor the obvious, since you all know Java, JavaScript, Python, and probably a couple of other languages, but this is how the code breaks down:

line one includes a standard library of functions that facilitate input and output
line three declares the main which is the entry point to the program, just like Java; this is the way the operating system can figure out where to start the program, much like the JVM is told where to start with public static void main( String [] args )
line four calls a C function from the stdio library to print to the screen
line five tells the program to return to the operating system

Compiling and Linking

OK, now we've written the source code, so we have to get it ready to run. Since C is truly a compiled language, we have to run the compiler on our source code. The compiler, as you may remember, is a translator program that turns our human-readable source code into the object code that we need to execute. Object code is the machine code that will be used, but it isn't a complete program yet, so it can't run on its own. We still need another step to make a complete runnable program.

To compile the program we need to run gcc [which is the short name for the Gnu Compiler Collection]. Assuming that your source code is saved in the file named hello.c, compilation is done using the following line:

      gcc hello.c

In this case, because the gcc program is smart, it will automatically compile and then will perform the next step for us, linking our object code with the object code from the C standard libraries to make a complete runnable program. The result is a program file called a.exe on Windows and a.out just about everywhere else.

To run the program on Windows, you simply type a and press the Enter key. To run the program on a Mac or on UNIX/Linux, you type ./a.out.

The gcc program has several options that you can use on the command line to affect its operation. Here are a few:

Notes on the a.exe and a.out thing: If you don't tell gcc what to name the executable output file, it defaults to a with whatever the appropriate extension is for the operating system on which it is creating the application. There needs to be SOME name for the output file, so the convention of using the letter a has traditionally been used. You can re-name the output file to whatever you want, as we'll see here.

gcc -xc hello.c	[to tell gcc to compile C code, not C++ – this automatically happens if the file extent is a .c]
gcc -std=c99 hello.c	[to use C99]
gcc -c hello.c	[to compile, but not link, the program]
gcc -S hello.c	[to translate to assembly language only]
gcc -o hello hello.c	[to name the output file something other than the default name]
gcc -o hello.exe hello.c	[if you are using a Windows computer]

Don't forget to try these out and see what happens!

Variables, For Loops, Decision [if] Blocks

These look just like Java, but with a couple of twists you'll see that you MAY not be used to. Here is a sample:

  /*
   *  file: triple.c
   */

   #include <stdio.h>

   int main() {
       int a, b, c;              // declare three integers

       printf("     A     B     C\n");    // header line 1
       printf("------------------\n");    // header line 2
       for (c = 1; c <= 100; c++) {
           for (b = 1; b <= 100; b++) {
               for (a = 1; a <= 100; a++) {
                   if (a * a + b * b == c * c) {
                       printf("%6d%6d%6d\n", a, b, c);
                   } else {
                      // nothing here, just to show 'else'
                   }
               }
           }
       }
       return 0;
   }

This works in C90. In C99, you can declare a, b, and c directly in the for statements themselves, instead of separately.

Notice the comment style… but remember you can't nest them!

Investigation: Can you figure out what this code does? Why does it work the way it does?

Investigation: In a throwback to your Data Structures class, can you figure out what the algorithm run time of this program would be?

Investigation: Notice there are no initial values assigned for the three integers when they are declared. Are they automatically assigned a value?

Investigation: What is the return statement doing at the end of the program? If this is a main program, where is it returning?

Note: we'll get to the %6d%6d%6d stuff a little later…

Functions and Pointers and Strings ~ OH MY!

The C language is statically typed, just like Java. This means you have to define everything before you can use it. Java, as you know, allows you to define variables all over the place, and the Java compiler can [usually] figure out your intent. Not so with C. You must define variables, functions, constants, in short, everything before it gets used. This is one reason why most of the declarations that appear in a source file are at or near the top of the file.

The following example should make this clear:

  /*
   * A program that makes an approximation to pi by generating
   * a million random points in the unit square and computing
   * the ratio of those inside the unit circle to the total
   * number in the square. That value should be pretty close
   * to Pi/4.  The program displays the approximation as well
   * as the actual value to 10 digits.
   *
   * BTW, this should look REALLY FAMILIAR from CMSI 1010...
   */

   #include <stdio.h>
   #include <math.h>
   #include <stdlib.h>
   #include <time.h>

  // Not the best; should be a command line argument!
   #define NUMBER_OF_DARTS 1000000

  // Returns the 'c-squared' value of 'x' and 'y'
   double squareOfDistanceToOrigin( double x, double y ) {
      return x * x + y * y;
   }

  // Returns a random value in [-1..1]
   double randomValue() {
      return 2.0 * rand() / RAND_MAX - 1.0;
   }

  // The main program 'entry point'
   int main() {
      int i;
      int inside = 0;
      srand( time(0) );

      for (i = 0; i < NUMBER_OF_DARTS; i++) {
         double x = randomValue();
         double y = randomValue();
         if (squareOfDistanceToOrigin(x, y) < 1.0) {
            inside++;
         }
      }
      printf( "Pi [est.]: %12.10f\n",
              4.0 * ((double)inside / NUMBER_OF_DARTS) );
      printf( "[actual to 10 digits is %12.10f)\n", M_PI );

      return 0;
   }

Several things to note about this code:

There are no semicolons after #include and #define statements
The functions are defined before they are called in the code
The constant NUMBER_OF_DARTS is defined before it is used
Algorithmically, because this is a unit square we don't need the square root to check if the distance from the origin is less than one
The variables x and y inside the loop are LOCAL TO THE LOOP'S SCOPE
The return 0 returns control to the operating system at the end of the program; the 0 value is a convention that indicates successful completion that dates back to the earliest days of UNIX programming.
The srand( time(0) ) function calls seed the random number generator with the current time; this insures a different set of random numbers for each program run
The srand( time(0) ) function calls ALSO show that functions are treated in C as expressions that return a value, just like Java
You can break long lines of code into shorter chunks; some compilers let you just do it, others ask that you put a backslash [\] at the end of the line
Note the calculation that is passed as an argument to printf()
Note the cast to double of the integer value inside in the calculation

Using Arrays

C arrays are extraordinarily primitive. They do not not know how big they are, so you can read and write beyond the array bounds and the compiler will not barf, but your program might. For example:

   int f() {
      int x;
      int a[4];
      int y;
      int z = 23;
      a[5] = 100;
      return y;
   }

This code might return 100, or it might return some other random value, since y might be written over in memory by the compiler. And a[-1] is probably the same variable as x. You can also write things like a[x], x[a], a[3] and 3[a]. Trying to read or write a[234523132] will probably crash your program, though. Remember:

There are no array-index-out-of-bounds-exceptions in C because C does not remember how big you created your array
C does not have exceptions to protect you from yourself

Here's a program with arrays

  /*
   * A program that displays all the prime numbers up to and
   * including 1000, using the famous algorithm of Erathostenes.
   * This is a C99 program, not a C90 program.
   *
   * The purpose of the program is only to illustrate arrays
   * assuming one has not yet seen pointers or command line
   * arguments, so it isn't very good.
   */

   #include <stdio.h>
   #include <stdbool.h>

  // To get primes up to and including 1000, the sieve has
  // to have a slot at index 1000.  But indices must start
  // at 0, so there have to be 1001 slots in the array.

   #define SIZE 1001

  // Fills the first n slots of array s with the given value.
   void fillArray( bool s[], bool value, int n ) {
      for( int i = 0; i < n; i++ ) {
         s[i] = value;
      }
   }

  // This function writes false in each slot of the array
  // corresponding to a nonprime number.  First, we know 0 and
  // 1 are not prime.  Then for each value starting with 2, if
  // the value is still thought to be prime, we write false in
  // each slot corresponding to its multiples.
   void checkOffComposites( bool s[], int n ) {
      s[0] = false;
      s[1] = false;
      for( int i = 2; i * i < n; i++ ) {
         if( s[i] ) {
            for( int j = i + i; j < n; j += i ) {
               s[j] = false;
            }
         }
      }
   }

  // This function writes out all the values which correspond to
  // positions in a vector containing the value "true".  Each
  // value is written to the standard output in a field of
  // eight characters.
   void displayTrueIndices( bool s[], int n ) {
      for( int i = 0; i < n; i++ ) {
         if( s[i] ) {
            printf( "%8d", i );
         }
      }
      printf( "\n" );
   }

  // main() just calls the worker functions.
   int main() {
      bool sieve[SIZE];
      fillArray( sieve, true, SIZE );
      checkOffComposites( sieve, SIZE );
      displayTrueIndices( sieve, SIZE );
      return 0;
   }

Pointers, values, and addresses

In Java [and other object-oriented languages] we have references which are identifiers that refer to some object in memory. [In C identifiers are also called tokens.] In C we have the same thing, but in this case they are known as pointers.

A pointer is basically an object through which you reference another object, just like in Java. You saw pointers back in the in-class assignment week 04, so I won't belabor the point here.

Here is some sample code that shows a little more about how this works, though:

      int x = 5;           // a normal variable x
      int* p = &x;         // a pointer to where x is stored
      int* q = NULL;       // a pointer to nothing at this point

     // this is how we allocate space for something
      int* r = malloc( sizeof(int) );

     // this is how we allocate space for 100 somethings
      int* s = malloc( 100 * sizeof(int) );

      printf( "%d %d %d", *p, *r, s[20] );
      printf( "%d", *q );  // this will CRASH because it's NULL

      free(r);             // we have to "free" things
      free(s);             //    when we allocate space
     // But we do not free p or q

Here's more specific information:

If the pointer is called	Then the referent is called	And the field is called
p	*p	(*p).x or p->x

If the referent is called	Then the pointer is called	and the field is called
p	&p	p.x

Pointers are usually used for dynamic, linked data structures.
Unlinke in many languages, C lets you make pointers to local objects, which is convenientbut error-prone.
Pointers are typed in C.
In C, when you allocate memory you are given a pointer to that memory, and IT IS THE PROGRAMMER'S RESPONSIBILITY TO GIVE IT BACK. There is no implicit garbage collection! You need to know about memory leaks and dangling pointers.

Pointers and Arrays

Pointers and arrays are closely related. The value of an array variable is treated as a pointer to its first element, e.g. a == &a[0], and e1[e2] is the same as *(e1 + e2).

For definitions pointers and arrays are different:

      int *x;           /* is totally different from: */
      int x[100];

      int *a[n];        /* is totally different from: */
      int a[n][100];

But for declarations, at least in parameter declarations, you can blur the distinction:

      void f(int* a) { ... }
      void g(int b[]) { ... }

An array of ints, or pointer to an int, can be passed to either.

Strings and Arrays

A string is an array of characters, just like in many other languages, that ends with the NULL character. Here are some definition points:

A byte with all bits set to 0 is called the null character
A string (a.k.a. multibyte string) is a contiguous sequence of characters [of type char] terminated by and including the first null character
The length of a string is the number of bytes preceding the null character, and the value of a string is the sequence of the values of the contained characters, in order
A null wide character is a wide character with code value zero
A wide string is a contiguous sequence of wide characters [of type wchar_t] terminated by and including the first null wide character
The length of a wide string is the number of wide characters preceding the null wide character and the value of a wide string is the sequence of code values of the contained wide characters, in order

So, we can make a string with an array of characters with a zero at the end, or we can use string literals, which are sequences of:

characters except \, ", and newlines, and
escapes [\a, \b, \f, \n, \r, \t, \v, \', \", \?, \\, \one-to-three-octaldigits, \xhexdigits, \ufour-hex-digits, and \Ueight-hex-digits]

Let's just do examples:

  /*
   * A program that illustrates strings in C.
   * Designed for C99, but should run fine in C90
   *  [with a lot of warnings]
   */

   #include <stdio.h>
   #include <string.h>
   #include <wchar.h>

  // Simple strings from the basic character set
   char s1[] = {'d', 'o', 'g', (char)0};
   char s2[] = {'d', 'o', 'g', '\0'};
   char* s3 = "dog";
   wchar_t* s4 = L"dog";

   // String with some non-ascii, but still "8-bit" characters
   char* s5 = "c\xe9ili";
   char* s6 = "c\u00e9ili";
   char* s7 = "c\U000000e9ili";
   wchar_t* s8 = L"c\U000000e9ili";

   // Strings with characters with codepoints > 0xFF
   char* s9 = "k\u014dpa`a";
   wchar_t* s10 = L"k\u014dpa`a";

  // function to output information about a string
   void inspectString( char* s ) {
      int i, n;
      printf( "[%s] length=%d codepoints=[ ", s, strlen(s) );
      for( i = 0, n = strlen(s)+1; i < n; i++ ) {
         printf( "%02x ", (unsigned char)s[i] );
      }
      printf( "]\n" );
   }

  // function to output information about a 'wide' string
   void inspectWideString( wchar_t* s ) {
      int i, n;
      printf( "[%ls] length=%d codepoints=[ ", s, wcslen(s) );
      for( i = 0, n = (wcslen(s)+1)*sizeof(wchar_t); i < n; i++ ) {
         printf( "%02x ", ((unsigned char*)s)[i] );
      }
      printf( "]\n" );
   }

   int main() {
      inspectString(s1);
      inspectString(s2);
      inspectString(s3);
      inspectString((char*)s4);
      inspectWideString(s4);
      inspectString(s5);
      inspectString(s6);
      inspectString(s7);
      inspectString((char*)s8);
      inspectWideString(s8);
      inspectString(s9);
      inspectString((char*)s10);
      inspectWideString(s10);
      return 0;
   }

A Quick Quiz to Test Your Knowledge

Open a browser and navigate to this site: https://kahoot.it/. Enter the game PIN number in the box and click enter. Then enter your desired nickname for the session and click enter again. The classroom screen will display the questions, one at a time, and you can take your choice on YOUR computer of the four colored answers. You will have 20 seconds to answer. Once you have answered you will find out immediately whether you answer is correct. At the end, the highest three scores are displayed. You can use either your laptop or your phone for this game.

Header files and prototypes

Because C demands that things be declared before they are used, it falls on the programmer to make sure that happens. One way this is done is to define the entire function before it is used. Another, somewhat easier way to do this is to define a function prototype which is a definition of the function without the innards, much like we do in Java when we declare an Interface or an Abstract Class.

A prototype for the function we've seen already in the above code for the inspectString() function would look like this:

      void inspectString( char* s );

            [*OR*]

      void inspectWideString( wchar_t* s );

Being able to define the prototypes like this means you can include them in another file which is then included as part of your code using a pound include statement in your code. This will tell the compiler, go look in this other file and find the definitions you need ~ some of them will be in there!

The compiler needs to know where things are in order to compile them properly. Much of the time your code will all reside in the same directory, so it's easy for the compiler to locate things. However, your code will also make use of LOTS of libraries in order to keep you from having to rewrite code that is used over and over. Just like in Java with the import statement, we have in C the include statement. This, along with several others such as the #define, ifdef, ifndef, and several others are what are known as pre-processor directives. The compiler runs the preprocessor as one of the first steps of the compilation process. This is the step that reads in files from the libraries, whenever the directive include is encountered. These include files are also known as header files or often just headers.

The gcc installation should put the libraries into a standard location so that the compiler can find the files, but if you write your own header files, you have the option of leaving them in your own directory. In that case, the include statement looks a bit different. Here's an example:

     // header files that the compiler knows about
      #include <stdio.h>
      #include <stdlib.h>
      #include <time.h>
      #include <string.h>

     // header files you have written and must tell
     //   the compiler about
      #include "myHeader.h"
      #include "../different/otherHeader.h"
      #include "lowerDir/thirdHeader.h"

The ability to write header files that specify all kinds of things to include as part of your programs is another great strength of this language.

Multiple File Modules in a Program

And as long as we're talking about it, this is a good way to divide your program up into pieces, so that you can re-use the pieces, making your OWN libraries! Modularization is a good practice when writing your code, as we've seen in other classes. In Object-Oriented programming, you try to separate out the data and the operations that belong together, in a philosophy that is known as separation of concerns, which you've probably also heard me calling division of labor. The idea is to group data and operations that belong together into a single entity. In C you can do this by putting the related things into separate files. This modularization is a good practice:

It helps you organize your code
It helps with debugging since you know where to find things
It helps with reusing code [DRY]
It helps with sharing your code [open source]
You can also create libraries of functions and definitions to use and share
You can reuse OTHER people's code, and even adopt and modify it for your own purposes

One of the oldest build tools on the planet is the make utility. This program runs based on a special instruction file called a make file that has the definitions of what make needs to know to build your program. This is a bit advanced for now, but it's good to know it's out there, and it is an easy way of maintaining programs. Once you set up your make file in the directory of your program, all you need to do when you make a change is type the word make on the command line! Make will compile your code, link things with other libraries, remake your own libraries, and has directives that allow conditional compilation.

Here is the Wikipedia page that describes makefiles which is interesting reading to see the power and flexibility of this tool.

C Input/Output Specifications

You have probably noted that in the printf() function calls above, there are numerous references to things like %s, %d, %c, and %f. These are the format specifiers that control what the output looks like. Remember in Java we were able to just use the plus sign to get values into an output string. In that case, if we wanted a specific number of digits, or precision, we had to use another class, the DecimalFormat class. In C, we specify that information as part of the output string and then use a comma-delimited list of variables/values to fill those specifications.

Here is a list of the format specifiers for the commonly used C data types:

Data Type	Specifier
char	%c
signed char	%c
unsigned char	%c
short int	%hd
int	%d
long int	%li
long long int	%lli
unsigned int	%u
unsigned long int	%lu
unsigned long long int	%llu
float	%f
double	%lf
long double	%Lf

Interesting that there are unsigned and signed characters, huh?

We've seen examples in the code snippets and programs above, but it's interesting to note that you can make C output different versions of the same data type! For example, the code:

      int value = 97;
      printf( "value: %c as char,\n" \
              "value: %d as int,\n" \
              "value: %u as unsigned int,\n" \
              "value: %f as float,\n" \
              "value: %f as cast to float\n",
               value, value, value, value, (float)value );

…will output:

      value: a as char,
      value: 97 as int,
      value: 97 as unsigned int,
      value: 0.000000 as float,
      value: 0.000000 as cast to float

Also note in this code how the continuation of lines is done using a backslash.

Note that the compiler won't complain that I'm using an integer as a float, but it gives a zero instead of coercing the value. BE CAREFUL of this, you may not be getting expected outputs, but the values could be correct behind the scenes!

In-class Exercise #7

In your homework groups, implement the following code using what you know now about the C langage to implement the mouse and cheese game as specified by the following items:

Rules

The goal of the game is to prevent the mouse from reaching the cheese
You are given a word that you must guess, one letter at a time
You have ten guesses total, meaning the mouse has ten steps to reach the cheese
For each letter you guess right, the mouse stays put
For each letter you guess wrong, the mouse moves one step closer to the cheese
For each correct guess, the letter is displayed in the word
If the mouse reaches the cheese before you guess the word, you lose the game
If you guess the word before the mouse reaches the cheese, you win the game
For multiple occurrences of a letter in a word, one correct guess will display all of them; for example, if there are three E's in the word, one guess of E will display all three

Game Mechanics

The main game must be in a file called mouseandcheese.c
The word is selected randomly from an array of words which are strings
Initially all the letters are indicated by underscores, dashes, or some other non-letter you pick
You must keep track of which letters have been used
You must keep track of the number of incorrect guesses
You must display the number of steps remaining before the mouse reaches the cheese after each guess
You must display the current state of the word after each guess; for example, if the word is NANDGATE and the letters N and A have been correctly guessed, and the letters S, T, and I, have been incorrectly guessed, after five guesses you should display:

N A N _ _ A _ _ with seven steps left
This is a virtual abstraction game; you are NOT required to make a drawing of a mouse, the cheese, the steps, or any other part of the game except the letters and steps as shown above
You can have other files if you like, including header files or other modules as you deem necessary, as long as the main() method for running the program is contained in the mouseandcheese.c source file
Don't forget to COMMIT EARLY AND COMMIT OFTEN

Homework Assignment #5

I know this isn't due for a while, but i wanted to give you a heads-up on the homework assignments that you will be doing this semester. They are all available from the syllabus page, but just to make sure …

Week Seven Wrap-up

That's probably enough for the this week. Be sure to check out the links to the related materials that are listed on the class links page.