CMSI 2210: Welcome to Week 07

This Week's Agenda

For this week, here's the plan, Fran…

  • Announcements
  • REAL Language Representations
  • Top-down or Bottom-up?
  • Functions and Pointers and Strings ~ OH MY!
  • Compiling and Linking
  • Header files and prototypes
  • Multiple File Modules in a Program
  • C Input/Output Specifications
  • Wednesday/Thursday: In-class Exercise
just something interesting

REAL Language Representations

What is a real programming language? Many people seem to feel that the real-ness of a language is based on what facilities it may have [closures, for-each loops, records, etc.], while others seem to think that basically ANY language that you can use to program a computation engine [SQL? BASIC? VBA? HP calculator language?] is a real language. The moniker is open for debate, certainly. Here is a list of one person's ideas of different programming languages and what they are useful for.

Warning: This page will have lots of code examples. There were all tested on a Windows 10 Alienware PC with gcc version 2.95.2 which is really, really, REALLY REALLYold, but still works for our purposes. I have no idea if it's C90, C99, or what…

Top-down or Bottom-up?

First of all, in our case, there are several things to consider, not the least of which is, when we are designing our software, from what perspective do we start? Should we start at a high level and work our way deeper, drilling down into the details? Or should we begin at the bottom with the hardware functions and work our way upwards, combining the functions as we go to get to the full application?

top down Porsche Speedster

The first of these is known as a top down development philosophy, and the second is known as a bottom up methodology.

Actually, there is need for both of these. Top down is good when you have a high-level idea of what the software should do, and you are working to design the parts that will create and support that required functionality. In this case you would want to use a high level language, one which is the best fit for the task at hand. For example, you wouldn't write a web page using C#, and you wouldn't write a machine learning algorithm in HTML.

Bottom up is good for when you know what much [or most] of the underlying functionality will be, but you don't know yet how that will be implemented at a higher level. An example of this would be a scientific application in which you know that you will need several high-order functions to process very large data sets, but you're not sure just what the data outcomes will be so you don't know for certain how to make the presentation of that output appear to the user.

We are going to investigate this from both directions in this class, using the C language for our top down approach, and nasm for our bottom up methods. First, let's learn more about C.


Some Preliminary Information and Background

C is a very popular systems programming language, and by some measures might be the most popular programming language, period [Python and JavaScript notwithstanding]. Every developer needs to know C, whether or not she uses it – it's a kind of lingua franca, and as most high-level languages's runtime systems are written in C, knowing it helps you deal with the problem of leaky abstractions.

There are five official versions of C:

NameDateDescription/Notes/Comments
K&R C1972Never officially standardized. Pretty much dead now.
C90/ANSI C/C891989Practically a different language from K&R; still in use.
NA11995Actually not considered a "real" version; just C90 with minor clarifications and new library modules.
C991999A pretty big revision from C90; still widely used.
C112011The current standard; a pretty small step up from C99.

Note: C99 is not upwardly compatible with C90, but a lot of people still use C90. David Tribble has a great page describing the differences between C90, C99 and C++.

Some Notes And Links for You to Check Out…

There is an awesome C portal at http://www.lysator.liu.se/c/. Perhaps the most important references you'll find there are the IOCCC [don't miss the Winners Page], Rob Pike's notes on C programming, and Steve Summit's C programming course notes. Also, you have to have the Kernighan and Ritchie book, even though it only covers C90.

Please check out this talk by Dennis Ritchie, one of the creators of the language. Very enlightening about C, and that even the creator of the language gets things wrong when writing programs. [approx. 1 hour]

A C program is spread over one or more files. Each file is a sequence of declarations. A declaration is either:

  1. an object declaration
  2. a type declaration
  3. a function declaration
  4. a directive
Each file is compiled then the compiled units are linked together to form an executable program.

Some Examples

In the best Dr. Toal style, we'll start out with some examples, then get into the nitty-gritty details. That way you get more cluck for your buck.

Traditionally, as you no doubt remember, the first program that you always write in any language you are learning is the Hello World program. Here it is in C:

1      #include <stdio.h>
2
3      int main()
4         printf( "Hello, World!\n" );
5         return 0;
6      }
            

Not to belabor the obvious, since you all know Java, JavaScript, Python, and probably a couple of other languages, but this is how the code breaks down:

  1. line one includes a standard library of functions that facilitate input and output
  2. line three declares the main which is the entry point to the program, just like Java; this is the way the operating system can figure out where to start the program, much like the JVM is told where to start with public static void main( String [] args )
  3. line four calls a C function from the stdio library to print to the screen
  4. line five tells the program to return to the operating system

Compiling and Linking

OK, now we've written the source code, so we have to get it ready to run. Since C is truly a compiled language, we have to run the compiler on our source code. The compiler, as you may remember, is a translator program that turns our human-readable source code into the object code that we need to execute. Object code is the machine code that will be used, but it isn't a complete program yet, so it can't run on its own. We still need another step to make a complete runnable program.

To compile the program we need to run gcc [which is the short name for the Gnu Compiler Collection]. Assuming that your source code is saved in the file named hello.c, compilation is done using the following line:

      gcc hello.c
            

In this case, because the gcc program is smart, it will automatically compile and then will perform the next step for us, linking our object code with the object code from the C standard libraries to make a complete runnable program. The result is a program file called a.exe on Windows and a.out just about everywhere else.

To run the program on Windows, you simply type a and press the Enter key. To run the program on a Mac or on UNIX/Linux, you type ./a.out.

The gcc program has several options that you can use on the command line to affect its operation. Here are a few:

Notes on the a.exe and a.out thing: If you don't tell gcc what to name the executable output file, it defaults to a with whatever the appropriate extension is for the operating system on which it is creating the application. There needs to be SOME name for the output file, so the convention of using the letter a has traditionally been used. You can re-name the output file to whatever you want, as we'll see here.

gcc -xc hello.c [to tell gcc to compile C code, not C++ – this automatically happens if the file extent is a .c]
gcc -std=c99 hello.c [to use C99]
gcc -c hello.c [to compile, but not link, the program]
gcc -S hello.c [to translate to assembly language only]
gcc -o hello hello.c [to name the output file something other than the default name]
gcc -o hello.exe hello.c[if you are using a Windows computer]

Don't forget to try these out and see what happens!

Variables, For Loops, Decision [if] Blocks

These look just like Java, but with a couple of twists you'll see that you MAY not be used to. Here is a sample:

  /*
   *  file: triple.c
   */

   #include <stdio.h>

   int main() {
       int a, b, c;              // declare three integers

       printf("     A     B     C\n");    // header line 1
       printf("------------------\n");    // header line 2
       for (c = 1; c <= 100; c++) {
           for (b = 1; b <= 100; b++) {
               for (a = 1; a <= 100; a++) {
                   if (a * a + b * b == c * c) {
                       printf("%6d%6d%6d\n", a, b, c);
                   } else {
                      // nothing here, just to show 'else'
                   }
               }
           }
       }
       return 0;
   }
            

This works in C90. In C99, you can declare a, b, and c directly in the for statements themselves, instead of separately.

Notice the comment style… but remember you can't nest them!

Investigation: Can you figure out what this code does? Why does it work the way it does?

Investigation: In a throwback to your Data Structures class, can you figure out what the algorithm run time of this program would be?

Investigation: Notice there are no initial values assigned for the three integers when they are declared. Are they automatically assigned a value?

Investigation: What is the return statement doing at the end of the program? If this is a main program, where is it returning?

Note: we'll get to the %6d%6d%6d stuff a little later…

Functions and Pointers and Strings ~ OH MY!

The C language is statically typed, just like Java. This means you have to define everything before you can use it. Java, as you know, allows you to define variables all over the place, and the Java compiler can [usually] figure out your intent. Not so with C. You must define variables, functions, constants, in short, everything before it gets used. This is one reason why most of the declarations that appear in a source file are at or near the top of the file.

The following example should make this clear:

  /*
   * A program that makes an approximation to pi by generating
   * a million random points in the unit square and computing
   * the ratio of those inside the unit circle to the total
   * number in the square. That value should be pretty close
   * to Pi/4.  The program displays the approximation as well
   * as the actual value to 10 digits.
   *
   * BTW, this should look REALLY FAMILIAR from CMSI 1010...
   */

   #include <stdio.h>
   #include <math.h>
   #include <stdlib.h>
   #include <time.h>

  // Not the best; should be a command line argument!
   #define NUMBER_OF_DARTS 1000000

  // Returns the 'c-squared' value of 'x' and 'y'
   double squareOfDistanceToOrigin( double x, double y ) {
      return x * x + y * y;
   }

  // Returns a random value in [-1..1]
   double randomValue() {
      return 2.0 * rand() / RAND_MAX - 1.0;
   }

  // The main program 'entry point'
   int main() {
      int i;
      int inside = 0;
      srand( time(0) );

      for (i = 0; i < NUMBER_OF_DARTS; i++) {
         double x = randomValue();
         double y = randomValue();
         if (squareOfDistanceToOrigin(x, y) < 1.0) {
            inside++;
         }
      }
      printf( "Pi [est.]: %12.10f\n",
              4.0 * ((double)inside / NUMBER_OF_DARTS) );
      printf( "[actual to 10 digits is %12.10f)\n", M_PI );

      return 0;
   }
            

Several things to note about this code:

Using Arrays

C arrays are extraordinarily primitive. They do not not know how big they are, so you can read and write beyond the array bounds and the compiler will not barf, but your program might. For example:

   int f() {
      int x;
      int a[4];
      int y;
      int z = 23;
      a[5] = 100;
      return y;
   }
            

This code might return 100, or it might return some other random value, since y might be written over in memory by the compiler. And a[-1] is probably the same variable as x. You can also write things like a[x], x[a], a[3] and 3[a]. Trying to read or write a[234523132] will probably crash your program, though. Remember:


Here's a program with arrays

  /*
   * A program that displays all the prime numbers up to and
   * including 1000, using the famous algorithm of Erathostenes.
   * This is a C99 program, not a C90 program.
   *
   * The purpose of the program is only to illustrate arrays
   * assuming one has not yet seen pointers or command line
   * arguments, so it isn't very good.
   */

   #include <stdio.h>
   #include <stdbool.h>

  // To get primes up to and including 1000, the sieve has
  // to have a slot at index 1000.  But indices must start
  // at 0, so there have to be 1001 slots in the array.

   #define SIZE 1001

  // Fills the first n slots of array s with the given value.
   void fillArray( bool s[], bool value, int n ) {
      for( int i = 0; i < n; i++ ) {
         s[i] = value;
      }
   }

  // This function writes false in each slot of the array
  // corresponding to a nonprime number.  First, we know 0 and
  // 1 are not prime.  Then for each value starting with 2, if
  // the value is still thought to be prime, we write false in
  // each slot corresponding to its multiples.
   void checkOffComposites( bool s[], int n ) {
      s[0] = false;
      s[1] = false;
      for( int i = 2; i * i < n; i++ ) {
         if( s[i] ) {
            for( int j = i + i; j < n; j += i ) {
               s[j] = false;
            }
         }
      }
   }

  // This function writes out all the values which correspond to
  // positions in a vector containing the value "true".  Each
  // value is written to the standard output in a field of
  // eight characters.
   void displayTrueIndices( bool s[], int n ) {
      for( int i = 0; i < n; i++ ) {
         if( s[i] ) {
            printf( "%8d", i );
         }
      }
      printf( "\n" );
   }

  // main() just calls the worker functions.
   int main() {
      bool sieve[SIZE];
      fillArray( sieve, true, SIZE );
      checkOffComposites( sieve, SIZE );
      displayTrueIndices( sieve, SIZE );
      return 0;
   }
            

Pointers, values, and addresses

In Java [and other object-oriented languages] we have references which are identifiers that refer to some object in memory. [In C identifiers are also called tokens.] In C we have the same thing, but in this case they are known as pointers.

A pointer is basically an object through which you reference another object, just like in Java. You saw pointers back in the in-class assignment week 04, so I won't belabor the point here.

Here is some sample code that shows a little more about how this works, though:

      int x = 5;           // a normal variable x
      int* p = &x;         // a pointer to where x is stored
      int* q = NULL;       // a pointer to nothing at this point

     // this is how we allocate space for something
      int* r = malloc( sizeof(int) );

     // this is how we allocate space for 100 somethings
      int* s = malloc( 100 * sizeof(int) );

      printf( "%d %d %d", *p, *r, s[20] );
      printf( "%d", *q );  // this will CRASH because it's NULL

      free(r);             // we have to "free" things
      free(s);             //    when we allocate space
     // But we do not free p or q
            

Here's more specific information:

If the pointer is calledThen the referent is calledAnd the field is called
p*p(*p).x or p->x

If the referent is calledThen the pointer is calledand the field is called
p&pp.x

  • Pointers are usually used for dynamic, linked data structures.
  • Unlinke in many languages, C lets you make pointers to local objects, which is convenientbut error-prone.
  • Pointers are typed in C.
  • In C, when you allocate memory you are given a pointer to that memory, and IT IS THE PROGRAMMER'S RESPONSIBILITY TO GIVE IT BACK. There is no implicit garbage collection! You need to know about memory leaks and dangling pointers.
pointer diagram

Pointers and Arrays

Pointers and arrays are closely related. The value of an array variable is treated as a pointer to its first element, e.g. a == &a[0], and e1[e2] is the same as *(e1 + e2).

For definitions pointers and arrays are different:

      int *x;           /* is totally different from: */
      int x[100];

      int *a[n];        /* is totally different from: */
      int a[n][100];
            

But for declarations, at least in parameter declarations, you can blur the distinction:

      void f(int* a) { ... }
      void g(int b[]) { ... }
            

An array of ints, or pointer to an int, can be passed to either.


Strings and Arrays

A string is an array of characters, just like in many other languages, that ends with the NULL character. Here are some definition points:

So, we can make a string with an array of characters with a zero at the end, or we can use string literals, which are sequences of:

  1. characters except \, ", and newlines, and
  2. escapes [\a, \b, \f, \n, \r, \t, \v, \', \", \?, \\, \one-to-three-octaldigits, \xhexdigits, \ufour-hex-digits, and \Ueight-hex-digits]

Let's just do examples:

  /*
   * A program that illustrates strings in C.
   * Designed for C99, but should run fine in C90
   *  [with a lot of warnings]
   */

   #include <stdio.h>
   #include <string.h>
   #include <wchar.h>

  // Simple strings from the basic character set
   char s1[] = {'d', 'o', 'g', (char)0};
   char s2[] = {'d', 'o', 'g', '\0'};
   char* s3 = "dog";
   wchar_t* s4 = L"dog";

   // String with some non-ascii, but still "8-bit" characters
   char* s5 = "c\xe9ili";
   char* s6 = "c\u00e9ili";
   char* s7 = "c\U000000e9ili";
   wchar_t* s8 = L"c\U000000e9ili";

   // Strings with characters with codepoints > 0xFF
   char* s9 = "k\u014dpa`a";
   wchar_t* s10 = L"k\u014dpa`a";

  // function to output information about a string
   void inspectString( char* s ) {
      int i, n;
      printf( "[%s] length=%d codepoints=[ ", s, strlen(s) );
      for( i = 0, n = strlen(s)+1; i < n; i++ ) {
         printf( "%02x ", (unsigned char)s[i] );
      }
      printf( "]\n" );
   }

  // function to output information about a 'wide' string
   void inspectWideString( wchar_t* s ) {
      int i, n;
      printf( "[%ls] length=%d codepoints=[ ", s, wcslen(s) );
      for( i = 0, n = (wcslen(s)+1)*sizeof(wchar_t); i < n; i++ ) {
         printf( "%02x ", ((unsigned char*)s)[i] );
      }
      printf( "]\n" );
   }

   int main() {
      inspectString(s1);
      inspectString(s2);
      inspectString(s3);
      inspectString((char*)s4);
      inspectWideString(s4);
      inspectString(s5);
      inspectString(s6);
      inspectString(s7);
      inspectString((char*)s8);
      inspectWideString(s8);
      inspectString(s9);
      inspectString((char*)s10);
      inspectWideString(s10);
      return 0;
   }
            

Header files and prototypes

Because C demands that things be declared before they are used, it falls on the programmer to make sure that happens. One way this is done is to define the entire function before it is used. Another, somewhat easier way to do this is to define a function prototype which is a definition of the function without the innards, much like we do in Java when we declare an Interface or an Abstract Class.

A prototype for the function we've seen already in the above code for the inspectString() function would look like this:

      void inspectString( char* s );

            [*OR*]

      void inspectWideString( wchar_t* s );
            

Being able to define the prototypes like this means you can include them in another file which is then included as part of your code using a pound include statement in your code. This will tell the compiler, go look in this other file and find the definitions you need ~ some of them will be in there!

The compiler needs to know where things are in order to compile them properly. Much of the time your code will all reside in the same directory, so it's easy for the compiler to locate things. However, your code will also make use of LOTS of libraries in order to keep you from having to rewrite code that is used over and over. Just like in Java with the import statement, we have in C the include statement. This, along with several others such as the #define, ifdef, ifndef, and several others are what are known as pre-processor directives. The compiler runs the preprocessor as one of the first steps of the compilation process. This is the step that reads in files from the libraries, whenever the directive include is encountered. These include files are also known as header files or often just headers.

The gcc installation should put the libraries into a standard location so that the compiler can find the files, but if you write your own header files, you have the option of leaving them in your own directory. In that case, the include statement looks a bit different. Here's an example:

     // header files that the compiler knows about
      #include <stdio.h>
      #include <stdlib.h>
      #include <time.h>
      #include <string.h>

     // header files you have written and must tell
     //   the compiler about
      #include "myHeader.h"
      #include "../different/otherHeader.h"
      #include "lowerDir/thirdHeader.h"
            

The ability to write header files that specify all kinds of things to include as part of your programs is another great strength of this language.

Multiple File Modules in a Program

And as long as we're talking about it, this is a good way to divide your program up into pieces, so that you can re-use the pieces, making your OWN libraries! Modularization is a good practice when writing your code, as we've seen in other classes. In Object-Oriented programming, you try to separate out the data and the operations that belong together, in a philosophy that is known as separation of concerns, which you've probably also heard me calling division of labor. The idea is to group data and operations that belong together into a single entity. In C you can do this by putting the related things into separate files. This modularization is a good practice:

One of the oldest build tools on the planet is the make utility. This program runs based on a special instruction file called a make file that has the definitions of what make needs to know to build your program. This is a bit advanced for now, but it's good to know it's out there, and it is an easy way of maintaining programs. Once you set up your make file in the directory of your program, all you need to do when you make a change is type the word make on the command line! Make will compile your code, link things with other libraries, remake your own libraries, and has directives that allow conditional compilation.

Here is the Wikipedia page that describes makefiles which is interesting reading to see the power and flexibility of this tool.

C Input/Output Specifications

You have probably noted that in the printf() function calls above, there are numerous references to things like %s, %d, %c, and %f. These are the format specifiers that control what the output looks like. Remember in Java we were able to just use the plus sign to get values into an output string. In that case, if we wanted a specific number of digits, or precision, we had to use another class, the DecimalFormat class. In C, we specify that information as part of the output string and then use a comma-delimited list of variables/values to fill those specifications.

Here is a list of the format specifiers for the commonly used C data types:

Data TypeSpecifier
char%c
signed char%c
unsigned char%c
short int%hd
int%d
long int%li
long long int%lli
unsigned int%u
unsigned long int%lu
unsigned long long int%llu
float%f
double%lf
long double%Lf

Interesting that there are unsigned and signed characters, huh?

We've seen examples in the code snippets and programs above, but it's interesting to note that you can make C output different versions of the same data type! For example, the code:

      int value = 97;
      printf( "value: %c as char,\n" \
              "value: %d as int,\n" \
              "value: %u as unsigned int,\n" \
              "value: %f as float,\n" \
              "value: %f as cast to float\n",
               value, value, value, value, (float)value );
            

…will output:

      value: a as char,
      value: 97 as int,
      value: 97 as unsigned int,
      value: 0.000000 as float,
      value: 0.000000 as cast to float
            

Also note in this code how the continuation of lines is done using a backslash.

Note that the compiler won't complain that I'm using an integer as a float, but it gives a zero instead of coercing the value. BE CAREFUL of this, you may not be getting expected outputs, but the values could be correct behind the scenes!

In-class Exercise #7

In your homework groups, implement the following code using what you know now about the C langage to implement the mouse and cheese game as specified by the following items:

Rules
Game Mechanics

Homework Assignment #5

I know this isn't due for a while, but i wanted to give you a heads-up on the homework assignments that you will be doing this semester. They are all available from the syllabus page, but just to make sure …

Week Seven Wrap-up

That's probably enough for the this week. Be sure to check out the links to the related materials that are listed on the class links page.