CMSI 2120: Welcome to Week 07

This Week's Agenda

For this week, here's the plan, Jan…

Go over the test from last week
ALL HANDS ON DEQUE!
Finish Bubble Sort
Insertion Sort
Data Structures Comparisons
Introducing: Algorithm Performance Analysis!
Analysis of Algorithm Running Times

Test Coverage

I'll provide the answers to any questions you have about the test. If you want to actually work the entire test, we can do that, but it will put us behind in covering the topics we need to cover during the semester, so it might be better to spend class time on questions that more than one person share. We'll play ♫ it by 👂 ear 𝄫 though…

…In Which We Get Below Deque

We will re-visit the Deque Data Structure briefly, to clarify some of the questions people have asked over the last several days about homework #2. Go check out the new additional description here.

Sorting Redux

As a review, here is the code from last week. I've filled in the missing parts so you can see how it actually looks for the Bubble Sort:

  /**
   * First, the swap function; note that we are ONLY dealing with int
   *  also, remember this is intended to be in the IntList class from week 03
   */
   public void swapIntsAt( int index1, int index2 ) {
      int temp = theList[index1];
      theList[index1] = theList[index2];
      theList[index2] = temp;
   }

   // let's define a "toString()" method while were here,
   //  so we have a way of printing the entire list
   public String toString() {
      String output = "";
      for( int i = 0; i < size; i++ ) {
         output += theList[i];
      }
      return output;
   }

One change I made in the IntList class was to make the list array a public variable. That makes it easier to access from the IntSorter class. There are other [better?] ways to do that which don't break encapsulation, but they are a bit involved and will distract us from the issue at hand, so we won't go into that at this juncture.

OK, now we have a way to swap two values in the list, so now we can make a sorting class that uses the swapping method as part of a comparison. We'll call it [because we are just SOOOOOO imaginative] the intSorter class. Here is the code to implement it:

   public class IntSorter {

      // Our first pass at sorting, a simple bubble sort; we'll add more later
      public static void bubbleSort( IntList listToSort ) {

         for( int i = 0; i < listToSort.getSize(); i++ ) {
            // Track if a swap has been made
            boolean swapped = false;

            // Iterate through all elements of the list starting at the end element
            //  and continuing up to the i + 1 element
            for( int j = listToSort.getSize() - 1; j > i; j-- ) {

                // Swap if the two currently adjacent items in the 2nd loop iteration
                //  are out of order
                if( listToSort.theList[j] < listToSort.theList[j-1] ) {
                    listToSort.swapIntsAt( j, j-1 );
                    swapped = true;       // Mark that you have swapped
                }
            }

            // It is time to return if you made no swaps
            if( !swapped ) {
               return;
            }
         }
      }

      // OK, now test it out!  Add 51, 23, 17, 43, and 31 to the list,
      //  print it out, then sort it and print it again
      public static void main( String[] args ) {
         IntList orgList = new IntList();
         orgList.append(51);
         orgList.append(23);
         orgList.append(17);
         orgList.append(43);
         orgList.append(31);
         System.out.println("Pre-sort:  " + orgList.toString() );
         bubbleSort( orgList );
         System.out.println("Post-sort: " + orgList.toString() );
      }
   }

Notice a couple of things about this algorithm and test code. First, the algorithm works from the big end towards the small end of the array, in other words, it works backwards. This is so the largest values get pushed to the right of the array, but we could easily do it the other way round by using the SMALLEST values for comparison and reversing the condition in the swap funciton.

Second, notice that the swapping function takes in indices of the values, NOT the values themselves. The swapping is done using a third variable as a temporary holding space for the value, and the result is the values are swapped in place within the array.

Third, note that the algorithm requires that we keep track of whether a swap was made during that pass through the array, and that we aren't done unless no swap was made. We thus must keep resetting the value of the swapped variable to false at the start of every loop iteration.

Finally, note that in the toString() method Java will automatically convert the values of the list from the int to the String data type when including them in the output string. Nice, eh?

Insertion Sort

Next along the line is another common sorting method that is simple to implement and works well with the IntList class, so we can add that to our intSorter class. This is called the insertion sort and it works much the same way as people do alphabetical sorting. You select one item from the group, and it starts a pile. You select another one, and put it on the top or bottom of the first. Then you select another, and insert that one in its proper location: on top, on the bottom, or between the other two items. You keep going through the items, inserting them in their proper places, until all the items have been inserted in the list.

The advantages of an insertion sort are that it's simple, it's efficient for small data sets, it does not change ordering of elements with keys that are the same, and it can be done in place so that it doesn't take a lot more memory space to perform.

What does this look like when we're using an array of values and we're not getting them dealt to us one at a time? Well, the simple way is to put them into the IntList in the order we get them and then Do an insertion sort on them, basically doing the same thing we'd do with a hand of cards. Insertion sort does the following steps:

Start at the left end index plus one
Compare the current value with the previous value [meaning the one to the left of it]
Swap them if if current is smaller than previous
Go to the next item to the right
Repeat steps 2 through 4 until all the values have been compared
The list will be sorted

Here is some code to implement that, added into the code for our IntSorter class:

   public class IntSorter {

     // Our first pass at sorting, a simple bubble sort; we'll add more later
      public void bubbleSort( IntList listToSort ) {

         for( int i = 0; i < listToSort.getSize(); i++ ) {
           // Track if a swap has been made
            boolean swapped = false;

           // Iterate through all elements of the list starting at the end element
           //  and continuing up to the i + 1 element
            for( int j = listToSort.getSize() - 1; j > i; j-- ) {

               // Swap if the two currently adjacent items in the 2nd loop iteration
               //  are out of order
                if( listToSort.theList[j] < listToSort.theList[j-1] ) {
                    listToSort.swapIntsAt( j, j-1 );
                    swapped = true;       // Mark that you have swapped
                }
            }

           // It is time to return if you made no swaps
            if( !swapped ) {
               return;
            }
         }
      }

     // The second pass at sorting, the insertion sort
      public IntList insertionSort( IntList listToSort ){

         int temp;
         for (int i = 1; i < listToSort.getSize(); i++) {
            for(int j = i ; j > 0 ; j--){
               if(listToSort.theList[j] < listToSort.theList[j-1]){
                  listToSort.swapIntsAt( j, j-1 );
               }
            }
         }
         return listToSort;
      }

     // OK, now test it out!  Add 51, 23, 17, 43, and 31 to the list,
     //  print it out, then sort it and print it again
      public static void main( String[] args ) {
         IntList orgList = new IntList();
         IntSorter s = new IntSorter();
         orgList.append(51);
         orgList.append(23);
         orgList.append(17);
         orgList.append(43);
         orgList.append(31);
         System.out.println("Pre-sort:  " + orgList.toString() );
         s.bubbleSort( orgList );
         System.out.println("Post-sort: " + orgList.toString() );
         orgList.append(251);
         orgList.append(223);
         orgList.append(217);
         orgList.append(243);
         orgList.append(231);
         System.out.println("Pre-sort:  " + orgList.toString() );
         s.insertionSort( orgList );
         System.out.println("Post-sort: " + orgList.toString() );
      }
   }

Here's another link to the sorting page that has all those different sorting demonstrations that you can play to compare.

Notice that when you are running the Insertion Sort vs. the Bubble sort, the number of items in the list has a definite effect on the time duration, which becomes more dramatic with the size of the list. For example, here is a comparison table:

Items	Insertion Time	Bubble Time
20	6.14 seconds	9.0 seconds
50	32.72 seconds	58.02 seconds

Admittedly this is not very scientific, since I just timed it with the stopwatch on my cell phone, but you can see that there is quite a difference. With 20 items, the Bubble sort takes about 1.5 times as long. With 50 items, the Bubble Sort takes 1.77 times as long, so it's not consistently the same amount longer!

Data Structures Comparisons

So, you might ask, what difference does all this make? Why do we need different sorting methods, and different data structures? Well, think about it:

Would a stack be a good data structure to use for printing when multiple people need to print? Why or why not?

Would a queue be good for passing parameters when calling functions? Why or why not?

OK, so when you think about it, different data structures serve different purposes. Likewise, a different algorithm might serve a better purpose for the operation that you are trying to perform on a certain type or certain set of data.

Algorithm Performance Analysis

Consider the following two algorithms for testing our sorting methods:


      private static int TEST_SIZE = 200000;

      public void testArryaListPrepend() {
         ArrayList<String> myRA = new ArrayList<String>();
         for( int i = 0; i < TEST_SIZE; i++ ) {
            myRA.add( 0, "" + i );
         }
      }

      public void testLinkedListPrepend() {
         LinkedList<String> myList = new LinkedList<String>();
         for( int i = 0; i < TEST_SIZE; i++ ) {
            myList.add( 0, "" + i );
         }
      }

Code this up and try it for yourself. You'll find there is a significant difference in the time it takes to execute the two different methods. [WHY IS THAT?]

This should also prove to you that there are good reasons for differences in efficiency, which means there are specific reasons why you should choose one or the other for a specific application.

Data Structure	What it's good for:	What it's NOT good for:
Sequential list [Array-based]	if you know the size beforehand quick access of items insertion / removal at end of list	arbitrary insertion / removal doesn't easily shrink once expanded
Linked list [Node-based]	unknown size until runtime heterogeneous collections arbitrary insertion / removal	fast querying/sorting of elements

Algorithm Running Times

Now we have a justification for being able to analyze the running time of two algorithms to see if one is more effective and/or efficient than the other! YAY! Always remember, though, that you must take into account the data on which the algorithm will operate!

OK, so now let's start doing some interesting analysis of the algorithms. We saw that the operation of the two sorting algorithms we implmented above is vastly different, even for a relatively small set of data which the computer is actually designed to operate with efficiently -- integers. When we do that sort of analysis, we can say that one algorithm is computationally more expensive than the other, meaning that one requires more processing than the other. This statement is the beginning of the idea of characterizing algorithms formally. In the case of our two sorts, this is NOT really an objective metric, because we can't really easily measure the exact time that either took; there are too many potentially contributing factors which can skew the results, like if two computers have different clock speeds or operating systems, initial data sortedness, or even if some process happens to start in the background while the algorithm is processing. So, we need a more objective way of doing these characterizations.

To help us out, we have a couple of different ways to do this activity that have already been invented and passed down to us from the exalted mountaintop:

Run-time or Performance Analysis which provides a classification of running time of the algorithm. This is performed using analysis of the number of primitive or atomic operations that the operation requires as a function of its input size.
Problem Complexity Analysis that revolves around the etymology of computational problems based on their inherent difficulty, then comparing these classifications to each other.

For our purposes in this class, we will focus on the first type, and leave the second type to your CMSI 385 Introduction to Theory of Computation class.

Input Size and Growth Rate

Input size is a fairly non-specific term, but what it means is usually based on the number of things that the algorithm has to process in a particular data structure. Algorithms also need to account for the growth rate in the number of steps that are required as the size of the input gets larger. For example, an algorithm may be fine for small data sets, but may become unusable for large data sets. Remember that computers are so fast that frequently any pair of algorithms can process a small data set without any appreciable difference in the running time, so that it is only when the data set gets to be large that the difference becomes apparent. Further, it would be nice to know how the algorithm might perform at scale, meaning with large or even distributed inputs.

Two assumptions that are made about the computing hardware in relation to this analysis are:

Single Processor assumes that the there is a single processor executing the instructions in a linear fashion. The single processor assumption means that there is no parallel processing done; there is assumed to be a single core and a single processor, no network between processors, no parallel/distributed processing taking place, ONLY one set of instructions being performed linearly.
Uniform Cost assumes that every atomic machine instruction takes the same amount of time to execute; i.e., addition, moving from RAM, moving between registers, storing to hard disk, division, &c. all take the same number of processor cycles. We know this isn't true in reality; multiplication operations take MUCH longer than a move from the EAX to the EBX register inside the processor, but we idealize this processing in much the same way that we use an abstract model when we design our software.

What are Primitive Operations

The following are considered primitive operations in the way we will use them for analysis of algorithms that we'll look at:

Assigning a value to a variable
Calling a method
Performing an arithmetic operation
Comparing two numbers or variables
Indexing into an array
Following an object reference [dot operators]
Returning from a method
Any operation that is constant except for the hardware on which it is run

An Example to the Rescue!

So, how do we do this? Those of you from my CMSI 185 class may remember a bit about this. We did it by simply counting how many times each instruction in our script was executed. For example, here is some code that we can use to start:

   public int findMaxInArray( int[] myRA ) {
      int currentMax = 0;
      for( int i = 0; i < myRA.length; i++ ) {
         if( currentMax < myRA[i] ) {
            currentMax = myRA[i];
         }
      }
      return currentMax;
   }

Now, let's count 'em up:

Initializing the value of currentMax is a single primitive operation, since all we are doing is setting the value to zero. A case could be made that we are doing several operations here: declaring a variable token, declaring it as an int primitive data type, and setting its value to zero. However, we treat this as atomic and just count it once.
Initializing the value of int i is the same thing, so it also gets one count. The fact that it is part of a for loop doesn't matter, because it still only gets executed one time.
Before entering the loop, we check the loop conditional to see if we're done. Of course, at first, we are not, but with each iteration we have to check again. This means the check operation, which is a comparison, is performed myRA.length + 1 times, so it contribues n + 1 units to the count. Further, because the myRA.length is a primitive when using the dot operator AND the comparison is also a primitive, the check if done instruction is actually TWO primitives, so it actually contributes 2(n + 1).
The body of the for() loop is executed a total of n times. With each iteration, the comparison of currentMax and myRA[i] is performed, which is TWO operations [indexing and comparison]. Then, if the maximum value must be reassigned because the array value is bigger than the current maximum, that is another TWO operations [indexing and assignment]. Worst case, this would happen EVERY TIME through the loop so this contributes a total of 2 * n. Best case, it would happen only once for a total contribution of 2.
After this, the value of i needs to be incrememted, which is yet another TWO operations [addition and assignment], and this will happen n times for a contribution of 2n.
Returning the value of the variable currentMax is one primitive step.

One Way To Be Certain

A fairly easy way to determine this kind of algorithm analysis is to pick a small value for n and walk your way through the algorithm. Make a small table of how many times each operation occurs, in terms of primitives, then add up the totals to find out the actual equation.

For example, let's take the case in which myRA contains 5 elements, so that n is five. Here is a table that you can use to verify the operation list above:

For the loop with n = 5
iteration	i value	comparison	execute	increment
1	0	1	1	1
2	1	2	2	2
3	2	3	3	3
4	3	4	4	4
5	4	5	5	5
6	5	6	—	—

Looking at the bottom row, you can easily see that the number of comparisons performed is n + 1, and the number of increments performed is n. Also the number of times the loop is executed is also n. The dashes at the bottom of the execute and increment columns indicate that for the final row those operations are NOT performed.

So, to Sum It All Up…

To sum it all up, then, this loop executes generally at least:

1 + {1 + [2(n + 1)] + 2n} + 2n + 2 + 1 = 6n + 7

…and at most:

1 + {1 + [2(n + 1)] + 2n} + 2n + 2n + 1 = 8n + 5

Note that the best case happens when the value of myRA[0] is the biggest value in the array, so that there never has to be any reassignment. In contrast, the worst case for this algorithm would occur when the array is sorted in increasing order so that every iteration needs to perform the reassignment. Further, because we have to step through the entire array from beginning to end in sequence, this is known as a linear algorithm. This can also be seen from the fact that there are no exponents in either of the equations. A couple of other points to remember are:

Many problems and solutions have their own class of complexity, and different algorithms that attempt to solve them will have their own performance complexities as well.
Problem and performance complexities might be different, though it's generally the case that when they're the same, you've found a probable optimal solution.
For this class, the important distinction will be classifying the performances of the algorithms, comparing them, and being able to determine which are better solutions than others FOR PARTICULAR DATA SETS.
In particular, certain data structures will facilitate performance guarantees on different tasks.

Week Seven In Class Exercise

To give you some practice, here is a sheet of some examples on which you can try out your new-found expertise in calculating algorithmic complexity!

Week Seven Wrap-up

That's probably enough for the this week. Next week, we'll continue with analysis of algorithms and have some fun with quick verbal quizzes or perhaps getting people to write on the whiteboard in class.