CMSI 2120: Welcome to Week 07

This Week's Agenda

For this week, here's the plan, Jan…

Test Coverage

I'll provide the answers to any questions you have about the test. If you want to actually work the entire test, we can do that, but it will put us behind in covering the topics we need to cover during the semester, so it might be better to spend class time on questions that more than one person share. We'll play ♫ it by 👂 ear 𝄫 though…

…In Which We Get Below Deque

We will re-visit the Deque Data Structure briefly, to clarify some of the questions people have asked over the last several days about homework #2. Go check out the new additional description here.

Sorting Redux

As a review, here is the code from last week. I've filled in the missing parts so you can see how it actually looks for the Bubble Sort:

  /**
   * First, the swap function; note that we are ONLY dealing with int
   *  also, remember this is intended to be in the IntList class from week 03
   */
   public void swapIntsAt( int index1, int index2 ) {
      int temp = theList[index1];
      theList[index1] = theList[index2];
      theList[index2] = temp;
   }

   // let's define a "toString()" method while were here,
   //  so we have a way of printing the entire list
   public String toString() {
      String output = "";
      for( int i = 0; i < size; i++ ) {
         output += theList[i];
      }
      return output;
   }
            

One change I made in the IntList class was to make the list array a public variable. That makes it easier to access from the IntSorter class. There are other [better?] ways to do that which don't break encapsulation, but they are a bit involved and will distract us from the issue at hand, so we won't go into that at this juncture.

OK, now we have a way to swap two values in the list, so now we can make a sorting class that uses the swapping method as part of a comparison. We'll call it [because we are just SOOOOOO imaginative] the intSorter class. Here is the code to implement it:

   public class IntSorter {

      // Our first pass at sorting, a simple bubble sort; we'll add more later
      public static void bubbleSort( IntList listToSort ) {

         for( int i = 0; i < listToSort.getSize(); i++ ) {
            // Track if a swap has been made
            boolean swapped = false;

            // Iterate through all elements of the list starting at the end element
            //  and continuing up to the i + 1 element
            for( int j = listToSort.getSize() - 1; j > i; j-- ) {

                // Swap if the two currently adjacent items in the 2nd loop iteration
                //  are out of order
                if( listToSort.theList[j] < listToSort.theList[j-1] ) {
                    listToSort.swapIntsAt( j, j-1 );
                    swapped = true;       // Mark that you have swapped
                }
            }

            // It is time to return if you made no swaps
            if( !swapped ) {
               return;
            }
         }
      }

      // OK, now test it out!  Add 51, 23, 17, 43, and 31 to the list,
      //  print it out, then sort it and print it again
      public static void main( String[] args ) {
         IntList orgList = new IntList();
         orgList.append(51);
         orgList.append(23);
         orgList.append(17);
         orgList.append(43);
         orgList.append(31);
         System.out.println("Pre-sort:  " + orgList.toString() );
         bubbleSort( orgList );
         System.out.println("Post-sort: " + orgList.toString() );
      }
   }
            

Notice a couple of things about this algorithm and test code. First, the algorithm works from the big end towards the small end of the array, in other words, it works backwards. This is so the largest values get pushed to the right of the array, but we could easily do it the other way round by using the SMALLEST values for comparison and reversing the condition in the swap funciton.

Second, notice that the swapping function takes in indices of the values, NOT the values themselves. The swapping is done using a third variable as a temporary holding space for the value, and the result is the values are swapped in place within the array.

Third, note that the algorithm requires that we keep track of whether a swap was made during that pass through the array, and that we aren't done unless no swap was made. We thus must keep resetting the value of the swapped variable to false at the start of every loop iteration.

Finally, note that in the toString() method Java will automatically convert the values of the list from the int to the String data type when including them in the output string. Nice, eh?

Insertion Sort

Next along the line is another common sorting method that is simple to implement and works well with the IntList class, so we can add that to our intSorter class. This is called the insertion sort and it works much the same way as people do alphabetical sorting. You select one item from the group, and it starts a pile. You select another one, and put it on the top or bottom of the first. Then you select another, and insert that one in its proper location: on top, on the bottom, or between the other two items. You keep going through the items, inserting them in their proper places, until all the items have been inserted in the list.

The advantages of an insertion sort are that it's simple, it's efficient for small data sets, it does not change ordering of elements with keys that are the same, and it can be done in place so that it doesn't take a lot more memory space to perform.

What does this look like when we're using an array of values and we're not getting them dealt to us one at a time? Well, the simple way is to put them into the IntList in the order we get them and then Do an insertion sort on them, basically doing the same thing we'd do with a hand of cards. Insertion sort does the following steps:

  1. Start at the left end index plus one
  2. Compare the current value with the previous value [meaning the one to the left of it]
  3. Swap them if if current is smaller than previous
  4. Go to the next item to the right
  5. Repeat steps 2 through 4 until all the values have been compared
  6. The list will be sorted

Here is some code to implement that, added into the code for our IntSorter class:

   public class IntSorter {

     // Our first pass at sorting, a simple bubble sort; we'll add more later
      public void bubbleSort( IntList listToSort ) {

         for( int i = 0; i < listToSort.getSize(); i++ ) {
           // Track if a swap has been made
            boolean swapped = false;

           // Iterate through all elements of the list starting at the end element
           //  and continuing up to the i + 1 element
            for( int j = listToSort.getSize() - 1; j > i; j-- ) {

               // Swap if the two currently adjacent items in the 2nd loop iteration
               //  are out of order
                if( listToSort.theList[j] < listToSort.theList[j-1] ) {
                    listToSort.swapIntsAt( j, j-1 );
                    swapped = true;       // Mark that you have swapped
                }
            }

           // It is time to return if you made no swaps
            if( !swapped ) {
               return;
            }
         }
      }

     // The second pass at sorting, the insertion sort
      public IntList insertionSort( IntList listToSort ){

         int temp;
         for (int i = 1; i < listToSort.getSize(); i++) {
            for(int j = i ; j > 0 ; j--){
               if(listToSort.theList[j] < listToSort.theList[j-1]){
                  listToSort.swapIntsAt( j, j-1 );
               }
            }
         }
         return listToSort;
      }

     // OK, now test it out!  Add 51, 23, 17, 43, and 31 to the list,
     //  print it out, then sort it and print it again
      public static void main( String[] args ) {
         IntList orgList = new IntList();
         IntSorter s = new IntSorter();
         orgList.append(51);
         orgList.append(23);
         orgList.append(17);
         orgList.append(43);
         orgList.append(31);
         System.out.println("Pre-sort:  " + orgList.toString() );
         s.bubbleSort( orgList );
         System.out.println("Post-sort: " + orgList.toString() );
         orgList.append(251);
         orgList.append(223);
         orgList.append(217);
         orgList.append(243);
         orgList.append(231);
         System.out.println("Pre-sort:  " + orgList.toString() );
         s.insertionSort( orgList );
         System.out.println("Post-sort: " + orgList.toString() );
      }
   }
            

Here's another link to the sorting page that has all those different sorting demonstrations that you can play to compare.

Notice that when you are running the Insertion Sort vs. the Bubble sort, the number of items in the list has a definite effect on the time duration, which becomes more dramatic with the size of the list. For example, here is a comparison table:

ItemsInsertion TimeBubble Time
206.14 seconds9.0 seconds
5032.72 seconds58.02 seconds

Admittedly this is not very scientific, since I just timed it with the stopwatch on my cell phone, but you can see that there is quite a difference. With 20 items, the Bubble sort takes about 1.5 times as long. With 50 items, the Bubble Sort takes 1.77 times as long, so it's not consistently the same amount longer!

Data Structures Comparisons

So, you might ask, what difference does all this make? Why do we need different sorting methods, and different data structures? Well, think about it:

Would a stack be a good data structure to use for printing when multiple people need to print? Why or why not?

Would a queue be good for passing parameters when calling functions? Why or why not?

OK, so when you think about it, different data structures serve different purposes. Likewise, a different algorithm might serve a better purpose for the operation that you are trying to perform on a certain type or certain set of data.

Algorithm Performance Analysis

Consider the following two algorithms for testing our sorting methods:


      private static int TEST_SIZE = 200000;

      public void testArryaListPrepend() {
         ArrayList<String> myRA = new ArrayList<String>();
         for( int i = 0; i < TEST_SIZE; i++ ) {
            myRA.add( 0, "" + i );
         }
      }

      public void testLinkedListPrepend() {
         LinkedList<String> myList = new LinkedList<String>();
         for( int i = 0; i < TEST_SIZE; i++ ) {
            myList.add( 0, "" + i );
         }
      }
            

Code this up and try it for yourself. You'll find there is a significant difference in the time it takes to execute the two different methods. [WHY IS THAT?]

This should also prove to you that there are good reasons for differences in efficiency, which means there are specific reasons why you should choose one or the other for a specific application.

Data StructureWhat it's good for:What it's NOT good for:
Sequential list
[Array-based]
if you know the size beforehand
quick access of items
insertion / removal at end of list
arbitrary insertion / removal
doesn't easily shrink once expanded
Linked list
[Node-based]
unknown size until runtime
heterogeneous collections
arbitrary insertion / removal
fast querying/sorting of elements

Algorithm Running Times

Now we have a justification for being able to analyze the running time of two algorithms to see if one is more effective and/or efficient than the other! YAY! Always remember, though, that you must take into account the data on which the algorithm will operate!

OK, so now let's start doing some interesting analysis of the algorithms. We saw that the operation of the two sorting algorithms we implmented above is vastly different, even for a relatively small set of data which the computer is actually designed to operate with efficiently -- integers. When we do that sort of analysis, we can say that one algorithm is computationally more expensive than the other, meaning that one requires more processing than the other. This statement is the beginning of the idea of characterizing algorithms formally. In the case of our two sorts, this is NOT really an objective metric, because we can't really easily measure the exact time that either took; there are too many potentially contributing factors which can skew the results, like if two computers have different clock speeds or operating systems, initial data sortedness, or even if some process happens to start in the background while the algorithm is processing. So, we need a more objective way of doing these characterizations.

To help us out, we have a couple of different ways to do this activity that have already been invented and passed down to us from the exalted mountaintop:

  1. Run-time or Performance Analysis which provides a classification of running time of the algorithm. This is performed using analysis of the number of primitive or atomic operations that the operation requires as a function of its input size.
  2. Problem Complexity Analysis that revolves around the etymology of computational problems based on their inherent difficulty, then comparing these classifications to each other.

For our purposes in this class, we will focus on the first type, and leave the second type to your CMSI 385 Introduction to Theory of Computation class.

Input Size and Growth Rate

Input size is a fairly non-specific term, but what it means is usually based on the number of things that the algorithm has to process in a particular data structure. Algorithms also need to account for the growth rate in the number of steps that are required as the size of the input gets larger. For example, an algorithm may be fine for small data sets, but may become unusable for large data sets. Remember that computers are so fast that frequently any pair of algorithms can process a small data set without any appreciable difference in the running time, so that it is only when the data set gets to be large that the difference becomes apparent. Further, it would be nice to know how the algorithm might perform at scale, meaning with large or even distributed inputs.

Two assumptions that are made about the computing hardware in relation to this analysis are:

  1. Single Processor assumes that the there is a single processor executing the instructions in a linear fashion. The single processor assumption means that there is no parallel processing done; there is assumed to be a single core and a single processor, no network between processors, no parallel/distributed processing taking place, ONLY one set of instructions being performed linearly.
  2. Uniform Cost assumes that every atomic machine instruction takes the same amount of time to execute; i.e., addition, moving from RAM, moving between registers, storing to hard disk, division, &c. all take the same number of processor cycles. We know this isn't true in reality; multiplication operations take MUCH longer than a move from the EAX to the EBX register inside the processor, but we idealize this processing in much the same way that we use an abstract model when we design our software.

What are Primitive Operations

The following are considered primitive operations in the way we will use them for analysis of algorithms that we'll look at:

An Example to the Rescue!

So, how do we do this? Those of you from my CMSI 185 class may remember a bit about this. We did it by simply counting how many times each instruction in our script was executed. For example, here is some code that we can use to start:

   public int findMaxInArray( int[] myRA ) {
      int currentMax = 0;
      for( int i = 0; i < myRA.length; i++ ) {
         if( currentMax < myRA[i] ) {
            currentMax = myRA[i];
         }
      }
      return currentMax;
   }
            

Now, let's count 'em up:

One Way To Be Certain

A fairly easy way to determine this kind of algorithm analysis is to pick a small value for n and walk your way through the algorithm. Make a small table of how many times each operation occurs, in terms of primitives, then add up the totals to find out the actual equation.

For example, let's take the case in which myRA contains 5 elements, so that n is five. Here is a table that you can use to verify the operation list above:

For the loop with n = 5
iterationi valuecomparisonexecuteincrement
10111
21222
32333
43444
54555
656

Looking at the bottom row, you can easily see that the number of comparisons performed is n + 1, and the number of increments performed is n. Also the number of times the loop is executed is also n. The dashes at the bottom of the execute and increment columns indicate that for the final row those operations are NOT performed.

So, to Sum It All Up…

To sum it all up, then, this loop executes generally at least:

1 + {1 + [2(n + 1)] + 2n} + 2n + 2 + 1 = 6n + 7

…and at most:

1 + {1 + [2(n + 1)] + 2n} + 2n + 2n + 1 = 8n + 5

Note that the best case happens when the value of myRA[0] is the biggest value in the array, so that there never has to be any reassignment. In contrast, the worst case for this algorithm would occur when the array is sorted in increasing order so that every iteration needs to perform the reassignment. Further, because we have to step through the entire array from beginning to end in sequence, this is known as a linear algorithm. This can also be seen from the fact that there are no exponents in either of the equations. A couple of other points to remember are:

Week Seven In Class Exercise

To give you some practice, here is a sheet of some examples on which you can try out your new-found expertise in calculating algorithmic complexity!


Week Seven Wrap-up

That's probably enough for the this week. Next week, we'll continue with analysis of algorithms and have some fun with quick verbal quizzes or perhaps getting people to write on the whiteboard in class.