CS115 Lab: Searching and Sorting


Highlights of this lab:

In this lab, we will discuss binary search in C++.  Suppose you have an array of elements (integer, char, string, etc) and you need to know whether one particular element is in the array, and its index in the array, you need to use some kind of searching algorithm to perform this task.  Binary search is one of these algorithms.  Binary search requires that the array to be searched is sorted, preferably in ascending order. The main idea of binary search is to divide the array into two equal parts and narrow down the range of the array to be searched either to the low part of the array or high part of the array.  Keep doing this until the value is located, or until we realize that the value searched for is not in the array.

An intuitive approach of searching an element in an array is to do a "sequential search", which means we traverse the array from index 0 to ArraySize - 1, where ArraySize is the number of elements in the array.  While sequential search does not require that the array to be searched is sorted, it is much less efficient when the array is sorted.  We will  start up by examining a program using sequential search and then move on to binary search.  A detailed list of topics for this lab is listed below: 

Lab Exercise:


Click the little computer above for a detailed description.
NOTE: Your lab instructor will tell you what will be marked for this lab.

"New" Unix/Linux Commands

Here are some more Unix/Linux commands that you need to learn. Remember that you can enter man command to get a complete description of any Unix/Linux command and its options.

Command Description
cal [month #] year Prints a calendar of the specified year. e.g. cal 2010
If a month number is specified, prints only that month. e.g. cal 3 2010 (for March 2010)
cat file1 [file2 ...] Concatenate (join together) specified files and direct the output to the standard output device - the screen.
This command is commonly used to display the contents of one file on the screen. (It's simpler than getting in and out of an editor.)
date Print the current time and date.
who Lists who is logged into a machine. It provides information such as the user's login name and the time when the user logged on.
w Lists who is logged into a machine. Provides information such as the user's login name and the time when the user logged on. It also provides information about what the user is curently doing.
sort Sorts the input stream or the contents of files. To sort the contents of a file, use sort filename.
wc Displays the number of lines, words and characters in a file. To display only the number of lines, you can use wc -l.
file file Perform tests on a file to determine its type. Useful if you want to make sure a file is not an executable before you try to edit it.
cmp file1 file2 Compare two files to see if they are the same. Reports just the first difference unless you specify -l
diff file1 file2 Displays the differences between file1 and file2. This lists the changes necessary to convert file1 to file2.
find path option Search down directories for a file. e.g. find   ./   -name gold.cpp would search in the current directory and in all subdirectories for the file called gold.cpp
grep [option] string [file(s)] Search for a string pattern in a file. There are several options. e.g. grep   namespace *.cpp would search the current directory for the string "namespace" in all .cpp files and show the lines in each file where the string occurs. e.g. grep   -n   namespace *.cpp would perform the same search but also give the line numbers in which the string was found.
ps Lists the processes that are running for a terminal. To see all the processes that are running for you, use ps -fu yourusername. This command is often used with kill.
kill [option] processid Kill the process specified. e.g. kill -9 1455 would perform a "sure kill" (option 9) on process id "1455". This is a handy command if you change your mind after sending a job to the printer and want to delete it from the queue. See the lpq command to see how you can query the print queue for process ids.
lpq -P[printername] Query the specified printer to see active jobs. Reports process ids of jobs. e.g. lpq -Pcl122
quota -v Show how much disk space you are using ("usage") on a multi-user Unix system and what your limit is ("quota"). The numbers given refer to kilobytes of space.

Notes:

Summary of Redirection/Pipe Commands

In a previous lab, you learned about piping and redirection. The following table is meant as a review.

Symbol Description
> file Redirects standard output to file.
>! file Redirects standard output to file. This will overwrite the contents of a file.
>> file Redirects standard output to file. This will append to the end of a file.
< file Redirects the contents of file to standard input. If a program accepts keyboard input and you use this redirection, all of the input will come instead from file.
command1 | command2 The standard output of command1 is fed as standard input into command2.

Building Executables with "make"

You can use the Unix utility called make to help you create an executable file from several C++ files. There are many other uses for make, but this lab focuses on how to use it to help simplify working with C++ files. Here is a list of references for using make: When you type make at the Unix/Linux command prompt, it looks for a file called Makefile or makefile which you would have created with a text editor. Inside the Makefile you have to put in the names of the C++ and object files that you need to create the executable file for the C++ project you are working with.

Let's look at an example of how you would create a C++ project without a Makefile. We can then see how to make life simpler by using a Makefile to create that same C++ project.

We will work with files that you may have seen before. If you don't have them, you can copy them now:

cp /net/data/ftp/pub/class/170/ftp/cpp/SeparateCompile/main.cpp main.cpp
cp /net/data/ftp/pub/class/170/ftp/cpp/SeparateCompile/myFunction.cpp myFunction.cpp
cp /net/data/ftp/pub/class/170/ftp/cpp/SeparateCompile/myFunction.h myFunction.h
To obtain an executable, you must type: Now every time you make a change to either main or myFunction, you have to recompile the file that you modified and then perform the link again. And if you make a change to the .h file that all of those .cpp files depended upon, you have to recompile everything and do the link again. Of course you can just do this: but that is time-consuming and not at all necessary. It is much better to only compile the files you need to. This is where a Makefile can help you.

You have to be observe a little syntax when you create your Makefile with a text editor. Here are a few very simple rules:

This is a simple introduction to Makefiles. As you become comfortable with them you will find that you can elaborate with targets to clean up old .o files and so on. However, to start, here is a simple Makefile to deal with the myFunction project.
# Makefile to create the basic myFunction project

main.o:  main.cpp myFunction.h
        g++ -c main.cpp

myFunction.o: myFunction.cpp myFunction.h 
        g++ -c myFunction.cpp

main: myFunction.o main.o
        g++ myFunction.o main.o -o main
Now that you understand what is in a Makefile you need to know how to use it. It's quite straightforward - simply enter make target at the system prompt. The make process will report on what it is doing. The following examples illustrate how to compile a single module, and how to link all files.
[26]% make myFunction.o
        g++ -c myFunction.cpp

[27]% make main
        g++ -c main.cpp
        g++ myFunction.o main.o -o main
[28]%
If you enter a target that is already current, "make" reports that.
e.g.
[22]% make main.o
main.o is up to date.
The bottom line here is that Makefiles help you work with multiple C++ files that make up a project. By only compiling the C++ files that you have to compile, you save a lot of time recompiling files needlessly. For greater detail on Makefiles, go back to any of the tutorials given at the start of this explanation.

Sequential search

    Sequential search is the intuitive approach of searching an array for a particular element.  The idea is to start from either the beginning or end of the array and traverse the array element one by one until the target is located or until all the elements in the array have been visited.  Keep in mind that this approach, although easy to understand, could be very expensive.  This is especially true when the array is very large and the element to be searched for is the farthest one from your starting point.

    The following is a sample program demonstrating the idea of sequential search.  The program includes three  files: SeqSearch.h, SeqSearch.cpp, and SeqMain.cpp.

Program listing: SeqSearch.h

// File name: ~ftp/pub/class/170/ftp/cpp/BinarySearch/SeqSearch.h
// Purpose:   Interface for SeqSearch.cpp

#include <iostream>
#include <iomanip>  // needed by setw
#include <cstdlib>  // needed for random number generator
#include <ctime>    // used to seed random numbers

using namespace std;

const int MAXINDEX = 100;  

 // define a type with the name IntArrayType
 // when we need to declare an int array with MAXINDEX elements,
 // we could simple write: IntArrayType IntArray;
 // IntArray is big enough to hold MAXINDEX integers.
typedef int IntArrayType[MAXINDEX];

////////////////////////
// Function prototypes:
////////////////////////

int SeqSearch(IntArrayType IntArray, int Count, int Target);

void GenerateArray(IntArrayType IntArray, int Count);

void PrintArray(IntArrayType IntArray, int Count);

Program listing: SeqSearch.cpp

//File name: ~ftp/pub/class/170/ftp/cpp/BinarySearch/SeqSearch.cpp
// Purpose:  Implementation of SeqSearch.h

#include "SeqSearch.h"

using namespace std;

////////////////////////////////////////////////////////////////////
// Function: SeqSearch(IntArrayType IntArray, int Count, int Target) 
// Purpose:  Given an integer array with "Count" number of elements,
//           try to find "Target".  
// Parameters: IntArray: an integer array to hold MAXINDEX elements
//             Count:    number of elements
//             Target:   the integer you are searching for
// Return:     The position of the Target in the array
//             -1 if not found
////////////////////////////////////////////////////////////////////
int SeqSearch(IntArrayType IntArray, int Count, int Target)
{
   int k;

   for (k = 0; k < Count; k++)
      if (IntArray[k] == Target)
         return k;

   return -1;   // If reach here, Target was not found.
}

//////////////////////////////////////////////////////////////////////
// Function: void GenerateArray(IntArrayType IntArray, int Count);
// Purpose:  Given "Count", generate "Count" number of random integers
//           and put them in IntArray
// Parameters: IntArray: an integer array to hold MAXINDEX elements
//             Count:    number of elements in the array
// Return:     None
//////////////////////////////////////////////////////////////////////
void GenerateArray(IntArrayType IntArray, int Count)
{
   int k;

   srand((unsigned) time(NULL));  // seed the random number generator

   for (k = 0; k < Count; k++)
      IntArray[k] = rand()%1000;
   // Note that rand() returns a random integer between 0 and
   // RAND_MAX.  We use modulo - % - to limit the integer value to a
   // smaller range
}

////////////////////////////////////////////////////////////////////////////
// Function: void PrintArray(IntArrayType IntArray, int Count);
// Purpose:  Print out the content of "IntArray"
// Parameters: IntArray: an integer array to hold MAXINDEX elements
//             Count:    number of elements in the array
// Return:     None
////////////////////////////////////////////////////////////////////////////
void PrintArray(IntArrayType IntArray, int Count)
{
   int k;

   //Print header
   cout << "Ones ->";
   for (k = 0; k < 10; k++)
   {
      cout << setw(4) << k << " |";
   }
   cout << endl;
   cout << "Tens ||" << setw(60) << setfill('=') << '|' << endl << setfill(' ');
   
   for (k = 0; k < Count; k++)
   {
      //Label new rows
      if (k % 10 == 0)
         cout << setw(4) << k/10 << " ||";
      
      //Print the array value
      cout << setw(4) << IntArray[k] << " |";

      //End rows
      if (k % 10 == 9)
         cout << endl;   
   }

   cout << endl;
}

 Program listing: SeqMain.cpp

// File name: ~ftp/pub/class/170/ftp/cpp/BinarySearch/SeqMain.cpp
// Purpose:   Driver program for sequential search
#include "SeqSearch.h"

using namespace std;

int main(void)
{
   IntArrayType IntArray;
   int Num, Position;

   GenerateArray(IntArray, MAXINDEX); // gengerate 100 random integers

   cout << "Here is the content of the array: " << endl;
   PrintArray(IntArray, MAXINDEX);

   cout << "Enter the number to search for: ";
   cin >> Num;

   // Search the IntArray for Num and return its position
   Position = SeqSearch(IntArray, MAXINDEX, Num);

   if (Position == -1)  // not found
      cout << "Cannot find " << Num << "." << endl;
   else
      cout << "Found " << Num << " in position " << Position << ".\n";

   return 0;
}

To try the linear search program just copy, download, or ftp these files. You will need to use separate compilation to make them work.

FTP path: pub/class/170/ftp/cpp/BinarySearch

Files: SeqSearch.h SeqSearch.cpp SeqMain.cpp

cp /net/data/ftp/pub/class/170/ftp/cpp/BinarySearch/SeqSearch.h SeqSearch.h
cp /net/data/ftp/pub/class/170/ftp/cpp/BinarySearch/SeqSearch.cpp SeqSearch.cpp
cp /net/data/ftp/pub/class/170/ftp/cpp/BinarySearch/SeqMain.cpp SeqMain.cpp

Running Result:

hercules[602]% CC -o SeqSearch SeqSearch.cpp SeqMain.cpp
SeqSearch.cpp:
SeqMain.cpp:
hercules[603]% SeqSearch
Here is the content of the array:
Ones ->   0 |   1 |   2 |   3 |   4 |   5 |   6 |   7 |   8 |   9 |
Tens ||===========================================================|
   0 ||   7 | 891 | 736 | 275 | 876 |  12 | 413 | 743 | 306 | 843 |
   1 || 702 | 581 | 658 | 298 | 810 | 265 |  19 | 834 | 886 | 534 |
   2 || 322 |  23 | 717 | 943 | 734 | 859 | 112 | 137 |  29 | 151 |
   3 || 439 | 944 | 156 | 842 | 286 | 883 | 730 | 600 | 710 | 178 |
   4 || 665 | 525 |  50 | 863 | 657 | 312 | 772 |  95 | 753 | 724 |
   5 || 305 | 242 | 701 | 702 | 350 | 855 | 331 | 972 | 619 | 118 |
   6 || 651 | 483 | 484 | 865 | 911 | 526 | 497 | 753 | 320 | 869 |
   7 || 249 | 702 | 667 | 759 | 542 | 174 | 539 | 140 | 342 | 404 |
   8 || 368 | 914 | 476 | 159 |  28 | 601 | 549 | 138 | 186 | 247 |
   9 ||  44 | 589 | 144 | 902 |  81 | 245 | 378 | 538 | 792 | 179 |

Enter the number to search for: 95
Found 95 in position 47.
hercules[604]%

Sorting

Sorting information helps people guess where they need to look for what they want. Imagine trying to use a phone book where the names are out of order. Last time you needed a specific piece of information from a textbook you probably turned to the index at the back. Sorted information is also much easier for computers to work with, as we'll see in the next section.

There are many kinds of sorting routines. Some are easy to understand and to program, others are very complex. A couple of simple and well known sorts are bubble sort and selection sort. This section introduces selection sort.

Assume we have an array of n elements and we want to sort in descending order. The selection sort algorithm would go like this:

Set Current_Element to the start of the array
Repeat until Current_Element at the end of the array
     Linear search from Current_Element to the end of the array for the Largest_Element
     If the Largest_Element is not the Current_Element swap them
     Move Current_Element to next element

Pretty simple, huh? This sort always does n searches with an average size of n/2 elements. That means that the time the sort takes grows quickly, (n2)/2, with more elements. That's about four times longer for double the elements. You need to use the sorted array a few times to recoup the cost. (More advanced sorts are much faster, about n*log(n), on average.) Here's the code for a Selection Sort function. You will need it later on in the lab. (Don't worry, there will be a link back to this point.)

    //Prototype for SelSort
    void SelSort(IntArrayType IntArray, int Count);

////////////////////////////////////////////////////////////////////
// Function: SelSort(IntArrayType IntArray, int Count)
// Purpose:  Given an integer array with "Count" number of elements,
//           sort the elements in ascending order
// Parameters: IntArray: an integer array to hold MAXINDEX elements
//             Count:    number of elements
// Return:     None
////////////////////////////////////////////////////////////////////
void SelSort(IntArrayType IntArray, int Count)
{
   int current_ele,search_ele,search_max,temp;

   for (current_ele = 0; current_ele < Count; current_ele++)
   {
      search_max=current_ele;
      // Find index of largest element in unsorted section of elements
      for(search_ele = current_ele+1; search_ele < Count; search_ele++)
	 if(IntArray[search_max] < IntArray[search_ele])
	    search_max=search_ele;

      // Exchange items at position search_max and current_ele
      if (search_max > current_ele)
      {
	 temp=IntArray[search_max];
	 IntArray[search_max]=IntArray[current_ele];
	 IntArray[current_ele]=temp;
      }
   }
}

To try this selection sort implementation just copy, download, or ftp these files.

FTP path: pub/class/170/ftp/cpp/BinarySearch

Files: SelSortMain.cpp SelSort.h SelSort.cpp

cp /net/data/ftp/pub/class/170/ftp/cpp/BinarySearch/SelSortMain.cpp SelSortMain.cpp
cp /net/data/ftp/pub/class/170/ftp/cpp/BinarySearch/SelSort.h SelSort.h
cp /net/data/ftp/pub/class/170/ftp/cpp/BinarySearch/SelSort.cpp SelSort.cpp

Binary Search

    As was mentioned earlier, the idea of binary search is to cut an array (sorted) into two parts and discard the part which could not contain our searching target.  How do we know which part potentially contains our target?  Well, since the array is sorted (generally in ascending order), if one of the two parts of the array, whose largest value is smaller than our target, or  whose smallest value is larger than our target, then this part does not contain our target.  

    Here is the code for Binary Search:

// Code for BinarySearch
int BinarySearch(IntArrayType IntArray, int Low, int High, int Target)
{
   int Mid, Difference;

   while (Low <= High)
   {
      Mid = (Low + High) / 2;
      Difference = IntArray[Mid] - Target;

      if (Difference == 0)         // IntArray[Mid] == Target
         return Mid;
      else if (Difference < 0)     // IntArray[Mid] < Target
         Low = Mid + 1;
      else
         High = Mid - 1;
   }

   return -1;   // If reach here, Target was not found.
}

This is best illustrated with figures.

    Suppose we have an integer array with 10 elements, from 0 to 9, and we would like to search for the value 7:

    

     

     In the beginning, Low = 0 (starting index of the array), High = 9 (ending index of the array) and Mid = (Low+High)/2 = 4. We first check if the value at index 4 is equal to our "Target". IntArray[Mid] = 4 != 7, so we check if the difference between IntArray[Mid] and Target is greater or less than 0. It turns out that it is less than 0, so we update the value for Low: Low = Mid + 1.  See the following figure:

    

     In the above figure,  Low = Mid + 1 = 4 + 1 = 5, High remains unchanged, and Mid = (Low + High)/2 = (5 + 9)/ 2 = 7. The result of this is to ignore the first half of the array (The shaded part) and search for the target 7 in the second half of the array. The reason is that the largest element in the first part of the array, IntArray[4] = 4 is less than Target.

    Again, we check if the value at index Mid matches the Target. It does, so we return the index Mid = 7, which is the location for the target.

    Following is the complete program demonstrating binary search of an integer array. It contains three 3 files:  BinSearch.h, Binsearch.cpp and BinMain.cpp.

Program listing: BinSearch.h

// File name: ~ftp/pub/class/170/ftp/cpp/BinarySearch/BinSearch.h
// Purpose:   interface for BinSearch.cpp

#include <iostream>
#include <iomanip>   // needed for setw
#include <cstdlib>   // needed for random number generator
#include <ctime>     // used to seed random numbers

using namespace std;


const int MAXINDEX = 100;

typedef int IntArrayType[MAXINDEX];


// Function prototypes:

int BinarySearch(IntArrayType IntArray, int Low, int High, int Target);

void GenerateArray(IntArrayType IntArray, int Count);

void PrintArray(IntArrayType IntArray, int Count);

Program listing: BinSearch.cpp

// File name: ~ftp/pub/class/170/ftp/cpp/BinarySearch/BinSearch.cpp
// Purpose    implementatin of BinSearch.h

#include "BinSearch.h"
using namespace std;


////////////////////////////////////////////////////////////////////////////
// Function: void GenerateArray(IntArrayType IntArray, int Count)
// Purpose:    Generate an array of random integers in ascending order
// Parameters: IntArray: Array of integers in acending order
//             Count :   The number of integers in the array 
// Return:     None
////////////////////////////////////////////////////////////////////////////
void GenerateArray(IntArrayType IntArray, int Count)
{
   int k;

   srand((unsigned) time(NULL)); // seed the rand() function

   IntArray[0] = rand()%10; // pick a random number as the first value
   for (k = 1; k < Count; k++)
      // make sure the array is in ascending order
      IntArray[k] = IntArray[k - 1] + rand()%10 + 2;
}

////////////////////////////////////////////////////////////////////////////
// Function: void PrintArray(IntArrayType IntArray, int Count)
// Purpose:    printing the array 
// Parameters: IntArray: Array of integers in acending order
//             Count :   The number of integers in the array 
// Return:     None
////////////////////////////////////////////////////////////////////////////
void PrintArray(IntArrayType IntArray, int Count)
{
   int k;

   //Print header
   cout << "Ones ->";
   for (k = 0; k < 10; k++)
   {
      cout << setw(4) << k << " |";
   }
   cout << endl;
   cout << "Tens ||" << setw(60) << setfill('=')<< '|' << endl << setfill(' ');
   
   for (k = 0; k < Count; k++)
   {
      //Label new rows
      if (k % 10 == 0)
         cout << setw(4) << k/10 << " ||";
      
      //Print the array value
      cout << setw(4) << IntArray[k] << " |";

      //End rows
      if (k % 10 == 9)
         cout << endl;   
   }

   cout << endl;
}



////////////////////////////////////////////////////////////////////////////
// Function: 
//     int BinarySearch(IntArrayType IntArray, int Low, int High, int Target)
// Purpose:    search "Target" in an ordered integer array
// Parameters: IntArray: Array of integers in acending order
//             Low:      The starting index
//             High:     The ending index
//             Target:   The integer value to search
// Return:     The index of the value if found
//             -1 if not found
////////////////////////////////////////////////////////////////////////////
int BinarySearch(IntArrayType IntArray, int Low, int High, int Target)
{
   int Mid, Difference;

   while (Low <= High)
   {
      Mid = (Low + High) / 2;
      Difference = IntArray[Mid] - Target;

      if (Difference == 0)         // IntArray[Mid] == Target
         return Mid;
      else if (Difference < 0)     // IntArray[Mid] < Target
         Low = Mid + 1;
      else
         High = Mid - 1;
   }

   return -1;   // If reach here, Target was not found.
}

    A few comments about the function to generate an array for binary search.  Since binary search requires that the array be sorted in ascending order, when using the random number generator to generate integers, we need to make sure that the integers are sorted. The way to do it is to pick a random number as the first element of the array. The second value is the first one plus a random integer between 0 and 9, plus 2. This way, we have an array of random integers sorted in ascending order without repeating values.

Program listing: BinMain.cpp

// File name:  ~ftp/pub/class/170/ftp/cpp/BinarySearch/BinMain.cpp
// Purpose:    Driver program for binary search program

#include "BinSearch.h"
using namespace std;

int main(void)
{
   IntArrayType IntArray;
   int Num, Pos;

   GenerateArray(IntArray, MAXINDEX); // generate sorted integer array

   cout << "The contents of the array are: " << endl;
   PrintArray(IntArray, MAXINDEX);

   cout << "Enter the integer to search for: " << endl;
   cin >> Num;

   Pos = BinarySearch(IntArray, 0, MAXINDEX - 1, Num);

   if(Pos == -1)
	cout << "Not found" << endl;

   else
	cout << "Number " << Num << " found in Position " << Pos << endl;

   return 0;
}



To try the binary search program just copy, download, or ftp these files. You will need to use separate compilation to make them work.

FTP path: pub/class/170/ftp/cpp/BinarySearch

Files: BinSearch.h BinSearch.cpp BinMain.cpp

cp /net/data/ftp/pub/class/170/ftp/cpp/BinarySearch/BinSearch.h BinSearch.h
cp /net/data/ftp/pub/class/170/ftp/cpp/BinarySearch/BinSearch.cpp BinSearch.cpp
cp /net/data/ftp/pub/class/170/ftp/cpp/BinarySearch/BinMain.cpp BinMain.cpp

Running result:

hercules[560]% CC -o BinSearch BinSearch.o BinMain.o
hercules[561]% ./BinSearch
The contents of the array are:
Ones ->   0 |   1 |   2 |   3 |   4 |   5 |   6 |   7 |   8 |   9 |
Tens ||===========================================================|
   0 ||   0 |   2 |  11 |  17 |  28 |  38 |  42 |  47 |  51 |  53 |
   1 ||  57 |  65 |  67 |  70 |  81 |  90 |  97 | 106 | 114 | 122 |
   2 || 131 | 134 | 138 | 140 | 145 | 154 | 163 | 166 | 176 | 183 |
   3 || 191 | 195 | 202 | 209 | 214 | 220 | 222 | 232 | 240 | 249 |
   4 || 252 | 262 | 269 | 277 | 280 | 286 | 292 | 300 | 302 | 312 |
   5 || 318 | 325 | 329 | 331 | 341 | 343 | 351 | 353 | 356 | 358 |
   6 || 363 | 367 | 377 | 381 | 392 | 402 | 410 | 416 | 420 | 429 |
   7 || 439 | 442 | 445 | 447 | 455 | 458 | 464 | 473 | 479 | 490 |
   8 || 492 | 494 | 498 | 507 | 517 | 521 | 528 | 536 | 546 | 550 |
   9 || 561 | 565 | 570 | 581 | 592 | 599 | 605 | 611 | 614 | 623 |

Enter the integer to search for:
447
Number 447 found in Position 73
hercules[562]%

    


Lab Exercise 


This page last modified:
Friday, 25-Sep-2009 17:04:46 CST

CS Dept Home Page
CS Dept Class Files
CS115 Lab Files

Copyright: Department of Computer Science, University of Regina.