CS330 Unix, Linux, and Tokenizing Strings


Highlights of this lab:


Unix Introduction

Brief History

Advantages of Unix

Unix Flavours

There are many different implementations of Unix. They all have subtle differences in the way that they operate. Most modern Unixes try to comply with the Single UNIX Specification (SUS), and most commercial UNIXes are officially registered as SUS complient.

A few commercial implementations available in the CS Department are:

Linux comes in many different versions, called distributions or distros. Some are free. Others, the user must pay for a support contract. Many smaller distributions are based on one of these. No Linux distros are officially SUS compliant because of the costs involved, but the Linux Standard Base (LSB) includes SUS standards.

Here's a list of some important Linux distributions:


Parts of Unix

Unix is organized at three levels:

  • kernel

“The UNIX kernel is built specifically for a machine when it is installed. It has a record of all the pieces of hardware it needs to talk to and knows what languages they speak (how to turn switches on and off to get a desired result).” http://www.extropia.com/tutorials/unix/kernel.html

  • shell

The Unix shell provides a user interface. “The most basic UNIX shell provides a 'command line' which allows you to type in commands which are translated by the shell into kernel speak and sent off to the kernel.” http://www.extopia.com/tutorials/unix/shells.html

  • tools and applications

These provide additional functionality to the operating system. To see some tools that you have access to check out: /bin or /usr/bin


More on the Unix Shell

There are several different shells, they offer their own advantages and disadvantages. For instance, some allow for auto completion using the tab key; others don't.

A few common shells are the following:

For more on shells, click here.

To see what shells exist on your current Unix system, try the following command:

$ cat /etc/shells

How do you get your shell?

Before you get a shell, you identify yourself with a login name and password. That login name is looked up in a file called /etc/passwd, which describes each user's account. More specifically, it tells your unique numeric ID, principal group ID, general information, home directory, and shell.

Try the following command:

$ cat /etc/passwd | grep yourusername

The format of the data in the /etc/passwd file is

Name:Password:UserID:PrincipleGroup:Gecos:HomeDirectory:Shell

Note, each of these attributes are separated by a : (colon). The last attribute on the line is your shell.

If you are feeling daring you can change your shell with the chsh command.

More information about /etc/passwd can be found here.


Review of Strings and C Strings

  C Strings Strings
general fixed length determined when declared, ends in '\0' dynamic length, can change length during the program
#include #include<cstring> #include<string>
declaring char cString[100]; string theString;
copying strncpy(cString,cString2,100); theString=theString2;
getting a line cin.getline(cString,100); getline(cin,theString);
determining length strlen(cString); theString.length();
comparing if(!strncmp(cString,cString2,100)) if (theString==theString2)

A handy thing to know is how to convert a String into a C String (for copying, perhaps?). The syntax is:

strncpy(cString,theString.c_str(),100);

You may also need a review of using getline to read lines until the end of a file. The following is meant as a refresher:

#include <iostream>
#include <fstream>
#include <string>
using namespace std;

int main()
{
    ifstream inFile("test.txt");
    string strOneLine;

    while (inFile)
    {
       getline(inFile, strOneLine);
       cout << strOneLine << endl;
    }

    inFile.close();

    return 0;
} 
Note:

Dynamically Allocating C Strings

The following is meant as a review of how to dynamically allocate and free up space.

C++ Style C Style
char *s4;

s4=new char[strlen("hello") + 1];   //determine size + 1 for null
//Copy the strings
strncpy(s4,"hello",6);  

//...

delete[] s4;
char *s4;

s4=(char*)malloc(strlen("hello") + 1);   //determine size + 1 for null
//Copy the strings
strncpy(s4,"hello",6);  

//...

free(s4);

Well, maybe it's not quite a review. Some of you may not have worked with malloc and free. The reason for introducing it now is that you can reduce the above code to the following:

char *s4;
   
s4=strdup("hello");   //make copy of "hello"

//...

free(s4);

strdup is not a part of the C or C++ standard; it is included in the POSIX standards. If you are lucky enough to be programming on a POSIX compliant OS such as Linux (the lab) or Solaris (Hercules) or Mac OS X then you can use strdup:

  • strdup(const char *s)
returns a pointer to a new string that is a duplicate of the string pointed to by s. The returned pointer should be released with free() because the space for the new string is obtained using malloc. If the new string cannot be created, a null pointer is returned.

Notes:


Splitting C Strings into Tokens

Sometimes you may want to split a line into tokens or words. To do that, there is a C String function called strtok. The prototype is:
char * strtok (char * str, const char * delimiters)
where str is the line (or C string) that you want to split into tokens or words, and delimiters are an array of characters in which any one of the characters delimits or marks the boundaries between words.

The following is an example of using strtok:

#include <iostream>
#include <cstring>
using namespace std;

int main(int argc, char *argv[])
{
   char cstr1[]="This is a sample string. Is it working?";
   char delim[]=" ,.-;!?";
   char *token;

   cout << "cstr1 before being tokenized: " << cstr1 << endl << endl;

   //In the first call to strtok, the first argument is the line to be tokenized
   token=strtok(cstr1, delim);
   cout << token << endl;


   //In subsequent calls to strtok, the first argument is NULL
   while((token=strtok(NULL, delim))!=NULL)
   {
         cout << token << endl;
   }
}

The output:

cstr1 before being tokenized: This is a sample string. Is it working?

This
is
a
sample
string
Is
it
working

There are a couple of "catches" with strtok:

  1. In the first call to strtok, the first argument is the line or C string to be tokenized; in subsequent calls to strtok, the first argument is NULL. Notice the two calls from the lines above:
  2. The original C string is modified when it is tokenized so that delimiters are replaced by null terminators ('\0'). The following represents what the C string in the sample code will look like after tokenizing:
    after tokenizing

 


Dynamic Arrays of C Strings

Sometimes you want to have a dynamically created array of C Strings. The following code demonstrates this:

#include <iostream>
#include <cstring>
using namespace std;

int main ()
{
  char **words;
  char tempWord[100];
  char endWord[]="330!";

  words = new char *[3]; //allocate pointers to three words

  //--------------
  //get two words from the user input--use strdup to dynamically allocate space

  cout << "Please input a word (less than 100 characters): ";
  cin >> tempWord;
  words[0]=strdup(tempWord);

  cout << "Please input a second word (less than 100 characters): ";
  cin >> tempWord;
  words[1]=strdup(tempWord);

  //--------------
  //the third one hard code copy of "330!" (endWord)

  words[2]=strdup(endWord);

  //--------------
  //print and clean up individual words as you go

  for (int i=0; i<3; i++)
  {
     cout << words[i] << endl;
     free(words[i]);   //remember that space was set aside by strdup

  }

  //--------------
  //Clean up the array of words
  delete [] words;     // cleans up words = new char *[3];
}

References and More Info

Extra Info: