CS330 Unix, Linux, and Tokenizing Strings


Highlights of this lab:


Unix Introduction

Brief History

Advantages of Unix

Unix Flavours

There are many different implementations of Unix. They all have subtle differences in the way that they operate. Most modern Unixes try to comply with the Single UNIX Specification (SUS), and most commercial UNIXes are officially registered as SUS complient.

A few commercial implementations available in the CS Department are:

Linux comes in many different versions, called distributions or distros. Some are free. Others, the user must pay for a support contract. Many smaller distributions are based on one of these. No Linux distros are officially SUS compliant because of the costs involved, but the Linux Standard Base (LSB) includes SUS standards.

Here's a list of some important Linux distributions:


Parts of Unix

Unix is organized at three levels:

  • kernel

“The UNIX kernel is built specifically for a machine when it is installed. It has a record of all the pieces of hardware it needs to talk to and knows what languages they speak (how to turn switches on and off to get a desired result).” http://www.extropia.com/tutorials/unix/kernel.html

  • shell

The Unix shell provides a user interface. “The most basic UNIX shell provides a 'command line' which allows you to type in commands which are translated by the shell into kernel speak and sent off to the kernel.” http://www.extopia.com/tutorials/unix/shells.html

  • tools and applications

These provide additional functionality to the operating system. To see some tools that you have access to check out: /bin or /usr/bin


More on the Unix Shell

There are several different shells, they offer their own advantages and disadvantages. For instance, some allow for auto completion using the tab key; others don't.

A few common shells are the following:

For more on shells, click here.

To see what shells exist on your current Unix system, try the following command:

$ cat /etc/shells

How do you get your shell?

Before you get a shell, you identify yourself with a login name and password. That login name is looked up in a file called /etc/passwd, which describes each user's account. More specifically, it tells your unique numeric ID, principal group ID, general information, home directory, and shell.

Try the following command:

$ cat /etc/passwd | grep yourusername

The format of the data in the /etc/passwd file is

Name:Password:UserID:PrincipleGroup:Gecos:HomeDirectory:Shell

Note, each of these attributes are separated by a : (colon). The last attribute on the line is your shell.

If you are feeling daring you can change your shell with the chsh command.

More information about /etc/passwd can be found here.


Focus on Unix Commands

Although you can do a lot with a graphical user interface (GUI), it is often necessary to execute commands on the command-line. The basic format is:

prompt$ command-name options arguments

For example:

hercules[1]% g++ -c main.cpp 
a037094[7]% ls -l

A summary of a few commands is provided in tabular format here:

Command Example Comments
ssh ssh smithj@hercules.cs.uregina.ca Secure Shell (ssh) enables you to log in, execute commands, and run applications on a remote system.
SSH encrypts any communication between the remote user and a system on your network.
scp

scp input.txt smithj@hercules.cs.uregina.ca:CS330lab1/

scp -r CS330 smithj@hercules.cs.uregina.ca:classes/

With Secure Copy (scp) you can copy files between the remote host and a network host. scp actually uses ssh to transfer data and employs the same authentication and encryption.

This first example copies the file input.txt from a users current directory to the user smithj's CS330lab1 directory, located on the hercules host.
The second example uses the -r option (which allows whole directories to be copied); it, thus, copies the entire CS330 directory to the classes directory of the user smithj

touch touch myfile Update the access and modification time of "myfile". If "myfile" does not exist, create it.
mkdir mkdir reports Creates a directory named "reports"
rmdir rmdir letters Erases a directory named "letters"
rm

rm myfile

rm -r mydir

Erases a file named "myfile"

Erases the directory "mydir" and any contents/subdirectories in it

ls

ls -F


ls -R

ls -a

ls -l

ls -i

ls -t

Lists working directory with trailing characters for file types. Most common are / for directory and * for executable.


Lists working directory as well as all subdirectories

Lists all files including "hidden files"

Lists files with permissions, owner, group, time stamp

List the files inode number--a unique number used by the system to identify a specific file.

List files by time last modified.

cd

cd reports

cd

cd ..

Changes to the "reports" directory, making it the working directory.

Changes back to the home directory

Moves you up one directory level

pwd pwd Print Working Directory - prints full path to current directory.
cp

cp lab1 mylab

cp lab1 mydirectory

cp lab1 mydirectory/mylab

cp -r mydirectory dirname

Copies file "lab1" to "mylab" file

Copies "lab1" in your working directory to "mydirectory"

Copies "lab1" to "mydirectory" and renames it "mylab".

Copies "mydirectory" and all its contents into "dirname".

mv

mv lab1 lab2

mv lab1 labdirectory

mv lab1 labdirectory/newfile

mv labdirectory newdirectory

Renames "lab1" to "lab2"

Moves "lab1" to the "labdirectory"

Moves "lab1" to the "labdirectory" and renames it "newfile"

Renames a whole directory to a new directory name

g++
or
CC
(on Solaris machines)

g++ -o prog_run main.cpp

g++ -c main.cpp

g++ -o prog_run main.o part1.o part2.o

compiles and links "main.cpp", calls the executable "prog_run"

compiles "main.cpp" (creates the object file)

links the object files (when you have different files), calls the executable "prog_run"

For another summary of Unix commands, click here.

For a Unix tutorial, click here.


Focus on Permissions

Each file and directory in Unix contains a set of permissions that determine who can access it and how. There are three levels of access to set:

  1. You can restrict access to yourself alone (user)
  2. You can allow users in a predesignated group to have access (group)
  3. You can permit anyone on your system to have access (world)

How do you view permissions?

The ls command with the -l option allows you to view a file's permissions (among other information).

 $ ls -l mydata
 -rw-r--r-- 1 chris weather 207 Feb 20 11:55 mydata
 
The breakdown of this information is as follows:
File Type Permissions Number of Links Owner Name Group Name Size of File in Bytes Date and Time Last Modified File Name
- rw-r--r-- 1 chris weather 207 Feb 20 11:55 mydata

Right now, the owner of mydata has read and write permissions, and the group, and the world have read permissions. How do I know? The permissions are organized in groups of three:

In addition,

What would the following permissions represent?

  1. -rwxr--r--
  2. drwxr-xr-x
  3. -rwxrw-r--

How does Unix determine who has permissions to access files?

Again it comes down to the /etc/passwd file. In this file, you have a unique numeric id, and a principle group id (also numeric). When you create a file, your unique numeric id and principle group id are assigned to that file. If there is a match of these numbers, then you will have specific permissions (according to whether you are user/group/world).

You have a principle group id, but you may also belong to other groups that are not your principle group. To know what groups you belong to, try the following command:

$ groups

This command gets its information from the /etc/group file as well using your principle group id.

How do I set permissions?

To set permissions, you use chmod. There are two main usages of chmod:

Symbolic Permission Mode:

The general format for using the symbolic permission mode is the following:
chmod 'access class' operator 'access type' filename
For example, this would add executable access for the user:
$ chmod u+x testfile
The following summarizes the values of "access class", "operator", and "access type" in the above syntax:
  1. Access Class
  2. Operator
  3. Permissions

Given a base permission of -rw------ for a file called "myfile", what would the resulting permission be after the following chmod calls?

  1. chmod u+x myfile
  2. chmod a+x myfile
  3. chmod g+r myfile

Absolute Permission Mode

Another way to change permissions is by using a numeric (octal) code. Typically, you will use three octal numbers: one for the user, one for the group and one for other (world).

The syntax for using chmod in absolute permission mode is:

chmod 'octal permissions' filename
For example:
$ chmod 744 myfile
Each of the three octal digits represent the read, write, and execute permissions for the user, group, and world respectively.

The following table summarizes the octal digits and how the permissions are affected:

Octal Binary Permissions
0 000 ---
1 001 --x
2 010 -w-
3 011 -wx
4 100 r--
5 101 r-x
6 110 rw-
7 111 rwx

What would the permissions look like on "myfile" after the following chmod calls?

  1. chmod 755 myfile
  2. chmod 644 myfile
  3. chmod 711 myfile

For more on chmod click here


Review of Strings and C Strings

  C Strings Strings
general fixed length determined when declared, ends in '\0' dynamic length, can change length during the program
#include #include<cstring> #include<string>
declaring char cString[100]; string theString;
copying strncpy(cString,cString2,100); theString=theString2;
getting a line cin.getline(cString,100); getline(cin,theString);
determining length strlen(cString); theString.length();
comparing if(!strncmp(cString,cString2,100)) if (theString==theString2)

A handy thing to know is how to convert a String into a C String (for copying, perhaps?). The syntax is:

strncpy(cString,theString.c_str(),100);

You may also need a review of using getline to read lines until the end of a file. The following is meant as a refresher:

#include <iostream>
#include <fstream>
#include <string>
using namespace std;

int main()
{
    ifstream inFile("test.txt");
    string strOneLine;

    while (inFile)
    {
       getline(inFile, strOneLine);
       cout << strOneLine << endl;
    }

    inFile.close();

    return 0;
} 
Note:

Dynamically Allocating C Strings

The following is meant as a review of how to dynamically allocate and free up space.

C++ Style C Style
char *s4;

s4=new char[strlen("hello") + 1];   //determine size + 1 for null
//Copy the strings
strncpy(s4,"hello",6);  

//...

delete[] s4;
char *s4;

s4=(char*)malloc(strlen("hello") + 1);   //determine size + 1 for null
//Copy the strings
strncpy(s4,"hello",6);  

//...

free(s4);

Well, maybe it's not quite a review. Some of you may not have worked with malloc and free. The reason for introducing it now is that you can reduce the above code to the following:

char *s4;
   
s4=strdup("hello");   //make copy of "hello"

//...

free(s4);

strdup is not a part of the C or C++ standard; it is included in the POSIX standards. If you are lucky enough to be programming on a POSIX compliant OS such as Linux (the lab) or Solaris or Mac OS X then you can use strdup:

  • strdup(const char *s)
returns a pointer to a new string that is a duplicate of the string pointed to by s. The returned pointer should be released with free() because the space for the new string is obtained using malloc. If the new string cannot be created, a null pointer is returned.

Notes:


Splitting C Strings into Tokens

Sometimes you may want to split a line into tokens or words. To do that, there is a C String function called strtok. The prototype is:
char * strtok (char * str, const char * delimiters)
where str is the line (or C string) that you want to split into tokens or words, and delimiters are an array of characters in which any one of the characters delimits or marks the boundaries between words.

The following is an example of using strtok:

#include <iostream>
#include <cstring>
using namespace std;

int main(int argc, char *argv[])
{
   char cstr1[]="This is a sample string. Is it working?";
   char delim[]=" ,.-;!?";
   char *token;

   cout << "cstr1 before being tokenized: " << cstr1 << endl << endl;

   //In the first call to strtok, the first argument is the line to be tokenized
   token=strtok(cstr1, delim);
   cout << token << endl;


   //In subsequent calls to strtok, the first argument is NULL
   while((token=strtok(NULL, delim))!=NULL)
   {
         cout << token << endl;
   }
}

The output:

cstr1 before being tokenized: This is a sample string. Is it working?

This
is
a
sample
string
Is
it
working

There are a couple of "catches" with strtok:

  1. In the first call to strtok, the first argument is the line or C string to be tokenized; in subsequent calls to strtok, the first argument is NULL. Notice the two calls from the lines above:
  2. The original C string is modified when it is tokenized so that delimiters are replaced by null terminators ('\0'). The following represents what the C string in the sample code will look like after tokenizing:
    after tokenizing

 


Dynamic Arrays of C Strings

Sometimes you want to have a dynamically created array of C Strings. The following code demonstrates this:

#include <iostream>
#include <cstring>
using namespace std;

int main ()
{
  char **words;
  char tempWord[100];
  char endWord[]="330!";

  words = new char *[3]; //allocate pointers to three words

  //--------------
  //get two words from the user input--use strdup to dynamically allocate space

  cout << "Please input a word (less than 100 characters): ";
  cin >> tempWord;
  words[0]=strdup(tempWord);

  cout << "Please input a second word (less than 100 characters): ";
  cin >> tempWord;
  words[1]=strdup(tempWord);

  //--------------
  //the third one hard code copy of "330!" (endWord)

  words[2]=strdup(endWord);

  //--------------
  //print and clean up individual words as you go

  for (int i=0; i<3; i++)
  {
     cout << words[i] << endl;
     free(words[i]);   //remember that space was set aside by strdup

  }

  //--------------
  //Clean up the array of words
  delete [] words;     // cleans up words = new char *[3];
}

References and More Info

Extra Info: