7.10. SAS

7.10. SAS

SAS is a software system for data analysis designed in 1966 with numerous enhancements to date. SAS was originally an acronym for Statistical Analysis System. It was developed for statistical needs but has grown into an all purpose data analysis system.

The basic SAS system provides tools for:

* information storage and retrieval,

* data modification and programming,

* report writing,

* statistical analysis, and

* file handling.

The SAS system has optional enhancements for:

* graphics (SAS/GRAPH)

* forecasting (SAS/ETS)

* data entry (SAS/FSP), and

* other data base interfaces (SAS/IMS-DL/I).

SAS runs on IBM Compatible mainframes, VAX VMS systems, Data General and Prime machines, as well as IBM PC's. Our system configuration for SAS is as follows:

* Machine : MAX --- IBM 4381

* O/S : VM/CMS

* Edit : XEDIT

* Invoke : SAS filename sas

* Extra disk space : TDISK (n

7.10.1. USING SAS

A SAS job consists of two types of steps:

* DATA steps in which you use a programming language to create a SAS data set, which puts data in a rectangular form in a format unique to SAS,

* PROC or procedure steps which can be thought of as canned software to perform some particular manipulation or analysis of the data in a data set e.g. printing --- PROC PRINT, Frequencies --- PROC FREQ, charts --- PROC CHART, regression analysis --- PROC GLM.

A SAS data set must be created before any analysis is done. It is composed of a set of observations, one for each entity in the analysis. Each observation is a set of data values describing the entity. Each data value is identified by a variable name from 1 to 8 characters.

NAME SEX AGE HEIGHT WEIGHT

Jim M 33 1.9 98

Eva F 64 1.0 28

George M 12 1.0 30

A normal SAS job starts with a DATA step, consisting of a number of statements. Statements contain key words, SAS names, and end with a semi-colon. Comments are enclosed in /* */.

The DATA statement begins creation of a SAS data set. It's form is

DATA [[SASdataset_name] [(dsoptions)]...];

SASdataset_name names the data set(s) being created, whiledsoptions give more information about each data set.

DATA SAMPLE (LABEL='PERSONAL DATA');

The INPUT statement allows you to describe data lines and to assign variable names.

INPUT NAME $ SEX $ AGE HEIGHT WEIGHT;

The variables are read in the order in which they occur in the INPUT statement. A trailing $ means that the variable is in character format, otherwise it is numeric. Other possible parameters are column numbers to read data from, and input formats, for example, Packed decimal (PD4.2), etc.

The CARDS statement tells SAS that data lines follow until the next semi-colon. It follows the INPUT statement and has no parameters.

CARDS;

data line 1

data line 2

.

.

.

data line n

;

The INFILE statement can be used to tell SAS that a CMS file should be used as the source for the data lines. It precedes the INPUT statement. A CMS file def should be issued before using this statement.

In CMS: FILEDEF fileref DISK fname ftype fmode

In SAS: INFILE fileref;

New variables can be created and old ones manipulated using an extensive set of program statements and functions including IF --- THEN --- ELSE, DO --- WHILE, DO --- UNTIL.

RATIO = WEIGHT/HEIGHT;

SAS creates a data set by processing all statements in the DATA step in order for each observation in the data source. The end of a step is marked by a RUN;, PROC, or DATA statement.

START: use INPUT description to read the next observation from the data source.

if EOF go to next step in the job process program statements (IF's, DO's, assignments) add observation to the SAS data set GOTO START

Once the data is in a data set, we can process or analyse it using a PROC step. In the PROC statement, you tell SAS the procedure you wish to run, on which data set it should be run, and with what options. Additional statements are used to specify whether you want it processed in subsets and on which variables. Defaults are the most recently created data set, the entire data set (no subsetting) and all variables.

7.10.2. RUNNING SAS

There are four modes of operation under CMS:

* CMS Interactive

- SAS Interactive

* Line oriented

* Screen oriented --- Display Manager System

- SAS Non-Interactive --- File input for program

* CMS Batch --- File input for program submitted as CMS batch job.

7.10.2.1. DISPLAY MANAGER SYSTEM

A full screen facility which allows you to interact with all parts of your SAS job: program statements, job log, and procedure output. It has a logical screen for each plus a help screen. It is invoked by giving the SAS command from a full screen terminal.

Normal layout:

* bottom screen is program edit area

* top screen is SAS job log

* when a program is submitted, procedure screen is activated.

The process is controlled by commands which can be entered on the

command lines or activated using special function keys:

* KEYS --- displays special function key values

* HELP --- with key word for specific help. Invokes Help screen.

- DMSHELP --- Display Manager System help

- PGMSCR --- Edit screen help

- LOGSCR --- Log screen help

- proc name --- Procedure help

* BYE --- exit SAS.

The program is entered and edited in the Edit screen, then SUBMITTED for processing. Errors will be listed in the Log screen and the program can be recovered and edited using the RECALL command in the edit screen.

EXAMPLES:

DATA SAMPLE (LABEL='PERSONAL DATA');

INPUT NAME $ SEX $ AGE HEIGHT WEIGHT;

RATIO = HEIGHT / WEIGHT;

CARDS;

Jim M 33 6 175

Jane F 45 5.3 120

;

RUN;

Command> SUBMIT

PROC PRINT;

RUN;

Command> SUBMIT

PROC CHART;

HBAR SEX/DISCRETE;

VBAR HEIGHT;

RUN;

Command> SUBMIT

PROC FREQ;

TABLE SEX AGE SEX*AGE;

RUN;

Command> SUBMIT

PROC SORT DATA=SAMPLE OUT=SORTED;

BY SEX;

PROC PRINT DATA=SORTED;

RUN;

Command> SUBMIT

PROC CHART DATA=SORTED;

BY SEX;

HBAR AGE HEIGHT WEIGHT;

TITLE 'BAR CHARTS';

RUN;

Command> SUBMIT

DATA FEMALES (LABEL='FEMALE DATA');

SET SAMPLE;

IF SEX EQ 'F';

TITLE 'FEMALES ONLY';

PROC FREQ;

RUN;

Command> SUBMIT

FILE INPUT EXAMPLES: Use xedit to enter the following in a file with type sas.

DATA SURFACE;

DO X = -5 TO 5 BY 1;

DO Y = -5 TO 5 By 1;

Z = X*X - 2*X*Y + 4*Y*Y;

OUTPUT;

END;

END;

PROC PRINT;

PROC PLOT;

PLOT X*Z;

Then run the program by entering SAS filename. The output can be found

in files filename SASLOG and filename LISTING.

7.10.3. SAS/GRAPH

SAS/Graph is an interactive computer graphics system for producing color plots, bar charts, bar graphs, maps, and other displays on graphic screens or plotters. Graphics devices usable with SAS include Tektronix screens, Hewlett-Packard plotters, and many others. A wide range of fonts are supported. Data must be in a SAS data set before SAS/Graph procedures can be used. PROCedures include:

* GCHART --- bar, pie, and block charts

* GPLOT --- 2D plots

* GMAP --- maps (from library or of your own devising)

* G3D --- 3D plots

* GSLIDE --- text slides.

The general procedure to create graphics output is to define the hardware being used with the GOPTIONS statement, generate or read data into a data set, then use PROC's to display the data in graphics form.

7.10.3.1. GLOBAL STATEMENTS

GOPTIONS --- used to determine the output device and other hardware features.

GOPTIONS DEVICE=HP7475A GPROTOCOL=GSAS7171 NOTEXT82;

TITLE --- used to print up to 10 titles at the top of your plot with specific size, font, and color.

TITLEn options 'TEXT' ;

TITLE1 F=font C=color H=n 'This is a test title';

FOOTNOTE - Used to print up to 10 footnotes at the bottom of a plot with specific size, font, and color.

FOOTNOTEn options 'TEXT '

FOOTNOTE2 F=font C=color H=n 'This is footnote two' ;

7.10.3.2. PROCEDURE GPLOT

Although there are many SAS/GRAPH procedures available, the one we will examine is called GPLOT. The general form of the procedure is:

PROC GPLOT;

PLOT y*x options ;

SYMBOL options;

The options on the plot line include:

* OVERLAY --- used when plotting two or more relations on the same graph

* HZERO --- request tick marks be placed on the horizontal axis

7.10.4. EQUIPMENT NOTES

The Tektronix 4014 and the XT100 are switch selectable. As of this writing, the B setting gives the Tek4014 and C gives the XT100. The HP7475A plotter is daisy chained with the switch. The plotter must be on for either terminal to operate.

Tektronix 4014 operation under SAS is obtained by signing on to MAX from the XT100, as MAX doesn't understand the Tek terminal type. The GOPTIONS statement should specify DEVICE=TEK4014. Once a SAS/Graph PROC asks you to hit return, switch to the Tek and hit return from there. Once viewing is complete, switch back and continue on the XT100.

For HP7475A operations, sign onto MAX from the XT100 and use

GOPTIONS DEVICE=HP7475A ....

When a SAS/Graph PROC is run, the plotter is smart enough to pick up signals meant for it and removes them from the output stream. Paper mounting and pen changing is as specified in the manual.