The Basics of SAS/PC Version 8.1

Platform: Windows
Level of Difficulty: Beginner
<hr>
Rutgers Main | OIT Main | NBCS Main | NBCS Documentation Main


Conventions

Normal, descriptive text will be presented in Times New Roman font, text submitted to SAS will be in Courier New font and file locations will be presented in Bold Courier New font.

What is SAS?

SAS (originally an acronym for Statistical Analysis System) is a program designed to perform analysis on large sets of numerical and character data. Its most common use in this environment is the analysis of research data. It consists of base software, as well as innumerable add-on components (the list of which ones are available to RU users is at http://mssg.rutgers.edu/license/SAS/SAS.htm). For example, the advanced statistical operations SAS is capable of require a component named SAS/STAT, which is available to Rutgers users. The features that make SAS especially useful include:

  • its flexibility in being able to read and utilize various formats of data;
  • its platform-independence; and
  • the extremely powerful statistical and mathematical manipulations it is capable of performing.

How does one use SAS?

SAS is able to run on a variety of platforms, but no matter what platform it's being used on, the language used to create SAS programs and files is the same. SAS can also be run in a variety of styles, or 'modes,' depending on what type of operating system it is being run on. SAS statements and programs themselves are unaffected by the mode one chooses to use, so that a program written in one mode should be useable in another (with the exception of references to an operating system).

The modes most often used at Rutgers include:

  • batch mode: The user writes whole SAS programs, saves them into a file, then runs SAS from a command line prompt. This is the most common mode on UNIX systems.

  • interactive line mode: The user enters commands line by line in response to prompts issued by the SAS System. This is available on the UNIX platform.

  • interactive windowing mode (SAS Display Manager System): The user interacts with SAS through windows using pull-down menus, dialog boxes and icons. This is the version used on Windows and Macintosh. It is also available for use on the UNIX systems, if an X-Windows interface is being used, and is the recommended UNIX mode for those used to using Windows, since it will be more familiar.

Display Manager mode's windows

The SAS Display Manager mode has five windows: Explorer, Editor, Log, Output and Results. In the Explorer window, you can view and manage your SAS files, which are stored in libraries, and create shortcuts to non-SAS files. The Editor window is used to input SAS statements and programs, which are then processed by the SAS System. The Log window contains information about the processing of a given SAS program, including any warning or error messages, or notes for the user to be aware of. The Output window will contain reports--if any--that have been generated by the running of a SAS program. Finally, the Results window helps you navigate and manage output from SAS programs that you submit; you can view, save, and manage individual output items.

System requirements for SAS/PC V. 8.1 are:

MINIMAL: Windows 95 (with Y2K patches) or later Operating System; Intel or Intel-compatible Pentium class processor; 32MB RAM for Windows and NT with 32MB swapfile space, 64MB RAM for Windows NT Server with 64 MB swapfile space; XGA, SVGA or better display; mouse; and the absolute minimum hard drive space required (depending upon which components are installed) is 150 MB. The preferred system configuration is 500 MHz Pentium III or Athlon processor with 128 MB of RAM and a 20 GB hard disk. 256 MB of RAM, a 750 or 950 MHz processor, and a 40 GB hard drive is considered an optimal configuration.

In order to launch SAS 8.1 for Windows:

Click on the Start button, select Programs, then The SAS System. Within that menu, select The SAS System for Windows v6.12.

In order to close SAS, click on the X box in the furthest upper right hand corner of the SAS window. This will bring up a dialog box asking if you you're sure you want to quit; click on 'OK'.

How does SAS work?

SAS programs are a series of steps, created by the user and submitted to the SAS software for processing. There are only two kinds of steps:

  • DATA steps: create SAS data sets; and
  • PROC steps: process SAS data sets (creating reports, graphs, editing data, sorting data, etc.) and can also create data sets.

For example: Raw data and/or a pre-existing SAS data set are read into a SAS DATA step, turned into a SAS data set, altered or analyzed by a PROC step and then the results are displayed in a report.

Data Sets

SAS data sets are made up of two parts, the descriptor and data portions of the set. The descriptor portion has attribute information about the data set, such as the number of observations, the size and name of the variables, et al. The data portion contains the data described in the descriptor portion.

Data sets are arranged in rectangular tables, where the vertical columns are called variables, and the horizontal rows are called observations . Each intersection provides the value of a variable, for that observation.

Variables: Variables are either numeric or character. Character variables can be from 1 to 200 characters long. This value also corresponds to their byte size (i.e., a 20 character variable is 20 bytes long), the amount of hard disk space taken up by the variable. Numeric variables can have 16 or 17 significant digits (depending on if the number contains a decimal), and are always 8 bytes long due to the way SAS is able to compress the numeric information. Missing values for character data are displayed as a space and missing values for numeric data are displayed as a period in any output generated by various SAS operations.

A note about SAS dates: SAS uses a system for recording dates which counts the number of days a given date is away from January 1, 1960. For example, September 1, 1999 has the SAS date value of 14488. This manner of storing dates allows for numeric calculations using dates (addition, subtraction, etc.), and is also Y2K-bug resistant. Variables containing date information can be converted into SAS dates using the MDY function (see SAS manuals).

Data set storage

Data sets are stored in SAS data libraries, which are collections of SAS files that are recognized as a unit by SAS. In directory-based systems, such as UNIX and Windows, a library is defined as a collection of SAS files stored in a specific directory. Non-SAS files may exist in this directory. SAS files may not be edited by any of the system's native editors (they must be edited by SAS). In directory-based operating systems, SAS assigns all file extensions automatically.

Data sets have a two-level name, the libref and the SAS-filename. When the name of a file is given, it is given in the format, libref.SAS-filename. The libref is the name SAS gives a particular library. If no library is specified when a SAS data set is created, SAS assumes that the file is not important, and it will not be saved when the SAS session is ended. Such temporary files are stored, for the duration of the SAS session, in the WORK library. The physical location of this library is defined by different systems. For example, on RCI, it is defined as the /tmp/ directory, whereas on a standard Windows installation, it might be defined as C:\Windows\Temp\SAS Temporary Files\_TD48219\ (though the _TD48219NOTE: These _TD files are automatically deleted when a SAS session is terminated normally. However when SAS is terminated abnormally (such as the computer 'crashing' while using SAS), these files can accumulate, taking up space and decreasing available resources. Therefore it is wise to manually delete leftovers in the event of an abnormal exit from SAS.

The names of data sets and the names of variables within the sets must:

  • be one to eight characters in length;
  • start with a letter or an underscore; and
  • contain only letters, numbers and underscores (i.e., no other special characters).

Fixing problems in programs

When a mistake is made in programming, SAS will often let you know with a message in the log. This will take the form of an ERROR, WARNING or NOTE. The error message will give the location of the error and a message that explains it.

There are several common problems which occur when writing SAS programs. You should run through this list when your program doesn't do what it's supposed to. Even if everything seems okay, you should always read the log file, to make sure nothing went wrong.

  1. Omitted Semicolons: This is one of the most common errors made when creating a SAS program. Every statement must close with a semicolon. To check for this error, scan the log file for statements which appear before the ERROR flag, for a statement that does not have a closing semicolon.

  2. Unbalanced Quotation Marks: You must be careful to always balance any single or double quotes provided in a program. Common signs of unbalanced quotes would include a warning that a quoted string has become too long (quoted strings are limited to 200 characters) or a warning that a TITLE statement is ambiguous due to quoted text. Correcting and resubmitting the program will not fix the problem. A special program, called "quote medicine," must be used. Submit the following line:

    *'; *"; run;

    which will restore the SAS system to normal working order, without having to quit and restart it.

  3. Omitting the RUN statement: Each step in a SAS program is compiled and executed independently from every other step and executed when the end of the step is encountered. SAS recognizes the end of the step when it encounters a DATA or PROC statement (which tells it a new step is beginning) or when it encounters a RUN or QUIT statement (which tells it it's done, and there are no more steps coming). In interactive modes (such as the Display Manager mode), a RUN statement is required to signal the end of the final step. If you submit a program without a final run statement, the last step will continue to run, and will not complete whatever it was supposed to do. In order to correct this problem, submit the following line: run;

CREATING AND USING PERMANENT DATA SETS


Defining a library

In order to utilize a directory on your system as a SAS data library, you have to make SAS aware of the library, and name it. This is done with the libname statement. This is a global statement, which may be issued outside of any PROC or DATA steps, and remains in effect until you close your SAS session. In order to define a library for the course of your SAS session, the syntax is as follows:

LIBNAME libref 'SAS-data-library-path';

Keep in mind the restrictions on the libref name listed earlier 'SAS-data-library-path'; is the full local path name for the directory that the libref references. For example:

LIBNAME new 'C:\Data\';

creates a library, called new, which references the Windows directory 'C:\Data';

Keep in mind that a library defined by this step will exist only so long as the SAS session is open. In order to assign a library permanently in Windows, you must use the Explorer window. Double click on the Libraries icon, then click on the File pull down menu and select New You will get the New Library dialog box, depicted below:

Puts the name of the library in the "Name:" field, and the name of the system folder this corresponds to in the "Path" field. Make sure the "Enable at startup" box is checked, then click "OK."

If you want to see what data sets are inside a given library or specific information about a single data set, you can use the PROC DATASETS procedure:

   PROC DATASETS LIB=<libref>;

      CONTENTS DATA=<member-name> | _ALL_ NODS;

   RUN;

The | symbol is a SAS convention used to separate alternate options. So after the DATA= segment, one may either give a member-name or the _ALL_ NODS argument. The former would be used to obtain specific, extensive information about a data set and the latter would provide a list of members (files) within the library.

SIMPLE STATISTICAL ANALYSES


This section will discuss two simple statistical operations, PROC MEANS and PROC UNIVARIATE.

1. PROC MEANS: The MEANS procedure displays simple descriptive statistics for the numeric variables in a SAS data set. Output from PROC MEANS is concise and information on all variables is presented on one report. Here is the syntax:

   PROC MEANS DATA=<SAS-data-set>;

   RUN;

Where <SAS-data-set> is the name of the SAS filename you wish to analyze.

By default, PROC MEANS

  • Prints summary statistics for all numerical values

  • Prints the statistics:

  • N, the number of observations with non-missing values for a given variable;

  • MEAN, the arithmetic mean (average);

  • STD, the standard deviation;

  • MIN, the minimum value; and

  • MAX, the maximum value.

EXAMPLE:

This piece of code creates the data set.

data temp;
   infile 'test.dat';
   input @1 Name field, and the name of the system folder this corresponds to in the $10. @11 Score 3.;
run;

This piece of code analyzes the data.

proc means data = temp;
run;

This is the resulting output.


The SAS System          15:00 Friday, August 13, 1999   1
                Analysis Variable : SCOR	E box is checked, then click
	

N          Mean       Std Dev       Minimum       Maximum	
4    51.0000000    34.9284984    14.0000000    98.0000000

2. PROC UNIVARIATE: The UNIVARIATE procedure displays simple descriptive statistics for the numerical variables in a SAS data set, including quantiles. Output from PROC UNIVARIATE consists of one page per variable. Here is the syntax:

PROC UNIVARIATE DATA=<SAS-data-set>;				
RUN; 

For each numeric variable, PROC UNIVARIATE provides information on the distribution of values, including the:

  • number of observations with non-missing values;
  • arithmetic mean (average);
  • five lowest values;
  • five highest values;
  • median value (50th percentile);
  • upper and lower quartiles (75th and 25th percentiles);
  • 1st, 5th, 10th, 90th, 95th and 99th percentiles; and
  • the most common value (MODE).

EXAMPLE:

This piece of code analyzes the data using the previously-created data set.

proc univariate data=temp;						
run;

This is a limited version of the output.

The SAS System    15:03 Friday, August 13, 1999   1
	        Univariate Procedure
 		   Variable=SCORE

                Moments
N                 4  Sum Wgts          4 
Mean             51  Sum             204 	
Std Dev     34.9285  Variance       1220 
Skewness   0.820411  Kurtosis   1.640204 
USS           14064  CSS            3660
CV         68.48725  Std Mean   17.46425
T:Mean=0   2.920252  Pr>|T|       0.0615
Num ^= 0          4  Num > 0           4
M(Sign)           2  Pr>=|M|      0.1250
Sgn Rank          5  Pr>=|S|      0.1250

Note that SAS is capable of many more advanced statistical operations. You may refer to manuals available for purchase through the SAS Institute, Inc. ( http://www.sas.com/service/doc/intro.html or 919-677-8000) to learn more about these procedures.

PROC PRINT

Another very important procedure is PROC PRINT. This will give you a complete listing of the data values in a given set. The overall procedure syntax is:

PROC PRINT DATA=<SAS-file-name>;
	ID variable(s);
 	SUM variable(s);
 	VAR variable(s);	
RUN;

Where:

  • <SAS-file-name> is the SAS name of the data set being printed;
  • ID identifies observations by the formatted values of the variables that you list instead of by observation numbers;
  • SUM instructs the SAS system to provide totals for the listed variables; and
  • VAR selects the variables that will appear in the final report, and provides for their order.

EXAMPLE:

This code uses the data set created previously.

proc print data=temp;							
	id Name field, 
	and the name of the system folder this corresponds to in the 
	sum Score;
run;

This is the output.

The SAS System            15:24 Friday, August 13, 1999   1

	NAME         SCORE
	
	Chafin         98
	Jericho        50
	Westbrook      42
	Grant          14
                     =====	
		      204			

MACHINE-READABLE DATA

Machine-readable data (MRD), such as information from the Inter-University Consortium on Political and social research (ICPSR), is available at http://scc01.rutgers.edu:80/datacenter/search1.htm.

RECOMMENDED DOCUMENTATION

Obtaining manuals for the SAS system is highly recommended. The following are all available from The SAS Institute's publications department ( http://www.sas.com/service/doc/intro.html or 919-677-8000):

  • The Little SAS Book: A Primer, 2nd Edition by Lora D. Delwiche and Susan J. Slaughter. This is a very well written beginner's book, both informative and entertaining.

  • SAS Fundamentals: A Programming Approach, Course Notes. The notes to the beginner's course which SAS offers. A beginning user will find it very helpful.

  • SAS Procedures Guide, Version 6, 3rd Edition; SAS Language and Procedures: Usage Volumes 1 & 2, Version 6, 1st Edition; SAS Language: Reference, Version 6, 1st Edition. These books are very useful reference material for command syntax.

  • SAS/STAT User's Guide, Version 6, 4th Edition, Volumes 1 & 2;Quick Start to Data Analysis with SAS. Excellent resources for statistical analysis.


<hr>
Copyright © 2008 Rutgers, The State University of New Jersey, NBCS Help Desk. All rights reserved.

Rutgers Logo

webmaster@nbcs.rutgers.edu
01/31/07