|
The Basics of SAS/PC Version 6.12 Platform: Windows Level of Difficulty: Beginner Rutgers Main | OIT Main | NBCS Main | NBCS Documentation Main ConventionsNormal, descriptive text will be presented in Times New Roman font, text submitted to SAS will be in Courier New font and file locations will be presented in Bold Courier New font. What is SAS?SAS (originally an acronym for Statistical Analysis System) is a program designed to perform analysis on large sets of numerical and character data. Its most common use in this environment is the analysis of research data. It consists of base software, as well as innumerable add-on components (the list of which ones are available to RU users is at http://mssg.rutgers.edu/license/). For example, the advanced statistical operations SAS is capable of require a component named SAS/STAT, which is available to Rutgers users. The features that make SAS especially useful include:
How does one use SAS?SAS is able to run on a variety of platforms, but no matter what platform it's being used on, the language used to create SAS programs and files is the same. SAS can also be run in a variety of styles, or 'modes,' depending on what type of operating system it is being run on. SAS statements and programs themselves are unaffected by the mode one chooses to use, so that a program written in one mode should be useable in another (with the exception of references to an operating system). The modes most often used at Rutgers include:
Display Manager mode's windowsThe SAS Display Manager mode has three windows: programming, log and output. The programming window is used to input SAS statements and programs, which are then processed by the SAS System. The log window contains information about the processing of a given SAS program, including any warning or error messages, or notes for the user to be aware of. The output window will contain reports--ifany--that have been generated by the running of a SAS program. System requirements for SAS/PC V. 6.12 are: MINIMAL: MS Windows 3.1 or later with Win32s extensions, MS Windows NT 3.51 or later or MS Windows 95 or later; 386 33MHz or higher processor; 8MB RAM for Win 3.x, 16 MB RAM for Windows 9x/NT; 15MB of swapfile space; VGA, 8514, XGA, SVGA or better display; mouse; and 77.4 MB for Win 3.x or 79.1MB for Win 9x/NT of hard disk space. Ideally in a PC environment, one should have 64 MB of RAM. In order to launch SAS 6.12 for Windows on a Windows 95/98/NT operating system:
Click on the Start button, select Programs, then The SAS System. Within that menu, select The SAS System for Windows v6.12. In order to close SAS, click on the X box in the furthest upper right hand corner of the SAS window:
This will bring up a dialog box asking if you you're sure you want to quit; click on 'OK'. How does SAS work? SAS programs are a series of steps, created by the user and submitted to the SAS software for processing. There are only two kinds of steps:
For example: Raw data and/or a pre-existing SAS data set are read into a SAS DATA step, turned into a SAS data set, altered or analyzed by a PROC step and then the results are displayed in a report. Data Sets SAS data sets are made up of two parts, the descriptor and data portions of the set. The descriptor portion has attribute information about the data set, such as the number of observations, the size and name of the variables, et al. The data portion contains the data described in the descriptor portion. Data sets are arranged in rectangular tables, where the vertical columns are called variables, and the horizontal rows are called observations. Each intersection provides the value of a variable, for that observation. Variables: Variables are either numeric or character. Character variables can be from 1 to 200 characters long. This value also corresponds to their byte size (i.e., a 20 character variable is 20 bytes long), the amount of hard disk space taken up by the variable. Numeric variables can have 16 or 17 significant digits (depending on if the number contains a decimal), and are always 8 bytes long due to the way SAS is able to compress the numeric information. Missing values for character data are displayed as a space and missing values for numeric data are displayed as a period in any output generated by various SAS operations.
A note about SAS dates: SAS uses a system for recording dates which counts the number of days a given date is away from January 1, 1960. For example, September 1, 1999 has the SAS date value of 14488. This manner of storing dates allows for numeric calculations using dates (addition, subtraction, etc.), and is also Y2K-bug resistant. Variables containing date information can be converted into SAS dates using the MDY function (see SAS manuals). Data set storage Data sets are stored in SAS data libraries, which are collections of SAS files that are recognized as a unit by SAS. In directory-based systems, such as UNIX and Windows, a library is defined as a collection of SAS files stored in a specific directory. Non-SAS files may exist in this directory. SAS files may not be edited by any of the system's native editors (they must be edited by SAS). In directory-based operating systems, SAS assigns all file extensions automatically. Data sets have a two-level name, the libref and the SAS-filename. When the name of a file is given, it is given in the format, libref.SAS-filename. The libref is the name SAS gives a particular library. If no library is specified when a SAS data set is created, SAS assumes that the file is not important, and it will not be saved when the SAS session is ended. Such temporary files are stored, for the duration of the SAS session, in the WORK library. The physical location of this library is defined by different systems. For example, on RCI, it is defined as the /tmp/ directory, whereas on a standard Windows installation, it might be defined as C:\SAS\SASWORK\#TD83879\ (though the '#TD83879' portion changes each time you run a session of SAS). If one wants a data set created in a SAS data step to be permanent, one must give the data set both a libref and a filename. NOTE: These#TD files are automatically deleted when a SAS session is terminated normally. However when SAS is terminated abnormally, these files can accumulate, taking up space and decreasing available resources. Therefore it is wise to manually delete leftovers in the event of an abnormal exit from SAS. The names of data sets and the names of variables within the sets must:
Fixing problems in programs When a mistake is made in programming, SAS will often let you know with a message in the log. This will take the form of an ERROR, WARNING or NOTE. The error message will give the location of the error and a message that explains it. There are several common problems which occur when writing SAS programs. You should run through this list when your program doesn't do what it's supposed to. Even if everything seems okay, you should always read the log file, to make sure nothing went wrong.
CREATING AND USING PERMANENT DATA SETS
Defining a library In order to utilize a directory on your system as a SAS data library, you have to make SAS aware of the library, and name it. This is done with the libname statement. This is a global statement, which may be issued outside of any PROC or DATA steps, and remains in effect until you close your SAS session. In order to define a library for the course of your SAS session, the syntax is as follows: LIBNAME libref 'SAS-data-library-path'; Keep in mind the restrictions on the libref name listed earlier. 'SAS-data-library-path'; is the full local path name for the directory that the libref references. For example: LIBNAME sasuser 'C:\SAS\SASUSER'; creates a library, called sasuser, which references the Windows directory 'C:\SAS\SASUSER'; Keep in mind that a library defined by this step will exist only so long as the SAS session is open. In order to assign a library permanently in Windows, you must use the Libraries Dialog box, which can be invoked by clicking on the filing cabinet icon toolbar, or by typing DLGLIB in the command box. At that point, click on the New Library button, and you will get the New Library dialog box, depicted below:
Put the name of the library (in the example, NEWLIB ) in the "Library:" field, and the name of the system folder this corresponds to (e.g., 'C:\SAS\NEWLIB';) in the "Folder To Assign:" field. Make sure the "Assign automatically at startup" box is checked, then click "Assign". If you want to see what data sets are inside a given library or specific information about a single data set, you can use the PROC DATASETS procedure:
PROC DATASETS LIB=<libref>;
CONTENTS DATA=<member-name> | _ALL_ NODS;
RUN;
The | symbol is a SAS convention used to separate alternate options. So after the DATA= segment, one may either give a member-name or the _ALL_ NODS argument. The former would be used to obtain specific, extensive information about a data set and the latter would provide a list of members (files) within the library. SIMPLE STATISTICAL ANALYSES
This section will discuss two simple statistical operations, PROC MEANS and PROC UNIVARIATE. 1. PROC MEANS: The MEANS procedure displays simple descriptive statistics for the numeric variables in a SAS data set. Output from PROC MEANS is concise and information on all variables is presented on one report. Here is the syntax:
PROC MEANS DATA=<SAS-data-set>;
RUN;
Where <SAS-data-set> is the name of the SAS filename you wish to analyze. By default, PROC MEANS
EXAMPLE: This piece of code creates the data set.
data temp;
This piece of code analyzes the data.
proc means data = temp;
This is the resulting output.
2. PROC UNIVARIATE: The UNIVARIATE procedure displays simple descriptive statistics for the numerical variables in a SAS data set, including quantiles. Output from PROC UNIVARIATE consists of one page per variable. Here is the syntax:
For each numeric variable, PROC UNIVARIATE provides information on the distribution of values, including the:
EXAMPLE: This piece of code analyzes the data using the previously-created data set.
This is a limited version of the output.
Note that SAS is capable of many more advanced statistical operations. You may refer to manuals available for purchase through the SAS Institute, Inc. ( http://www.sas.com/service/doc/intro.html or 919-677-8000) to learn more about these procedures. PROC PRINT Another very important procedure is PROC PRINT. This will give you a complete listing of the data values in a given set. The overall procedure syntax is:
Where:
EXAMPLE: This code uses the data set created previously.
This is the output.
MACHINE-READABLE DATA Machine-readable data (MRD), such as information from the Inter-University Consortium on Political and Social Research (ICPSR), is available at http://scc01.rutgers.edu:80/datacenter/search1.htm, and for more information on this and other forms of MRD, go to http://www.nbcs.rutgers.edu/newdocs/unx00101/unx00101.html. RECOMMENDED DOCUMENTATION Obtaining manuals for the SAS system is highly recommended. The following are all available from The SAS Institute's publications department (http://www.sas.com/service/doc/intro.html or 919-677-8000):
webmaster@nbcs.rutgers.edu
|