|
The Basics of SAS/PC Version 8.1 Platform: Windows Level of Difficulty: Beginner Rutgers Main | OIT Main | NBCS Main | NBCS Documentation Main ConventionsNormal, descriptive text will be presented in Times New Roman font, text submitted to SAS will be in Courier New font and file locations will be presented in Bold Courier New font. What is SAS?SAS (originally an acronym for Statistical Analysis System) is a program designed to perform analysis on large sets of numerical and character data. Its most common use in this environment is the analysis of research data. It consists of base software, as well as innumerable add-on components (the list of which ones are available to RU users is at http://mssg.rutgers.edu/license/SAS/SAS.htm). For example, the advanced statistical operations SAS is capable of require a component named SAS/STAT, which is available to Rutgers users. The features that make SAS especially useful include:
How does one use SAS?SAS is able to run on a variety of platforms, but no matter what platform it's being used on, the language used to create SAS programs and files is the same. SAS can also be run in a variety of styles, or 'modes,' depending on what type of operating system it is being run on. SAS statements and programs themselves are unaffected by the mode one chooses to use, so that a program written in one mode should be useable in another (with the exception of references to an operating system). The modes most often used at Rutgers include:
Display Manager mode's windowsThe SAS Display Manager mode has five windows: Explorer, Editor, Log, Output and Results. In the Explorer window, you can view and manage your SAS files, which are stored in libraries, and create shortcuts to non-SAS files. The Editor window is used to input SAS statements and programs, which are then processed by the SAS System. The Log window contains information about the processing of a given SAS program, including any warning or error messages, or notes for the user to be aware of. The Output window will contain reports--if any--that have been generated by the running of a SAS program. Finally, the Results window helps you navigate and manage output from SAS programs that you submit; you can view, save, and manage individual output items. System requirements for SAS/PC V. 8.1 are: MINIMAL: Windows 95 (with Y2K patches) or later Operating System; Intel or Intel-compatible Pentium class processor; 32MB RAM for Windows and NT with 32MB swapfile space, 64MB RAM for Windows NT Server with 64 MB swapfile space; XGA, SVGA or better display; mouse; and the absolute minimum hard drive space required (depending upon which components are installed) is 150 MB. The preferred system configuration is 500 MHz Pentium III or Athlon processor with 128 MB of RAM and a 20 GB hard disk. 256 MB of RAM, a 750 or 950 MHz processor, and a 40 GB hard drive is considered an optimal configuration. In order to launch SAS 8.1 for Windows:
Click on the Start button, select Programs, then The SAS System. Within that menu, select The SAS System for Windows v6.12. In order to close SAS, click on the X box in the furthest upper right hand corner of the SAS window. This will bring up a dialog box asking if you you're sure you want to quit; click on 'OK'. How does SAS work? SAS programs are a series of steps, created by the user and submitted to the SAS software for processing. There are only two kinds of steps:
For example: Raw data and/or a pre-existing SAS data set are read into a SAS DATA step, turned into a SAS data set, altered or analyzed by a PROC step and then the results are displayed in a report. Data Sets SAS data sets are made up of two parts, the descriptor and data portions of the set. The descriptor portion has attribute information about the data set, such as the number of observations, the size and name of the variables, et al. The data portion contains the data described in the descriptor portion. Data sets are arranged in rectangular tables, where the vertical columns are called variables, and the horizontal rows are called observations . Each intersection provides the value of a variable, for that observation. Variables: Variables are either numeric or character. Character variables can be from 1 to 200 characters long. This value also corresponds to their byte size (i.e., a 20 character variable is 20 bytes long), the amount of hard disk space taken up by the variable. Numeric variables can have 16 or 17 significant digits (depending on if the number contains a decimal), and are always 8 bytes long due to the way SAS is able to compress the numeric information. Missing values for character data are displayed as a space and missing values for numeric data are displayed as a period in any output generated by various SAS operations.
A note about SAS dates: SAS uses a system for recording dates which counts the number of days a given date is away from January 1, 1960. For example, September 1, 1999 has the SAS date value of 14488. This manner of storing dates allows for numeric calculations using dates (addition, subtraction, etc.), and is also Y2K-bug resistant. Variables containing date information can be converted into SAS dates using the MDY function (see SAS manuals). Data set storage Data sets are stored in SAS data libraries, which are collections of SAS files that are recognized as a unit by SAS. In directory-based systems, such as UNIX and Windows, a library is defined as a collection of SAS files stored in a specific directory. Non-SAS files may exist in this directory. SAS files may not be edited by any of the system's native editors (they must be edited by SAS). In directory-based operating systems, SAS assigns all file extensions automatically. Data sets have a two-level name, the libref and the SAS-filename. When the name of a file is given, it is given in the format, libref.SAS-filename. The libref is the name SAS gives a particular library. If no library is specified when a SAS data set is created, SAS assumes that the file is not important, and it will not be saved when the SAS session is ended. Such temporary files are stored, for the duration of the SAS session, in the WORK library. The physical location of this library is defined by different systems. For example, on RCI, it is defined as the /tmp/ directory, whereas on a standard Windows installation, it might be defined as C:\Windows\Temp\SAS Temporary Files\_TD48219\ (though the _TD48219NOTE: These _TD files are automatically deleted when a SAS session is terminated normally. However when SAS is terminated abnormally (such as the computer 'crashing' while using SAS), these files can accumulate, taking up space and decreasing available resources. Therefore it is wise to manually delete leftovers in the event of an abnormal exit from SAS. The names of data sets and the names of variables within the sets must:
Fixing problems in programs When a mistake is made in programming, SAS will often let you know with a message in the log. This will take the form of an ERROR, WARNING or NOTE. The error message will give the location of the error and a message that explains it. There are several common problems which occur when writing SAS programs. You should run through this list when your program doesn't do what it's supposed to. Even if everything seems okay, you should always read the log file, to make sure nothing went wrong.
CREATING AND USING PERMANENT DATA SETS
Defining a library In order to utilize a directory on your system as a SAS data library, you have to make SAS aware of the library, and name it. This is done with the libname statement. This is a global statement, which may be issued outside of any PROC or DATA steps, and remains in effect until you close your SAS session. In order to define a library for the course of your SAS session, the syntax is as follows: LIBNAME libref 'SAS-data-library-path'; Keep in mind the restrictions on the libref name listed earlier 'SAS-data-library-path'; is the full local path name for the directory that the libref references. For example: LIBNAME new 'C:\Data\'; creates a library, called new, which references the Windows directory 'C:\Data'; Keep in mind that a library defined by this step will exist only so long as the SAS session is open. In order to assign a library permanently in Windows, you must use the Explorer window. Double click on the Libraries icon, then click on the File pull down menu and select New You will get the New Library dialog box, depicted below:
Puts the name of the library in the "Name:" field, and the name of the system folder this corresponds to in the "Path" field. Make sure the "Enable at startup" box is checked, then click "OK." If you want to see what data sets are inside a given library or specific information about a single data set, you can use the PROC DATASETS procedure:
PROC DATASETS LIB=<libref>;
CONTENTS DATA=<member-name> | _ALL_ NODS;
RUN;
The | symbol is a SAS convention used to separate alternate options. So after the DATA= segment, one may either give a member-name or the _ALL_ NODS argument. The former would be used to obtain specific, extensive information about a data set and the latter would provide a list of members (files) within the library. SIMPLE STATISTICAL ANALYSES
This section will discuss two simple statistical operations, PROC MEANS and PROC UNIVARIATE. 1. PROC MEANS: The MEANS procedure displays simple descriptive statistics for the numeric variables in a SAS data set. Output from PROC MEANS is concise and information on all variables is presented on one report. Here is the syntax:
PROC MEANS DATA=<SAS-data-set>;
RUN;
Where <SAS-data-set> is the name of the SAS filename you wish to analyze. By default, PROC MEANS
EXAMPLE: This piece of code creates the data set.
data temp;
This piece of code analyzes the data.
proc means data = temp;
This is the resulting output.
2. PROC UNIVARIATE: The UNIVARIATE procedure displays simple
descriptive statistics for the numerical variables in a SAS data set,
including quantiles. Output from PROC UNIVARIATE consists of one page
per variable. Here is the syntax:
For each numeric variable, PROC UNIVARIATE provides information on the distribution of values, including the:
EXAMPLE: This piece of code analyzes the data using the previously-created
data set.
This is a limited version of the output.
Note that SAS is capable of many more advanced statistical operations. You may refer to manuals available for purchase through the SAS Institute, Inc. ( http://www.sas.com/service/doc/intro.html or 919-677-8000) to learn more about these procedures. PROC PRINT Another very important procedure is PROC PRINT. This will give you a
complete listing of the data values in a given set. The overall procedure
syntax is:
Where:
EXAMPLE: This code uses the data set created previously.
This is the output.
MACHINE-READABLE DATA Machine-readable data (MRD), such as information from the Inter-University Consortium on Political and social research (ICPSR), is available at http://scc01.rutgers.edu:80/datacenter/search1.htm. RECOMMENDED DOCUMENTATION Obtaining manuals for the SAS system is highly recommended. The following are all available from The SAS Institute's publications department ( http://www.sas.com/service/doc/intro.html or 919-677-8000):
webmaster@nbcs.rutgers.edu
|