The most commonly used SAS Statistical procedures are Means, Summary, Frequency and Univariate we will take a detailed look at each of them.
Part 1- PROC MEANS
Syntax:
PROC MEANS <DATA=SAS-data-set>
<statistic-keyword(s)><option(s)>;
Run:
where
- SAS-data-set is the name of the data set to be used
- statistic-keyword(s) specify the statistics to compute
- option(s) control the content, analysis, and appearance of output
PROC MEANS prints the n-count (number of non missing values), the mean, the standard deviation, and the minimum and maximum values of every numeric variable in a data set. You may not always want this default statistics produced by PROC MEANS, so you can specific other statistic-keywords. Statistic-keywords that can be used with PROC MEANS are:
Descriptive Statistics
Keyword | Description |
CLM | Two-sided confidence limit for the mean |
CSS | Corrected sum of squares |
CV | Coefficient of variation |
KURTOSIS | Kurtosis |
LCLM | One-sided confidence limit below the mean |
MAX | Maximum value |
MEAN | Average |
MIN | Minimum value |
N | Number of observations with nonmissing values |
NMISS | Number of observations with missing values |
RANGE | Range |
SKEWNESS | Skewness |
STDDEV / STD | Standard deviation |
STDERR | Standard error of the mean |
SUM | Sum |
SUMWGT | Sum of the Weight variable values. |
UCLM | One-sided confidence limit above the mean |
USS | Uncorrected sum of squares |
VAR | Variance |
Quantile Statistics
Keyword | Description |
MEDIAN / P50 | Median or 50th percentile |
P1 | 1st percentile |
P5 | 5th percentile |
P10 | 10th percentile |
Q1 / P25 | Lower quartile or 25th percentile |
Q3 / P75 | Upper quartile or 75th percentile |
P90 | 90th percentile |
P95 | 95th percentile |
P99 | 99th percentile |
QRANGE | Difference between upper and lower quartiles: Q3-Q1 |
Hypothesis Testing
Keyword | Description |
PROBT | Probability of a greater absolute value for the t value |
T | Student's t for testing the hypothesis that the population mean is 0 |
Let us see an example:
Write a program to import excel file, know the content of dataset and produce the mean of numeric variables in the data set work.unemployment
The unemployment dataset contains 62000 observations but we calculated the means of first 10 observation by using obs option. As can be seen, by use of PROC CONTENTS statement we get to know the type of variable (char/num), a simple Means Procedure will produce n-count (number of nonmissing values), the mean, the standard deviation and the minimum and maximum values of all the 8 numeric variable present in the data set
Write a Proc Means using other descriptive statistics like max and maxdec. Use var statement to limit the number of variables and also use class statement to group the observation.
We will now select only max value of variables January_Employment, February_Employment, March_Employment and Total_Quarterly_Wages. Maxdecimal specifies the maximum number of decimal places in result. To produce separate analyses of grouped observations, add a CLASS statement to the MEANS procedure. PROC MEANS does not generate statistics for CLASS variables, because their values are used only to categorize data. CLASS variables can be either character or numeric. A BY statement can also be used but unlike CLASS Statement By statement requires data to be in sorted order or indexed in order. Also it will produce different
A BY statement can also be used but unlike CLASS Statement By statement requires data to be in sorted order or indexed in order. Also it will produce different output than class. As can be seen below BY statement produces separate table for each BY variable.
The difference between the two procedures is that PROC MEANS produces a report by default (remember that you can use the NOPRINT option to suppress the default report). By contrast, to produce a report in PROC SUMMARY, you must include a PRINT option in the PROC SUMMARY statement.
No comments:
Post a Comment