Wednesday, December 2, 2015

Descriptive Statistics in SAS (Part -1) - PROC MEANS and PROC SUMMARY

Data Source: 2015 Google Data for Unemployment

The most commonly used SAS Statistical procedures are Means, Summary, Frequency and Univariate we will take a detailed look at each of them.

Part 1- PROC MEANS

Syntax:
 PROC MEANS <DATA=SAS-data-set>
    <statistic-keyword(s)><option(s)>;
Run:

 
where
  • SAS-data-set is the name of the data set to be used
  • statistic-keyword(s) specify the statistics to compute
  • option(s) control the content, analysis, and appearance of output

 PROC MEANS prints the n-count (number of non missing values), the mean, the standard deviation, and the minimum and maximum values of every numeric variable in a data set. You may not always want this default statistics produced by PROC MEANS, so you can specific other statistic-keywords. Statistic-keywords that can be used with PROC MEANS are:

Descriptive Statistics
Keyword Description
 CLM Two-sided confidence limit for the mean
 CSS Corrected sum of squares
 CV Coefficient of variation
 KURTOSIS Kurtosis
 LCLM One-sided confidence limit below the mean
 MAX Maximum value
 MEAN Average
 MIN Minimum value
 N Number of observations with nonmissing values
 NMISS Number of observations with missing values
 RANGE Range
 SKEWNESS Skewness
 STDDEV / STD  Standard deviation
 STDERR Standard error of the mean
 SUM Sum
 SUMWGT Sum of the Weight variable values.
 UCLM One-sided confidence limit above the mean
 USS Uncorrected sum of squares
 VAR Variance

Quantile Statistics
Keyword Description
 MEDIAN / P50  Median or 50th percentile
 P1 1st percentile
 P5 5th percentile
 P10 10th percentile
 Q1 / P25 Lower quartile or 25th percentile
 Q3 / P75 Upper quartile or 75th percentile
 P90 90th percentile
 P95 95th percentile
 P99 99th percentile
 QRANGE Difference between upper and lower quartiles: Q3-Q1

Hypothesis Testing
Keyword Description
 PROBT  Probability of a greater absolute value for the t value
 T Student's t for testing the hypothesis that the population mean is 0


Let us see an example:

Write a program to import excel file, know the content of dataset and produce the mean of numeric variables in the data set work.unemployment

  The unemployment dataset contains 62000 observations but we calculated the means of first 10 observation by using obs option. As can be seen, by use of PROC CONTENTS statement we get to know the type of variable (char/num), a simple Means Procedure will produce n-count (number of nonmissing values), the mean, the standard deviation and the minimum and maximum values of  all the 8 numeric variable  present in the data set

 

 

 



Write a Proc Means using other descriptive statistics like max and maxdec.  Use var statement to limit the number of variables and also use class statement to group the observation.




We will now select only  max value of variables January_Employment, February_Employment, March_Employment and Total_Quarterly_Wages. Maxdecimal specifies the maximum number of decimal places in result. To produce separate analyses of grouped observations, add a CLASS statement to the MEANS procedure. PROC MEANS does not generate statistics for CLASS variables, because their values are used only to categorize data. CLASS variables can be either character or numeric. A BY statement can also be used but unlike CLASS Statement By statement requires data to be in sorted order or indexed in order. Also it will produce different

A BY statement can also be used but unlike CLASS Statement By statement requires data to be in sorted order or indexed in order. Also it will produce different output than class. As can be seen below BY statement produces separate table for each BY variable.

 

 


 

 

A summarized output data set can be created by using PROC SUMMARY. When you use PROC SUMMARY, you use the same code to produce the output data set that you would use with PROC MEANS.
The difference between the two procedures is that PROC MEANS produces a report by default (remember that you can use the NOPRINT option to suppress the default report). By contrast, to produce a report in PROC SUMMARY, you must include a PRINT option in the PROC SUMMARY statement.