SAS by Stevy: October 2015

String extraction using SAS Scan and Substr functions

In my last blog we saw string extraction using perl function. Today we are going to extract and concatenate character string using SAS Scan and Substr functions.

Syntax:

SCAN(argument,n,<delimiters>)

where
argument specifies the character variable or expression to scan.
n specifies which word to read.
delimiters are special characters that must be enclosed in single quotation marks (' '). If you do not specify delimiters, default delimiters are used.

SUBSTR(argument,position,<n>)
where
argument specifies the character variable or expression to scan.
position is the character position to start from.
n specifies the number of characters to extract. If n is omitted, all remaining characters are included in the substring.

Let us see an example:

In the dataset work.Company there are four columns Name, Age, Sex and SSN.
We wants to create a new data set employeeID with a column named ID which contains unique Identification number for the employees based on their last name and last four digits of their SSN.

Company

Obs	Name	Age	Sex	SSN
1	Morrison, Michael	32	M
2	Rudelich, Herbert	39	M	129-42-9261
3	Vincent, Martina	34	F	074-53-9892
4	Benito, Gisela	32	F	228-88-9649
5	Sirignano, Emily	12	F	442-21-8075
6	Harbinger, Nicholas	36	M	446-93-2122
7	Phillipon, Marie-Odile	28	F	776-84-5391
8	Gunter, Thomas	27	M	929-75-0218

data employeeID;
set company;
ID = (scan(name, 1, ","))!!(substr(SSN,8,4));
run;
proc print data=employeeID;
run;

Explanation:
scan(name, 1, ",")
name is the character variable to scan
1 is the position of the word to read
"," is the delimiter

substr(SSN,8,4)
SSN is the character variable to scan.
8 is the character position to start from.
4 specifies the number of characters to extract

Result

Obs	Name	Age	Sex	SSN	ID
1	Morrison, Michael	32	M		Morrison
2	Rudelich, Herbert	39	M	129-42-9261	Rudelich9261
3	Vincent, Martina	34	F	074-53-9892	Vincent9892
4	Benito, Gisela	32	F	228-88-9649	Benito9649
5	Sirignano, Emily	12	F	442-21-8075	Sirignano8075
6	Harbinger, Nicholas	36	M	446-93-2122	Harbinger2122
7	Phillipon, Marie-Odile	28	F	776-84-5391	Phillipon5391
8	Gunter, Thomas	27	M	929-75-0218	Gunter0218

String Extraction Using Perl Regular Expressions (PRX) in SAS

The PRXMATCH function searches source with the perl-regular-expression and returns the position at which the string begins. If there is no match, PRXMATCH returns a zero.

syntax : PRXMATCH (perl-regular-expression, source)

Suppose we have a list of companies and their addresses :

7‑Eleven, Inc.community relations department dallas, TX 75221-0711

20 Century Fox 10201 pico blvd, los angeles, CA 90064

APACHE prime four business park kingswells scotland, United Kingdom

We wish to extract the names of the companies. The names of the companies starts with either a upper-case letters or numbers while their addresses are in the lower-case. To do so we are going to the string extraction with the help of PRXMATCH Function.

Explanaton:

prxmatch ('/\b[a-z]\w*\b/')

In this example we are using forward slashes (/) as perl dilimiters.

\b is word boundary (a space or end-of-line)

[a-z] matches lower-case letters

\w matches any word character (upper- and lowercase letters, blank and underscore)

* matches the previous subexpression zero or more times

As can be seen in the result PRXMATCH function helped us to extract the names of the companies.

Examining the spread of Ebola in different countries using SAS

Data source : Sub-national time series data on Ebola cases and deaths in Guinea, Liberia, Sierra Leone, Nigeria, Senegal and Mali since March 2014

https://data.hdx.rwlabs.org/dataset/rowca-ebola-cases

Google data sets provides you with current information on the extent of spread of Ebola. We will work on this data set and learn how to

import excel file into SAS
write a basic SAS sql program

Program to import excel file in SAS

proc import DBMS=xls out=work.ebola

datafile="/folders/myfolders/ebola/Ebola1.xls";

run;

After we are done importing the file into SAS, we want to find which country has maximum outbreak of Ebola area-wise .

The following sql lists the extent of spread of Ebola in different countries. The result gives us country name and the count of different localites affected by ebola.

the output is as follows

Thursday, October 29, 2015

String extraction using SAS Scan and Substr functions

SCAN(argument,n,<delimiters>)

Let us see an example:

String Extraction Using Perl Regular Expressions (PRX) in SAS

Examining the spread of Ebola in different countries using SAS

Examining the spread of Ebola in different countries using SAS