Inner Join: SAS Merge & Proc SQL

Folks,

In this blog we will explore the basic concept of Inner Join using SAS Merge & Proc SQL.


An inner join retrieve only the matched rows from the data-sets/tables.

Suppose we have two data-sets/tables Customer & Sales.

So an inner join of Customer and Sales gives the result of Customer intersect Sales, i.e. the inner part of a Venn diagram intersection. (See below image)

innerjoinn.png

In SQL terminology, this is inner join. Equivalent to a merge in a DATA step in SAS.

To achieve the inner join using SAS merge, first we have to understand the concept of IN= option, which is especially useful when merging and concatenating data sets.

IN= option

IN= option tells SAS to create an “indicator variable” having the value 0 or 1 depending on whether or not the current observation comes from the input data set.

  • If the observation does come from the input data-set, then the indicator will be 1.
  • If the observation does not come from the input data set, then the indicator variable value will be 0.

Lets have a example – Here is the Data-sets Customer & Sales.

inner

DATA TEMP;
MERGE CUSTOMER (IN=IN_CUSTOMER ) SALES(IN=IN_SALES);
BY CUSTOMER_ID ;

/* Creating new variables using indicator variables */
CUSTOMER = IN_CUSTOMER ;
SALES = IN_SALES;

RUN;
Proc print data=temp;

See below output. Indicator Sales= 1 When observation coming from the input data-set Sales  else indicator is 0, same applied for customer data set.

full

 

SAS Merge (Inner Join):

Lets go for SAS Merge (Inner Join) using IN= Options.

Prerequisites for a SAS Merge

  • Input data-sets must have at least one common variable to merge with same name (In our case we have CUSTOMER_ID). If not same then use rename.
  • Input data sets must be sorted by the common variable(s) that will be used to merge.

Let see how we can use IN= options to get the common data using Merge.

Select only those observation where IN variables Sales=1 & Customer=1 i.e. common observation from both datasets.joins.png

IN VAR

SAS Merge (Inner Join) –

DATA INNER_JOIN;
MERGE CUSTOMER (IN=IN_CUSTOMER ) SALES(IN=IN_SALES);
BY CUSTOMER_ID ;
IF IN_CUSTOMER = 1 & IN_SALES = 1;
RUN;

Output

output inner join

Inner Join using Proc SQL in SAS:

  • Joining is possible on columns with differing names. (No need to rename)
  • Input data sets doesn’t required sorting by the common variable(s)

1st way using Proc SQL-

 PROC SQL;
 SELECT C.CUSTOMER_ID,C.COUNTRY, S.SALES FROM 
 CUSTOMER AS C, SALES AS S
 WHERE C.CUSTOMER_ID=S.CUSTOMER_ID;
 QUIT;

2nd way using Proc SQl-

PROC SQL;
 SELECT C.CUSTOMER_ID,C.COUNTRY, S.SALES FROM 
 CUSTOMER AS C JOIN SALES AS S
 ON C.CUSTOMER_ID=S.CUSTOMER_ID;
 QUIT;

Proc SQL Output

proc sql


Thanks!

Happy Learning! Your feedback would be appreciated!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s