In this blog we will explore the basic concept of Inner Join using SAS Merge & Proc SQL.
An inner join retrieve only the matched rows from the data-sets/tables.
Suppose we have two data-sets/tables Customer & Sales.
So an inner join of Customer and Sales gives the result of Customer intersect Sales, i.e. the inner part of a Venn diagram intersection. (See below image)
In SQL terminology, this is inner join. Equivalent to a merge in a DATA step in SAS.
To achieve the inner join using SAS merge, first we have to understand the concept of IN= option, which is especially useful when merging and concatenating data sets.
IN= option tells SAS to create an “indicator variable” having the value 0 or 1 depending on whether or not the current observation comes from the input data set.
- If the observation does come from the input data-set, then the indicator will be 1.
- If the observation does not come from the input data set, then the indicator variable value will be 0.
Lets have a example – Here is the Data-sets Customer & Sales.
DATA TEMP; MERGE CUSTOMER (IN=IN_CUSTOMER ) SALES(IN=IN_SALES); BY CUSTOMER_ID ; /* Creating new variables using indicator variables */ CUSTOMER = IN_CUSTOMER ; SALES = IN_SALES; RUN; Proc print data=temp;
See below output. Indicator Sales= 1 When observation coming from the input data-set Sales else indicator is 0, same applied for customer data set.
SAS Merge (Inner Join):
Lets go for SAS Merge (Inner Join) using IN= Options.
Prerequisites for a SAS Merge
- Input data-sets must have at least one common variable to merge with same name (In our case we have CUSTOMER_ID). If not same then use rename.
- Input data sets must be sorted by the common variable(s) that will be used to merge.
Let see how we can use IN= options to get the common data using Merge.
Select only those observation where IN variables Sales=1 & Customer=1 i.e. common observation from both datasets.
SAS Merge (Inner Join) –
DATA INNER_JOIN; MERGE CUSTOMER (IN=IN_CUSTOMER ) SALES(IN=IN_SALES); BY CUSTOMER_ID ; IF IN_CUSTOMER = 1 & IN_SALES = 1; RUN;
Inner Join using Proc SQL in SAS:
- Joining is possible on columns with differing names. (No need to rename)
- Input data sets doesn’t required sorting by the common variable(s)
1st way using Proc SQL-
PROC SQL; SELECT C.CUSTOMER_ID,C.COUNTRY, S.SALES FROM CUSTOMER AS C, SALES AS S WHERE C.CUSTOMER_ID=S.CUSTOMER_ID; QUIT;
2nd way using Proc SQl-
PROC SQL; SELECT C.CUSTOMER_ID,C.COUNTRY, S.SALES FROM CUSTOMER AS C JOIN SALES AS S ON C.CUSTOMER_ID=S.CUSTOMER_ID; QUIT;
Proc SQL Output –
Happy Learning! Your feedback would be appreciated!Follow @shobhitsinghIN