Different use of Means Procedure in SAS

Different use of Means Procedure in SAS

Folks,

In this blog we will explore the different use of MEANS procedure in SAS.

Descriptive Statistics such as the Sum, Average, Minimum, Maximum, Range, Standard deviation etc. provide useful information about numeric data (Numeric Variable).

SAS MEANS procedure also provides helpful options for controlling your output.


Means Procedure Basics:

SAS MEANS procedure a way to generate summary reports. Descriptive Statistics such as the sum, min, max and means of your numeric variables data. However, Means procedure is much more versatile and can be used to create output summary data sets, which can be used in other DATA or PROC steps in SAS.

Procedure Syntax-

proc means <Data=SAS-Data-Set> <statistic-keyword(s)> <option(s)>; 
var variable(s); 
by variable(s); 
class variable(s) <option(s)>; 
id variable(s); 
output  <output-specification(s)>;

Where

  • SAS-data-set is the name of the SAS data set to be used for Means Procedure.
  • Statistic-keyword(s) specify the statistics to compute eg. min max mean sum etc.
  • Option(s) control the content, output analysis & appearance of output.
  • VAR identify the analysis variables and their order in the results.
  • BY calculate separate statistics for each BY group.
  • CLASS identify variables whose values define subgroups for the analysis report.
  • ID include additional identification variables in the output dataset.
  • OUTPUT create an output dataset that contains specified statistics and identification variables.

Identify Missing Values:

Suppose we have SAS dataset STUDENTS having 14 Observations, 3 Numeric type & 2 Character type variables.  (See below Content Procedure output)

contents

1

We have submitted below Means procedure code for STUDENT data set  without providing any statistics keywords & options.

proc means data=work.students;
run;

Here is the Output of Means Procedure.

out

So by default as shown above, the MEANS procedure produce N (the number of non-missing observations), Mean, Standard Deviation, Minimum and Maximum for all numeric variables in the input SAS dataset.

Using Options we can request additional statistics. Remember when statistics options are added, you must include those default requests if required. Again, only numeric variables can be added in the var statement.

Untitled

 

As we have already mentioned that STUDENTS dataset has 14 Observations. So here we can say that Variable STUDENT_AGE has some missing values in it, as it’s count is not 14.

 

Validate Numeric Data Range:

MEANS procedure can also be used to validate the numeric data because it produces summary reports displaying descriptive statistics (min, max & std).

It can show whether the values for a particular numeric variable are within their expected range or not.

Example:

proc means data=work.students;
run;

utl2

Output for the MEANS procedure displays a range of 27 to 399 for STUDENT_AGE  variable, which clearly shows that there is invalid data somewhere in the STUDENT_AGE column. Here we can say that data cleaning is required.

Additional Statistics & VAR Statement:

As we have already mentioned that PROC MEANS printsn-count (number of non-missing values), meanstandard deviation, minimum and maximum values of every numeric variable in a input SAS data set.

We can control which variables to include in the report by supplying a VAR statement.
Also selecting options in the PROC MEANS statement we can request additional statistics. Remember when statistics options are added, you must include those default requests if required.

Again, only numeric variables can be added in the var statement.

Example : Here we are requesting n std range skewness kurtosis statistics in Means procedure output for only STUDENT_AGE variable in STUDENTS SAS data set.

proc means data=work.students n std range skewness kurtosis;
var student_age;
run;

VAR.png

Group Processing – CLASS Statement:

It is used to categorize data in the output. It can be either character or numeric, but they should contain discrete values. If a CLASS statement is used, then the N Obs statistic is calculated which is based on the CLASS variables.

CLASS variable(s);

where variable(s) specifies category variables for group processing.

proc means data=work.students n max min std range q1 q3 qrange;
var STUDENT_WEIGHT STUDENT_HIEGHT;
class STUDENT_GENDER;
run;

class.png

Group Processing – BY Statement:

Like the CLASS statement, the BY statement also specifies variables to use for categorizing observations

BY variable(s);

where variable(s) specifies category variables for group processing.

Note: You have to first sort your data set by the variable or variables you list on the BY statement.

proc sort data=work.students;
by STUDENT_GENDER;
run;

proc means data=work.students max min std range q1 q3 qrange;
var STUDENT_WEIGHT STUDENT_HIEGHT ;
by STUDENT_GENDER;
run;

You now have your descriptive statistics for males and females separately. Along with one missing data in STUDENT_GENDER variable.

by.png

Difference between Class & By Statements:

  • CLASS statement is easier to use than the BY statement, as it doesn’t require a sorting step. If you have a very large data set which is not sorted, you may want to use a CLASS statement. However, if the data set is already in the correct sorted order, a BY statement is more efficient.
  • If you are using PROC MEANS to print a report and are not creating a summary output data set, the differences in the printed output between a BY and CLASS statement are basically related to layout. CLASS statement would produce a single large table & BY produce separate groups.

Creating Summarized Data Set – PROC MEANS

We can use PROC MEANS to create a new data set that contains summary information such as sums and means. This data set can then be used for further analysis.

OUTPUT OUT= SAS-data-set statistic=variable(s);
where

  • OUT= specifies the name of the output data set
  • statistic= specifies the summary statistic written out
  • variable(s) specifies the names of the variables to create. It represent the statistics for the analysis variables that are listed in the VAR statement.

Example 1: Without specifying any out variables in Output.

proc means data=work.students max min noprint ;
var STUDENT_WEIGHT STUDENT_HIEGHT;
output out = my_summary ;
run;
proc print;
run;

PROC MEANS produces a report by default, NOPRINT option to suppress the default report.

out

Example 2: Here Specifying variable names in out.

proc means data=work.students max min noprint ;
var STUDENT_WEIGHT STUDENT_HIEGHT;
output out = my_summary 
             max = MAX_STUDENT_WEIGHT MAX_STUDENT_HIEGHT
             min= MIN_STUDENT_WEIGHT MIN_STUDENT_HIEGHT;
run;

proc print;
run;

out var.png

Example 3: Using autoname keyword for variable names in out.

proc means data=work.students n max min std noprint ;
var STUDENT_WEIGHT ;
output out = my_summary 
             n = 
             max = 
             min = 
             std = / autoname;
run;

proc print;
run;

autoname.png

Example 4: Including BY Statement.

When you would like to output summary statistics for each
level of one or more classification variables

Remember you have to first sort your data set by the variable or variables you list on the BY statement.

proc means data=work.student max min noprint ; 
var STUDENT_WEIGHT STUDENT_HIEGHT; 
by student_gender; 
output out = my_summary02 
       max = MAX_STUDENT_WEIGHT MAX_STUDENT_HIEGHT 
       min= MIN_STUDENT_WEIGHT MIN_STUDENT_HIEGHT; 
run;

proc print;
run;

by output.png

In this data set, FREQ represents the number of observations for each value of gender.

Example 5: Including CLASS Statement.

proc means data=work.student max min noprint ;
var STUDENT_WEIGHT STUDENT_HIEGHT;
class student_gender;
output out = my_summary03 
             max = MAX_STUDENT_WEIGHT MAX_STUDENT_HIEGHT
             min= MIN_STUDENT_WEIGHT MIN_STUDENT_HIEGHT;
run;
proc print;
run;

output class.png

Here in the above output first observation in this data set, TYPE equal to 0, is the mean for both males and females (Grand Mean) & where TYPE equal to 1 represent the means for females and males separately.

If you do not want an observation with the grand mean (see above output TYPE equal to 0) in your output data set, use the NWAY option of PROC MEANS.

You can add multiple class variables in the class statement. Adding two classification variables to the CLASS statement enables you to group your analysis into multiple levels.


Thanks!

Happy Learning! Your feedback would be appreciated!

 

 

 

 

Stock Market Analysis Using R

Stock Market Analysis Using R

Folks,

In this blog we will learn how to extract & analyze the Stock Market data using R!

Using quantmod package first we will extract the Stock data after that we will create some charts for analysis.

Quantmod – “Quantitative Financial Modeling and Trading Framework for R”!

R Package designed to assist the quantitative trader in the development, testing, and deployment of statistically based trading models. It has many features so check out its link.

Check out this blog for Quantmod getSymol R Shiny App – Link


R Packages Required:-

install.packages("quantmod")

Extracting Stock Market Data–

Functions getSymbols: It load and manage data from Multiple Sources.

getSymbols(“SYMBOL”, src=”SOURCE” , from =”YYYY-MM-DD”, to = “YYYY-MM-DD”)

Some src methods are: yahoo, google, oanda etc.

In this blog we will first extract Bombay Stock Exchange Data using yahoo finance source. Bombay Stock Exchange Index/Symbol – BSESN 

1) Analyze One Month Data of Bombay Stock Exchange- 

library(quantmod)

getSymbols("^BSESN",src="yahoo" , from ="2016-10-23", to = Sys.Date())

View(BSESN)

Here is the BSESN (xts Object) Output Data. Here you can see different columns having data for Open, High, Low, Close, Volume & Adjusted Stock Price.

High refers to the highest price of the stock touched the same day, Low refer to the lowest price the stock was traded on the same day, Close refers to the closing price of that particular stock and the Volume refer to the number of share traded that day.

1.png

Output Charts:- 

chart_Series(BSESN)

As you can see in below chart there was huge dip after 8 Nov 2016, may be this is due to demonetization in India.

3.JPG

2) Analyze One Year Data of Bombay Stock Exchange- 

getSymbols("^BSESN",src="yahoo" , from ="2015-10-23", to = Sys.Date())

chart_Series(BSESN,type = "candlesticks")

Output Chart:-

4.JPG

2) Complete Data of Bombay Stock Exchange– 

It will provide you all data after 2007.

getSymbols("^BSESN",src="yahoo")

Quantmod has some other features. For more details, please visit this Link.


Thanks!

Happy Learning! Your feedback would be appreciated!

Exploring Instagram API using R!

Folks,

In this blog we will explore the basics of Instagram API using R.

“instaR” Package in R: Provides an interface to the Instagram API , which allows R users to access public users’ profile data.

Install “instaR” package from CRAN : install.packages(“instaR”)

Install “RCurl” package from CRAN : install.packages(“RCurl”)


Step 1: Registering an Application with Instagram.

If you already have an account with Instagram, go to Instagram Developer and register.

1.png

Click “Register your application” button. After you register as a Instagram developer, you can go to  a Manage clients & register a new client.

Note down your App Name, Client ID & Client Secret

2


Step 2: Create OAuth token to Instagram R session.

instaOAuth creates an OAuth access token that enables R to make authenticated calls to the Instagram API.

instaOAuth( client_id, client_secret, scope = “basic”)

Scope is related to the access permissions. Read Login Permissions (Scopes) Link

my_app_client_id  <- “”
my_app_client_secret <- “”

code 1

Now run the  instaOAuth command in R. See below format.

code 3

For setting the ‘redirect_uri’. Go to Manage Clients.

Copy and paste http://localhost:1410/ into ‘redirect_uri’ on Instagram App Settings.

2.png

After setting the redirect uri press enter on R command window. Automatic browser will open below page & ask for the Authorization.

r 1

After Authorizing the app. Below message will come.

r 2

See below message “Authentication Complete”. Now the connection is done with the Instagram API & R.

comp.PNG

Extracting the Token from OAuth. Just type below command.

code 5

The token can be saved as a file in disk to be re-used in future sessions.

saving auth

Now we have token = my_access_token. So here we goes with some of the basic end points of Instagram API.

Instagram End Points Links: Here


1) Getting the information from Instagram for token owner.

End Points: https://api.instagram.com/v1/users/self/?access_token=ACCESS-TOKEN

Using R

json1.PNG

Output:

11111111111.png

Using Postman API Client

End Points: https://api.instagram.com/v1/users/self/?access_token=ACCESS-TOKEN

Output: JSON/XML

pstman.png

2) Get the most recent media published on Instagram by token owner.

https://api.instagram.com/v1/users/self/media/recent/?access_token=ACCESS-TOKEN

Using Postman API Client

Output:

json3.png

Actual Image:

sunburn.png


Checking the details of any public image.

Hit this endpoint: https://api.instagram.com/oembed/?url=YOUR URL

See below gif for explanation. I’m using postman client for API testing.

output_wlPtEf.gif

You can get/post API request in your own way! Explore more End Points here: Link

PS: In Sandbox mode we cannot use extended permissions. Read Login Permissions (Scopes) here Link


Thanks!

Happy Learning!

Mining Facebook Data Using R & Facebook API!

Mining Facebook Data Using R & Facebook API!

Folks,

In this blog we will learn the basics of extracting Facebook data using R & Facebook API.

Rfacebook Package: Provides an interface to the Facebook API.

Rfacebook package in R provides functions that allow R to access Facebook’s API to get information about posts, comments, likes, group that mention specific keywords & much more.

Install “Rfacebook” package from CRAN : install.packages(“Rfacebook”) 


Step 1: Registering an Application with Facebook.

If you already have an account with Facebook, go to FacebookDeveloper and register.

Click “Register Now” button. After you register as a Facebook developer, you can register a new application.

Register a new application

From  FacebookDeveloper click on Apps at the top of the page to go to the application dashboard.

Click the fb-create-new-app-button button near the top. Once you are done with the verification process, your application is created. Note down the App Id & App Secret.

1
Application Dashboard

Site URL on Facebook App Settings: http://localhost:1410 

6.PNG
App Settings

Step 2: Create OAuth token to Facebook R session.

fbOAuth creates a long-lived OAuth access token that enables R to make authenticated calls to the Facebook API.

fbOAuth(app_id=””,app_secret=”YourAppSecret”)

4

5

Saving “my_oauth”as a file to be re-used in future sessions, which can be used as token in functions.

7.PNG

Now the connection is done with the Facebook API & R. So here we goes with some of the basic functions.

If you are getting below error while calling function, please go the bottom of this blog post for fix.

Error in callAPI(query, token) :
An active access token must be used to query information about the current user.

function getLikes

getLikes(user, n = n , token): Extract list of liked pages of a Facebook user with page id.

Arguments: user: user name/ID , n: Number of liked pages to return for user.

Use below command in R to get likes.

8.PNG

Here is the my result data set “my_likes”. With three variables: Id, names & website of pages.

view.PNG

9.PNG
Pages Liked

function getPage 

getPage(page , token, n = n): Extract list of posts from a public Facebook page.

Arguments: page: Page ID or page name, n : Number of posts to return for page.

Example: Extracting 10 posts of Facebook Page “Narendra Modi”. 

Facebook Page Id of this page you can get from this link: http://findmyfbid.com/ . See below the gif.

output_LJA4Du.gif

After getting the page id use below command in R to get posts.

getpage.PNG

Here is my result data set “getpagedata”. With 10 Observation & 10 Variables. Variables like Post with likes_count, share_count & comments_count etc.

padess.PNG

1a

1b


function search_groups

searchGroup(“text”,token, n = n): Find any group with its privacy status & Facebook ID.

Arguments: text: text string, n : Number of groups to return.

Use below command in R to search groups.

groups.PNG

Here is my result data set “search_groups”. With many observation & 3 Variables.

11.png
Groups


function getGroup

getGroup(ID, token, n = n): Extract list of posts from a public Facebook page. Whose privacy is open.

Arguments: ID: Group ID , n: Number of posts to return for group.

Example: We have to extract 10 posts from 7th group “Web Scraping and Data mining” present in above image groups.

Use below command in R to get groups.

12

Here is my result data set. With 10 Observation & 3 Variables.

postttttttttt

12a


function searchPages

searchPages(, token, n = n): It Search pages that having a string/keyword.

Arguments: string: any string , n: Number of pages to return

Example: We have to search 10 pages that mention a string “Sports”.

Use below command in R to search pages.

sports.PNG

Here is my result data set with 10 Observation & 16 Variables.

s1

s2


function updateStatus

updateStatus(“text”, token)

Arguments: text: any string , token

Use below command in R to update Facebook status.

status1.PNG

Result:

status2


 

Fix for Error in callAPI:

If you are getting below error, please follow below steps to fix this issue.

Error in callAPI(query, token) :
An active access token must be used to query information about the current user.

Run Below Function First :-

fbOAuth <- function(app_id, app_secret, extended_permissions=FALSE, legacy_permissions=FALSE, scope=NULL)
{
  ## getting callback URL
  full_url <- oauth_callback()
  full_url <- gsub("(.*localhost:[0-9]{1,5}/).*", x=full_url, replacement="\\1")
  message <- paste("Copy and paste into Site URL on Facebook App Settings:",
                   full_url, "\nWhen done, press any key to continue...")
  ## prompting user to introduce callback URL in app page
  invisible(readline(message))
  ## a simplified version of the example in httr package
  facebook <- oauth_endpoint(
    authorize = "https://www.facebook.com/dialog/oauth",
    access = "https://graph.facebook.com/oauth/access_token") 
  myapp <- oauth_app("facebook", app_id, app_secret)
  if (is.null(scope)) {
    if (extended_permissions==TRUE){
      scope <- c("user_birthday", "user_hometown", "user_location", "user_relationships",
                 "publish_actions","user_status","user_likes")
    }
    else { scope <- c("public_profile", "user_friends")}
  
    if (legacy_permissions==TRUE) {
      scope <- c(scope, "read_stream")
    }
  }

  if (packageVersion('httr') < "1.2"){
    stop("Rfacebook requires httr version 1.2.0 or greater")
  }

  ## with early httr versions
  if (packageVersion('httr') <= "0.2"){
    facebook_token <- oauth2.0_token(facebook, myapp,
                                     scope=scope)
    fb_oauth <- sign_oauth2.0(facebook_token$access_token)
    if (GET("https://graph.facebook.com/me", config=fb_oauth)$status==200){
      message("Authentication successful.")
    }
  }

  ## less early httr versions
  if (packageVersion('httr') > "0.2" & packageVersion('httr') <= "0.6.1"){
    fb_oauth <- oauth2.0_token(facebook, myapp,
                               scope=scope, cache=FALSE) 
    if (GET("https://graph.facebook.com/me", config(token=fb_oauth))$status==200){
      message("Authentication successful.")
    } 
  }

  ## httr version from 0.6 to 1.1
  if (packageVersion('httr') > "0.6.1" & packageVersion('httr') < "1.2"){
    Sys.setenv("HTTR_SERVER_PORT" = "1410/")
    fb_oauth <- oauth2.0_token(facebook, myapp,
                               scope=scope, cache=FALSE) 
    if (GET("https://graph.facebook.com/me", config(token=fb_oauth))$status==200){
      message("Authentication successful.")
    } 
  }

  ## httr version after 1.2
  if (packageVersion('httr') >= "1.2"){
    fb_oauth <- oauth2.0_token(facebook, myapp,
                               scope=scope, cache=FALSE) 
    if (GET("https://graph.facebook.com/me", config(token=fb_oauth))$status==200){
      message("Authentication successful.")
    } 
  }

  ## identifying API version of token
  error <- tryCatch(callAPI('https://graph.facebook.com/pablobarbera', fb_oauth),
                    error = function(e) e)
  if (inherits(error, 'error')){
    class(fb_oauth)[4] <- 'v2'
  }
  if (!inherits(error, 'error')){
    class(fb_oauth)[4] <- 'v1'
  }

  return(fb_oauth)
}

111

script.png

Fix Credit goes to (www.listendata.com/) Visit this link for more information.


Thanks!

Happy Learning!