Extracting Twitter Trends using R Script

Extracting Twitter Trends using R Script

Folks,

In this blog we will learn how to extract the twitter trends from twitteR Package using R, after that we will learn how to save data into SQL Server using ODBC connection!

R Packages required: 

 

  • install.packages(“twitteR”): It provides an interface to the Twitter web API. For twitteR App & OAuth access setup – Visit this blog.
  • install.packages(“RODBC”) for OBDC connectivity. For ODBC connection setup for SQL Server Database/Oracle – Visit this blog .

Let’s get started!

twitteR package has getTrends function that can be used to extract the twitter trends based on a input parameter (woeid).

A WOEID (Where On Earth IDentifier) is a unique 32-bit reference identifier, originally defined by GeoPlanet and now assigned by Yahoo!, that identifies any feature on Earth.[Source: Wikipedia]

See below the WOEID for INDIA. You can use this link for WOEID look up.

woeid.png

Here is the Script for extracting the twitter trend & saving data into SQL Server for further analysis.

 
#------------------------------------------------------------------------#
# R Script Name: GetTrends.R
# R Script Description: This script collect the twitter trend data 
# from twitter API and dump into database.
#------------------------------------------------------------------------#
#-- Importing Required library
library("twitteR", lib.loc="~/R/win-library/3.3")
library("RODBC", lib.loc="~/R/win-library/3.3")

#-- Provide woeid from internet as per your requirement
#-- Below woeid is for INDIA
woeid <-23424848

#-- Fetching Access Keys for Twitter API, which is present in local file 
OAuth_Location <- "C:\\Users\\lenovo\\Desktop\\OAuth.csv"
twitter_OAuth <- read.csv(file=OAuth_Location, header=TRUE, sep=",",colClasses = "character")

  #-- Calling twitteR OAuth function
  setup_twitter_oauth(twitter_OAuth$Consumer_API_Key, twitter_OAuth$Consumer_API_Secret, 
                      twitter_OAuth$Access_Token, twitter_OAuth$Access_Token_Secret)

  #-- Extracting Trends using getTrends Function
  current_trends  <-  getTrends(woeid) 
  current_trends["trend_date"]  <-  Sys.Date()

  #-- Opening OBBC Connection and Saving Trends in Database
  my_conn  <-  odbcConnect("RSQL", uid="***", pwd="***")
  sqlSave(my_conn, current_trends, tablename = "t_current_trends", append = TRUE)

  #-- Closing/Removing unwanted values and data
  #remove(list=c("current_trends","woeid","OAuth_Location","twitter_OAuth"))
  close(my_conn)

 

Output:- IPL Cricket fever in INDIA, that’s why #KKRvSRH is trending on top today  🙂

trend.png

You can also scheduled this R Script using the windows scheduler, so that you get your trends in scheduled time automatically. For scheduling R script visit this blog.


Thanks!

Happy Learning! Your feedback would be appreciated!

 

Inner Join: SAS Merge & Proc SQL

Folks,

In this blog we will explore the basic concept of Inner Join using SAS Merge & Proc SQL.


An inner join retrieve only the matched rows from the data-sets/tables.

Suppose we have two data-sets/tables Customer & Sales.

So an inner join of Customer and Sales gives the result of Customer intersect Sales, i.e. the inner part of a Venn diagram intersection. (See below image)

innerjoinn.png

In SQL terminology, this is inner join. Equivalent to a merge in a DATA step in SAS.

To achieve the inner join using SAS merge, first we have to understand the concept of IN= option, which is especially useful when merging and concatenating data sets.

IN= option

IN= option tells SAS to create an “indicator variable” having the value 0 or 1 depending on whether or not the current observation comes from the input data set.

  • If the observation does come from the input data-set, then the indicator will be 1.
  • If the observation does not come from the input data set, then the indicator variable value will be 0.

Lets have a example – Here is the Data-sets Customer & Sales.

inner

DATA TEMP;
MERGE CUSTOMER (IN=IN_CUSTOMER ) SALES(IN=IN_SALES);
BY CUSTOMER_ID ;

/* Creating new variables using indicator variables */
CUSTOMER = IN_CUSTOMER ;
SALES = IN_SALES;

RUN;
Proc print data=temp;

See below output. Indicator Sales= 1 When observation coming from the input data-set Sales  else indicator is 0, same applied for customer data set.

full

 

SAS Merge (Inner Join):

Lets go for SAS Merge (Inner Join) using IN= Options.

Prerequisites for a SAS Merge

  • Input data-sets must have at least one common variable to merge with same name (In our case we have CUSTOMER_ID). If not same then use rename.
  • Input data sets must be sorted by the common variable(s) that will be used to merge.

Let see how we can use IN= options to get the common data using Merge.

Select only those observation where IN variables Sales=1 & Customer=1 i.e. common observation from both datasets.joins.png

IN VAR

SAS Merge (Inner Join) –

DATA INNER_JOIN;
MERGE CUSTOMER (IN=IN_CUSTOMER ) SALES(IN=IN_SALES);
BY CUSTOMER_ID ;
IF IN_CUSTOMER = 1 & IN_SALES = 1;
RUN;

Output

output inner join

Inner Join using Proc SQL in SAS:

  • Joining is possible on columns with differing names. (No need to rename)
  • Input data sets doesn’t required sorting by the common variable(s)

1st way using Proc SQL-

 PROC SQL;
 SELECT C.CUSTOMER_ID,C.COUNTRY, S.SALES FROM 
 CUSTOMER AS C, SALES AS S
 WHERE C.CUSTOMER_ID=S.CUSTOMER_ID;
 QUIT;

2nd way using Proc SQl-

PROC SQL;
 SELECT C.CUSTOMER_ID,C.COUNTRY, S.SALES FROM 
 CUSTOMER AS C JOIN SALES AS S
 ON C.CUSTOMER_ID=S.CUSTOMER_ID;
 QUIT;

Proc SQL Output

proc sql


Thanks!

Happy Learning! Your feedback would be appreciated!

Different use of Means Procedure in SAS

Different use of Means Procedure in SAS

Folks,

In this blog we will explore the different use of MEANS procedure in SAS.

Descriptive Statistics such as the Sum, Average, Minimum, Maximum, Range, Standard deviation etc. provide useful information about numeric data (Numeric Variable).

SAS MEANS procedure also provides helpful options for controlling your output.


Means Procedure Basics:

SAS MEANS procedure a way to generate summary reports. Descriptive Statistics such as the sum, min, max and means of your numeric variables data. However, Means procedure is much more versatile and can be used to create output summary data sets, which can be used in other DATA or PROC steps in SAS.

Procedure Syntax-

proc means <Data=SAS-Data-Set> <statistic-keyword(s)> <option(s)>; 
var variable(s); 
by variable(s); 
class variable(s) <option(s)>; 
id variable(s); 
output  <output-specification(s)>;

Where

  • SAS-data-set is the name of the SAS data set to be used for Means Procedure.
  • Statistic-keyword(s) specify the statistics to compute eg. min max mean sum etc.
  • Option(s) control the content, output analysis & appearance of output.
  • VAR identify the analysis variables and their order in the results.
  • BY calculate separate statistics for each BY group.
  • CLASS identify variables whose values define subgroups for the analysis report.
  • ID include additional identification variables in the output dataset.
  • OUTPUT create an output dataset that contains specified statistics and identification variables.

Identify Missing Values:

Suppose we have SAS dataset STUDENTS having 14 Observations, 3 Numeric type & 2 Character type variables.  (See below Content Procedure output)

contents

1

We have submitted below Means procedure code for STUDENT data set  without providing any statistics keywords & options.

proc means data=work.students;
run;

Here is the Output of Means Procedure.

out

So by default as shown above, the MEANS procedure produce N (the number of non-missing observations), Mean, Standard Deviation, Minimum and Maximum for all numeric variables in the input SAS dataset.

Using Options we can request additional statistics. Remember when statistics options are added, you must include those default requests if required. Again, only numeric variables can be added in the var statement.

Untitled

 

As we have already mentioned that STUDENTS dataset has 14 Observations. So here we can say that Variable STUDENT_AGE has some missing values in it, as it’s count is not 14.

 

Validate Numeric Data Range:

MEANS procedure can also be used to validate the numeric data because it produces summary reports displaying descriptive statistics (min, max & std).

It can show whether the values for a particular numeric variable are within their expected range or not.

Example:

proc means data=work.students;
run;

utl2

Output for the MEANS procedure displays a range of 27 to 399 for STUDENT_AGE  variable, which clearly shows that there is invalid data somewhere in the STUDENT_AGE column. Here we can say that data cleaning is required.

Additional Statistics & VAR Statement:

As we have already mentioned that PROC MEANS printsn-count (number of non-missing values), meanstandard deviation, minimum and maximum values of every numeric variable in a input SAS data set.

We can control which variables to include in the report by supplying a VAR statement.
Also selecting options in the PROC MEANS statement we can request additional statistics. Remember when statistics options are added, you must include those default requests if required.

Again, only numeric variables can be added in the var statement.

Example : Here we are requesting n std range skewness kurtosis statistics in Means procedure output for only STUDENT_AGE variable in STUDENTS SAS data set.

proc means data=work.students n std range skewness kurtosis;
var student_age;
run;

VAR.png

Group Processing – CLASS Statement:

It is used to categorize data in the output. It can be either character or numeric, but they should contain discrete values. If a CLASS statement is used, then the N Obs statistic is calculated which is based on the CLASS variables.

CLASS variable(s);

where variable(s) specifies category variables for group processing.

proc means data=work.students n max min std range q1 q3 qrange;
var STUDENT_WEIGHT STUDENT_HIEGHT;
class STUDENT_GENDER;
run;

class.png

Group Processing – BY Statement:

Like the CLASS statement, the BY statement also specifies variables to use for categorizing observations

BY variable(s);

where variable(s) specifies category variables for group processing.

Note: You have to first sort your data set by the variable or variables you list on the BY statement.

proc sort data=work.students;
by STUDENT_GENDER;
run;

proc means data=work.students max min std range q1 q3 qrange;
var STUDENT_WEIGHT STUDENT_HIEGHT ;
by STUDENT_GENDER;
run;

You now have your descriptive statistics for males and females separately. Along with one missing data in STUDENT_GENDER variable.

by.png

Difference between Class & By Statements:

  • CLASS statement is easier to use than the BY statement, as it doesn’t require a sorting step. If you have a very large data set which is not sorted, you may want to use a CLASS statement. However, if the data set is already in the correct sorted order, a BY statement is more efficient.
  • If you are using PROC MEANS to print a report and are not creating a summary output data set, the differences in the printed output between a BY and CLASS statement are basically related to layout. CLASS statement would produce a single large table & BY produce separate groups.

Creating Summarized Data Set – PROC MEANS

We can use PROC MEANS to create a new data set that contains summary information such as sums and means. This data set can then be used for further analysis.

OUTPUT OUT= SAS-data-set statistic=variable(s);
where

  • OUT= specifies the name of the output data set
  • statistic= specifies the summary statistic written out
  • variable(s) specifies the names of the variables to create. It represent the statistics for the analysis variables that are listed in the VAR statement.

Example 1: Without specifying any out variables in Output.

proc means data=work.students max min noprint ;
var STUDENT_WEIGHT STUDENT_HIEGHT;
output out = my_summary ;
run;
proc print;
run;

PROC MEANS produces a report by default, NOPRINT option to suppress the default report.

out

Example 2: Here Specifying variable names in out.

proc means data=work.students max min noprint ;
var STUDENT_WEIGHT STUDENT_HIEGHT;
output out = my_summary 
             max = MAX_STUDENT_WEIGHT MAX_STUDENT_HIEGHT
             min= MIN_STUDENT_WEIGHT MIN_STUDENT_HIEGHT;
run;

proc print;
run;

out var.png

Example 3: Using autoname keyword for variable names in out.

proc means data=work.students n max min std noprint ;
var STUDENT_WEIGHT ;
output out = my_summary 
             n = 
             max = 
             min = 
             std = / autoname;
run;

proc print;
run;

autoname.png

Example 4: Including BY Statement.

When you would like to output summary statistics for each
level of one or more classification variables

Remember you have to first sort your data set by the variable or variables you list on the BY statement.

proc means data=work.student max min noprint ; 
var STUDENT_WEIGHT STUDENT_HIEGHT; 
by student_gender; 
output out = my_summary02 
       max = MAX_STUDENT_WEIGHT MAX_STUDENT_HIEGHT 
       min= MIN_STUDENT_WEIGHT MIN_STUDENT_HIEGHT; 
run;

proc print;
run;

by output.png

In this data set, FREQ represents the number of observations for each value of gender.

Example 5: Including CLASS Statement.

proc means data=work.student max min noprint ;
var STUDENT_WEIGHT STUDENT_HIEGHT;
class student_gender;
output out = my_summary03 
             max = MAX_STUDENT_WEIGHT MAX_STUDENT_HIEGHT
             min= MIN_STUDENT_WEIGHT MIN_STUDENT_HIEGHT;
run;
proc print;
run;

output class.png

Here in the above output first observation in this data set, TYPE equal to 0, is the mean for both males and females (Grand Mean) & where TYPE equal to 1 represent the means for females and males separately.

If you do not want an observation with the grand mean (see above output TYPE equal to 0) in your output data set, use the NWAY option of PROC MEANS.

You can add multiple class variables in the class statement. Adding two classification variables to the CLASS statement enables you to group your analysis into multiple levels.


Thanks!

Happy Learning! Your feedback would be appreciated!

 

 

 

 

Latest Oracle SQL – Scenario Questions for Interview!

Latest Oracle SQL – Scenario Questions for Interview!

Folks,

In this blog we will explore some latest Oracle SQL – Scenario questions for Interview!


Scenario 1:  

Select numeric & character values in separate columns using data present in single column.

Input Data – One column having both numeric & character values in it. See below sample data.

Untitled.png

Output – Show numeric & character values in two separate columns using select query only. See below output.

output

Solution of Scenario 1:

Step 1: Separate the data values using REGEXP_LIKE & digit class [:digit:].

select  null as numeric_value, value as character_value from data WHERE REGEXP_LIKE(value, '[^[:digit:]]')
union all
select value as numeric_value, null as character_value from data WHERE REGEXP_LIKE(value, '[[:digit:]]')

Output of Step 1:

3.png

Step 2 – Remove null values from the data.

select nv.numeric_value, cv.character_value
from

( select  numeric_value , rownum rnum from
(select  null as numeric_value, value as character_value from data WHERE REGEXP_LIKE(value, '[^[:digit:]]')
union all
select value as numeric_value, null as character_value from data WHERE REGEXP_LIKE(value, '[[:digit:]]')
)
where numeric_value is not null ) nv 

full join 

(select  character_value , rownum rnum from
(select  null as numeric_value, value as character_value from data WHERE REGEXP_LIKE(value, '[^[:digit:]]')
union all
select value as numeric_value, null as character_value from data WHERE REGEXP_LIKE(value, '[[:digit:]]')
)
where character_value is not null ) cv

on nv.rnum=cv.rnum

Output Step 2:

4.png


Scenario 2:  

Print data on the basis of Even/Odd data present in Id Column.

Input Data – See below sample data.

Untitled.png

Output Required: Suppose we have to print the Even ID – Values

output.png

Solution of Scenario 2:

Output 2.1:  – With null values

select decode(mod(id,2),0,value, null) as value from data ;

3.png

Output 2.2:  – Without null values

select value
  from data
where id in (
              select decode(mod(id,2),0,id,null) from data
            );

4.png

For Odd – Just change the decode condition, see below code.

-- Without Null Values
select value from data where id in ( select decode(mod(id,2),0,null,id) from data );

-- With Null Values
select decode(mod(id,2),0,null,value) as value from data ; 

 


 

Scenario 3: 

Fetch alternate data from table – Using Even/Odd Rowid.

Suppose we have only one column in the table & we have to fetch alternate records using rowid & rownum concepts.

Input Data – 

Untitled.png

Solution of Scenario 3:

For Even Rowid- 

-- Without Null Values
select value
  from data
where rowid in (
              select decode(mod(rownum,2),0,rowid, null)  from data
            );

-- With Null Values
 select decode(mod(rownum,2),0,value, null) as value from data ;

3

4

For Odd Rowid- 

-- Without Null Values
select value
  from data
where rowid in (
              select decode(mod(rownum,2),0, null,  rowid)  from data
            );

-- With Null Values
 select decode(mod(rownum,2),0, null,  value) as value from data ;

3.png


 

Scenario 4: 

Print like this using select query.


2 * 1 = 2
2 * 2 = 4
2 * 3 = 6
2 * 4 = 8
2 * 5 = 10
2 * 6 = 12
2 * 7 = 14
2 * 8 = 16
2 * 9 = 18
2 * 10 = 20

Solution of Scenario 4:

SELECT '2 * ' || rownum  || ' = ' || rownum *2 as t
  FROM DUAL
CONNECT BY rownum  <= 10

output.png


 

Scenario 5:

Print how many ‘e’ in ‘elephant’ using select query.

Solution

 

select length('elephant') - length(replace('elephant', 'e', '')) from dual;

--  case-insensitive matching.
SELECT REGEXP_COUNT ('Elephant', 'e', 1, 'i')
FROM dual;

--  case-sensitive matching.
SELECT REGEXP_COUNT ('Elephant', 'e', 1, 'c')
FROM dual;
 

Output – Using length & Replace

Untitled

Output – Using Regexp_count

4    3


 

Scenario 6:

Input Data –                              Output Required –

Untitled.png output.png

 

 

 

 

 

 

Solution of Scenario 6:

 
select 
city,
Count ( case gender when 'M' then 1 end) as Male_Count,
Count ( case gender when 'F' then 1 end) as Female_Count,
Count(*) as Total_Count
from DATA
group by city;
 

3.png


 

Scenario 7:

See below data table. Update product_name ‘CAR’ with ‘BIKE’ & ‘BIKE’ with ‘CAR’ using single update statement.

Untitled.png

Solution of Scenario 7:

 

-- Using Decode Statement
update DATA
set product_name = decode (product_name,'CAR', 'BIKE','CAR')

-- Using Case Statement
update DATA
set product_name = ( case product_name 
                          when 'CAR' then 'BIKE'
                          when 'BIKE' then 'CAR' END)
 

Scenario 8:

8.1) In ‘abcde12xys2254’ string, replace all numeric data with null.

 
select regexp_replace('abcde12xys2254', '[0-9]', '') from dual;
 

3.png

8.2) In ‘abcde12xys2254’ string, replace all character data with null.

 
select regexp_replace('abcde12xys2254', '[^0-9]', '') from dual;
 

4.png


 

If you have optimized answers for these scenarios, then please comment.

Thanks! Happy Learning! Your feedback would be appreciated!

Stock Market Analysis Using R

Stock Market Analysis Using R

Folks,

In this blog we will learn how to extract & analyze the Stock Market data using R!

Using quantmod package first we will extract the Stock data after that we will create some charts for analysis.

Quantmod – “Quantitative Financial Modeling and Trading Framework for R”!

R Package designed to assist the quantitative trader in the development, testing, and deployment of statistically based trading models. It has many features so check out its link.

Check out this blog for Quantmod getSymol R Shiny App – Link


R Packages Required:-

install.packages("quantmod")

Extracting Stock Market Data–

Functions getSymbols: It load and manage data from Multiple Sources.

getSymbols(“SYMBOL”, src=”SOURCE” , from =”YYYY-MM-DD”, to = “YYYY-MM-DD”)

Some src methods are: yahoo, google, oanda etc.

In this blog we will first extract Bombay Stock Exchange Data using yahoo finance source. Bombay Stock Exchange Index/Symbol – BSESN 

1) Analyze One Month Data of Bombay Stock Exchange- 

library(quantmod)

getSymbols("^BSESN",src="yahoo" , from ="2016-10-23", to = Sys.Date())

View(BSESN)

Here is the BSESN (xts Object) Output Data. Here you can see different columns having data for Open, High, Low, Close, Volume & Adjusted Stock Price.

High refers to the highest price of the stock touched the same day, Low refer to the lowest price the stock was traded on the same day, Close refers to the closing price of that particular stock and the Volume refer to the number of share traded that day.

1.png

Output Charts:- 

chart_Series(BSESN)

As you can see in below chart there was huge dip after 8 Nov 2016, may be this is due to demonetization in India.

3.JPG

2) Analyze One Year Data of Bombay Stock Exchange- 

getSymbols("^BSESN",src="yahoo" , from ="2015-10-23", to = Sys.Date())

chart_Series(BSESN,type = "candlesticks")

Output Chart:-

4.JPG

2) Complete Data of Bombay Stock Exchange– 

It will provide you all data after 2007.

getSymbols("^BSESN",src="yahoo")

Quantmod has some other features. For more details, please visit this Link.


Thanks!

Happy Learning! Your feedback would be appreciated!

Interactive Maps in R: Leaflet

Interactive Maps in R: Leaflet

Folks,

Leaflet is one of the most popular open-source JavaScript libraries for interactive maps.

There are many ways to visualize latitude and longitude data on map using R, such as using ggmaps or RgoogleMaps packages. But these packages generate static maps images only. Leaflet allow users to zoom in or zoom out in a very interactive  way.

In this blog we will learn how to create a interactive map using Leaflet in R & also we will learn how to map and style – latitude and longitude data using R & Leaflet package!

Basics of Leaflet in R – bigdataenthusiast.com/2016/12/12/Leaflet.html


R Packages Required:  

install.packages("leaflet")

1) Creating a map using leaflet –

R Code – 

Code 0.JPG

Here leaflet() initializes the leaflet workspace & addTiles() will bring default OpenStreetMap tiles. OpenStreetMap is a free open-source service to create a free editable map of the world.

Output –

R Viewer (Snapshot)- It allow users to zoom in or zoom out in a very interactive way.

out.JPG

I have published this output on RPubs. Click on below link to see interactive output!

Output Link – rpubs.com/BdataEnthusiast/InteractiveMap

2) Creating a map with single marker –

Suppose user want to mark 28.61293° N, 72.229564° E “India Gate” co-ordinates on Map.

R Code – 

code-1

Here addMarkers – Add markers to the map eg. geo co-ordinates, Popup, link. etc.

Output –

I have published this output on RPubs. Click on below link to see interactive output!

Link – rpubs.com/BdataEnthusiast/InteractiveMap01

Output GIF –

ezgif.com-crop.gif

2) Creating a map with multiple marker –

Suppose user want to mark multiple co-ordinates on Map.

Eg. – Below is R dataframe (india_smart_cities) having  latitude & longitude of 30 proposed smart cities in India.

data.jpg

R Code – 

code multiple.JPG

Output– Click on below link to see interactive output!

Link – rpubs.com/BdataEnthusiast/IndiaSmartCities


If you also share your interactive map outside of the RStudio environment, just click on the Save as Web Page option in Export. It will generate an HTML file.

2


 

Check out this awesome leaflet R Shiny App Blog. This basic R Shiny App allows you to locate your geographic coordinates on Leaflet interactive map.

Check out the Leaflet R Shiny App here shinyapps.io/LeafletShinyR/

leafletshinyr

For more details of R leaflet package, please visit this Link.


Thanks!

Happy Learning! Your feedback would be appreciated!