Extracting Twitter Trends using R Script

Extracting Twitter Trends using R Script


In this blog we will learn how to extract the twitter trends from twitteR Package using R, after that we will learn how to save data into SQL Server using ODBC connection!

R Packages required: 


  • install.packages(“twitteR”): It provides an interface to the Twitter web API. For twitteR App & OAuth access setup – Visit this blog.
  • install.packages(“RODBC”) for OBDC connectivity. For ODBC connection setup for SQL Server Database/Oracle – Visit this blog .

Let’s get started!

twitteR package has getTrends function that can be used to extract the twitter trends based on a input parameter (woeid).

A WOEID (Where On Earth IDentifier) is a unique 32-bit reference identifier, originally defined by GeoPlanet and now assigned by Yahoo!, that identifies any feature on Earth.[Source: Wikipedia]

See below the WOEID for INDIA. You can use this link for WOEID look up.


Here is the Script for extracting the twitter trend & saving data into SQL Server for further analysis.

# R Script Name: GetTrends.R
# R Script Description: This script collect the twitter trend data 
# from twitter API and dump into database.
#-- Importing Required library
library("twitteR", lib.loc="~/R/win-library/3.3")
library("RODBC", lib.loc="~/R/win-library/3.3")

#-- Provide woeid from internet as per your requirement
#-- Below woeid is for INDIA
woeid <-23424848

#-- Fetching Access Keys for Twitter API, which is present in local file 
OAuth_Location <- "C:\\Users\\lenovo\\Desktop\\OAuth.csv"
twitter_OAuth <- read.csv(file=OAuth_Location, header=TRUE, sep=",",colClasses = "character")

  #-- Calling twitteR OAuth function
  setup_twitter_oauth(twitter_OAuth$Consumer_API_Key, twitter_OAuth$Consumer_API_Secret, 
                      twitter_OAuth$Access_Token, twitter_OAuth$Access_Token_Secret)

  #-- Extracting Trends using getTrends Function
  current_trends  <-  getTrends(woeid) 
  current_trends["trend_date"]  <-  Sys.Date()

  #-- Opening OBBC Connection and Saving Trends in Database
  my_conn  <-  odbcConnect("RSQL", uid="***", pwd="***")
  sqlSave(my_conn, current_trends, tablename = "t_current_trends", append = TRUE)

  #-- Closing/Removing unwanted values and data


Output:- IPL Cricket fever in INDIA, that’s why #KKRvSRH is trending on top today  🙂


You can also scheduled this R Script using the windows scheduler, so that you get your trends in scheduled time automatically. For scheduling R script visit this blog.


Happy Learning! Your feedback would be appreciated!


Sentimental Analysis in R

Sentimental Analysis in R


In this blog we will do the sentimental analysis of Trump & Clinton Tweets using R!

We will use Microsoft Cognitive Services (Text Analytics API) in R to calculate sentimental scores of tweets!

Step 1) Twitter Data Extraction

Extract tweets of Trump & Clinton using twitteR Package.

setup_twitter_oauth(Consumer_API_Key, Consumer_API_Secret, Access_Token, Access_Token_Secret)

clinton_tweets = searchTwitter("Hillary Clinton+@HillaryClinton", n=200, lang="en")
trump_tweets = searchTwitter("Donald Trump+@realDonaldTrump", n=200, lang="en")

trump_tweets_df = do.call("rbind", lapply(trump_tweets, as.data.frame))
trump_tweets_df = subset(trump_tweets_df, select = c(text))

clinton_tweets_df = do.call("rbind", lapply(clinton_tweets, as.data.frame))
clinton_tweets_df = subset(clinton_tweets_df, select = c(text))

If you are new to twitteR package, please visit this blog & learn how to setup twitter application & Oauth in R.

Step 2) Cleaning of Tweets

Cleaning both dataframe – trump_tweets_dfclinton_tweets_df.

Below is the just sample code for cleaning text in R.

# Removing blank spaces, punctuation, links, extra spaces, special characters and other unwanted things.
clinton_tweets$text = gsub("[:blank:]", "", clinton_tweets$text)
clinton_tweets$text = gsub("[[:punct:]]", "", clinton_tweets$text)
clinton_tweets$text = gsub("[:cntrl:]", "", clinton_tweets$text)
clinton_tweets$text = gsub("[[:digit:]]", "", clinton_tweets$text)
clinton_tweets$text = gsub("[:blank:]", "", clinton_tweets$text)
clinton_tweets$text = gsub("(RT|via)((?:\\b\\W*@\\w+)+)", " ",  clinton_tweets$text)
clinton_tweets$text = gsub("@\\w+", "", clinton_tweets$text)
clinton_tweets$text = gsub("http\\w+", "", clinton_tweets$text)

# Removing Duplicate tweets
clinton_tweets["DuplicateFlag"] = duplicated(clinton_tweets$text)
clinton_tweets = subset(clinton_tweets, clinton_tweets$DuplicateFlag=="FALSE")
clinton_tweets = subset(clinton_tweets, select = -c(DuplicateFlag))

Here is the snapshot of cleaned data frames trump_tweets_dfclinton_tweets_df.


Step 3) Calculate Sentimental Scores

We will use Microsoft Cognitive Services (Text Analytics API) in R to calculate sentimental scores of tweets. If you are new to Microsoft Cognitive Services, please visit this blog .

Calculating sentimental scores for trump_tweets_df – 


# Creating the request body for Text Analytics API
trump_tweets_df["language"] = "en"
trump_tweets_df["id"] = seq.int(nrow(trump_tweets_df))
request_body_trump = trump_tweets_df[c(2,3,1)]

# Converting tweets dataframe into JSON
request_body_json_trump = toJSON(list(documents = request_body_trump))

# Calling text analytics API
result_trump = POST("https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment",
body = request_body_json_trump,
add_headers(.headers = c("Content-Type"="application/json","Ocp-Apim-Subscription-Key"="my_Subscription-Key")))

Output = content(result_trump)

score_output_trump = data.frame(matrix(unlist(Output), nrow=100, byrow=T))
score_output_trump$X1 =  as.numeric(as.character(score_output_trump$X1))
score_output_trump$X1 = as.numeric(as.character(score_output_trump$X1)) *10
score_output_trump["Candidate"] = "Trump"

Calculating sentimental scores for clinton_tweets_df – 

# Creating the request body for Text Analytics API
clinton_tweets_df["language"] = "en"
clinton_tweets_df["id"] = seq.int(nrow(clinton_tweets_df))
request_body_clinton = clinton_tweets_df[c(2,3,1)]

# Converting tweets dataframe into JSON
request_body_json_clinton = toJSON(list(documents = request_body_clinton))

result_clinton = POST("https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment",
body = request_body_json_clinton,
add_headers(.headers = c("Content-Type"="application/json","Ocp-Apim-Subscription-Key"="my-Subscription-Key")))

Output_clinton = content(result_clinton)

score_output_clinton = data.frame(matrix(unlist(Output_clinton), nrow=100, byrow=T))
score_output_clinton$X1 =  as.numeric(as.character(score_output_clinton$X1))
score_output_clinton$X1 = as.numeric(as.character(score_output_clinton$X1)) *10

score_output_clinton["Candidate"] = "Clinton"

Here is the snapshot of sentimental scores of trump_tweets_dfclinton_tweets_df.

Where X1 is Sentimntal Score & Where X2 is ID of tweets present in dataframe.

Here scores close to 10 indicate positive sentiment, while scores close to 1 indicate negative sentiment


Snapshot –


Step 4) Sentimental Analysis Output

Boxplot for the sentimental scores.

final_score = rbind(score_output_clinton,score_output_trump)


cols = c("#7CAE00", "#00BFC4")
names(cols) = c("Clinton", "Trump")

# boxplot
ggplot(final_score, aes(x=final_score$Candidate, y=X1, group=final_score$Candidate)) +
geom_boxplot(aes(fill=final_score$Candidate)) +
scale_fill_manual(values=cols) +
position=position_jitter(width=0.5), alpha=0.3)

Box Plot –

Here you can see that Trump Median(7.1) > Hillary Median (6.7)


Here scores close to 10 indicate positive sentiment, while scores close to 1 indicate negative sentiment

Summary of Sentimental Scores –



Happy Learning! Your feedback would be appreciated!

Visualization of Tweets on Google Maps

Visualization of Tweets on Google Maps


In this blog we will learn how to visualize tweets on Google Maps using R!

Twitter App required to get started with this. If you don’t have Twitter Application, go to Twitter Developer link and sign in with your credentials, after that go to twitter Apps & click on “Create New App”button for creating new application.


Once you’ve done this, make a note of your Keys & Access Token.

  • Consumer Key (API Key)
  • Consumer Secret (API Secret)
  • Access Token
  • Access Token Secret

R Packages required: 

install.packages(“ggmap”): It allows us to access maps from the Google Maps API.

ggmap package has a function get_map that can download maps from Google Maps API.

install.packages(“twitteR“): It provides an interface to the Twitter web API.

twitteR package has searchTwitter function that can search tweets based on a supplied search string.

Extracting Maps: 

Extraction of India map using get_map using below coordinates. Coordinates extracted using this Blog .


R Commands: 


I have selected the maptype = “terrain“, other maptype eg. satellite & roadmap.

Here is the output of ggmap(india_map).


Extracting Tweets: 

Twitter OAuth Setup:


Extracting 500 tweets having tag @narendramodi within a 1000 km radius of the given latitude/longitude.

20.593684° N, 78.96288° E is of India Location: Extracted using this Blog.


Here is the output of View(my_tweets_df). 500 Observation & 16 Variable.



Excluding rows from data frame where longitude/latitude=NA & taking only last two columns (unique data).


Here is the final output of View(my_tweets_df). 206 Observation & 2 Variable.


Mapping of Tweets & Google Map: 

Converting both columns of my_tweets_df as numeric.


Mapping Map with my_tweets_df .


Here is the output of plot.

Seems like more tweets (having tag @narendramodi) are coming from Mumbai & Delhi.



Happy Learning!