Sentimental Analysis in R

Folks,

In this blog we will do the sentimental analysis of Trump & Clinton Tweets using R!

We will use Microsoft Cognitive Services (Text Analytics API) in R to calculate sentimental scores of tweets!

Step 1) Twitter Data Extraction

Extract tweets of Trump & Clinton using twitteR Package.

library(twitteR)
setup_twitter_oauth(Consumer_API_Key, Consumer_API_Secret, Access_Token, Access_Token_Secret)

clinton_tweets = searchTwitter("Hillary Clinton+@HillaryClinton", n=200, lang="en")
trump_tweets = searchTwitter("Donald Trump+@realDonaldTrump", n=200, lang="en")

trump_tweets_df = do.call("rbind", lapply(trump_tweets, as.data.frame))
trump_tweets_df = subset(trump_tweets_df, select = c(text))

clinton_tweets_df = do.call("rbind", lapply(clinton_tweets, as.data.frame))
clinton_tweets_df = subset(clinton_tweets_df, select = c(text))

If you are new to twitteR package, please visit this blog & learn how to setup twitter application & Oauth in R.

Step 2) Cleaning of Tweets

Cleaning both dataframe – trump_tweets_df & clinton_tweets_df.

Below is the just sample code for cleaning text in R.

# Removing blank spaces, punctuation, links, extra spaces, special characters and other unwanted things.
clinton_tweets$text = gsub("[:blank:]", "", clinton_tweets$text)
clinton_tweets$text = gsub("[[:punct:]]", "", clinton_tweets$text)
clinton_tweets$text = gsub("[:cntrl:]", "", clinton_tweets$text)
clinton_tweets$text = gsub("[[:digit:]]", "", clinton_tweets$text)
clinton_tweets$text = gsub("[:blank:]", "", clinton_tweets$text)
clinton_tweets$text = gsub("(RT|via)((?:\\b\\W*@\\w+)+)", " ",  clinton_tweets$text)
clinton_tweets$text = gsub("@\\w+", "", clinton_tweets$text)
clinton_tweets$text = gsub("http\\w+", "", clinton_tweets$text)

# Removing Duplicate tweets
clinton_tweets["DuplicateFlag"] = duplicated(clinton_tweets$text)
clinton_tweets = subset(clinton_tweets, clinton_tweets$DuplicateFlag=="FALSE")
clinton_tweets = subset(clinton_tweets, select = -c(DuplicateFlag))

Here is the snapshot of cleaned data frames trump_tweets_df & clinton_tweets_df.

Step 3) Calculate Sentimental Scores

We will use Microsoft Cognitive Services (Text Analytics API) in R to calculate sentimental scores of tweets. If you are new to Microsoft Cognitive Services, please visit this blog .

Calculating sentimental scores for trump_tweets_df –

library(jsonlite)
library(httr)

# Creating the request body for Text Analytics API
trump_tweets_df["language"] = "en"
trump_tweets_df["id"] = seq.int(nrow(trump_tweets_df))
request_body_trump = trump_tweets_df[c(2,3,1)]

# Converting tweets dataframe into JSON
request_body_json_trump = toJSON(list(documents = request_body_trump))

# Calling text analytics API
result_trump = POST("https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment",
body = request_body_json_trump,
add_headers(.headers = c("Content-Type"="application/json","Ocp-Apim-Subscription-Key"="my_Subscription-Key")))

Output = content(result_trump)

score_output_trump = data.frame(matrix(unlist(Output), nrow=100, byrow=T))
score_output_trump$X1 =  as.numeric(as.character(score_output_trump$X1))
score_output_trump$X1 = as.numeric(as.character(score_output_trump$X1)) *10
score_output_trump["Candidate"] = "Trump"

Calculating sentimental scores for clinton_tweets_df –


# Creating the request body for Text Analytics API
clinton_tweets_df["language"] = "en"
clinton_tweets_df["id"] = seq.int(nrow(clinton_tweets_df))
request_body_clinton = clinton_tweets_df[c(2,3,1)]

# Converting tweets dataframe into JSON
request_body_json_clinton = toJSON(list(documents = request_body_clinton))

result_clinton = POST("https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment",
body = request_body_json_clinton,
add_headers(.headers = c("Content-Type"="application/json","Ocp-Apim-Subscription-Key"="my-Subscription-Key")))

Output_clinton = content(result_clinton)

score_output_clinton = data.frame(matrix(unlist(Output_clinton), nrow=100, byrow=T))
score_output_clinton$X1 =  as.numeric(as.character(score_output_clinton$X1))
score_output_clinton$X1 = as.numeric(as.character(score_output_clinton$X1)) *10

score_output_clinton["Candidate"] = "Clinton"

Here is the snapshot of sentimental scores of trump_tweets_df & clinton_tweets_df.

Where X1 is Sentimntal Score & Where X2 is ID of tweets present in dataframe.

Here scores close to 10 indicate positive sentiment, while scores close to 1 indicate negative sentiment

Snapshot –

Step 4) Sentimental Analysis Output

Boxplot for the sentimental scores.


final_score = rbind(score_output_clinton,score_output_trump)

library(ggplot2)

cols = c("#7CAE00", "#00BFC4")
names(cols) = c("Clinton", "Trump")

# boxplot
ggplot(final_score, aes(x=final_score$Candidate, y=X1, group=final_score$Candidate)) +
geom_boxplot(aes(fill=final_score$Candidate)) +
scale_fill_manual(values=cols) +
geom_jitter(colour="gray40",
position=position_jitter(width=0.5), alpha=0.3)

Box Plot –

Here you can see that Trump Median(7.1) > Hillary Median (6.7)

Here scores close to 10 indicate positive sentiment, while scores close to 1 indicate negative sentiment

Summary of Sentimental Scores –

Thanks!

Happy Learning! Your feedback would be appreciated!

Follow @shobhitsinghIN

13 thoughts on “Sentimental Analysis in R”

lemur78 says:

November 8, 2016 at 1:31 pm

in Step 2) you’ve missed _df in clinton_tweets -> there is no clinton_tweets$text, there is clinton_tweets_df$text

LikeLiked by 1 person

1. Shobhit says:
  
  November 8, 2016 at 1:58 pm
  
  Yes you are correct, it is because i have provided only reference cleaning code in my blog. Thanks!
  
  LikeLike
  
  1. amina bahri says:
    
    March 6, 2019 at 9:56 pm
    
    i got thiserror : Error in tweets1[“DuplicateFlag”] = duplicated(tweets1$text) :
    replacement has length zero how can i correctit??
    
    LikeLike
Saurabh Jain says:

November 9, 2016 at 12:22 pm

HI ,

I got below mentioned error while executing below-mentioned line …Secondly in your code u have mentioned removal of duplicates for clinton not for trump ??

LIne NO
clinton_tweets = subset(clinton_tweets, select = -c(DuplicateFlag))

Details of Error
Error in subset.default(clinton_tweets, select = c(DuplicateFlag)) :
argument “subset” is missing, with no default

LikeLike

1. Shobhit says:
  
  November 20, 2016 at 11:47 am
  
  Yes i have provided only sample code for cleaning data. For both data frames Clinton & trump we have to perform similar cleaning process.
  
  Regarding error – please check your dataframe & provide column names accordingly.
  
  LikeLike
  
Pingback: How to call Cognitive Services APIs with R – Mubashir Qasim
Pingback: How to call Cognitive Services APIs with R | A bunch of data
Parag Verma says:

November 25, 2016 at 5:37 pm

Hi Shobhit,
I am trying to do twitter data mining in R.However, when I am trying to connect using access keys, I am getting the OAuth authentication error.Please help.I can also sharing the code I am using

library(twitteR)
Consumer_API_Key<-'dACKFLWwCVF8Y9Ji7'
Consumer_API_Secret<-'YEBmYl7jQNU509Wndqv0tDNwN7xW8Tjmk'
Access_Token<-'113567783-Kuy7T3cQSgUNioyycKkwLskNh'
Access_Token_Secret<-'VI3ROoOy7bS47Njn3IbSy8PfTqNFUXV'

setup_twitter_oauth(Consumer_API_Key, Consumer_API_Secret, Access_Token, Access_Token_Secret)

LikeLike

Pingback: The Revolutions blog calls Cognitive APIs | The Information Age
Saurabh says:

December 19, 2016 at 8:05 am

Hi , small question … why u have not use corpus function for data cleansing ??

LikeLike

Akram Sayadi says:

February 23, 2017 at 3:53 pm

HI
https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment this api return
{
“statusCode”: 404,
“message”: “Resource not found”
}

LikeLike

1. khairul amri says:
  
  February 19, 2018 at 7:33 am
  
  the url depends on your subscription to microsoft coginitive api. you also need to append key to get the result
  
  LikeLike
  
Pingback: Power BI, R e Microsoft Cognitive Services - Análise de Sentimento no Twitter