Sentimental Analysis in R

Sentimental Analysis in R

Folks,

In this blog we will do the sentimental analysis of Trump & Clinton Tweets using R!

We will use Microsoft Cognitive Services (Text Analytics API) in R to calculate sentimental scores of tweets!


Step 1) Twitter Data Extraction

Extract tweets of Trump & Clinton using twitteR Package.

library(twitteR)
setup_twitter_oauth(Consumer_API_Key, Consumer_API_Secret, Access_Token, Access_Token_Secret)

clinton_tweets = searchTwitter("Hillary Clinton+@HillaryClinton", n=200, lang="en")
trump_tweets = searchTwitter("Donald Trump+@realDonaldTrump", n=200, lang="en")

trump_tweets_df = do.call("rbind", lapply(trump_tweets, as.data.frame))
trump_tweets_df = subset(trump_tweets_df, select = c(text))

clinton_tweets_df = do.call("rbind", lapply(clinton_tweets, as.data.frame))
clinton_tweets_df = subset(clinton_tweets_df, select = c(text))

If you are new to twitteR package, please visit this blog & learn how to setup twitter application & Oauth in R.

Step 2) Cleaning of Tweets

Cleaning both dataframe – trump_tweets_dfclinton_tweets_df.

Below is the just sample code for cleaning text in R.

# Removing blank spaces, punctuation, links, extra spaces, special characters and other unwanted things.
clinton_tweets$text = gsub("[:blank:]", "", clinton_tweets$text)
clinton_tweets$text = gsub("[[:punct:]]", "", clinton_tweets$text)
clinton_tweets$text = gsub("[:cntrl:]", "", clinton_tweets$text)
clinton_tweets$text = gsub("[[:digit:]]", "", clinton_tweets$text)
clinton_tweets$text = gsub("[:blank:]", "", clinton_tweets$text)
clinton_tweets$text = gsub("(RT|via)((?:\\b\\W*@\\w+)+)", " ",  clinton_tweets$text)
clinton_tweets$text = gsub("@\\w+", "", clinton_tweets$text)
clinton_tweets$text = gsub("http\\w+", "", clinton_tweets$text)

# Removing Duplicate tweets
clinton_tweets["DuplicateFlag"] = duplicated(clinton_tweets$text)
clinton_tweets = subset(clinton_tweets, clinton_tweets$DuplicateFlag=="FALSE")
clinton_tweets = subset(clinton_tweets, select = -c(DuplicateFlag))

Here is the snapshot of cleaned data frames trump_tweets_dfclinton_tweets_df.

data.png

Step 3) Calculate Sentimental Scores

We will use Microsoft Cognitive Services (Text Analytics API) in R to calculate sentimental scores of tweets. If you are new to Microsoft Cognitive Services, please visit this blog .

Calculating sentimental scores for trump_tweets_df – 

library(jsonlite)
library(httr)

# Creating the request body for Text Analytics API
trump_tweets_df["language"] = "en"
trump_tweets_df["id"] = seq.int(nrow(trump_tweets_df))
request_body_trump = trump_tweets_df[c(2,3,1)]

# Converting tweets dataframe into JSON
request_body_json_trump = toJSON(list(documents = request_body_trump))

# Calling text analytics API
result_trump = POST("https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment",
body = request_body_json_trump,
add_headers(.headers = c("Content-Type"="application/json","Ocp-Apim-Subscription-Key"="my_Subscription-Key")))

Output = content(result_trump)

score_output_trump = data.frame(matrix(unlist(Output), nrow=100, byrow=T))
score_output_trump$X1 =  as.numeric(as.character(score_output_trump$X1))
score_output_trump$X1 = as.numeric(as.character(score_output_trump$X1)) *10
score_output_trump["Candidate"] = "Trump"

Calculating sentimental scores for clinton_tweets_df – 


# Creating the request body for Text Analytics API
clinton_tweets_df["language"] = "en"
clinton_tweets_df["id"] = seq.int(nrow(clinton_tweets_df))
request_body_clinton = clinton_tweets_df[c(2,3,1)]

# Converting tweets dataframe into JSON
request_body_json_clinton = toJSON(list(documents = request_body_clinton))

result_clinton = POST("https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment",
body = request_body_json_clinton,
add_headers(.headers = c("Content-Type"="application/json","Ocp-Apim-Subscription-Key"="my-Subscription-Key")))

Output_clinton = content(result_clinton)

score_output_clinton = data.frame(matrix(unlist(Output_clinton), nrow=100, byrow=T))
score_output_clinton$X1 =  as.numeric(as.character(score_output_clinton$X1))
score_output_clinton$X1 = as.numeric(as.character(score_output_clinton$X1)) *10

score_output_clinton["Candidate"] = "Clinton"

Here is the snapshot of sentimental scores of trump_tweets_dfclinton_tweets_df.

Where X1 is Sentimntal Score & Where X2 is ID of tweets present in dataframe.

Here scores close to 10 indicate positive sentiment, while scores close to 1 indicate negative sentiment

scores.png

Snapshot –

sample.png

Step 4) Sentimental Analysis Output

Boxplot for the sentimental scores.


final_score = rbind(score_output_clinton,score_output_trump)

library(ggplot2)

cols = c("#7CAE00", "#00BFC4")
names(cols) = c("Clinton", "Trump")

# boxplot
ggplot(final_score, aes(x=final_score$Candidate, y=X1, group=final_score$Candidate)) +
geom_boxplot(aes(fill=final_score$Candidate)) +
scale_fill_manual(values=cols) +
geom_jitter(colour="gray40",
position=position_jitter(width=0.5), alpha=0.3)

Box Plot –

Here you can see that Trump Median(7.1) > Hillary Median (6.7)

score.png

Here scores close to 10 indicate positive sentiment, while scores close to 1 indicate negative sentiment

Summary of Sentimental Scores –

mean.png


Thanks!

Happy Learning! Your feedback would be appreciated!

Advertisements

Microsoft Cognitive Services (Text Analytics API) in R

Microsoft Cognitive Services (Text Analytics API) in R

Folks,

In this blog we will explore Microsoft Cognitive Services (Text Analytics API) in R!

This API can detect sentiment, key phrases, topics, and language from your text.

Click here & Register for the free subscription of Microsoft Cognitive Services (Text Analytics).

Here is my free subscription. Free 5,000 transactions per month.

free-subs

After registering please copy the Subscription Key. This is the Subscription Key which provides access to this API.


Detect Sentiments 

Request URL:

https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment

Request Headers:

Content-Type (optional): Media type of the body sent to the API.
Ocp-Apim-Subscription-Key: Subscription key which provides access to this API.

Request Body

{
  "documents": [
    {
      "language": "string",
      "id": "string",
      "text": "string"
    }
  ]
}
R Commands & Output:
R Packages required:httr & jsonlite.

# Below is the Request body for the API having text id 1 = Negative sentiments, id 2 = Positive sentiments

request_body <- data.frame(
language = c("en","en"),
id = c("1","2"),
text = c("This is wasted! I'm angry","This is awesome! Good Job Team! appreciated")
)

# Converting the Request body(Dataframe) to Request body(JSON)

request_body_json <- toJSON(list(documents = request_body), auto_unbox = TRUE)

# Below we are calling API (Adding Request headers using add_headers)

result <- POST("https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment",
body = request_body_json,
add_headers(.headers = c("Content-Type"="application/json","Ocp-Apim-Subscription-Key"="my_subscrition_key")))
Output <- content(result)

# Show Output
Output
Output Score:-
id - "1" score - 0.2324503

id "2" score - 0.9998128
Where scores close to 1 indicate positive sentiment, while scores close to 0 indicate negative sentiment.
score.png

Detect Language

Request URL:

https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/languages[?numberOfLanguagesToDetect]

Request Parameters:

numberOfLanguagesToDetect - (Optional) Number of languages to detect. Set to 1 by default.

Request Headers:

Content-Type (optional): Media type of the body sent to the API.
Ocp-Apim-Subscription-Key: Subscription key which provides access to this API.

Request Body:

{
  "documents": [
    {
      "id": "string",
      "text": "string"
    }
  ]
}

R Commands & Output:

R Packages required:httr & jsonlite.

# Below is the Request body for the API
request_body <- data.frame(
id = "1",
text = "भारतीय धर्म में निर्वाण मुक्ति है",
stringsAsFactors = FALSE
)

# Converting the Request body(Dataframe) to Request body(JSON)
request_body_json <- toJSON(list(documents = request_body), auto_unbox = TRUE)

# Below we are calling API (Adding Request headers using add_headers)
# Here parameter numberOfLanguagesToDetect=1

result <- POST("https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/languages?numberOfLanguagesToDetect=1",
body = request_body_json ,
add_headers(.headers = c("Content-Type"="application/json","Ocp-Apim-Subscription-Key"="my_subscription_key"))
)
Output <- content(result)
Output
Output Detected Language :-
name - "Hindi"
iso6391Name - "hi"
score - 1
Where scores close to 1 indicate 100% certainty that the identified language is true.
2
Output
You can set numberOfLanguagesToDetect & text as per your requirement. API can detect multiple languages also, see below example where  numberOfLanguagesToDetect = 2 .

# Below is the Request body for the API
request_body <- data.frame(
id = "1",
text = "Nirvana is most commonly associated with Buddhism भारतीय धर्म में निर्वाण मुक्ति है",
stringsAsFactors = FALSE
)

request_body_json <- toJSON(list(documents = request_body), auto_unbox = TRUE)

result <- POST("https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/languages?numberOfLanguagesToDetect=2",
body = request_body_json ,
add_headers(.headers = c("Content-Type"="application/json","Ocp-Apim-Subscription-Key"="my_subscription_key"))
)
Output <- content(result)
Output
3.png

For more details & API’s, please visit this Link.

You can also use this R Package for Microsoft Cognitive Services (Text Analytics API)


Thanks!

Happy Learning! Your feedback would be appreciated!