Press "Enter" to skip to content

Personality prediction using twitter data

Use of social media is increasing rapidly. Various information are

shared widely through social media, i.e. Facebook, Twitter, etc. Informa-

Haven't found the right essay?
Get an expert to write you the one you need
GET YOUR PAPER NOW

tion about users and what they expressed through status updates are such

important assets for research in the field of behavioral learning and human

personality. Similar research have been conducted in this field and it grows

continually. This study attempts to build a system that can predict a person’s

personality based on person’s social media information. Personality model

used in this project is Big Five Model Personality. While other previous

researches used older machine learning algorithm in building their models”,

this project tries to implement some deep learning architectures to see the

comparison by doing comprehensive analysis method through the accuracy

result.1.1

Overview

Social media is a place where users represent themselves to the world.

Social media account is private and personal so it can reflect their personal

lives. Activities in social media such as posting, commenting and updating

status can reveal personal information. Text posted by users can be ana-

lyzed to extract information about them. In this project we have predicted

personality of the user using this data.

Social Media websites are now the most popular destination for

Internet users, providing social scientists with a great opportunity to un-

derstand online behaviour. There are a growing number of research papers

related to social media, a small number of which focus on personality pre-

diction. In this project we have typically focused on the Big Five traits of

personality. This project determines social personality traits based on user’s

Twitter feed.

1.2

Motivation

We have chosen to work with twitter since we feel it is a better ap-

proximation of public sentiment as opposed to conventional internet articles

and web blogs. The reason is that the amount of relevant data is much larger

for twitter, as compared to traditional blogging sites. Moreover the response

on twitter is more prompt and also more general (since the number of users

who tweet is substantially more than those who write web blogs on a daily

basis).

Sentiment analysis of public is highly critical in macro-scale so-

cioeconomic phenomena like predicting the stock market rate of a particular

firm. This could be done by analysing overall public sentiment towards that

firm with respect to time and using economics tools for finding the correla-

tion between public sentiment and the firm’s stock market value.

PICT, Department of Computer Engineering 2018-19

Page 2Personality Prediction System using Twitter Data

1.3

1.3.1

Problem Definition and Objectives

Problem Definition

User’s personality can be accurately predicted using Big Five Person-

ality model through the users’ Twitter feed available in their Twitter profiles.

We have proposed a system to use this available data and build a software

application to predict personality of any person who has a twitter account.

1.3.2

Objectives

Objectives of our project are to-

• Correctly predict users’ personality.

• Analyse and classify personalities of a given set of people or an indi-

vidual using data mining and machine learning concepts.

1.4

1.4.1

Project Scope & Limitations

Scope

• The practical implementation of the results can be in the field of infor-

mation retrieval, content selection, product positioning and psycholog-

ical assessment of the user.

• It is an attempt towards ‘people profiling’ with the help of tag words”,

emotion-icons that can be a very useful tool.

1.4.2

Limitations

• It can have problems recognizing things like sarcasm and irony, nega-

tions, jokes, and exaggerations – the sorts of things a person would have

little trouble identifying. And failing to recognize these can skew the

results.

• Opinions, sentiment is inherently subjective from person to person”,

and can even be outright irrational. It’s critical to mine a large — and

relevant — sample of data when attempting to measure sentiment.

PICT, Department of Computer Engineering 2018-19

Page 3CHAPTER 2

LITERATURE SURVEYPersonality Prediction System using Twitter Data

2.1

Personality Classification using Tweeter

Tweets

In The Personality Classification using Tweeter Tweets[1], Nadeem

Ahmed Jawaid Siddique, in its primary context, exhibited the psychological

profiling of users on the basis of the data set. This profiling can be a very

useful tool for career progression, job satisfaction and setting preferences in

different interfaces. It is proposed in a way in which the user’s personal-

ity can be predicted through information mapping available to the public

on their personal Twitter using DISC (Dominance, Influence, Compliance”,

Steadiness) assessment.

The study[1] did broader, larger and more relevant personality as-

sessment than the media and polls have. The twitter sentiment/texts are

expanded by classifying tweets into four categories Dominance, Influence”,

Submission, and Compliance (DISC). DISC has proved predictive validity

and compatibility with earlier knowledge (Social Sciences Marketing) with

understandable dimensions. Text mining and sentiment analysis were per-

formed for each user based on his/her recent tweets.

2.2

Personality Prediction System from Face-

book Users

In Personality Prediction System from Facebook Users[2], Tommy

Tandera, Hendro, Derwin Suhartono, Rini Wongso, and Yen Lina Prasetio

attempted to build a system that can predict a person’s personality based

on Facebook user information. Personality model used in this research is

Big Five Model Personality. They tried to implement some deep learning

architectures to see the comparison by doing comprehensive analysis method

through the accuracy result. The results succeeded to outperform the accu-

racy of previous similar research with the average accuracy of 74.17 percent[2]

The dataset used in this study is divided into two parts. The first

dataset obtained from my Personality which consists of 250 data of Facebook

users with approximately 10″,000 statuses with given personality label based

Other essay:   How to let go of being a ’good’ person — and become a better person

on the Big Five Personality Traits model.The second dataset is the statuses

of 150 Facebook users which are collected manually.

Linguistic features such as LIWC and SPLICE.In this research all ma-

chine learning algorithms were used which were differentiated and compared

to deep learning algorithm

PICT, Department of Computer Engineering 2018-19

Page 5Personality Prediction System using Twitter Data

2.3

Personality and patterns of Facebook us-

age

In Personality and patterns of Facebook usage[3], Bachrach Y, Kosin-

ski M, Graepel T, Kohli P, Stillwell D. showed how users’ activity on Face-

book relates to their personality, as measured by the standard Five Factor

Model. It showed significant relationships between personality traits and

various features of Facebook profiles. It is then showed how multivariate re-

gression allows prediction of the personality traits of an individual user given

their Facebook profile. The best accuracy of such predictions is achieved for

Extraversion and Neuroticism, the lowest accuracy is obtained for Agreeable-

ness”,with Openness and Conscientiousness lying in the middle.

2.4

How well do your Facebook status up-

dates express your personality?

In this study[6] Farnadi G, Zoghbi S, Moens M, De Cock M. have

contributed to this effort by exploring the use of machine learning(ML) tech-

niques to automatically infer users’ personality traits based on their Facebook

status updates(i.e., text messages to communicate with friends).

In this a binary classifier is trained for each trait that separates the

users displaying the trait from those who do not. A variety of features are

used as input for the classifiers:(1) features related to the text of statuses (e.g.”,

vocabulary and writing style), (2) features about the user’s social network

(e.g., network size and density) and (3)temporal factors (e.g., frequency of

updating status).

PICT, Department of Computer Engineering 2018-19

Page 6CHAPTER 3

SOFTWARE REQUIREMENTS

SPECIFICATIONPersonality Prediction System using Twitter Data

3.1

Assumptions and Dependencies

• Probability of occurrence of one word will not be affected by presence

or absence of another word in the text.

• Data provided is subjective.

• Prediction is done on the basis of information provided in the current

tweets of the individuals and not from any past information.

3.2

3.2.1

Functional Requirements

System Feature 1

• The classifier algorithm should be chosen such that the accuracy of

predicting the behaviour of the user should be high as compared to

others.

• It should be able to work with large as well as small datasets.

3.2.2

System Feature 2

The classification algorithm should be trained in the early stages of

developement.

3.2.3

System Feature 3

The classification algorithm should be tested in the early stages of

developement.

3.2.4

System Feature 4

Preprocessing of tweets like removing URLs, tagging emoticons, tok-

enization, removing stopwords.

PICT, Department of Computer Engineering 2018-19

Page 8Personality Prediction System using Twitter Data

3.3

3.3.1

External Interface Requirements

User Interfaces

User interface consists of a web application. The main window will

consist of the main search bar and a search button which when clicked will

retrieve the tweets of the person whose username was written.

The interface will visualize the features in the form of a graph.

1. Push button for users tweet retrieval and behaviour prediction

2. Visual graphs to show results.

3. Help button

3.3.2

Software Interfaces

1. Operating System: Anyone among MAC OS, Linux or Windows

2. Jupyter Notebook

3. Data set in the text form

4. Programming Language: Python

3.3.3

Communication Interfaces

System will communicate with the help of Flask frame work.

PICT, Department of Computer Engineering 2018-19

Page 9Personality Prediction System using Twitter Data

3.4

3.4.1

Nonfunctional Requirements

Performance Requirements

As for this prototype version we will keep on detecting if the system

crashed, hanged or an operating system error occurred. Also detecting the

performance of the system in terms of the efficiency of integration of the

different components

3.4.2

Safety Requirements

For the safety requirements nothing but an operation of weekly back-

ups for the data base should take place.

3.4.3

Security Requirements

There are no specific security requirements, anyone can access and

use the portal.

3.4.4

Software Quality Attributes

1. The solution should provide reliability to the user that the product will

run with all the features mentioned in this document are available and

executing perfectly. It should be tested and debugged completely. All

exceptions should be well handled

2. The solution should be able to reach the desired level of accuracy. But

also keeping in mind that this prototype version is for proving the

concept of the project.

3.5 System Requirements

3.5.1 Database Requirements

Database in textual file(.csv) format is required.

3.5.2

Software Requirements

1. Twitter API : Twython API

2. NLTK Scikit libraries

PICT, Department of Computer Engineering 2018-19

Page 10Personality Prediction System using Twitter Data

3. Jupyter Notebook

4. Ubuntu OS

3.5.3

Hardware Requirements

i3 core processor

8GB RAM

3.6

Analysis Models: SDLC Model to be ap-

plied

Figure 3.1: SDLC Model

PICT, Department of Computer Engineering 2018-19

Page 11CHAPTER 4

SYSTEM DESIGNPersonality Prediction System using Twitter Data

4.1

System Architecture

Figure 4.1: System Architecture

PICT, Department of Computer Engineering 2018-19

Page 13Personality Prediction System using Twitter Data

4.2

Mathematical Model

Let S be a system S={I”,O”,F”,DB”,Twpi } Here”,

I= Input Data

O= Output Data

F= Naive Bayes Classifier

DB= Database

Twpi= Twitter API

n

o

I= U”,T

U=User Details T=Tweets

n

O= O”,C”,E”,A”,N

o

O = Openness

C = Conscientiousness

E = Extraversion

A = Agreeableness

N = Neuroticism

Functions

f u → T wpi

g Twpi → T

h T”,Preprocessing → t P reprocessedT weets

Description Mapping

User Authentication one-to-one

Tweets Retrival one-to-many

j t”,F → O

one-one

one eto many

Table 4.1: Table to test captions and labels

PICT, Department of Computer Engineering 2018-19

Page 14Personality Prediction System using Twitter Data

4.3

4.3.1

Data Flow Diagrams

DFD Level 0

Figure 4.2: DFD Level 0

4.3.2

DFD Level 1

Figure 4.3: DFD Level 1

PICT, Department of Computer Engineering 2018-19

Page 15Personality Prediction System using Twitter Data

4.4

UML Diagrams

Figure 4.4: Activity Diagram

PICT, Department of Computer Engineering 2018-19

Page 16Personality Prediction System using Twitter Data

4.5

Sequence Diagram

Figure 4.5: Sequence Diagram

PICT, Department of Computer Engineering 2018-19

Page 17CHAPTER 5

PROJECT PLANPersonality Prediction System using Twitter Data

5.1

Project Estimate

5.1.1 Reconciled Estimates

5.1.1.1 Cost Estimates

1. Real-time data was collected by us from Twitter.

2. Required software was available for free.

3. Required hardware was available with us.

5.1.1.2

Time Estimates

1. Time estimation of project is around 5-6 months.

5.2

5.2.1

Project Resources

Human Resources

Internal Guide: Prof. P. P. Joshi

Group Members: Kanchan Khadse

Navneet Iyer

Unnati Kolhe

Shreyas Umare

5.2.2

Software Resources

1. Jupyter Notebook

2. Netbeans 8.1

3. Sublime Text version 3

Other essay:   Personality

5.2.3

Hardware Resources

1. 8 GB RAM

2. Core i3 processor.

PICT, Department of Computer Engineering 2018-19

Page 19Personality Prediction System using Twitter Data

5.3

5.3.1

Risk Management

Risk Identification

For risks identification, review of scope document, requirements specifications

and schedule is done. Answers to questionnaire revealed some risks. We can

refer following risk identification questionnaire.

1. Are requirements fully understood by the software engineering team

and its customers?

2. Have customers been involved fully in the definition of requirements?

3. Do end-users have realistic expectations?

4. Are end-users enthusiastically committed to the project and the sys-

tem/product to be built?

5. Does the software engineering team have the right mix of skills?

6. Is the number of people on the project team adequate to do the job?

7. Do all customer/user constituencies agree on the importance of the

project and on the requirements for the system/product to be built?

5.3.2

Risk Analysis

Sr No.

1

2

3

4

5

6

7

8

9

Risk 1

POINT

Description

Risk ID

1

Risk Description Images in Tweets

Category

RunTime Environment

Source

Input Handler

Probability

Medium

Impact

High

Response

By-passed

Strategy

Code Modification

Status

Occurred

PICT, Department of Computer Engineering 2018-19

Page 20Personality Prediction System using Twitter Data

Sr No.

1

2

3

4

5

6

7

8

9

Risk 2

POINT

Description

Risk ID

2

Risk Description Videos In tweets

Category

RunTime Environment

Source

Input Handler

Probability

Medium

Impact

High

Response

By-passed

Strategy

Code Modification

Status

Occurred

PICT, Department of Computer Engineering 2018-19

Page 21Personality Prediction System using Twitter Data

5.4

5.4.1

Project Schedule

Project Task Set

Major Tasks in the Project stages are:

• Task 1: Creating an interface for user to enter the twitter handle and

show the desired result.

• Task 2: Training the data sets for classification

• Task 3: Receiving user’s tweets and classifying it based on the knowl-

edge learned by the classifier.

• Task 4: Calculation of amount of presence of each type of personality;.

• Task 5: Visualization of results.

5.4.2

Task Network

Figure 5.1: Task Network

PICT, Department of Computer Engineering 2018-19

Page 22Personality Prediction System using Twitter Data

5.4.3

Timeline Chart

Figure 5.2: Gantt Chart

5.5

5.5.1

Team Organization

Team structure

The team structure for project is identified and roles are defined.

• Guide: To mentor the process of the project throughout the develope-

ment span.

• External Guide: To review the work done and provide suggestions and

improvements required.

PICT, Department of Computer Engineering 2018-19

Page 23Personality Prediction System using Twitter Data

• Mentor: To help in problem statement finalization and dataset arrange-

ment.

• Project Members:

1. Team member 1(Navneet Iyer)

2. Team member 2(Kanchan Khadse)

3. Team member 3(Unnati Kolhe)

4. Team member 4(Shreyas Umare)

5.5.2

Management reporting and communication

Timetable was put up and sufficient time was provided for:

1. Deciding the domain.

2. Designing the final problem statement with the help of mentor.

3. Submitting group details and problem statement.

4. Submission of abstract of the project.

5. Submission of Synopsis and Mathematical Model.

6. Literature survey.

7. Review 1 with the guide and experts.

8. Implementation of some part of the project.

9. Review 2 with guide and experts.

10. Submission of project lab assignments including UML diagrams.

11. Updating project workbook regularly.

12. Submission of preliminary project report.

13. Review 3 with the guide and experts.

14. Implementation of the project.

15. Review 4 with the guide and experts. ‘

16. Submission of Project Report

PICT, Department of Computer Engineering 2018-19

Page 24CHAPTER 6

PROJECT IMPLEMENTATIONPersonality Prediction System using Twitter Data

6.1

Overview of Project Modules

• Web module

It provides the entry point for the user. The user can enter the twitter

handle of a desired person in the text-field box given.

• Training module

System must be trained in order to accompolish classification task.

For training real-time dataset ie., tweets were used where each trans-

action entry was provided correct categorical label. System was made

to classify based on initial knowledge. Entire training process was au-

tomated. End result of training phase is learned terms and knowledge

which system gained during training process.

• Deployment module

The deployment module aims at doing real-time analysis of tweets.

Web module was interconnected with the deployment module.

6.2

Tools and Technologies Used

• Tools

1. Editing Tools

Jupyter Notebook was used for writing and editing the code.

2. Twython

pure Python wrapper for the Twitter API was used to access

tweets.

3. Web Development Tools

For web module design(HTML”,CSS) Netbeans IDE was used.

• Technologies

1. Client side technologies

HTML and CSS was used to design the interface and webpages.

2. Server side technologies

For server side of interface element, JavaScript and FLASK frame-

work was used to connect the back-end and the front-end.

PICT, Department of Computer Engineering 2018-19

Page 26Personality Prediction System using Twitter Data

3. Programming technologies

(a) Training and testing deployment module

’Python’ programming language is used for implementation.

(b) Server side web module technology

For server side of interface element, JavaScript is used.

6.3

6.3.1

Algorithm Details

Bayes’ theorem

Bayes Theorem is named after Thomas Bayes (1701–1761). In gen-

eral Bayes Theorem describes the probability of an event, based on prior

knowledge of conditions be related of conditions to the event. So it basically

fits perfectly for machine learning, because that is exactly what machine

learning does: making predictions for the future based on prior experience.

Mathematically you can write the Bayes theorem as following:

6.3.2

Gaussian Naive Bayes

When dealing with continuous data, a typical assumption is that the

continuous values associated with each class are distributed according to a

Gaussian distribution. The Gaussian Naive Bayes is one classifier model. Be-

side the Gaussian Naive Bayes there are also existing the Multinomial naive

Bayes and the Bernoulli naive Bayes. I picked the Gaussian Naive Bayes

because it is the simplest and the most popular one.

In Gaussian Naive Bayes, continuous values associated with each fea-

ture are assumed to be distributed according to a Gaussian distribution. A

Gaussian distribution is also called Normal distribution. When plotted, it

PICT, Department of Computer Engineering 2018-19

Page 27Personality Prediction System using Twitter Data

gives a bell shaped curve which is symmetric about the mean of the feature

values as shown below:

The likelihood of the features is assumed to be Gaussian, hence, conditional

probability is given by:

• In spite of their apparently over-simplified assumptions, naive Bayes

classifiers have worked quite well in many real-world situations, fa-

mously document classification and spam filtering. They require a small

amount of training data to estimate the necessary parameters.

Other essay:   What i have learned & how my placement year has carved me into a stronger person.

PICT, Department of Computer Engineering 2018-19

Page 28Personality Prediction System using Twitter Data

• Naive Bayes learners and classifiers can be extremely fast compared to

more sophisticated methods. The decoupling of the class conditional

feature distributions means that each distribution can be independently

estimated as a one dimensional distribution. This in turn helps to

alleviate problems stemming from the curse of dimensionality.

PICT, Department of Computer Engineering 2018-19

Page 29CHAPTER 7

SOFTWARE TESTINGPersonality Prediction System using Twitter Data

7.1

Goals of Testing

• Unit Testing

Unit testing is performed for testing modules against detailed

design. Inputs to the are usually compiled modules from the coding

process. Each modules are assembled into a larger unit during the unit

testing process.Testing has been performed on each phase of project

design and coding. We carry out the testing of module interface to

ensure the proper flow of information into and out of the program unit

while testing. We make sure that the temporarily stored data maintains

its integrity throughout the algorithm’s execution by examining the

local data structure. Finally, all error-handling paths are also tested.

• System Testing

We usually perform system testing to find errors resulting from

unanticipated interaction between the sub-system and system compo-

nents. Software must be tested to detect and rectify all possible errors

once the source code is generated before delivering it to the customers.

For finding errors, series of test cases must be developed which ul-

timately uncover all the possibly existing errors. Different software

techniques can be used for this process. These techniques provide sys-

tematic guidance for designing test that

– Exercise the internal logic of the software components.

– Exercise the input and output domains of a program to uncover

errors in program function, behavior and performance.

We test the software using two methods:

– White Box testing: Internal program logic is exercised using this

test case design techniques.

– Black Box testing: Software requirements are exercised using this

test case design techniques.

Both techniques help in finding maximum number of errors with min-

imal effort and time.

7.2

Test Objective

Test Objective is the overall goal and achievement of the test exe-

cution. The objective of the testing is finding as many software defects as

PICT, Department of Computer Engineering 2018-19

Page 31Personality Prediction System using Twitter Data

possible; ensure that the software under test is bug free before release. To

define the test objectives, you should do 2 following steps

1. List all the software features (functionality, performance, GUI. . . ) which

may need to test.

2. Define the target or the goal of the test based on above features

7.3

Testing Strategy

Test Strategy is a critical step in making a Test Plan. A Test Strat-

egy document, is a high-level document, which is usually developed by Test

Manager. This document defines:

The project’s testing objectives and the means to achieve them De-

termines testing effort and costs.

7.4

Test cases & Test Results

• Enter the twitter user handle.

• Click on Search

• Finally the output will be a Pie-chart as you can see in the screenshots

below

This is the first test case in which we enter the twitter handle @jk r owling.

PICT, Department of Computer Engineering 2018-19

Page 32Personality Prediction System using Twitter Data

Figure 7.1: Input Page

We get the output as follows:

Figure 7.2: Output Page

This is the final output which tells us the percentage of the big five personality

traits:

PICT, Department of Computer Engineering 2018-19

Page 33Personality Prediction System using Twitter Data

Figure 7.3: Pie-Chart

PICT, Department of Computer Engineering 2018-19

Page 34CHAPTER 8

RESULTSPersonality Prediction System using Twitter Data

8.1

Outcomes

This is how the output is going to look like for the twitter handle of Barack

Obama

Figure 8.1: Input

Figure 8.2: Pie-Chart

This is how the output is going to look like for the twitter handle of Kate

Perry

PICT, Department of Computer Engineering 2018-19

Page 36Personality Prediction System using Twitter Data

Figure 8.3: Input

from 2019-04-10 01-36-52.png

Figure 8.4: Pie-Chart

PICT, Department of Computer Engineering 2018-19

Page 37CHAPTER 9

CONCLUSIONSPersonality Prediction System using Twitter Data

9.1

Conclusions

In this project, we have shown that a users’ Big Five personality

traits can be predicted from the public information they share on Twitter.

Our subjects completed a personality test and through the Twitter API, we

collected publicly accessible information from their profiles. After processing

this data, we found many small correlations in the data. Using the profile

data as a feature set, we were able to train one machine learning algorithms

– Gaussian Naive Bayes Processes – to predict scores on each of the five

personality traits. The second insight is that user personality can be easily

and effectively predicted from public data, and that suggests future directions

in a variety of areas, including : 1) Marketing: Since there is a relationship

between marketing strategies and consumer personality, one could select ads

to which a user is likely to be most receptive; 2) User Interface Design:

One could match not just content but also the basic “look and feel” of a

social media site to personality traits and 3) Recommender Systems: Given

the well-established relationship between personality and music taste, music

recommender systems might improve their predictions by also considering

user personality.

PICT, Department of Computer Engineering 2018-19

Page 39Personality Prediction System using Twitter Data

9.2

Future Work

Future developments of this study may utilize a larger training and

testing data set, which will allow the system to immerse itself in a wider va-

riety of tweets. Improving n-gram normalization functions may also increase

the system’s accuracy since it allows the system to recognize and assess more

words.

9.3

Applications

• Thousands of text documents can be processed for sentiment in seconds”,

compared to the hours it would take a team of people to manually

complete

• With the sentiment data about your established and the new products”,

it’s easier to estimate your customer retention rate.

• Sentimental analysis can also be used to receive feedback from the

employees of the company and analyze their emotions and attitude

towards their job. And to determine whether they are satisfied with

their job or not.

• To forecast market movement based on news, blogs and social media

sentiment

• To identify the clients with negative sentiment in social media or news

and to increase the margin for transactions with them for default pro-

tection

Be First to Comment

Leave a Reply

Your email address will not be published.

0 Shares
Share via
Copy link

Spelling error report

The following text will be sent to our editors: