Analyzing the Impact of Body Shaming on Twitter: A Study Using Naive Bayes Classifier and Machine Learning

the Impact of Body Shaming


INTRODUCTION
Information is now readily available because to recent technology developments, and people now express their thoughts in a variety of ways.Social media has had a big impact on daily relationships, language, culture, and lifestyle.Dorsey established Twitter, which is still widely used today.However, a lot of Twitter users publish unfavorable remarks, information, or viewpoints that could offend others, such body shaming.
Body shaming can be extremely damaging, leading to despair, low self-esteem, and an increased risk of suicide.By identifying whether tweets are favorable or negative, sentiment analysis can be used to examine Twitter user behavior.A confusion matrix can be used to assess the accuracy, precision, and recall of the Naive Bayes Classifier technique.

METHOD
A classification approach built on the foundation of Bayes' theorem is the Naive Bayes Classifier.The Bayes' theorem, a classification technique that makes use of probability and statistics and predicts future probabilities based on historical data, was developed by the English scientist Thomas Bayes.The Nave Bayes Classifier's fundamental characteristic is its strong (nave) assumption of the independence of each condition or occurrence.(Olson & Delen, 2008) asserts that Nave Bayes determines the likelihood that a decision class is accurate in light of the information object vector for each decision class.This algorithm takes for granted the independence of object characteristics.The "master" decision table's frequencies are used to determine the probabilities that go into creating the final estimate.This method has the benefit of requiring little training data, which makes it easier to estimate the parameters needed for classification.Because it presumes that variables are independent, classification requires simply the variance of a variable inside a class rather than the whole covariance matrix.
By adding the frequencies and combinations of values from the input dataset, the Naive Bayes classification method creates a set of probabilities.The approach relies on the Bayes theorem and assumes that, depending on the values of the class variable, all attributes are either independent or not interdependent.The foundation of Naive Bayes is the simplification that, given the output value, attribute values are conditionally independent.In other words, the likelihood of observing them together, given the output value, is the sum of the individual probabilities.Utilizing Naive Bayes has the benefit of requiring little to no training data to estimate the parameters needed for the classification process.Naive Bayes frequently outperforms expectations in a wide variety of real-world scenarios.
The following gives the Bayes' theorem's equation: Understanding that the classification process needs a set of cues to identify the proper class for the sample being studied is crucial to understanding the Naive Bayes approach.Therefore, the following modifications are made to the Naive Bayes approach: Where C stands for the class and F1 through Fn for the distinguishing cues that must be present in order to classify something.As a result, the formula explains that the likelihood of a specific set of characteristic samples entering class C (Posterior) is the likelihood of class C occurring before the entry of that sample, which is frequently referred to as prior, multiplied by the likelihood of the characteristics in the sample occurring in class C, which is divided by the likelihood of the characteristics occurring globally (also called evidence).Thus, the following sentence can also be used to express the aforementioned formula: For every class in a sample, the Evidence value stays the same.Against establish which class a sample will be categorized into, the posterior value will later be compared to the posterior values of other classes.2. as a probabilistic machine learning technique.
3. in support of automated medical diagnosis.
Benefits of Naive Bayes 1. both quantitative and qualitative data may be employed.
2. need a small bit of info.
3. doesn't require a lot of training data.
7. Simple to use. 9.The code is straightforward when applied to programming.
10. both binary and multiclass categorization is possible .
Naive Bayes disadvantages 1.The prediction probability will be 0 if the conditional probabilities are also zero.
2. Given that there is frequently some correlation between variables, the assumption of their independence can lead to reduced accuracy.
3. A single probability cannot be used to gauge its correctness.To validate its accuracy, more proof is required.
4. Making decisions requires knowledge of the situation at hand or of previous events.This prior knowledge is vital to its success.There are numerous holes that could lessen its efficacy.
5. Only able to identify words; not capable of working with visuals.

Research Flow
There are multiple stages to the process that was used in this research.The approach starts with gathering twitter data and continues with preprocessing, first data processing, the Naïve Bayes classifier algorithm, and performance analysis.The graphic displays the study procedure flowchart in diagrammatic form.

A. Phase of Data Collection:
This study's phases involve gathering tweets from Twitter that include body-shaming terms like "slim," "bald," "chubby," "snub-nosed," and so on.The information obtained does not include pictures and is restricted to statements in the Indonesian language that contain body-shaming terminology.
The Twitter program is used in the tweet data collection process.Data about tweets is retrieved from the Twitter server via Twitter.There are a thousand pieces of data in all.Preprocessing, which includes cleaning and tokenization, is the following step.The next step is called crossvalidation (in this stage, the dataset is divided into two parts: training data and testing data, with 80 percent for training data and 20 percent for testing data).
The Naïve Bayes Classifier technique implementation is the following step (in this stage, weighting is done using the method through the Rapid Miner software).The performance study of the Naïve Bayes Classifier technique comes next (accuracy, precision, and recall).Making decisions based on the outcomes of the performance stage computations is the last step.The following figure shows the steps involved in the data collection stage:" Figure 3 Data Collection Process

B. First-Hand Data Handling
The Naïve Bayes Classifier approach is employed to classify the gathered data after RapidMiner is utilized for sentiment analysis in this study.The initial data that was imported into Excel from Twitter is shown below: Table 1 Initial Data The process, comprising several analysis stages, is as follows: 1. Data Collection Process (Crawling) RapidMiner is the data processing technology used in this research's sentiment analysis data collecting and Twitter data crawling processes.At this point, the data entered is unprocessed.The cleaning and tokenizing procedures are two of the data preprocessing procedures used in this investigation.Sentiment analysis of the data that has been cleaned up.The polarity analysis procedure, which is labeled using RapidMiner and uses the Extract Sentiment and Generate Attributes operators to classify the data into Positive (P) and Negative (N) (N).The sentiment toward the tweets is determined by testing the outcomes of the data analysis.

C. Unsupervised Bayes Classifier
One text classification technique based on keyword probability comparisons between training and test documents is the Naive Bayes Classifier approach.After going through multiple equality phases of comparison, the document with the highest likelihood is designated as the category for a new one.The classification procedure for the Naïve Bayes Classifier using Rapid Miner software is shown below: Figure 10.Unsupervised Bayes Classifier The data that made it through the preprocessing stage and become clean data will now go through the classification procedure.The naive Bayes classification algorithm will be used to process the data.The machine will first be trained to identify patterns in the data or documents that are currently available, and it will then be able to classify the data into two groups: positive and negative classifications.

Analysis Outcomes
The breakdown of the original data, or training data, can be outlined as it lowers through several stages, from crawling to classification using the Naïve Bayes Classifier algorithm, after completing sentiment analysis using RapidMiner.This is its historical data: • Data cleansing and crawling Crawled data totaled 1000 Twitter data points, made up of training or raw data that was acquired via RapidMiner.Following the cleaning procedure, 329 Twitter data points worth of testing or test data were acquired.

• Testing the Algorithm of Naïve Bayes Classifier
The following results from the algorithm testing method were obtained: 80.55 percent accuracy, 100 percent positive precision, 80.43 percent negative precision, 3.03 percent positive recall, and 100 percent negative recall." When compared to other classifier models, the Naive Bayes Classifier performs excellently.Naive Bayes Classifier has a greater accuracy rate compared to other classifier models, according to Xhemali and Hinde Stone in their study "Naive Bayes vs. Decision Trees vs. Neural Networks in the Classification of Training Web Pages."When compared to other classifier models, the Naive Bayes Classifier performs excellently.Naive Bayes Classifier has a greater accuracy rate compared to other classifier models, according to Xhemali and Hinde Stone in their study "Naive Bayes vs. Decision Trees vs. Neural Networks in the Classification of Training Web Pages." data utilizing a certain class P(H|X): Based on X, what is the likelihood that hypothesis H will occur?(posterior probability) P(H) : occurrence of hypothesis H (prior probability) P(X|H): Hypothesis H's condition and probability of X P(X) : probability of H.

Figure 4
Figure 4 Data Collecting And Twitter Data Crawling Processes 2. Preprocessing Data Data preprocessing, which includes all forms of processing done on unprocessed data to get it ready for more data processing like data visualization and model building, is a part of data preparation.

Figure 5
Figure 5 Processes Cleaning

Analyzing the Impact of Body Shaming on Twitter: A Study Using Naive Bayes Classifier and Machine Learning
Diantoro, Rinaldo, Sitorus, Rohman 14 | Digitus : Journal of Computer Science Applications 8.It is possible to tailor document classification to a person's needs.

Table 2
Total Word Occurrences