Practical Analysis of Consumer Reviews -part I
Text analysis for consumer reviews |Beyond Sentiment Analysis
Can sentiment analysis solve business problems ?
No
why?
Knowing a review is positive or negative will not solve the business problem.
what if you can identify what is good and what is bad
yes that would help. then business can focus on specific areas that maters to end consumer.
This article is focused on how to go beyond standard sentiment analysis by creating custom classifiers along with sentiment analysis using limited data set.
Before going into beyond Sentiment analysis lets understand what is sentiment analysis and how to do it.
Wikipedia
Sentiment analysis (also known as opinion mining or emotion AI) refers to the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information.
Advantages of custom classification
Standard text classification models like sentiment analysis are not always sufficient to get the holistic view when analyzing unstructured data like voice of the customer. Custom classifier, on the other hand can give you the ability to categorize such data into more nuanced categories that we care. For, eg, using custom classifier, a fast food chain owner can categorize the reviews categories like price, ambiance, staff behavior, food quality, etc. to gain better insights into what aspects of business needs to improve.
Action plan
1- Build a Sentiment analysis algorithm
2- Build a Custom classifier
3- Create Interactive dashboard
Lets build a basic sentiment analysis algorithm using python (SK-learn )
Below are the quick steps to build a basic sentiment analysis algorithm. using sk-learn
you can find complete code here .
check the data set
our data set consist of two columns review and its sentiment for 65,000 rows that will be used to train and test the classification model.
data=pd.read_excel('sentimanet.xlsx',sheet_name='Sheet1')
plt.figure(figsize=(10,4))
data.Sentiment.value_counts().plot(kind='bar');
This data set is balanced for two classes ( positive and Negative ) hence we can go ahead and create our binary classification model. if your data set in imbalanced there are multiple ways to overcome that problem using techniques such as SMOTE, oversampling ,under sampling etc.
Text Cleaning is very important to get the best out of the Machine learning model that we going to build.
REPLACE_BY_SPACE_RE = re.compile('[/(){}\[\]\|@,;]')
BAD_SYMBOLS_RE = re.compile('[^0-9a-z #+_]')
STOPWORDS = set(stopwords.words('english'))def clean_text(text):
text = BeautifulSoup(text, "lxml").text # HTML decoding
text = text.lower() # lowercase text
text = REPLACE_BY_SPACE_RE.sub(' ', text) # replace REPLACE_BY_SPACE_RE symbols by space in text
text = BAD_SYMBOLS_RE.sub('', text) # delete symbols which are in BAD_SYMBOLS_RE from text
text = ' '.join(word for word in text.split() if word not in STOPWORDS) # delete stopwors from text
return textdata[‘review’] = data[‘Review’].apply(clean_text)
Train -Test Split
X = data.review
y = data.Sentiment
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state = 42)
Model Building
tried many models and best model is picked based on accuracy
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfTransformernb = Pipeline([('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', MultinomialNB()),
])
nb.fit(X_train, y_train)
Model Validation
%%time
%%time
from sklearn.metrics import classification_report
y_pred = nb.predict(X_test)print('accuracy %s' % accuracy_score(y_pred, y_test))
print(classification_report(y_test, y_pred))accuracy 0.7948247174249814
precision recall f1-score support
0 0.77 0.86 0.82 9129
1 0.83 0.72 0.77 8300
micro avg 0.79 0.79 0.79 17429
macro avg 0.80 0.79 0.79 17429
weighted avg 0.80 0.79 0.79 17429
Wall time: 527 ms
we have built a fairly accurate sentiment classification model . lets pickle this and use.
Pkl_Filename = “Sentiment.pkl”with open(Pkl_Filename, ‘wb’) as file:
pickle.dump(nb, file)with open(Pkl_Filename, 'rb') as file:
Sentiment = pickle.load(file)Sentiment
now we can use this pickle object to get sentiment score to our dat set.
Lets Build custom classifier to classify reviews
we can use python libry called “fuzzywuzzy” which is simple but very effective algorithm.
#import fuzzywuzzy libraryfrom fuzzywuzzy import fuzz
from fuzzywuzzy import process
define Key words for each category
fit = ['fit','fitting','small','too','tight','large','lose','confortable','stretchy','tailored','strappy']price =['expensive','price','cheep','bucks','buy','sell','purchase','too',price,cost,amount,buks]material=['strech','material','fabric','composition','feel','premium']
Lets create a Scoring function using both custom classifier and Sentiment analyzer.
def aspects(sentense):
sentiment=Sentiment.predict(sentense)
fit_score=process.default_scorer(sentense,fit)
mat_score=process.default_scorer(sentense,mat)
price_score=process.default_scorer(sentense,price)
return sentiment,fit_score,mat_score,price_score
Lets check the final output data set we get once we apply scoring function.
Now we have analyzed data set to create an interactive dashboard in next Story.