Analysis of Reclame Aqui Comments: Discovering Topics and Grammatical Classes with Python
Hello, everyone! :)
Today, I will share my experience analyzing user comments from Avenue Securities on Reclame Aqui, a Brazilian complaint platform. The goal was to discover the topics and grammatical classes of words to help build a controlled vocabulary dictionary.
Let’s get into it!
Firstly, I decided to refrain from performing a sentiment analysis since we deal with complaints. I chose to analyze the topics and the frequency of words in different grammatical classes. It was quite a challenge, as a single word can have several grammatical classes depending on the context.
I used the Octoparse tool to scrape the comments, which greatly facilitated the process since the website’s API was inaccessible. With the data in hand, I started programming in Python.
The first step was pre-processing the data and conducting a topic analysis using the LDA model. Then, to analyze the grammatical classes, I used the SpaCy library. However, Spacy isn’t accurate for Brazilian Portuguese, so I had to reclassify many words manually.
Next, I calculated the frequency of the words in each grammatical class and created charts to visualize the results. To relate the topic and grammatical class analyses, I made a heatmap that shows the frequency of grammatical classes in the keywords of dominant topics.
Throughout the project, I realized how complex Natural Language Processing is in Portuguese. Despite the challenge we face in analyzing text data, I managed to gain valuable insights to create a controlled vocabulary dictionary.
I hope you enjoyed this journey through comment analysis! It was an incredible experience, and I learned much throughout the process. Explore my repository and the complete code to delve into the technical details.
See you next time!