Analysis of Reclame Aqui Comments: Discovering Topics and Grammatical Classes with Python

Luana Nova
2 min readMar 1, 2024

--

The image displays two graphics related to Natural Language Processing (NLP). On the left, there’s a word cloud with terms like ‘Natural’, ‘Language’, ‘Processing’, ‘Statistical’, ‘Machine Learning’, and ‘NLP’ in varying sizes indicating their frequency or importance. On the right, a Venn diagram intersects the fields of ‘Computação’ (Computing), ‘Linguística’ (Linguistics), and ‘Inteligência Artificial’ (Artificial Intelligence) with NLP at the convergence of all three.

Hello, everyone! :)

Today, I will share my experience analyzing user comments from Avenue Securities on Reclame Aqui, a Brazilian complaint platform. The goal was to discover the topics and grammatical classes of words to help build a controlled vocabulary dictionary.

Let’s get into it!

Firstly, I decided to refrain from performing a sentiment analysis since we deal with complaints. I chose to analyze the topics and the frequency of words in different grammatical classes. It was quite a challenge, as a single word can have several grammatical classes depending on the context.

I used the Octoparse tool to scrape the comments, which greatly facilitated the process since the website’s API was inaccessible. With the data in hand, I started programming in Python.

The first step was pre-processing the data and conducting a topic analysis using the LDA model. Then, to analyze the grammatical classes, I used the SpaCy library. However, Spacy isn’t accurate for Brazilian Portuguese, so I had to reclassify many words manually.

Next, I calculated the frequency of the words in each grammatical class and created charts to visualize the results. To relate the topic and grammatical class analyses, I made a heatmap that shows the frequency of grammatical classes in the keywords of dominant topics.

Throughout the project, I realized how complex Natural Language Processing is in Portuguese. Despite the challenge we face in analyzing text data, I managed to gain valuable insights to create a controlled vocabulary dictionary.

I hope you enjoyed this journey through comment analysis! It was an incredible experience, and I learned much throughout the process. Explore my repository and the complete code to delve into the technical details.

See you next time!

--

--