Sentiment Analysis: First Steps With Python’s NLTK Library The idea behind the TF-IDF approach is that the words that occur less in all the documents and more in individual documents contribute more towards classification. Next, we remove all the single characters left as a result of removing the special character using the re.sub(r’\s+[a-zA-Z]\s+’, ‘ ‘, […]