更新时间:2023-02-18 12:08:21
您的问题非常基础,您设置的问题有误.理想情况下,您希望将 50-50 的正面和负面数据拆分为您的训练数据.由于朴素贝叶斯分类器的工作方式,它试图最小化熵.
Your issue is very basic, you are setting up your problem wrong. Ideally you want a 50-50 split of positives and negatives for your training data. Because of how the Naive Bayes classifier works, it is trying to minimize entropy.
我猜在您只有 1 条正面评论的情况下,分类器能够根据多个预测变量轻松地最小化熵.
I am guessing that in your case where you have only 1 positive comment, the classifier was able to minimize entropy very easily based on multiple predictors.
在您绝对不使用正面评论的情况下,您基本上是在说唯一的预测值/唯一可能的结果是悲伤",而这正是您的模型所做的.
Where you use absolutely no positive comments, you are basically saying that the only predicted value/ the only possible outcome is "sad" and that is exactly what your model is doing.
至于您的主要问题,请使用不同的数据集尝试不同的问题.你从哪里得到你的推文,它们是否足够多样化?
As for your main issue, try a different using a different data set. Where are you getting your tweets from, are they sufficiently diverse?