単純ベイズ分類器 - TECHCODEVIEW.COM - PYTHON を使用した AI-ML-DS

ナイーブベイズ分類器、ベイズの定理に基づくアルゴリズムのファミリー。特徴の独立性についての単純な仮定にもかかわらず、これらの分類器は機械学習における単純さと効率性のために広く利用されています。この記事では、理論、実装、アプリケーションを詳しく掘り下げ、単純化しすぎた仮定にもかかわらず、実際の有用性を明らかにします。

単純ベイズ分類器とは何ですか?

単純ベイズ分類器は、ベイズの定理に基づいた分類アルゴリズムの集合です。これは単一のアルゴリズムではなく、すべてのアルゴリズムが共通の原則を共有するアルゴリズムのファミリーです。つまり、分類される特徴のすべてのペアは互いに独立しています。まず、データセットについて考えてみましょう。

最もシンプルで効果的な分類アルゴリズムの 1 つであるナイーブベイズ分類器は、迅速な予測機能を備えた機械学習モデルの迅速な開発に役立ちます。

Naïve Bayes アルゴリズムは分類問題に使用されます。テキストの分類によく使用されます。テキスト分類タスクでは、データには高次元が含まれます (各単語がデータ内の 1 つの特徴を表すため)。これは、スパムフィルタリング、センチメント検出、評価分類などに使用されます。ナイーブベイズを使用する利点は、その速度です。高速で、高次元のデータによる予測が簡単です。

このモデルは、インスタンスが特定の特徴値のセットを持つクラスに属する確率を予測します。確率的分類器です。これは、モデル内の 1 つのフィーチャが別のフィーチャの存在から独立していると想定しているためです。言い換えれば、各特徴は相互に関係なく予測に寄与します。現実の世界では、この条件が満たされることはほとんどありません。トレーニングと予測のアルゴリズムにベイズの定理を使用します

なぜナイーブベイズと呼ばれるのでしょうか?

名前の Naive 部分は、Naïve Bayes 分類器によって行われた単純化された仮定を示します。分類器は、クラスラベルが与えられた場合、観測値を記述するために使用される特徴が条件付きで独立していると想定します。名前のベイズの部分は、ベイズの定理を定式化した 18 世紀の統計学者兼神学者であるトーマスベイズ牧師を指します。

ラテックスフォント

ゴルフのゲームをプレイするための気象条件を記述する架空のデータセットを考えてみましょう。気象条件を考慮して、各タプルはその条件をゴルフのプレーに適している (はい) または不適格 (いいえ) として分類します。これは、データセットの表形式の表現です。

	見通し	温度	湿度	風が強い	ゴルフをする
0	雨の	熱い	高い	間違い	いいえ
1	雨の	熱い	高い	真実	いいえ
2	曇り	熱い	高い	間違い	はい
3	晴れ	軽度	高い	間違い	はい
4	晴れ	いいね	普通	間違い	はい
5	晴れ	いいね	普通	真実	いいえ
6	曇り	いいね	普通	真実	はい
7	雨の	軽度	高い	間違い	いいえ
8	雨の	いいね	普通	間違い	はい
9	晴れ	軽度	普通	間違い	はい
10	雨の	軽度	普通	真実	はい
十一	曇り	軽度	高い	真実	はい
12	曇り	熱い	普通	間違い	はい
13	晴れ	軽度	高い	真実	いいえ

データセットは 2 つの部分に分かれています。 特徴マトリックス そしてその 応答ベクトル 。

特徴行列には、各ベクトルが次の値で構成されるデータセットのすべてのベクトル (行) が含まれます。 依存機能 。上記のデータセットでは、特徴は「見通し」、「気温」、「湿度」、「風」です。
応答ベクトルには次の値が含まれます。 クラス変数 特徴行列の各行の (予測または出力)。上記のデータセットでは、クラス変数名は「Play Golf」です。

単純ベイズの仮定

基本的な単純ベイズの仮定は、各特徴が次のことを行うということです。

機能の独立性: データの特徴は、クラスラベルが与えられている限り、条件付きで互いに独立しています。
連続特徴は正規分布します。 特徴が連続的である場合、それは各クラス内で正規分布していると想定されます。
離散特徴量には多項分布があります。 特徴が離散的である場合、特徴は各クラス内で多項分布を持つと想定されます。
機能も同様に重要です。 すべての特徴がクラスラベルの予測に等しく寄与すると想定されます。
欠落データなし: データには欠損値が含まれていてはなりません。

データセットに関連して、この概念は次のように理解できます。

依存する機能のペアがないことを前提としています。たとえば、気温が「暑い」でも湿度は関係ありませんし、見通しが「雨」でも風には影響がありません。したがって、特徴は次のように仮定されます。 独立した 。
第二に、各機能には同じ重み (または重要性) が与えられます。たとえば、温度と湿度だけを知っていても、結果を正確に予測することはできません。どの属性も無関係ではなく、寄与していると考えられます 平等に 結果に。

Naive Bayes によって立てられた仮定は、現実世界の状況では一般に正しくありません。実際、独立性の仮定は決して正しくありませんが、実際にはうまく機能することがよくあります。さて、単純ベイズの公式に進む前に、ベイズの定理について知っておくことが重要です。

ベイズの定理

ベイズの定理は、すでに発生した別のイベントの確率を考慮して、あるイベントが発生する確率を求めます。ベイズの定理は数学的には次の方程式として表されます。

P(A|B) = fracP(B{P(B)}

ここで、A と B はイベントであり、P(B) ≠ 0

基本的に、イベント B が true であると仮定して、イベント A の確率を見つけようとしています。イベント B は次のようにも呼ばれます。証拠。
P(A) は、 先験的 A (事前確率、つまり、証拠が見られる前のイベントの確率)。証拠は未知のインスタンス(ここではイベントB)の属性値です。
P(B) は限界確率、つまり証拠の確率です。
P(A|B) は B の事後確率、つまり証拠が確認された後のイベントの確率です。
P(B|A) は尤度確率、つまり証拠に基づいて仮説が成り立つ可能性です。

さて、データセットに関しては、次の方法でベイズの定理を適用できます。

P(y|X) = fracP(X{P(X)}

ここで、y はクラス変数、X は依存特徴ベクトル (サイズ n ）どこ：

X = (x_1,x_2,x_3,…..,x_n)

念のために言っておきますが、特徴ベクトルと対応するクラス変数の例は次のとおりです: (データセットの 1 行目を参照)

X = (Rainy, Hot, High, False)>
y = No>

だから基本的に、P(y|X) これは、気象条件が雨の見通し、気温が高く、湿度が高く、風がない場合に、ゴルフをプレーしない確率を意味します。

データセットに関連して、この概念は次のように理解できます。

依存する機能のペアがないことを前提としています。たとえば、気温が「暑い」でも湿度は関係ありませんし、見通しが「雨」でも風には影響がありません。したがって、特徴は次のように仮定されます。 独立した 。
第二に、各機能には同じ重み (または重要性) が与えられます。たとえば、温度と湿度だけを知っていても、結果を正確に予測することはできません。どの属性も無関係ではなく、寄与していると考えられます 平等に 結果に。

ここで、ベイズの定理に素朴な仮定を置きます。独立機能のうち。だから今、私たちは別れます証拠独立した部分に分割します。

さて、任意の 2 つのイベント A と B が独立している場合、次のようになります。

P(A,B) = P(A)P(B)>

したがって、次の結果に達します。

P(y|x_1,…,x_n) = frac P(x_1{P(x_1)P(x_2)…P(x_n)}

これは次のように表現できます。

P(y|x_1,…,x_n) = frac{P(y)prod_{i=1}^{n}P(x_i|y)}{P(x_1)P(x_2)…P(x_n)}

ここで、特定の入力に対して分母が一定のままであるため、その項を削除できます。

P(y|x_1,…,x_n)propto P(y)prod_{i=1}^{n}P(x_i|y)

次に、分類子モデルを作成する必要があります。このために、クラス変数のすべての可能な値に対する、指定された入力セットの確率を見つけます。 そして 最大の確率で出力を取得します。これは数学的に次のように表現できます。

y = argmax_{y} P(y)prod_{i=1}^{n}P(x_i|y)

したがって、最後に、計算するタスクが残されています。 P(y) そしてP(x_i | y) 。

その点に注意してくださいP(y) はクラス確率とも呼ばれ、P(x_i | y) を条件付き確率といいます。

さまざまな単純ベイズ分類器の主な違いは、分布に関する仮定です。P(x_i | y).

上記の式を気象データセットに手動で適用してみましょう。このためには、データセットに対していくつかの事前計算を行う必要があります。

見つける必要があります P(x_i | y_j) それぞれにx_i X とy_j yで。これらすべての計算は、以下の表に示されています。

したがって、上の図で計算したのは、P(x_i | y_j) それぞれにx_i Xとy_j 表 1 ～ 4 で y を手動で入力します。たとえば、気温が低い場合にゴルフをする確率、つまり P(気温 = 涼しい | ゴルフをする = はい) = 3/9。

また、クラス確率を見つける必要がありますP(y) これは表 5 で計算されています。たとえば、P(ゴルフをする = はい) = 9/14 です。

これで事前計算が完了し、分類器の準備が整いました。

新しい機能セット (今日はそれと呼びます) でテストしてみましょう。

today = (Sunny, Hot, Normal, False)>

P(Yes | today) = fracYes)P(No Wind{P(today)}

ゴルフをしない確率は次の式で求められます。

P(No | today) = fracP(Sunny Outlook{P(today)}

P(today) は両方の確率で共通であるため、P(today) を無視して、比例確率を次のように求めることができます。

P(Yes | today) propto frac{3}{9}.frac{2}{9}.frac{6}{9}.frac{6}{9}.frac{9}{14} approx 0.02116

そして

P(No | today) propto frac{3}{5}.frac{2}{5}.frac{1}{5}.frac{2}{5}.frac{5}{14} approx 0.0068

さて、それ以来

P(Yes | today) + P(No | today) = 1

これらの数値は、合計を 1 に等しくすることで確率に変換できます (正規化)。

P(Yes | today) = frac{0.02116}{0.02116 + 0.0068} approx 0.0237

そして

P(No | today) = frac{0.0068}{0.0141 + 0.0068} approx 0.33

以来

P(Yes | today)>P(いいえ | 今日)

したがって、ゴルフが行われるという予測は「はい」です。

上で説明した方法は、離散データに適用できます。連続データの場合、各特徴の値の分布に関していくつかの仮定を行う必要があります。さまざまな単純ベイズ分類器の主な違いは、分布に関する仮定です。P(x_i | y).

単純ベイズモデルの種類

単純ベイズモデルには 3 つのタイプがあります。

ガウス単純ベイズ分類器

Gaussian Naive Bayes では、各特徴に関連付けられた連続値がガウス分布に従って分布すると仮定されます。ガウス分布とも呼ばれます正規分布プロットすると、以下に示すように、特徴値の平均に関して対称な釣鐘型の曲線が得られます。

Outlook 機能の事前確率の更新された表は次のとおりです。

特徴の尤度はガウスであると仮定されるため、条件付き確率は次の式で与えられます。

P(x_i | y) = frac{1}{sqrt{2pisigma _{y}^{2} }} exp left (-frac{(x_i-mu _{y})^2}{2sigma _{y}^{2}} ight )

次に、scikit-learn を使用したガウス単純ベイズ分類器の実装を見ていきます。

	はい	いいえ	P(はい)	P(いいえ)
晴れ	3	2	3/9	2/5
雨の	4	0	4/9	0/5
曇り	2	3	2/9	3/5
合計	9	5	100%	100% 隠れたアプリを表示する

パイソン

# load the iris dataset> from> sklearn.datasets>import> load_iris> iris>=> load_iris()> > # store the feature matrix (X) and response vector (y)> X>=> iris.data> y>=> iris.target> > # splitting X and y into training and testing sets> from> sklearn.model_selection>import> train_test_split> X_train, X_test, y_train, y_test>=> train_test_split(X, y, test_size>=>0.4>, random_state>=>1>)> > # training the model on training set> from> sklearn.naive_bayes>import> GaussianNB> gnb>=> GaussianNB()> gnb.fit(X_train, y_train)> > # making predictions on the testing set> y_pred>=> gnb.predict(X_test)> > # comparing actual response values (y_test) with predicted response values (y_pred)> from> sklearn>import> metrics> print>(>'Gaussian Naive Bayes model accuracy(in %):'>, metrics.accuracy_score(y_test, y_pred)>*>100>)>

Output: Gaussian Naive Bayes model accuracy(in %): 95.0 Multinomial Naive Bayes Feature vectors represent the frequencies with which certain events have been generated by a multinomial distribution. This is the event model typically used for document classification. Bernoulli Naive Bayes In the multivariate Bernoulli event model, features are independent booleans (binary variables) describing inputs. Like the multinomial model, this model is popular for document classification tasks, where binary term occurrence(i.e. a word occurs in a document or not) features are used rather than term frequencies(i.e. frequency of a word in the document). Advantages of Naive Bayes ClassifierEasy to implement and computationally efficient.Effective in cases with a large number of features.Performs well even with limited training data.It performs well in the presence of categorical features. For numerical features data is assumed to come from normal distributionsDisadvantages of Naive Bayes ClassifierAssumes that features are independent, which may not always hold in real-world data.Can be influenced by irrelevant attributes.May assign zero probability to unseen events, leading to poor generalization.Applications of Naive Bayes Classifier Spam Email Filtering : Classifies emails as spam or non-spam based on features. Text Classification : Used in sentiment analysis, document categorization, and topic classification. Medical Diagnosis: Helps in predicting the likelihood of a disease based on symptoms. Credit Scoring: Evaluates creditworthiness of individuals for loan approval. Weather Prediction : Classifies weather conditions based on various factors.As we reach to the end of this article, here are some important points to ponder upon: In spite of their apparently over-simplified assumptions, naive Bayes classifiers have worked quite well in many real-world situations, famously document classification and spam filtering. They require a small amount of training data to estimate the necessary parameters.Naive Bayes learners and classifiers can be extremely fast compared to more sophisticated methods. The decoupling of the class conditional feature distributions means that each distribution can be independently estimated as a one dimensional distribution. This in turn helps to alleviate problems stemming from the curse of dimensionality.ConclusionIn conclusion, Naive Bayes classifiers, despite their simplified assumptions, prove effective in various applications, showcasing notable performance in document classification and spam filtering. Their efficiency, speed, and ability to work with limited data make them valuable in real-world scenarios, compensating for their naive independence assumption. Frequently Asked Questions on Naive Bayes ClassifiersWhat is Naive Bayes real example?Naive Bayes is a simple probabilistic classifier based on Bayes’ theorem. It assumes that the features of a given data point are independent of each other, which is often not the case in reality. However, despite this simplifying assumption, Naive Bayes has been shown to be surprisingly effective in a wide range of applications. Why is it called Naive Bayes?Naive Bayes is called naive because it assumes that the features of a data point are independent of each other. This assumption is often not true in reality, but it does make the algorithm much simpler to compute. What is an example of a Bayes classifier?A Bayes classifier is a type of classifier that uses Bayes’ theorem to compute the probability of a given class for a given data point. Naive Bayes is one of the most common types of Bayes classifiers. What is better than Naive Bayes?There are several classifiers that are better than Naive Bayes in some situations. For example, logistic regression is often more accurate than Naive Bayes, especially when the features of a data point are correlated with each other. Can Naive Bayes probability be greater than 1?No, the probability of an event cannot be greater than 1. The probability of an event is a number between 0 and 1, where 0 indicates that the event is impossible and 1 indicates that the event is certain.>