Probability

　TABLE OF CONTENTS

　　・The word “Chance”

　　・Axioms of Probability

Mathematics articles that help in reading this article
・Numerical Computation：Ep. 1, Ep. 5, Ep. 14, Ep. 15

Kaya

Let’s think about chance and probability this time.

The word “Chance”

Nayumi

Do you mean things like the chance of winning the lottery?

Kaya

Yes, exactly. Chance is a word that has become part of everyday language. So let’s start with some concrete examples and explore the kinds of meanings this word carries.

　The chance of winning a lottery can be calculated if you know the total number of tickets and how many of them are winners. More precisely, the chance of winning is given by the ratio of winning tickets to the total number of tickets.

　The chance of rain in a weather forecast is not something we can calculate as easily as a lottery. Even so, you don’t need to know how the chance of rain is calculated in order to use it as a guide—when the chance is high, you take an umbrella with you. In other words, when thinking about chance, there are situations where knowing the calculation method is not essential.
　Even if we don’t know how it is calculated, we still understand that the numerical value of a chance is constrained to some extent. For example, if a weather forecast said “a 200% chance of rain,” most people would immediately find that strange. Because the chance of rain must fall between 0% and 100%, we can see that a “chance” cannot take just any value.

　Finally, there are many words that describe how often something happens, such as “probably,” “likely,” “almost,” or “rarely.” When people use these expressions, they are usually describing how often something occurs based on past experience. However, if you were suddenly asked to specify exactly which occurrences happened on which trials—say, “the 3rd and 7th times out of 20”—most people would not be able to answer. This means that, unconsciously, people smooth out and summarize their past experiences to estimate how often events occur. This, too, can be regarded as a kind of probabilistic way of thinking.

Kaya

In this way, the word chance and the way of thinking behind it are used in many different forms in everyday life. By contrast, in mathematics, probability is defined rigorously on the basis of the axioms of probability.

Axioms of Probability

Nayumi

What’s an axiom?

Kaya

Well, let me first introduce the terms proposition and axiom.

Proposition：A statement or equation whose truth value is determined. Important propositions in discussions are also called theorems.

Axioms：A proposition that is stated without proof, and is intended to prove another proposition.

　Just as you prepare ingredients like flour and sugar to bake a cake, in mathematics you first set down a collection of axioms, and from them proceed to prove various propositions when you want to discuss a subject.

Nayumi

Hm, so the axioms of probability are the starting point for thinking about probability.

Kaya

Exactly. The axioms of probability were introduced in 1933 by the Russian mathematician A. N. Kolmogorov.

\[ \text{Axioms of Probability}\] 　\( \Omega \) is a set consisting of elements called elementary events, and this set is referred to as the sample space. The set \( \mathfrak{F} \), known as the event space, is a collection of subsets of \( \Omega \), and its elements are called events. A triplet \( \left( \Omega, \mathfrak{F}, P \right) \) that satisfies the following axioms of probability [A1]–[A6] is called a probability space.

Axioms of Probability:
[A1] The union, difference, and intersection of any two elements in \( \mathfrak{F} \) are also contained in \( \mathfrak{F} \).
[A2] \( \Omega \in \mathfrak{F} \).
[A3] For any element \( A \in \mathfrak{F} \), a non-negative real number \( P(A) \), called the probability of the event \( A \), is assigned.
[A4] \( P(\Omega) = 1 \).
[A5] If two elements \( A \) and \( B \) in \( \mathfrak{F} \) are disjoint, then \[ P \left( A \cup B \right) = P \left( A \right) + P \left( B \right) \] In this case, \( A \) and \( B \) are said to be mutually exclusive.
[A6] For any decreasing sequence of elements in \( \mathfrak{F} \), \[ A_1 \supset A_2 \supset \cdots \supset A_n \supset \cdots \] if \[ \bigcap _{i=1} ^{\infty} A_i = \varnothing \] then \[ \lim _{i \to \infty} P \left( A_i \right) = 0 \]

　\( \mathfrak F \) is the letter F written in Fraktur script. Fraktur is a style of lettering that was used in Germany for printing up until around the time of World War II, and it is still used in mathematics today, where a large number of distinct symbols are needed.

　Next, we will prove the following six propositions from the axioms.

　From the axioms of probability, the following six propositions hold:
[1] If \( A \subset B \), then \( P(A) \leq P(B) \).
[2] \( P \left( \varnothing \right) = 0 \)
[3] \( P \left( A^c \right) = P \left( \Omega - A \right) = 1 - P \left( A \right) \)
Here, \( A^c \) is called the complementary event.
[4] \( 0 \leq P \left( A \right) \leq 1 \)
[5] \( P \left( A \cup B \right) = P \left( A \right) + P \left( B \right) - P \left( A \cap B \right) \)
[6] If a sequence of sets \( A_1, A_2, \dots, A_n, \dots \) in \( \mathfrak{F} \) are mutually exclusive, then \[ P \left( \bigcup _{i=1} ^{\infty} A_i \right) = \sum _{i=1} ^{\infty} P \left( A_i \right).\]

Nayumi

That’s quite a lot.

Kaya

Well, none of the individual proofs are particularly difficult. Let’s go through them one by one from the top. We’ll start with [1] and [2].

Proof of [1]
　Let \( A, B \in \mathfrak{F} \) and suppose \( A \subset B \). Since \( B = A \cup (B - A) \), by Axioms A3 and A5 we have \[ P(B) = P(A) + P(B - A) \ge P(A). \] Therefore, statement [1] holds.

Proof of [2]
　Let \( A \in \mathfrak{F} \). Since \( A = A \cup \varnothing \), by Axiom A5 we obtain \[ \begin{aligned} P(A) &= P(A) + P(\varnothing) \\\\ 0 &= P(\varnothing) \end{aligned} \] Therefore, statement [2] holds.

Nayumi

Both [1] and [2] hinge on Axiom A5, don’t they?

Kaya

That’s right. In the proof of [1], we also use Axiom A3, which states that the probability of an event always takes a non-negative real value. Next is the proof of [3], and it’s almost the same as these.

Proof of [3]
　Let \( A \in \mathfrak F. \) Since \( \Omega = A \cup \left( \Omega - A \right), \) by Axioms A4 and A5 we have \[ \begin{align} P \left( \Omega \right) &= P \left( A \right) + P \left( \Omega - A \right) = 1 \\\\ P \left( \Omega - A \right) &= 1 - P \left( A \right) \end{align}\] Therefore, statement [3] holds.

Nayumi

Axiom A4—the property that the probability of the sample space is 1—is being used here, right? And \( A^c \) is called the complementary event, but that’s the same notation as the complement of a set, isn’t it?

Kaya

That’s right. If we take \( \Omega \) as the universal set, then \( A^c = \Omega - A \) is just the complement of \( A, \) so we use the same notation.

Nayumi

I see.

Kaya

Next, statement [4] can be derived using [1] together with Axioms A3 and A4, as follows.

Proof of [4]
　Let \( A \in \mathfrak F .\) Since \( A \subset \Omega, \) by Axiom A3, Axiom A4, and statement [1], we have \[ 0 \leq P \left( A \right) \leq P \left( \Omega \right) = 1\] Therefore, statement [4] holds.

Nayumi

[4] means that probabilities range from 0 to 1—that is, from 0% to 100%.

Kaya

That’s right. Statement [5] can also be derived from Axiom A5.

Proof of [5]
　Let \( A,B \in \mathfrak F .\) Since \( A \cup B = A \cup \left( B - A \right) ,\) by Axiom A5 we have \[ P \left( A \cup B \right) = P \left( A \right) + P \left( B - A \right) \ \ \ldots (1) \] Also, since \( B = \left( A \cap B \right) \cup \left( B - A \right), \) again by Axiom A5 we obtain \[ P \left( B \right) = P \left( A \cap B \right) + P \left( B - A \right) \ \ \ldots (2) \] Subtracting \( (2) \) from \( (1) ,\) we get \[ \begin{align} P \left( A \cup B \right) - P \left( B \right) &= P \left( A \right) - P \left( A \cap B \right) \\\\ P \left( A \cup B \right) &= P \left( A \right) + P \left( B \right) - P \left( A \cap B \right) \end{align}\] Therefore, statement [5] holds.

Nayumi

[5] has the same form as the formula for the cardinality of finite sets that we studied earlier.

Kaya

Yes. It’s a matter of subtracting the part that gets counted twice. Finally, let’s prove [6].

Proof of [6]
　Let \( \mathbb{N} \) denote the set of all natural numbers. Let \( \left\{ A_i \right\} _{i \in \mathbb N} \) be a sequence of events in \( \mathfrak{F} \) that are mutually exclusive. Define \[ R_n = \bigcup _{i=n} ^{\infty} A_i \quad \left( n \in \mathbb N \right)\] Then \( {R_n} \) is a decreasing sequence in \( \mathfrak{F} \), and \[ \bigcap_{n=1}^{\infty} R_n = \varnothing . \] Indeed, suppose there exists an element \[ x \in \bigcap_{n=1}^{\infty} R_n . \] Then for every \( n \), \[ x \in R_n = \bigcup_{i=n}^{\infty} A_i \quad \text{and} \quad x \in R_{n+1} = \bigcup_{i=n+1}^{\infty} A_i . \] Now suppose \( x \in A_n \). Since the family \( \left\{ A_i \right\} _{i \in \mathbb N} \) is mutually exclusive, \[ x \notin A_i \quad (i \ge n+1). \] Hence \( x \notin R_{n+1} \), which contradicts \( x \in R_{n+1} \). Therefore \( x \notin A_n \). Since this holds for every \( n \), there exists no \( i \) such that \( x \in A_i \). On the other hand, \[ x \in \bigcap_{n=1}^{\infty} R_n \subset R_1 = \bigcup_{i=1}^{\infty} A_i , \] so there must exist some \( i \) such that \( x \in A_i \), which is a contradiction. Hence, \[ \bigcap_{n=1}^{\infty} R_n = \varnothing . \]
By Axiom A6, \[ \lim_{n \to \infty} P(R_n) = 0. \] Since \( A_{n+1} \subset R_n \), by [1] and Axiom A3, \[ \begin{align} 0 \leq P \left( A_{n+1} \right) & \leq P \left( R_n \right) \\\\ 0 \leq \lim _{n \to \infty} P \left( A_{n+1} \right) & \leq \lim _{n \to \infty} P \left( R_n \right) = 0 \end{align}\] Therefore, \[ \lim_{n\to\infty} P(A_{n+1}) = 0 . \] On the other hand, by Axiom A5, \[ P \left( \bigcup _{i=1} ^{n+1} A_i \right) = \sum _{i=1} ^n P \left( A_i \right) + P \left( A_{n+1} \right) \] Taking the limit as \( n \to \infty \) on both sides yields \[ P \left( \bigcup _{i=1} ^{\infty} A_i \right) = \sum _{i=1} ^{\infty} P \left( A_i \right) \] Therefore, statement [6] holds.

Nayumi

Hmm, constructing a decreasing sequence in \( \mathfrak F \) using the union of the \( A_i \) is quite clever.

Kaya

Indeed. It neatly brings out A6 as an axiom.

Nayumi

With that, we've proved everything from [1] through [6], haven't we.

Kaya

Yes. Good work. Let's stop here for today.

Reference：
[1] 宮西正宜 24 others, 高等学校　数学A　改訂版, 新興出版社啓林館, December 10, 2008
[2] A.N.Kolmogorov, translated by Nathan Morrison, FOUNDATIONS OF THE THEORY OF PROBABILITY, CHELSEA PUBLISHING COMPANY NEW YORK, 1950
[3] Wikipedia Fraktur, https://en.wikipedia.org/wiki/Fraktur, March 9, 2026

Ep. 15
Permutation/
Combination/
Binomial theorem