2017
If you've ever taken some lessons on competitive programming, chances are that you have already heard about one of the most famous formula: the Mobius inversion. This article is aimed to provide some basic insight on what is the Mobius inversion, as well as how to apply it in various programming tasks.
If you are not familiar with the linear sieve and multiplicative functions, I recommend that you read about them first from the previous essay.
I will introduce some frequently used notations and lemmas first.
The following functions are all multiplicative functions, where \( p \) is a prime number and \( k \) is a positive integer.
I have some my unofficial names for these frequently used conclusions. If you happen to know any more commonly used name for them, you are more than welcome to tell me.
Proof: We know that \( \frac{p}{q}-\lfloor \frac{p}{q} \rfloor <1 \). Hence \( \frac{p}{qr}-\frac{\lfloor \frac{p}{q} \rfloor}{r}<\frac{1}{r} \). Since the fraction part of \( \frac{\lfloor \frac{p}{q} \rfloor}{r} \) cannot exceed \( \frac{r-1}{r} \), we achieve the conclusion.
Proof: Denote \( d(i)=\lfloor \frac{n}{i} \rfloor \). For every \( i \) not exceeding \( \sqrt{n} \), there will be no more than \( \sqrt{n} \) values in the domain of \( d(i) \), so there can be at most \( \sqrt{n} \) different values for \( d(i) \). For the rest part of \( i \) greater than \( \sqrt{n} \), we can infer that \( d(i)<\sqrt{n} \) and is a positive integer, so there can be at most \( \sqrt{n} \) different values for \( d(i) \). Therefore we know that there are at most \( 2\sqrt{n} \) different elements in the sequence.
Since the Euler–Mascheroni constant states that
\[ \lim_{n \rightarrow \infty}\sum_{i=1}^n\frac{1}{i}=\ln n+\gamma\text{,} \]we know that \( \lim_{n \rightarrow \infty}\sum_{i=1}^n\frac{n}{i}=n \ln n+n\gamma \), so the sum of the original sequence is approximate to \( O(n \log n) \).
Moreover, it is actually possible to exploit the property that the sequence has at most \( 2\sqrt{n} \) different elements, and write a loop that runs in \( O(\sqrt{n}) \) complexity to iterate through every possible value of \( d(i) \), using the fact that the greatest integer \( x \) satisfying \( d(x)=d(i) \) is \( \lfloor \frac{n}{\lfloor \frac{n}{i} \rfloor} \rfloor \). The piece of code below demonstrates one way to program it.
The proof is more mathematical, which I will skip here.
According to Wikipedia, the Mobius inversion states that:
If \( g(n)=\sum_{d|n}f(d) \) for every positive integer \( n \), then \( f(n)=\sum_{d|n}g(d)\mu(\frac{n}{d}) \), where \( \mu(x) \) is the Mobius function, which is multiplicative and satisfies \( f(1)=1, f(p)=-1, f(p^k)=0 \) for any prime number \( p \) and any integer \( k \geq 2 \). {\it It is worth noting that you can pre-compute all values of the Mobius function from \( 1 \) to \( n \) using the linear sieve.}
However, the definition alone does not mean much (unless the problem statement explicitly gives you something like \( g(n)=\sum_{d|n}f(d) \). In that case, well...). There is one important property that is probably more useful than the definition:
\[ \displaystyle \sum_{d|n}\mu(d)=\epsilon(n)=[n=1] \]In order to prove this property, we have to use the transitive property of multiplicative functions to show that \( f(n)=\sum_{d|n}\mu(d) \) is multiplicative. After that, we can see \( f(1)=1 \) and \( f(p^k)=0 \), which means \( f(x)=\epsilon(x) \). Now you may ask: why is this property important? I think it is best to show the reasons in some basic examples below.
Now apply the Mobius inversion on \( [gcd(i,j)]=1 \), we have
\[ \displaystyle f(n)=\sum_{i=1}^n\sum_{j=1}^n\sum_{d|gcd(i,j)}\mu(d) \]which is the same as
\[ \displaystyle f(n)=\sum_{i=1}^n\sum_{j=1}^n\sum_{d=1}^n[d|gcd(i,j)]\mu(d) \]Notice that \( [d|gcd(i,j)]=[d|i][d|j] \). Therefore
\[ \displaystyle f(n)=\sum_{i=1}^n\sum_{j=1}^n\sum_{d=1}^n[d|i][d|j]\mu(d) \]We can change the order of summing things up, so
\[ \displaystyle f(n)=\sum_{d=1}^n\mu(d)\left(\sum_{i=1}^n[d|i]\right)\left(\sum_{j=1}^n[d|j]\right) \]We know that \( \sum_{i=1}^n[d|i]=\sum_{j=1}^n[d|j]=\lfloor \frac{n}{d} \rfloor \). Thus
\[ \displaystyle f(n)=\sum_{d=1}^n\mu(d)\lfloor \frac{n}{d} \rfloor ^2 \]Then we can simply loop through \( 1 \) to \( n \) to compute the formula. While we can optimize this loop to \( O(\sqrt{n}) \) using the harmonic lemma, we still have to use the linear sieve to pre-compute the values of \( \mu(d) \). Therefore, the overall complexity of the algorithm is \( O(n) \).
Let \( k=gcd(i,j) \). We can then loop for \( k \) first.
\[ \displaystyle f(n)=\sum_{k=1}^nk\sum_{i=1}^n\sum_{j=1}^n[gcd(i,j)=k] \]Now assume \( i=ak,j=bk \), and then
\[ \displaystyle f(n)=\sum_{k=1}^nk\sum_{a=1}^{\lfloor\frac{n}{k}\rfloor}\sum_{b=1}^{\lfloor\frac{n}{k}\rfloor}[gcd(a,b)=1] \]It can be observed that the last part of the formula is the same as the one in example 1. Applying the same procedure, we have
\[ \displaystyle f(n)=\sum_{k=1}^nk\sum_{d=1}^{\lfloor\frac{n}{k}\rfloor}\mu(d)\lfloor \frac{n}{kd} \rfloor ^2 \]According to the harmonic lemma, \( O(\sum_{k=1}^n\lfloor\frac{n}{k}\rfloor)=O(n \log n) \). Therefore, if we compute the sum above with brute force, the overall complexity will be \( O(n \log n) \).
This is, however, not the best complexity we can achieve with the Mobius inversion. Now let \( l=kd \), and we can loop for \( l \) first.
\[ \displaystyle f(n)=\sum_{l=1}^n\lfloor \frac{n}{l} \rfloor^2\sum_{k|l}k\mu(\frac{l}{k}) \]Applying the transitive property of multiplicative functions, we can see that \( g(l)=\sum_{k|l}k\mu(\frac{l}{k}) \) is multiplicative. Moreover, we have \( g(p^k)=p^k-p^{k-1} \). If you have somewhat studied about the number theory, you may figure out that \( g(l) \) is actually the Euler's totient function. Regardless of whether you know about the coincidence or not, \( g(l) \) can be pre-computed with the linear sieve, and \( f(n)=\sum_{l=1}^n\lfloor \frac{n}{l} \rfloor^2g(l) \), which can be simply computed in \( O(n) \) complexity.
Since \( lcm(i,j)=\frac{ij}{gcd(i,j)} \), let \( k=gcd(i,j) \). We can then loop for \( k \) first.
\[ \displaystyle f(n)=\sum_{k=1}^n\sum_{i=1}^n\sum_{j=1}^n[gcd(i,j)=k]\frac{ij}{k} \]Now assume \( i=ak,j=bk \), and then
\[ \begin{align*} \displaystyle f(n)&=\sum_{k=1}^n\sum_{a=1}^{\lfloor\frac{n}{k}\rfloor}\sum_{b=1}^{\lfloor\frac{n}{k}\rfloor}[gcd(a,b)=1]abk\\ &=\sum_{k=1}^nk\sum_{a=1}^{\lfloor\frac{n}{k}\rfloor}a\sum_{b=1}^{\lfloor\frac{n}{k}\rfloor}b[gcd(a,b)=1]\\ &=\sum_{k=1}^nk\sum_{a=1}^{\lfloor\frac{n}{k}\rfloor}a\sum_{b=1}^{\lfloor\frac{n}{k}\rfloor}b\sum_{d=1}^{\lfloor\frac{n}{k}\rfloor}[d|a][d|b]\mu(d)\\ &=\sum_{k=1}^nk\sum_{d=1}^{\lfloor\frac{n}{k}\rfloor}\mu(d)\left(\sum_{a=1}^{\lfloor\frac{n}{k}\rfloor}[d|a]a\right)\left(\sum_{b=1}^{\lfloor\frac{n}{k}\rfloor}[d|b]b\right) \end{align*} \]Now let's proceed to assume \( a=dp,b=dq \), then
\[ \begin{align*} \displaystyle f(n)&=\sum_{k=1}^nk\sum_{d=1}^{\lfloor\frac{n}{k}\rfloor}\mu(d)\left(\sum_{p=1}^{\lfloor\frac{n}{kd}\rfloor}dp\right)\left(\sum_{q=1}^{\lfloor\frac{n}{kd}\rfloor}dq\right)\\ &=\sum_{k=1}^nk\sum_{d=1}^{\lfloor\frac{n}{k}\rfloor}\mu(d)d^2\left(\sum_{p=1}^{\lfloor\frac{n}{kd}\rfloor}p\right)\left(\sum_{q=1}^{\lfloor\frac{n}{kd}\rfloor}q\right)\\ \end{align*} \]Now notice \( \sum_{p=1}^{\lfloor\frac{n}{kd}\rfloor} p=\frac{(1+\lfloor\frac{n}{kd}\rfloor)(\lfloor\frac{n}{kd}\rfloor)}{2} \), and we have
\[ \displaystyle f(n)=\sum_{k=1}^nk\sum_{d=1}^{\lfloor\frac{n}{k}\rfloor}\mu(d)d^2\left(\frac{(1+\lfloor\frac{n}{kd}\rfloor)(\lfloor\frac{n}{kd}\rfloor)}{2}\right)^2 \]Now following the example above, let \( l=kd \), and then
\[ \displaystyle f(n)=\sum_{l=1}^n\left(\frac{(1+\lfloor\frac{n}{l}\rfloor)(\lfloor\frac{n}{l}\rfloor)}{2}\right)^2\sum_{d|l}\mu(d)ld \]Notice that \( g(l)=\sum_{d|l}\mu(d)ld \) is also multiplicative, and \( g(p^k)=p^k-p^{k+1} \). We can pre-compute the values of \( g(l) \) via the linear sieve, and \( f(n)=\sum_{l=1}^n\left(\frac{(1+\lfloor\frac{n}{l}\rfloor)(\lfloor\frac{n}{l}\rfloor)}{2}\right)^2g(l) \). The overall complexity will be \( O(n) \).
We simply have
\[ \begin{align*} \displaystyle f(n)&=\sum_{d}\sum_{a\in A\land d|a}\sum_{b\in A\land d|b}[gcd(\frac{a}{d},\frac{b}{d})=1]\frac{ab}{d}\\ &=\sum_{d}\sum_{a\in A\land d|a}\sum_{b\in A\land d|b}\sum_{l}[l|\frac{a}{d}][l|\frac{b}{d}]\mu(l)\frac{ab}{d}\\ &=(\sum_{d}\frac{1}{d}\sum_{l}\mu(l))(\sum_{a\in A}\sum_{b\in A}[dl|a][dl|b]ab) \end{align*} \]Let \( t=dl \), then
\[ \displaystyle f(n)=(\sum_{t}\sum_{l|t}\frac{l}{t}\mu(l))(\sum_{a\in A\land t|a}a)^2 \]Now, similar to the example above \( \displaystyle g(t)=\sum_{l|t}\frac{l}{t}\mu(l) \) is multiplicative with \( g(p^k)=p^{-k}-p^{-k+1} \), and thus can be computed with \( O(V) \), where \( V \) is the range span of \( A \). The latter part of the formula \( \displaystyle (\sum_{a\in A\land t|a}a)^2 \) is also fairly easy to figure out if you keep track of the number of the elements equal to \( v \) as \( cnt(v) \), since
\[ \displaystyle \sum_{a\in A\land t|a}a=\sum_{t|v}cnt(v)v\\ =\sum_{i=1}^{\frac{V}{t}}cnt(it)it \]and with the harmonic lemma, the above computation can be done in \( O(V\log V) \). Hence the overall complexity is \( O(N+V\log V) \).
It is known that all examples above can be further optimized to \( O(n^{\frac{2}{3}}) \). The exact technique will be introduced in the Dirichlet Convolution essay.