且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

DataFrame中各列之间的相关性

更新时间:2022-12-09 16:19:37

np.correlate 计算两个一维序列之间的(未归一化的)互相关:

np.correlate calculates the (unnormalized) cross-correlation between two 1-dimensional sequences:

z[k] = sum_n a[n] * conj(v[n+k])

df.corr (默认情况下)计算 Pearson相关系数.

while df.corr (by default) calculates the Pearson correlation coefficient.

相关系数(如果存在)始终在-1和1之间(包括1和1). 互相关不受限制.

The correlation coefficient (if it exists) is always between -1 and 1 inclusive. The cross-correlation is not bounded.

这些公式有些相关,但是请注意,在上述互相关公式中,均值没有相减,也没有除以标准差(这是Pearson相关系数公式的一部分).

The formulas are somewhat related, but notice that in the cross-correlation formula (above) there is no subtraction of the means, and no division by the standard deviations which is part of the formula for Pearson correlation coefficient.

df['a']df['b']的标准偏差为零的事实是导致df.corr到处都是NaN的原因.

The fact that the standard deviation of df['a'] and df['b'] is zero is what causes df.corr to be NaN everywhere.

在下面的评论中,听起来您正在寻找 Beta .它与Pearson的相关系数有关,而不是除以标准差的乘积:

From the comment below, it sounds like you are looking for Beta. It is related to Pearson's correlation coefficient, but instead of dividing by the product of standard deviations:

您除以方差:

您可以使用 np.cov

计算Beta >
cov = np.cov(a, b)
beta = cov[1, 0] / cov[0, 0]


import numpy as np
import matplotlib.pyplot as plt
np.random.seed(100)


def geometric_brownian_motion(T=1, N=100, mu=0.1, sigma=0.01, S0=20):
    """
    http://***.com/a/13203189/190597 (unutbu)
    """
    dt = float(T) / N
    t = np.linspace(0, T, N)
    W = np.random.standard_normal(size=N)
    W = np.cumsum(W) * np.sqrt(dt)  # standard brownian motion ###
    X = (mu - 0.5 * sigma ** 2) * t + sigma * W
    S = S0 * np.exp(X)  # geometric brownian motion ###
    return S

N = 10 ** 6
a = geometric_brownian_motion(T=1, mu=0.1, sigma=0.01, N=N)
b = geometric_brownian_motion(T=1, mu=0.2, sigma=0.01, N=N)

cov = np.cov(a, b)
print(cov)
# [[ 0.38234755  0.80525967]
#  [ 0.80525967  1.73517501]]
beta = cov[1, 0] / cov[0, 0]
print(beta)
# 2.10609347015

plt.plot(a)
plt.plot(b)
plt.show()

mu s的比率是2,beta的比率是〜2.1.

The ratio of mus is 2, and beta is ~2.1.

您也可以使用df.corr进行计算,尽管这是一种更全面的方法(但是很高兴看到一致性):

And you could also compute it with df.corr, though this is a much more round-about way of doing it (but it is nice to see there is consistency):

import pandas as pd
df = pd.DataFrame({'a': a, 'b': b})
beta2 = (df.corr() * df['b'].std() * df['a'].std() / df['a'].var()).ix[0, 1]
print(beta2)
# 2.10609347015
assert np.allclose(beta, beta2)