且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

有效地计算大型相似度矩阵

更新时间:2022-02-17 09:28:08

答案有些零碎,您告诉我们要提供好的答案的地方仍然有太多空白,但是您可以填写那些你自己.从您告诉我们的所有信息来看,我认为任务的主要部分不是有效地计算大型相似性矩阵,而是主要部分是从此类矩阵中有效地检索值并有效地更新矩阵.

Here are some bits and pieces of an answer, there are still too many gaps in what you've told us to permit a good answer, but you can fill those in yourself. From everything you've told us I don't think that the major part of your task is to efficiently calculate a large similarity matrix, I think that the major parts are to efficiently retrieve values from such a matrix and to efficiently update the matrix.

我们已经确定矩阵是稀疏且对称的;了解稀疏性将很有用.这样可以大大减少存储需求,但是我们不知道要多少.

As we've already determined the matrix is sparse and symmetric; it would be useful to know how sparse. This reduces the storage requirements considerably, but we don't know by how much.

您已经向我们介绍了有关用户个人资料的更新,但是您的相似度矩阵是否需要频繁更新?我的期望(另一个假设)是,当用户修改其个人资料时,相似性度量不会迅速或急剧变化.据此,我假设使用过时几分钟(甚至几小时)的相似性度量不会造成任何严重危害.

You've told us a bit about updates to user profiles but does your similarity matrix have to be updated as frequently ? My expectation (another assumption) is that similarity measures do not change quickly or sharply when a user modifies his/her profile. From this I hypothesise that working with a similarity measure which is a few minutes (even a few hours) out of date won't do any serious harm.

我认为所有这些都将我们带入了数据库领域,这应该支持快速访问您指示的卷的存储的相似性度量.我希望每隔一段时间就可以批量更新这些措施,并且仅针对那些个人资料已更改的用户进行这些措施的更新,以适应您的需求和计算机功能的可用性.

I think that all this takes us into the domain of databases, which should support fast access to stored similarity measures of the volumes you indicate. I'd be looking to do batch updates of the measures, and only of the measures for users whose profiles have changed, at an interval to suit your demands and availability of computer power.

关于相似性矩阵的第一个版本的初始创建,因此如果在后台花费一周的时间,您只需要执行一次.

As for the initial creation of the first version of the similarity matrix, so what if it takes a week in the background, you're only going to do it once.