且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

登录到数据库而不是日志文件

更新时间:2023-02-02 23:29:38

我的公司一直在将一些结构化的流量信息直接记录到 MySQL 日志数据库中.该数据库被下游复制到另一个数据库.所有分析都在最终的数据库复制之后运行.我们的网站维持着相当多的流量.到目前为止,它似乎没有任何大问题.但是,我们的 IT 部门对当前设置的可扩展性越来越担心,并建议我们将日志信息卸载到正确的"日志文件中.然后日志文件将重新插入到相同的下游数据库表中.这让我想到了这个问题.:)

My company have been logging some structured traffic info straight into a MySQL log database. This database is replicated downstream to another database. All analytics run off the final database replication. Our site sustain quite a bit of traffic. So far, it doesn't seem to have any major problems. However, our IT department have some growing concerns regarding to the scalability of the current setup and is suggesting that we offload the log info onto "proper" log-files. The log-files will then be reinserted back into the same downstream database tables. Which brings me to this question. :)

以下是我看到的关于日志文件与日志数据库(关系)主题的一些优缺点:

Here are some of pros and cons that I see regarding to the subject of log-files vs log-db (relational):

  • 日志文件快速、可靠且可扩展(至少我听说雅虎大量使用日志文件进行点击跟踪分析).
  • 日志文件易于系统管理员维护.
  • 日志文件可以非常灵活,因为您几乎可以向其中写入任何内容.
  • 日志文件需要大量解析,并且可能需要简化地图类型的数据提取设置.
  • log-db 结构更接近您的应用程序,从而使某些功能的周转时间更短.这可能是一种祝福,也可能是一种诅咒.从长远来看,这可能是一种诅咒,因为您很可能最终会得到一个高度耦合的应用程序和分析代码库.
  • log-db 可以减少日志记录噪音和冗余,因为日志文件仅在日志文件的位置插入,因为 log-db 使您能够进行更新和相关插入(如果你敢的话,可以进行标准化).
  • 如果您使用数据库分区和/或多日志数据库(通过下游复制重新加入数据),log-db 也可以快速且可扩展

我认为在我的情况下需要对日志数据库进行一些压力测试.这样至少我知道我有多少净空.

I think some stress tests on the log database are needed in my situation. This way at least I know how much headroom I have.

最近,我一直在研究一些基于键值/文档的数据库,例如 Redis、Tokyo Cabinet 和 MongoDB.这些快速插入的数据库可能是***选择,因为它们提供了不同程度的持久性、高(写入)吞吐量和查询功能.它们可以使数据提取过程比通过日志文件进行解析和减少映射简单得多.

Recently, I've been looking into some key-value / document-based databases like Redis, Tokyo Cabinet, and MongoDB. These fast inserting databases can potentially be the sweet spot since they provide persistence, high (write) throughputs, and querying capabilities to varying degrees. They can make the data-extraction process much simpler than parsing and map-reducing through gigs of log files.

从长远来看,我认为拥有强大的分析数据仓库至关重要.从分析数据中释放应用程序数据,反之亦然,这可能是一个巨大的胜利.

In the long run, I believe it is crucial to have a robust analytics data warehouse. Freeing application data from analytic data and vice versa can be a big WIN.

最后,我想指出 *** 上有许多类似/密切相关的问题,以防您想扩大讨论范围.

Lastly, I would just like to point out there are many similar / closely related questions here on *** in case you want to broaden your discussion.

rsyslog 看起来很有趣.它使您能够直接写入 MySQL.如果您使用 Ruby,您应该查看日志记录 gem.它提供多目标日志记录功能.这太好了.

rsyslog looks very interesting. It gives you the ability to write directly to MySQL. If you are using Ruby, you should have a look at the logging gem. It provides multi-target logging capabilities. It's really nice.