且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

试图理解ebay的模式

更新时间:2022-10-15 15:28:11

这里是关于Ebay建筑的Randy Shoup



他主要谈论可扩展性,可用性,可管理性等。Schema是你根据自己的具体要求设计的东西。
幻灯片



从他的聊天记录



这是一个关系型数据库,还是真的不同?



。这是一个像Google或Yahoo!这样的搜索引擎,由开发AltaVista搜索引擎的同一个人开发,与许多搜索引擎一样,它是基于类似的原则开发的,这是一个倒排索引,有一组带有ID ,关键字被索引到这些文档中,并且查询操作通过将这些关键字的列表或向量相交而非常简单地发生,并且有很多关于它的工作原理的细节。挑战 - 作为一个旁白,eBay的挑战风格的搜索引擎是我们的用户期望搜索引擎基本上接近实时地更新。当有人对改变价格的项目出价时,价格是人们对查询非常感兴趣的过滤器。所以它实际上意味着风格 - 经典的网络搜索引擎风格的你在一种批处理模式下构建索引,然后将其上传到搜索引擎的东西是不是真的为我们工作。它需要更多的实时。所以我会谈一谈实时系统如何在异步部分工作,但无论如何,为了完成搜索的可伸缩性的思想,这个想法是搜索引擎可以水平分割。所以有这个整体搜索索引的任何大小。我们将它分成十或二十或六十或一百的块,并划分基础设施。然后我们有一个聚合器部分,现在确实散布/收集索引的所有不同部分。因此,有人查询iPod或米老鼠或Wii,聚合器将查询发送到不同分割或分片中的每一个,并将结果返回并聚合并将它们发送回用户。


I want to build a site similar to ebay (a mini version of it, using a LAMP stack as the basic setup I guess or maybe you guys will suggest something else) and I'm wondering how they built their system. The part I don't understand the most is how they manage their categories. They have one search code and probably one code for posting items for sale as well as one code for displaying the items. But how do they create/store the template for each category? Also what is the database structure behind their setup? And finally they have so many categories and sub-categories, let's say somebody posts an item inside (which is most likely the process ebay used to add categories) Motors -> Parts & Accessories > Racing Parts

A few days later, people request more sub categories under "Racing Parts":

  • Accessories
  • Auto Racing Parts
  • Fasteners, Fluids & Gaskets
  • Kart Racing Parts
  • Safety Equipment
  • Other

So now they have a new level for Racing parts that go like this:

  • Motors -> Parts & Accessories > Racing Parts > Accessories
  • Motors -> Parts & Accessories > Racing Parts > Fasteners etc..

What happens with the existing listings that have been posted prior to adding the new subcategories? do they get moved to a sub-category? Does ebay force new items to be listed in the sub-categories and removes their old posting form for "Racing Parts"? If they do, what if the user is missing a category, the user may get confused and not post and then ebay will lose money. And if they don't remove the general Racing Parts posting form, then users will post in a category that is too generic and now it will become difficult to use the "Refine search" option because all the forms have different fields that ebay could filter by.

If you guys have any ideas, please let me know. I'm really confused on how they do it and would really like to understand :)

Here is Randy Shoup on Ebay's Architecture

He mainly talks about scalability, availability, manageability etc. Schema is something that you ahve to devise on your own based on your specific requirements. Slides

From his chat transcript

"It is even a relational database or is it really different?

It is a very different. It's a search engine like Google or Yahoo! developed by the same people that developed the AltaVista search engine, and as with many search engines, it's developed on similar principles, which is that it's an inverted index. There's a set of documents with IDs, keywords are indexed into those documents, and query operations happen by intersecting lists or vectors of those keywords, very simply, and there's a lot more detail about how that works. The challenge for -- just as an aside, the challenge for an eBay-style search engine is that our users expect the search engine to be updated in essentially near real-time. When somebody bids on an item that changes the price, and price is a filter that people are very interested in querying on. So it actually means that the style -- the sort of classic web search engine style of "you build the index in a kind of batch mode and then upload it to the search engine" is something that doesn't really work for us. It needs to be a lot more real-time. So I will talk a little bit about how that real-time system works in my asynchrony section, but anyhow, to finish the thought on scalability for search, the idea is that the search engine can be horizontally split. So there is this overall search index of whatever size it is. We divide it up into chunks of ten or twenty or sixty or hundred, and divide the infrastructure that way. And then we have an aggregator piece, which now does do scatter/gather over all those different parts of the index. So somebody queries for "iPod" or "Mickey Mouse" or "Wii" and the aggregator sends the query to each one of the different splits or shards and gets the results back and aggregates them and sends them back to user."