且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Apache Storm与Hadoop相比

更新时间:2023-02-27 13:20:35

你为什么不告诉你的意见。



Twitter风暴已被吹捧为实时Hadoop。这更像是一种易于消费的营销手段。

它们表面上相似,因为它们都是分布式应用程序解决方案。除了典型的分布式架构元素,如主/从,基于动物园管理员的协调,对我来说,比较将不复存在。

b
$ b

Twitter更像是一个处理数据的管道谈到。管道是连接各种计算节点的接口,可以接收数据,计算和交付输出。 (有些术语是喷嘴和螺栓)将这种类比延伸到复杂的管道布线,可以在需要时重新设计,并且您可以看到Twitter Storm。

在坚果壳中,它处理数据。没有延迟。

Hadoop在这方面与以前不同,主要是由于HDFS。它是面向分布式存储的解决方案,可承受多种规模的中断(磁盘,机器,机架等)。

M / R旨在利用HDFS上的数据本地化来分发计算工作。它们一起不提供实时数据处理的功能。但是,当您查看大量数据时,并不总是这样。 (在haystack比喻中的针)

简而言之,Twitter Storm是一种分布式实时数据处理解决方案。我认为我们不应该比较它们。 Twitter构建它是因为它需要一个工具来处理小推文,但实时处理它们的数量非常多。



请参阅: HStreaming ,如果您不得不将它与某些内容进行比较


How does Storm compare to Hadoop? Hadoop seems to be the defacto standard for open-source large scale batch processing, does Storm has any advantages over hadoop? or Are they completely different?

Why don't you tell your opinion.

Twitter Storm has been touted as real time Hadoop. That is more a marketing take for easy consumption.

They are superficially similar since both are distributed application solutions. Apart from the typical distributed architectural elements like master/slave, zookeeper based coordination, to me comparison falls off the cliff.

Twitter is more like a pipline for processing data as it comes. The pipe is what connects various computing nodes that receive data, compute and deliver output. (There lingo is spouts and bolts) Extend this analogy to a complex pipeline wiring that can be re-engineered when required and you get Twitter Storm.

In nut shell it processes data as it comes. There is no latency.

Hadoop how ever is different in this respect primarily due to HDFS. It a solution geared to distributed storage and tolerance to outage of many scales (disks, machines, racks etc)

M/R is built to leverage data localization on HDFS to distribute computational jobs. Together, they do not provide facility for real time data processing. But that is not always a requirement when you are looking through large data. (needle in the haystack analogy)

In short, Twitter Storm is a distributed real time data processing solution. I don't think we should compare them. Twitter built it because it needed a facility to process small tweets but humungous number of them and in real time.

See: HStreaming if you are compelled to compare it with some thing