减少JDBC读取并行性的合并

更新时间：2021-12-11 02:18:23

由于Spark的懒惰评估，合并导致读取操作的并行性降低.

Thanks to Spark's lazy evaluation, the coalesce is resulting in reduced parallelism of read operation.

与懒惰无关. coalesce有意不创建

It has nothing to do with laziness. coalesce intentionally doesn't create analysis barrier:

但是，如果您要进行剧烈的合并，例如到numPartitions = 1，这可能会导致您的计算在少于您希望的节点上进行(例如，在numPartitions = 1的情况下为一个节点).为避免这种情况，您可以调用重新分区.这将增加一个随机播放步骤，但是意味着当前的上游分区将并行执行(无论当前分区是什么).

However, if you're doing a drastic coalesce, e.g. to numPartitions = 1, this may result in your computation taking place on fewer nodes than you like (e.g. one node in the case of numPartitions = 1). To avoid this, you can call repartition. This will add a shuffle step, but means the current upstream partitions will be executed in parallel (per whatever the current partitioning is).

因此，只需遵循文档并使用repartition而不是coalesce.

So just follow the documentation and use repartition instead of coalesce.

上一篇 : ：ARM Cortex-A7 中的系统控制寄存器下一篇 : 动态工作流程

减少JDBC读取并行性的合并

相关阅读

技术问答最新文章