且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

无法从EMR中运行的Spark应用程序删除AWS SQS消息

更新时间:2021-10-23 18:48:16

经过大量的调试和测试,我终于设法弄清了问题所在.

After A LOT of debugging and testing I finally managed to figure out what the problem is.

不出所料,这不是权限问题.问题是,由EMR启动并在其上运行Spark应用程序的EC2实例包含用于Java的所有AWS软件包(包括SQS软件包)的某个版本.包含软件包的路径已添加到Hadoop,Yarn和Spark.因此,当我的应用程序启动时,它使用了机器上已经存在的软件包,而我收到了错误消息. (该错误记录在Yarn日志中.我花了一些时间才能弄清楚.)

As expected it was not a permission problem. The problem was, that the EC2 instances that are started by the EMR, and on which the Spark application is run, contain a certain version of all AWS packages for java (including the SQS package). And the path containing the packages were added to Hadoop, Yarn and Spark. So when my application were started, it used the packages that were already on the machine and I received an error. (The error were logged in the Yarn log. It took me some time to figure that out.)

我正在使用maven shade插件为我的应用程序构建uber jar,因此我认为我可以尝试对AWS软件包进行着色(重定位).这将使我能够将依赖关系封装在应用程序内部.不幸的是,这个DID不起作用.亚马逊似乎在软件包内部使用了反射,它们已经对某些类的名称进行了硬编码,从而使阴影变得无用.(在我的阴影软件包中找不到硬编码的类)

I am using the maven shade plugin to build the uber jar for my application, so I thought that I can try and shade (relocate) the AWS packages. This would allow me to encapsulate the dependencies inside my application. Sadly this DID NOT work. It appears that Amazon are using reflection inside the packages and they have hardcoded the names of some classes, thus rendering the shading useless.(The hardcoded classes were not found in my shaded packages)

因此,在更加沮丧之后,我找到了以下解决方案:

So after some more frustration I found the following solution:

  1. 创建一个EMR步骤,将我的uber jar从S3下载到计算机.
  2. 使用以下spark-submit选项创建一个Spark应用程序步骤:

--driver-class-path /path_to_your_jar/myapp.jar --class com.myapp.startapp

这里的关键是--driver-class-path选项.您可以在此处了解更多信息.基本上,我将uber jar添加到Spark驱动程序的类路径中,以允许应用程序使用我的依赖项.

Here the key is the --driver-class-path option. You can read more about it here. Basically I am adding my uber jar to the Spark driver classpath, allowing for the application to use my dependencies.

到目前为止,这是我发现的唯一可接受的解决方案.如果您知道另一个或更好的,请写评论或答案.

So far this is the only acceptable solution that I have found. If you know of another or a better one, please write a comment or an answer.

我希望这个答案对一些不幸的人有用.本来可以节省我几天的时间.

I hope that this answer can be of use to some unfortunate soul. It would have saved me several excruciating days.