且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何在映射器(Hadoop)中使用 MATLAB 代码?

更新时间:2022-10-23 18:34:35

正如您可能已经怀疑的那样,由于 MATLAB 的运行时要求,这本来就很难做到.尝试在 Condor 上运行 MATLAB 代码时,我有类似的经历(不得不分发运行时库).

就您列出的选项而言,选项#1 效果***.此外,您可能无法避免使用 Linux.

但是,如果您不想失去高级软件(例如 MATLAB、Octave、Scilab 等)提供的便利,您可以尝试将 Hadoop 流与 Octave 可执行脚本结合使用.

Hadoop 流不关心可执行文件的性质(无论是可执行脚本还是可执行文件,据此(http://hadoop.apache.org/common/docs/r0.15.2/streaming.html)).

它所需要的只是给它一个可执行文件",此外它还可以 a) 从标准输入读取,b) 将输出发送到标准输出.

GNU Octave 程序可以转换为可执行脚本(在 Linux 中),能够从标准输入读取并将输出发送到标准输出 (http://www.gnu.org/software/octave/doc/interpreter/Executable-Octave-Programs.html).

作为一个简单的例子考虑这个:

创建一个包含以下内容的文件(例如al.oct"):

#!/bin/octave -qf(请注意,在我的安装中我必须使用#!/etc/alternatives/octave -qf")Q = fread(stdin);#Standard Octave/MATLAB 代码从这里开始显示(Q);

现在从命令提示符发出以下命令:

chmod +x al.oct

al.oct 现在是可执行文件...您可以使用./al.oct"执行它.要查看 stdin,stdout 适合的位置(以便您可以将其与 Hadoop 一起使用),您可以尝试以下操作:

>>cat al.oct|./al.oct|sort

或者换句话说...cat"文件 al.oct,将其输出通过管道传输到可执行脚本 al.oct,然后将 al.oct 的输出通过管道传输到排序实用程序(这只是一个示例,我们可以有cat"任何文件,但是因为我们知道 al.oct 是一个简单的文本文件,所以我们只使用它).

当然,Octave 可能不支持您的 MATLAB 代码尝试调用的所有内容,但这可能是使用 Hadoop Streaming 的另一种方法,而不会失去高级代码的便利性/功能.

I have a matlab code that processes images. I want to create a Hadoop mapper that uses that code. I came across the following solutions but not sure which one is best (as it is very difficult to install matlab compiler runtime on each slave node in hadoop for me):

  1. Manually convert that matlab code into OpenCV in C++ and call its exe/dll (and supply it appropriate parameters) from the mapper. Not sure, since the cluster has Linux installed on every node instead of Windows.

  2. Use Hadoop Streaming. But Hadoop streaming requires an executable as the mapper and the executable of matlab also requires Matlab Compiler Runtime which is very difficult to install on every slave node.

  3. Convert it automatically into C/C++ code and create its exe automatically (not sure whether this is right because either the exe will require the matlab runtime to run or there can be compiler issues in the conversion which are very difficult to fix )

  4. Use Matlab Java Builder. But the jar file thus created will need the runtime too.

Any suggestions?

Thanks in advance.

As you are probably already suspecting, this is going to be inherently difficult to do because of the runtime requirement for MATLAB. I had a similar experience (having to distribute the runtime libraries) when attempting to run MATLAB code over Condor.

As far as the options you are listing are concerned, option #1 will work best. Also, you will probably not be available to avoid working with Linux.

However, if you don't want to lose the convenience provided by higher level software (such as MATLAB, Octave, Scilab and others) you could try Hadoop streaming in combination with Octave executable scripts.

Hadoop streaming does not care about the nature of the executable (whether it is an executable script or an executable file, according to this (http://hadoop.apache.org/common/docs/r0.15.2/streaming.html)).

All it requires, is that it is given an "executable" that in addition can a) read from stdin, b) send output to stdout.

GNU Octave programs can be turned into executable scripts (in Linux) with the ability to read from stdin and send the output to stdout (http://www.gnu.org/software/octave/doc/interpreter/Executable-Octave-Programs.html).

As a simple example consider this:

Create a file (for example "al.oct") with the following contents:

#!/bin/octave -qf  (Please note, in my installation i had to use "#!/etc/alternatives/octave -qf")
Q = fread(stdin); #Standard Octave / MATLAB code from here on
disp(Q);

Now from the command prompt issue the following command:

chmod +x al.oct

al.oct is now an executable...You can execute it with "./al.oct". To see where the stdin,stdout fits in (so that you can use it with Hadoop) you can try this:

>>cat al.oct|./al.oct|sort

Or in other words..."cat" the file al.oct, pipe its output to the executable script al.oct and then pipe the output of al.oct to the sort utility (this is just an example,we could have "cat" any file, but since we know that al.oct is a simple text file we just use this).

It could be of course that Octave does not support everything your MATLAB code is trying to call, but this could be an alternative way to using Hadoop Streaming without losing the convenience / power of higher level code.