更新时间:2023-12-03 13:19:34
我在这里简单介绍一些命题,并考虑以下几点:
使用JCA
JCA连接器属于Java EE堆栈,允许从/ /到EJB世界。 JDBC和JMS通常以JCA连接器的形式实现。入站JCA连接器可以使用线程(通过工作者抽象)和事务。然后它可以将任何处理转发到消息驱动的bean(MDB)。
使用纯线程
通过使用从Web Servlet上下文侦听器(或evt。)启动的线程,我们可以实现与JCA方式大致相同的方式。一个EJB定时器)。
使用JMS 为避免需要多个并发轮询线程和作业获取/锁定问题,可以使用JMS异步实现实际处理。 JMS也可以将处理分成更小的任务。
使用ESB
去年处理集成:JBI,ServiceMix,OpenESB,Mule,Spring集成,Java CAPS,BPEL。有些是技术,有些是平台,它们之间有一些重叠。他们都有连接器的旅行车路线,转换和编排消息流。恕我直言,这条消息被认为是一小块信息,并且可能很难使用这些技术来处理您的大数据文件。 企业应用程序集成模式是一个很好的网站,可以获取更多信息。
I have a system that is supposed to take large files containing documents and process these to split up the individual documents and create document objects to be persisted with JPA (or at least it is assumed in this question).
The files are in the range of 1 document to 100 000 in each file. The files come in various types
Now the biggest concern is that the specification forbids accessing local files. At least in the way that i'm used to.
I could save the files to a database table, but is that really a good way to do it? The files can be up to 2GB and accessing the files from the database would require that you download the whole file, either into memory or onto disk.
My first thought was to separate this process from the application server and use a more traditional approach, but i've been thinking about how to keep it on the application server for future purposes such as clustering etc.
My questions are basically
I sketch here a few more propositions and consider the following concerns:
With JCA
JCA connectors belong to the Java EE stack and permit inboud/outboud connectivity from/to the EJB world. JDBC and JMS are usually implemented as JCA connector. An inbound JCA connector can use thread (through the worker abstraction) and transactions. It can then forward any processing to a message-driven bean (MDB).
With plain threads
We can achieve more or less the same as the JCA way, using threads that are launched from a web servlet context listener (or evt. an EJB Timer).
With JMS
To avoid the need of having several concurrent polling threads and the problem of job acquision/locking, the actual processing can be realized asynchronously using JMS. JMS can also be interesting to split the processing in smaller tasks.
With ESB
Many projects have emerged in the past year to deal with integration: JBI, ServiceMix, OpenESB, Mule, Spring integration, Java CAPS, BPEL. Some are technologies, some are platform, and there is some overlap between them. They all have a wagon of connectors to route, transform and orchestrate message flow. IMHO, the message are suppose to be small piece of information, and it may be hard to use these technologies to process your large data file. The website patterns of enterprise application integration is an excellent website for more information.
IMO, the approach that fits best the Java EE philosophy is JCA. But the effort to invest is relatively high. In your case, the usage of plain thread that delegate further processing to SLSB is maybe the easiest solution. The JMS approach (close to the proposition of P. Thivent) can be interesting if the processing pipelie gets more complicated. Using an ESB seems overkill to me.