且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

通过JDBC对CSV文件执行SQL

更新时间:2023-01-23 09:36:41

如果您的SQL是预定义的,您***的选择是将CSV加载到数据库中并对其执行查询。



Apache Derby是一个可行的选项,因此 MySQL ,甚至还有 CSV存储引擎 PostgreSQL



您的SQL是否使用任何专有功能/扩展?如果是这样,这可能会限制您的选择。


I need to apply an SQL query to CSV files (comma-separated text files). My SQL is predefined from another tool, and is not eligible to change. It may contain embedded selects and table aliases in the FROM part.

For my task I have found two open-source (this is a project requirement) libraries that provide JDBC drivers:

  1. CsvJdbc
  2. XlSQL
  3. JBoss Teiid
  4. Create an Apache Derby DB, load all CSVs as tables and execute the query.

These are the problems I encountered:

  1. it does not accept the syntax of the SQL (it uses internal selects and table aliases). Furthermore, it has not been maintained since 2004.
  2. I could not get it to work, as it has as dependency a SAX Parser that causes exception when parsing other documents. Similarly, no change since 2004.
  3. Have not checked if it supports the syntax, but seems like an overhead. It needs several entities defines (Virtual Databases, Bindings). From the mailing list they told me that last release supports runtime creation of required objects. Has anyone used it for such simple task (normally it can connect to several types of data, like CSV, XML or other DBS and create a virtual, unified one)?
  4. Can this even be done easily?

From the 4 things I considered/tried, only 3 and 4 seem to me viable. Any advice on these, or any other way in which I can query my CSV files?

Cheers

If your SQL is predefined and cannot be changed your best option is to load your CSV into a database and run queries against it.

Apache Derby is a viable option, so are MySQL, which even has a CSV storage engine or PostgreSQL.

Does your SQL use any proprietary functions / extensions? If so, that may limit your choices.