且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何使用pyodbc加快从CSV到MS SQL Server的批量插入

更新时间:2022-11-28 16:45:54

更新:如@SimonLang的注释中所述,SQL Server 2017及更高版本中的 BULK INSERT 显然支持文本CSV文件中的限定词(参考:此处)。

Update: As noted in the comment from @SimonLang, BULK INSERT under SQL Server 2017 and later apparently does support text qualifiers in CSV files (ref: here).

批量插入几乎可以肯定比阅读源代码要快很多。文件逐行,并对每行进行常规INSERT。但是,BULK INSERT和BCP都对CSV文件有很大的限制,因为它们不能处理文本限定符(请参阅:此处) 。也就是说,如果您的CSV文件中 not 中没有合格的文本字符串...

BULK INSERT will almost certainly be much faster than reading the source file row-by-row and doing a regular INSERT for each row. However, both BULK INSERT and BCP have a significant limitation regarding CSV files in that they cannot handle text qualifiers (ref: here). That is, if your CSV file does not have qualified text strings in it ...

1,Gord Thompson,2015-04-15
2,Bob Loblaw,2015-04-07

...然后可以批量插入它,但是如果它包含文本限定符(因为某些文本值包含逗号)...

... then you can BULK INSERT it, but if it contains text qualifiers (because some text values contains commas) ...

1,"Thompson, Gord",2015-04-15
2,"Loblaw, Bob",2015-04-07

...然后BULK INSERT无法处理它。尽管如此,将这样的CSV文件预处理为管道分隔文件的总体速度可能会更快...

... then BULK INSERT cannot handle it. Still, it might be faster overall to pre-process such a CSV file into a pipe-delimited file ...

1|Thompson, Gord|2015-04-15
2|Loblaw, Bob|2015-04-07

...或制表符分隔的文件(其中表示制表符)...

... or a tab-delimited file (where represents the tab character) ...

1→Thompson, Gord→2015-04-15
2→Loblaw, Bob→2015-04-07

...,然后批量插入该文件。对于后一个(制表符分隔的)文件,BULK INSERT代码如下所示:

... and then BULK INSERT that file. For the latter (tab-delimited) file the BULK INSERT code would look something like this:

import pypyodbc
conn_str = "DSN=myDb_SQLEXPRESS;"
cnxn = pypyodbc.connect(conn_str)
crsr = cnxn.cursor()
sql = """
BULK INSERT myDb.dbo.SpikeData123
FROM 'C:\\__tmp\\biTest.txt' WITH (
    FIELDTERMINATOR='\\t',
    ROWTERMINATOR='\\n'
    );
"""
crsr.execute(sql)
cnxn.commit()
crsr.close()
cnxn.close()

注意:如评论中所述,仅执行 BULK INSERT 语句如果SQL Server实例可以直接读取源文件,则适用。对于源文件在远程客户端上的情况,请参见此答案

Note: As mentioned in a comment, executing a BULK INSERT statement is only applicable if the SQL Server instance can directly read the source file. For cases where the source file is on a remote client, see this answer.