更新时间:2023-02-12 22:16:01
您可以使用 requests
将文件作为字节流下载,并用 io.BytesIO()
包装它,就这样:
You can download the file as a byte stream with requests
wrapping it with io.BytesIO()
, just so:
import io
import requests
from pyPdf import PdfFileReader
url = 'http://www.arkansasrazorbacks.com/wp-content/uploads/2017/02/Miami-Ohio-Game-2.pdf'
r = requests.get(url)
f = io.BytesIO(r.content)
reader = PdfFileReader(f)
contents = reader.getPage(0).extractText().split('\n')
f
是一个类似于对象的文件,您可以像打开 PDF 文件一样使用它.这样文件只存在于内存中,永远不会保存在本地.
f
is a file like object you can use just like you opened a PDF file. this way the file is only in the memory and never saved locally.
为了从 PDF 文件中获取文本,您可以使用 PyPdf.
In order to get text from the PDF file you can use PyPdf.