更新时间:2023-09-18 22:19:10
程序的问题是,您尝试使用获取的每个URL创建一个新的QApplication.
The problem with your program is that you are attempting to create a new QApplication with every url you fetch.
相反,您应该创建一个QApplication,并处理WebPage
类本身内的所有网页加载和处理.关键概念是使用loadFinished
信号通过在加载和处理了当前URL后获取一个新URL来创建循环.
Instead, you should create one QApplication, and handle all the loading and processing of web pages within the WebPage
class itself. The key concept is to use the loadFinished
signal to create a loop by fetching a new url after the current one has been loaded and processed.
下面的两个演示脚本(用于PyQt4和PyQt5)是简化的示例,显示了如何构造程序.希望,如何使它们适应自己的使用应该很明显:
The two demo scripts below (for PyQt4 and PyQt5) are simplified examples that show how to structure the program. Hopefully, it should be fairly obvious how to adapt them for your own use:
import sys
from PyQt4 import QtCore, QtGui, QtWebKit
class WebPage(QtWebKit.QWebPage):
def __init__(self):
super(WebPage, self).__init__()
self.mainFrame().loadFinished.connect(self.handleLoadFinished)
def start(self, urls):
self._urls = iter(urls)
self.fetchNext()
def fetchNext(self):
try:
url = next(self._urls)
except StopIteration:
return False
else:
self.mainFrame().load(QtCore.QUrl(url))
return True
def processCurrentPage(self):
url = self.mainFrame().url().toString()
html = self.mainFrame().toHtml()
# do stuff with html...
print('loaded: [%d bytes] %s' % (self.bytesReceived(), url))
def handleLoadFinished(self):
self.processCurrentPage()
if not self.fetchNext():
QtGui.qApp.quit()
if __name__ == '__main__':
# generate some test urls
urls = []
url = 'http://pyqt.sourceforge.net/Docs/PyQt4/%s.html'
for name in dir(QtWebKit):
if name.startswith('Q'):
urls.append(url % name.lower())
app = QtGui.QApplication(sys.argv)
webpage = WebPage()
webpage.start(urls)
sys.exit(app.exec_())
以下是上述脚本的PyQt5/QWebEngine版本:
Here is a PyQt5/QWebEngine version of the above script:
import sys
from PyQt5 import QtCore, QtWidgets, QtWebEngineWidgets
class WebPage(QtWebEngineWidgets.QWebEnginePage):
def __init__(self):
super(WebPage, self).__init__()
self.loadFinished.connect(self.handleLoadFinished)
def start(self, urls):
self._urls = iter(urls)
self.fetchNext()
def fetchNext(self):
try:
url = next(self._urls)
except StopIteration:
return False
else:
self.load(QtCore.QUrl(url))
return True
def processCurrentPage(self, html):
url = self.url().toString()
# do stuff with html...
print('loaded: [%d chars] %s' % (len(html), url))
if not self.fetchNext():
QtWidgets.qApp.quit()
def handleLoadFinished(self):
self.toHtml(self.processCurrentPage)
if __name__ == '__main__':
# generate some test urls
urls = []
url = 'http://pyqt.sourceforge.net/Docs/PyQt5/%s.html'
for name in dir(QtWebEngineWidgets):
if name.startswith('Q'):
urls.append(url % name.lower())
app = QtWidgets.QApplication(sys.argv)
webpage = WebPage()
webpage.start(urls)
sys.exit(app.exec_())