且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在Heroku上托管时,Selenium应用程序重定向到Cloudflare页面

更新时间:2023-09-12 08:11:22

如果程序阻止您的程序访问 AUT(测试中的应用程序).


分析

Cloudflare 阻止访问的原因可能有以下几种:

由于以下因素,访问可能被拒绝:

在这些情况下,最终您将被重定向到 undetected-chromedriver 来初始化 Chrome浏览上下文.

undetected-chromedriver 是经过优化的Selenium Chromedriver补丁程序,不会触发反机器人服务例如Distill Network/Imperva/DataDome/Botprotect.io.它会自动下载驱动程序二进制文件并对其进行修补.

  • 代码块:

     将undetected_chromedriver导入为uc从硒导入webdriver选项= webdriver.ChromeOptions()options.add_argument(开始最大化")驱动程序= uc.Chrome(options = options)driver.get('https://bet365.com') 


替代解决方案

另一种解决方案是通过 Project Honey Pot 网站将您的IP地址列入白名单.您可以在标题为标题为标题的视频中找到详细介绍的端到端过程CloudFlare错误.

I have made a discord bot that uses selenium to access a website and get information, when I run my code locally I don't have any problem but when I deploy to Heroku the first URL I get redirects me to the page Attention Required! | Cloudflare.

I have tried:

And many other with the same settings which I use:

options = Options()
options.binary_location = os.environ.get("GOOGLE_CHROME_BIN")
options.add_experimental_option("excludeSwitches", ["enable-logging", "enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_argument("--headless")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--no-sandbox")
self.driver = webdriver.Chrome(executable_path=os.environ.get("CHROMEDRIVER_PATH"), options=options)
self.driver.execute_cdp_cmd('Network.setUserAgentOverride', {
    "userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})

but this does not work and the code runs only locally

PS: locally I'm on Windows

Source of the page I'm redirected to: https://gist.github.com/rafalou38/9ae95bd66e86d2171fc8a45cebd9720c

In case the Selenium driven ChromeDriver initiated Browsing Context is getting redirected to the page...

... this implies that a Cloudflare program is blocking your program from accessing the AUT (Application under Test).


Analysis

There can be several reasons behind Cloudflare blocking the access as follows:

The access can be denied due to the following factors:

  • Cloudflare is trying to counter a possible Dictionary attack.
  • Your system IP is black listed by Cloudflare for mining Bit coins or Monero coins using your system.

In these cases eventually you are redirected to a captcha page.


Solution

In these cases the a potential solution would be to use the undetected-chromedriver to initialize the Chrome Browsing Context.

undetected-chromedriver is an optimized Selenium Chromedriver patch which does not trigger anti-bot services like Distill Network / Imperva / DataDome / Botprotect.io. It automatically downloads the driver binary and patches it.

  • Code Block:

    import undetected_chromedriver as uc
    from selenium import webdriver
    
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    driver = uc.Chrome(options=options)
    driver.get('https://bet365.com')
    


Alternate Solution

An alternate solution would be to whitelist your IP address through the Project Honey Pot website and you can find the end-to-end process detailed out in the video tittled Attention Required one more step captcha CloudFlare Error.