且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Azure计算服务工作者成为"忙碌QUOT;以下比例放大

更新时间:2022-10-15 11:37:14

在规模经营的Azure将通过更改事件发送RoleEnvironmentTopologyChange所有现有实例。这让那些情况发现,以允许所述实例之间通信的新角色实例。需要注意的是,如果你有一个内部端点定义(如果你开启RDP,那么你得到隐含一个内部端点)这只是发生。

在默认情况下,这些拓扑结构的变化不会影响到正在运行的实例。但是,如果您订阅了改变事件并设置 e.Cancel = TRUE 则角色实例将回收并再次运行启动任务。

有关拓扑变化的更多信息,请的http://azure.microsoft.com/blog/2011/01/04/responding-to-role-topology-changes/.

因此​​,有两个问题在这里:


  1. 为什么你的角色无法从回收恢复?这是一个显著的问题,一个你必须为了有一个可靠的服务解决。您可以在http://blogs.msdn.com/b/kwill/archive/2013/08/09/windows-azure-paas-compute-diagnostics-data.aspx,特别是方案3的http://blogs.msdn.com/b/kwill/archive/2013/09/06/troubleshooting-scenario-3-role-stuck-in-busy.aspx.

  2. 为什么要回收利用角色实例响应拓扑变化?检查您的更改事件处理程序,并确保你不设置 e.Cancel = TRUE

I'm running one service in Azure with 4 worker instances. When I scale up to 5 worker instances the first instance that had started goes into the "busy" state. Why is that? What happens during scale up? Does azure re-run all the startup tasks? I'm very confused and can't seem to find any documentation on this.

After scaling up to 5 instances the first instance changes its status to:

Busy (Waiting for role to start... Application startup tasks are running. [2014-08-12T18:36:52Z])

And the java process that was running there stops. Why would this happen?!

Any help would be appreciated.

Startup.cmd

REM   Log the startup date and time.
ECHO Startup.cmd: >> "%TEMP%\StartupLog.txt" 2>&1
ECHO Current date and time: >> "%TEMP%\StartupLog.txt" 2>&1
DATE /T >> "%TEMP%\StartupLog.txt" 2>&1
TIME /T >> "%TEMP%\StartupLog.txt" 2>&1

REM enable ICMP
netsh advfirewall firewall add rule name="ICMPv6 echo" dir=in action=allow enable=yes protocol=icmpv6:128,any

ECHO Starting WebService >> "%TEMP%\StartupLog.txt" 2>&1
tasklist /FI "IMAGENAME eq java.exe" 2>NUL | find /I /N "java.exe" >NUL 2>&1
if "%ERRORLEVEL%"=="0" GOTO running

SET %ERRORLEVEL% = 0
START /B java -jar WEB-SERVICE-1_0--SNAPSHOT.jar app.properties >> "%TEMP%\StartupLog.txt" 2>&1

:running
SET %ERRORLEVEL% = 0

During a scale operation Azure will send a RoleEnvironmentTopologyChange via the Changing event to all existing instances. This lets those instances discover the new role instance in order to allow communication between the instances. Note that this only happens if you have an internal endpoint defined (if you turn on RDP then you implicitly get an internal endpoint).

By default these topology changes won't affect running instances. However, if you subscribe to the Changing event and you set e.Cancel=True then the role instance will recycle and run your startup tasks again.

For more information on the topology change see http://azure.microsoft.com/blog/2011/01/04/responding-to-role-topology-changes/.

So there are two issues here:

  1. Why is your role not able to recover from a recycle? This is a significant issue and one you must fix in order to have a reliable service. You can start with the troubleshooting workflows at http://blogs.msdn.com/b/kwill/archive/2013/08/09/windows-azure-paas-compute-diagnostics-data.aspx, and in particular Scenario 3 at http://blogs.msdn.com/b/kwill/archive/2013/09/06/troubleshooting-scenario-3-role-stuck-in-busy.aspx.
  2. Why are you recycling your role instances in response to a topology change? Check your Changing event handler and make sure you aren't setting e.Cancel=true.