Professional OPC
Development Tools

logos

Publishing has halted on the client session

More
28 May 2019 21:14 #7416 by sjscheider
We are starting to see the error in our Windows environment as we increased the number of devices we connect to.

I've trying to implement the setting in kb.opclabs.com/QuickOPC-UA:_How_to_enable_extended_tracing but we are using .NET core and json settings. Is there a way to set these parameters via the json config?

Please Log in or Create an account to join the conversation.

More
25 May 2019 03:40 #7415 by support
Hello.

In the remainder of my answer, I will assume that when we talk about “connecting/disconnecting” devices, it refers to the OPC UA communication between the QuickOPC application, and the OPC UA server in the device. I just want to be sure that it not a different kind of “disconnection”, for example, with permanent connection between QuickOPC and some OPC UA server, which *then* connects itself to the device, and there are disconnections of the communication link between the OPC UA server and the device. In such case let me know, because some of the answers may be different.

1.1: See
opclabs.doc-that.com/files/onlinedocs/QuickOpc/Latest/User%2...onParameters~RetrialDelay.html
and
opclabs.doc-that.com/files/onlinedocs/QuickOpc/Latest/User%2...meters~ReconnectionPeriod.html

1.2: So far I see no reason to modify them, but you have better knowledge about what your application needs, and whether the defaults are suitable.
1.3: There is no limit. As long as you are subscribed to at least one monitored item in the device, the reconnection attempts will continue.

2. No. Assuming that your code has subscribed to at least one monitored item in the device, you do not need to do anything. When there is a disconnect, you will receive an error through the DataChangeNotification event. When reconnects finally succeed, you will receive valid data through DataChangeNotification event again.

3. If the browse is needed for the application logic, then it is correct. If you are doing it “just in case”, it is unnecessary and should be removed.

4. The code is correct, if you are using one EasyUAClient instance per device. It is possible, but not necessary to split it this way. You can use one EasyUAClient for all devices, in which case this situation should be handled by calling UnsubscribeMultipleMonitoredItems instead, for the monitored items that reside in that particular device (and, no Dispose, in that case).

Best regards

Please Log in or Create an account to join the conversation.

More
23 May 2019 16:25 #7414 by sjscheider
OK, thanks for the information. I'm going to hold off on this level of debugging until I've confirmed with you that I'm following your recommendation. I'd prefer to harden my code before investigating yours.

On issue #3 from the previous post, I wanted to provide an updated. When I run outside the debugger, it appears that the issue does NOT occur. I'm going to continue to test over a longer period of time but I think running outside the debugger has resolve the issue.

I've also removed the call to browse all actively subscribing devices every minute which has simplified the code.

Here are the scenarios that I'm hoping you can verify that I'm handling in the proper manor.
  1. The devices is off or disconnected when the client app starts up and eventually comes online. - Currently the code is attempting to browse the root node every minute. From your previous response I'm under the assumption this is not the recommended approach. But before I change the logic, I have a questions.
    1. Which parameters on EasyUAClient controller the retry logic?
    2. What is your recommendation of modifying any of these? My assumptions is to not change these, but I'd like to confirm.
    3. Is there a maximum time after which the code stops attempting to connect to the device? Some of the devices may be offline for days/weeks.

  2. The device looses connection with device because the device is turned restarted or there is a network issue.- Currently the code is attempting to browse the root node every minute and resubscribing to the 3 nodes once the device is back online. Is this the recommended approach?

  3. A new device is added - Currently the code check for new devices added to the config and then browses the devices root node and then subscribes to the 3 nodes that need to be monitored. I believe this is correct, I'm just including it to make sure I'm not doing something foolish.

  4. One of the devices is removed and should no longer be polled/subscribed to. - Currently the code will unsubscribe to all subscriptions and the dispose of the client object.
                client.UnsubscribeAllMonitoredItems();
                client.Dispose();

Please Log in or Create an account to join the conversation.

More
21 May 2019 19:55 #7407 by support
Thank you for the answers. It's not, however, quite clear what is happening. If you are willing to help with (hopefully) finding the true cause of the issue. please revert the changes made to work around the issue, and then reproduce the issue while collecting the information as described here: kb.opclabs.com/Collecting_information_for_troubleshooting .

Yes, retrials are done automatically. When an error occurs, a DataChangeNotification is generated for *all* subscribed monitored items. When/if the retrials finally succeed, a DataChangeNotification without an error (and with the actual data) is generated.The timing of the retrial can be influenced by parameters on EasyUAClient.

Given the above, your code should not do things like unsubscribing when an error occurs, etc. - that would just duplicate the logic and create problems. But, it can, of course, have a logic that in case of errors does something differently regarding the functioning of your application. What that would entail, however, I cannot tell - it totally depends on the purpose of the application.

Best regards

Please Log in or Create an account to join the conversation.

More
21 May 2019 17:36 #7405 by sjscheider
Here are the answers to your questions. I'll provide more details when I have them.
  1. Yes, I'm making a browse call to each device every minute. I've just added logic to only make this call to devices I have not received a message from in the previous minute. I've also set the following parameter to 5 minutes. I believe the default was 1 minute.
    EasyUAClient.AdaptableParameters.SessionParameters.SessionTimeout = 5 * 60 * 1000;
  2. I've seen the issue mainly when running under the debugger.
  3. I'm going to run a few test over the next couple days to see if the issue occurs when running normally, outside of the debugger.
  4. The server is custom code running on custom hardware. The SDK being used is:
    RTA (Real Time Automation – www.rtautomation.com ) OPC UA Server – Embedded Device Profile

One additional questions. It is very common for in our application what one or a few devices are offline or turned off. One startup, our client app attempts to connect to the many devices that it is configured to receive data from. When this communication fails because the device is offline/turned off, what is the recommended way of reconnecting to the device? I know the documentation indicates retry logic is built in. Should we handle devices we've lost comm with differently than devices we've never established comm with?

Please Log in or Create an account to join the conversation.

More
19 May 2019 08:38 #7396 by support
Hello.

Of course this should not be happening. It looks like that one side (most likely server) things that the other has stopped communicating, and then disposes of the session and considers new registers with the old session ID as invalid.

Questions:

1. Are you saying that making a browse every minute "resolves" the issue?
2. When you see this error, are you running under debugger, or without a debugger?
3. If possible, try the opposite method - run without the debugger if you were running with it before. Or run under the debugger (if available) if you were runnin g outside of debugger before. What are the results? The reason I want to test this out is because we have a separate, quite different set of timing parameters under the debugger - to see if it has an influence. The settings under the debugger are designed to allow setting breakpoints etc. without breaking the comms.
4. What is the OPC UA server you are connecting to?

Thank you

Please Log in or Create an account to join the conversation.

More
17 May 2019 17:35 #7392 by sjscheider
I just wanted to update the status on this so others may learn from what we've learned. I also have another connection related question to that I how you can answer.

First, we've opted, for now, to run the OPC Labs OPC-UA code outside of docker as a .NET Core 2.2 app in Windows. We also got the app to run without issue inside Windows Moby containers on Windows host. Ultimately we'd like to get out code running in Linus containers on a Linux host, but we've run into this issue that we don't believe is related to OPC Labs at the moment. If we get it working on Linux, we'll update this issue.

Second, we are now experiencing another issue when subscribing to devices that are turned on but not reporting any changes because the devices are not connected to any machinery at the time or the machinery is off. Since we don't have any way to know which devices are in this state, we connect all devices and subscribe to 3 complex data nodes. One of the nodes has a sampling interval of 500 ms and the other 2 the interval is 2000 ms.

After about 2 minutes of inactivity, the following exception is reported in our ServerConditionChanged handler. It appears we receive this event 4 time:
"opc.tcp://10.0.0.158:4840" Disconnecting; *** Failure -2146232832 (0x80131600): OPC-UA service result - An error specific to OPC-UA service occurred.
  - Connected: False, Succeeded: False, StatusInfo: Error - ErrorMessage: OPC-UA service result - An error specific to OPC-UA service occurred.
---- SERVICE RESULT ----
Status Code: {BadSessionIdInvalid} = 0x80250000 (2149908480)
Description: BadSessionIdInvalid
Additional Info: <ExceptionTrace>

We the receive this event 4 times:
"opc.tcp://10.0.0.158:4840" Disconnected(10000); *** Failure -2146232832 (0x80131600): OPC-UA service result - An error specific to OPC-UA service occurred.
  - Connected: False, Succeeded: False, StatusInfo: Error - ErrorMessage: OPC-UA service result - An error specific to OPC-UA service occurred.
---- SERVICE RESULT ----
Status Code: {BadSessionIdInvalid} = 0x80250000 (2149908480)
Description: BadSessionIdInvalid
Additional Info: <ExceptionTrace>

Is there a way to prevent these events from occurring? One thing we experiemented with is browsing one of the nodes on all devices every 1 minute. What other options do we have?

Please Log in or Create an account to join the conversation.

More
14 May 2019 14:05 #7360 by support
Thank you for the additional information.

We do not have experience with running under Docker, and we do not test with it as part of our development procedure. But if e did, it is unlikely that we would be able to resolve any issues that only appear under Docker; they most likely have to do with the environment that we cannot influence.

As I wrote, this looks like like server-side or communication issue. You can try:
1. Modify some QuickOPC parameters to be more "forgiving", to see if it helps. For example, it is possible that the additional 5000 milliseconds mentioned in the earlier post was simply not enough a wait, and if we have waited a bit more, the communication would proceed well. All in all I do not think this is going to really help, because the problems will still be there and even if the comm does not break, it would "pause" until resolved, anyway.
2. Deploy network analysis tool in the container (not sure how - or possibly on the host) - we intend to analyze the communication using WireShark and determine "who is to blame".

Best regards

Please Log in or Create an account to join the conversation.

More
13 May 2019 20:03 #7357 by sjscheider
Over the last few days we've run a couple of experiments. Originally, we were running the code in a Docker Linux container on Windows and we experienced the issues originally posted. So next we tried running the code as an app on the Windows box and it worked without issues.

Next we tried to run on a Linux server inside Linux Docker contains and we experienced the same issues as originally posted. So it appears the issues has to do with running in Linux or in Docker contains.

Our app needs to communicate with another app that run inside a Linux Docker container, so we would like to use Linux Docker containers. Do you have ant experience running OPC Labs under Linux Docker contains? What about Windows Docker containers?

Please Log in or Create an account to join the conversation.

More
08 May 2019 07:21 - 08 May 2019 07:22 #7343 by support
Hello.

First, it is very good that you are testing the e.Exception first. And, it is very good that you re-tested with the recent version - because we had some updates in a related area.

There seem to be two basically two issues, but they are likely to have common cause. It looks like either a communication problem (network), or server-side problem. In both cases, it is the client side (QuickOPC), detecting that the responses from the server do not come as expected. For example. in the case of the first error you have reported, the server is expected to reply to a publish request at least in 10,000 milliseconds (10*1000), we gave it additional 5000 milliseconds (accounts for scheduling/processing/transport delays), but then, in total of 16,533 milliseconds, no response had arrived. This first error has to do with subscription-level checking. The other error is similar, but related to session-level checking.

I suggest that it take this to the server vendor. Which OPC server are you connecting to? Do you have a reliable network?

If necessary - that is, if the above finding is genuinely contested (there can still be a problem on QuickOPC side, too), it would be possible to take a Wireshark capture of the OPC UA communication, and see precisely what is happening and where the issue comes from. If it somes that far, I'd need to give you further instructions (do not do it right now).

Best regards
Last edit: 08 May 2019 07:22 by support.

Please Log in or Create an account to join the conversation.

Moderators: support
Time to create page: 0.383 seconds

      

 Recommend this on Google