Professional OPC
Development Tools

logos

Random Disconnect from OPC server

More
22 Nov 2012 09:00 #1109 by support
Hello,
thanks for the information. I actually have something which I think is "good" news: The stress test application has now crashed. It took it over a day. The not-so-good part is that I will have to do this repeatedly, each time obtaining a bit more information. The problem is that there is some kind of memory corruption which means that the problem does not show when it happens, but later, in seemingly unrelated moment. Please trust me that we are working on this, it will just take time. Assuming that these crashes will still happen here, in the end we should be able to find and fix it. The fact that it nows happens here makes it much easier to address.
With regard to your question: I think what happens is that the timeouts that you get may be the initial symptom of something wrong on our side. They are probably not correct, but you can handle those, on item level. But at some time later, the problem on our side causes the ReadMultipleItems method itself to fail, and that causes a problem on your side, because it gets transformed to an exception, and your current code does not handle it. I think that the generated C++ wrapper throws _com_error ( msdn.microsoft.com/en-us/libra ...) on any failed HRESULT, so that's the exception you should be catching.
Best regards

Please Log in or Create an account to join the conversation.

More
21 Nov 2012 15:10 #1106 by algorithmica
Hi,The computer running the OPC client is a Windows 7 ultimate, 64 bit computer with a Xeon E5520 CPU. I'm using MS Visual Studio 2010 to compile my code. In the same network lives another computer running the OPC server. This computer is really just a communicator to the control system of a major industrial plant.So far, I have not been able to find any combination of factors that indicate an insipiant failure. My intuition tells me that it is hidden in the complexity of the whole system involving a number of other computers. Prior to a crash, I have usually observed a large number of time-out errors. Could it have to do with the load on the computer running the server (it is sometime heavy)? Could it have to do with some error in the server?Occasionally we have had to re-start the server process. I think, but do not know, that the connection between client and server can only take so many re-establishings of the connection after a connection break and when that limit is reached, quits. Is this plausible?An added complication is that I have no control over the computer running the server or the software it runs. This is not only a physical problem but an administrative one => I cannot get this control. Any attempt to simulate the error on the basis of your simulation server would therefore simply the whole setup tremendously, which will probably cut out the cause ... I will think about doing this.I think, all three of your options are good and should be done simultaneously if possible.Regarding the exception handling, would I need to write something liketry { ReadMultipleItems( ... ); }catch(OpcException& e) { cout << e.GetBaseException().Message; }Is that about right? Will this prevent crashing?Thank you!Best, Patrick

Please Log in or Create an account to join the conversation.

More
21 Nov 2012 14:31 #1105 by support
Hello.
I have worked on the problem you reported. The information you have provided makes it clear that there is something wrong in the component, but even after close analysis, it was not sufficient to allow me to find the cause. I will really need to reproduce it here in order to be able to fix it. I am now using a stress test program (according to my knowledge) does things similar to yours: It reads 1-1000 items randomly, waits a little, and repeats it in a loop. Unfortunately, no crash so far (since yesterday).
I can now see following options - can do one of them or multiple at once:

I can continue running the test, and possibly write even more demanding tests, in an attempt to reproduce the problem.
Modify the code in suspected spots (in somewhat "blind" way) to put in more checks, safeguards etc. - even though no clear bug could be identified so far. Then I can delived the modified binaries to you for re-test.
Or would you be able to figure it how to make the crash in an environment that I can reproduce? Do you think you can cause the crash with our simulation server? And, by the way, can you send me the details of the system you are using (OS version & service pack, bitness, number of CPUs)? - I will deploy my test on a similar computer and let it run there.

Regards

Please Log in or Create an account to join the conversation.

More
21 Nov 2012 14:12 #1104 by support
Hello.
Here is some explanation to the my previous post:
In general, for methods like ReadMultipleItem, we try to report all errors through the elements in the result array - the Exception property of DAVtq object that you are already testing. Almost all errors are reported in this way; specifically, anything that has to do with communication problems to the target OPC server, is reported in this way. The advantage of this way is that you have full control over how to test for the errors, and all errors are on per-item level, allowing you to test them one by one.
As it turns out, however, there are some errors that cannot be reported in this way. In such case, the method itself (ReadMultipleItems or similar method) returns a failed HRESULT. There are, very roughly, 4 main areas when this can happen:

An invalid argument passed to the method. Note that this is not about such "benign" things such as invalid OPC item ID - as far as the method is concerned, any string is a valid argument. But this case is about programming errors, such as passing in NULL argument where it should not be, or let's say a float number instead of array of strings, etc.
System errors that prevent the method from being executed or finalized. For example, "Out of memory" condition.
Errors reported by the system for the communication between your program and the component. Since the component lives in a separate process, there are some things that can go wrong. And, some cannot be fully prevented. For example, an administrator can terminate the component's process. In such case, the RPC error will be returned for any further attempts to communicate to already existing instances of the components' objects.
A bug in the component (such as the crash of it - that's what you have actually observed).

If you do everything right, you can prevent #1 from happening. And, if we do everything right, we can prevent #4 from happening. But there are still edge cases (#2 and #3) that cannot be fully prevented. For this reason, your code should test the HRESULT of the method, and act appropriately. I don not know the logic of your application, but one possible approach is to treat such error as it was an error reported to all items involved in the operation.
What has happened there is that if you use the wrapper classes generated by C++, then they turn any failed HRESULT of a method into an exception. You can see that in the pictures you have sent to me: There is a line in IEasyDAClient::ReadMultipleItems wrapper that is:
if (FAILED(_hr)) _com_issue_errorex(_hr, this, __uuidof(this));
The _com_issue_errorex throws an exception in error cases described above. You need to either catch that exception type, or not use the wrapper (or use the raw_ReadMultipleItems in the wrapper) and simply test the HRESULT from the method call - if done in this way, no exception will be thrown and no exception need to be caught.
On a separate note, I have some results from the testing; I will post them here shortly.

Please Log in or Create an account to join the conversation.

More
20 Nov 2012 08:47 #1103 by algorithmica
Hello, that sounds great. Thank you.However, I have very little experience with this aspect. All I really want to do is for the entire chain OPC-server, QuickOPC and my code not to crash and to output a message whenever things do not work properly. Can you provide a piece of sample code on how I can catch the exception so that I can output the message without a crash? Thank you very much!In the light of the previous crashes, I am not sure what these "more serious things" can be since the crash seems to always occur in connection with OPC (either QuickOPC or the OPC server).Best, Patrick

Please Log in or Create an account to join the conversation.

More
20 Nov 2012 08:14 #1102 by support
Hello, a bit more to "The RPC server is not available.": Further observing your screenshots, I now realized that this code is returned as result from ReadMultipleItems call. In such case, it basically means that not the target OPC server, but rather the process of the EasyOPC is not available - which is the case, because of the ASSERT you have caught. So on our side, we need to fix the ASSERT; but on your side, you need to be prepared to somehow handle this better as well. Same applies to the error 0x80010105 (The server threw an exception) seen in your earlier posts: EasyOPC should not be returning this error; but when using the COM wrappers provided by C++ compiler, the failure HRESULTs from method calls are transformed into exceptions, so please think about how to best handle those (these will never be exceptions related to actual OPC operations - they will be more serious things).

Please Log in or Create an account to join the conversation.

More
20 Nov 2012 07:26 #1101 by support
Hello,
thanks for this.
"The RPC server is not available." should not cause an ASSERT like it did, but otherwise it is quite common error usually cause by disrupted networking communication, OPC server crashes/unexpected terminations etc.
I am finishing a different task, but should be able to switch to your issue today or tommorrow.
Best regards

Please Log in or Create an account to join the conversation.

More
19 Nov 2012 10:25 #1099 by algorithmica
Here is a new observation of a crash with a slightly different outcome. Please see the screenshots. The error messages translates into English as "The RPC server is not available." When do you think we can try some sort of patch?Thank you.Best, Patrick

Please Log in or Create an account to join the conversation.

More
15 Nov 2012 08:42 #1097 by algorithmica
Sorry for the confusion. The new post is not so important. Let's focus on the crash in reading multiple items. Thank you. Best, Patrick

Please Log in or Create an account to join the conversation.

More
14 Nov 2012 11:05 #1096 by support
Hello, actually, this 2nd post has somewhat confused me. Are you saying that you get a crash when accessing the results (at the line where the Exception property is accessed), and that it looks line some kind of invalid pointer in the data that you received?
Anyway, I'd rather focus on the original post for now, because that is something that should not be happening and is probably the same issue as you have encountered before.
Regards,

Please Log in or Create an account to join the conversation.

Moderators: support
Time to create page: 0.322 seconds

      

 Recommend this on Google