Professional OPC
Development Tools

logos

Readfunction gets stuck after disconnection

More
25 Nov 2016 11:01 #4596 by support
Thank you. Yes, the WaitForXXXX functions on two threads also "point" in the direction of some deadlock.

I think that although you are calling us from Delphi, it should be possible to obtain a .NET call stack by attaching a .NET debugger to the process. Can you get hold of Visual Studio and try it? It is possible that a free (Community or so) edition would do. If not, it may even be possible to use "old fashioned" WinDbg (with SOS), although for that I would have to locate instructions.

At this moment, I am not much fond of trying to run your program on our side, because I think that it is likely that it won't show the problem here. We have tried to reproduce it already, without success. The differences that cause this are usually in things likely # of CPUs, processor speed, etc. - but not in the fact that you have doing it from Delphi, or somehow differently from us here. If you could, on the other, put together a virtual machine which consistently show the problem, that would be a win: In such case, it should be possible for us to take over the VM and perform our investigation there.

Best regards

Please Log in or Create an account to join the conversation.

More
24 Nov 2016 08:04 #4591 by RH
Dear Mr. Zahradnik,

thank you for your support. I really would like to provide some callstacks, but as I use the COM version from my Delphi program, the stacks are not very readable, except for my own code.

Most other threads belong to the ntdll and clr.dll, when I take a look into it in the "process explorer". In my Delphi IDE, I can only identify your threads, because the stack refers to .NET framework:



And as I assumed before, some WaitForMultipleObjects statement is waiting for something to happen which does not. There is one more thread which refers to the .NET framework which has a WaitForSingleObject on top of the callstack. I don't know if that could help you any further?

I could maybe provide a test program if that helps. Then you could use it with some modified version of your lib which creates log files or something?

Best regards,
Ralf Herrmann
Attachments:

Please Log in or Create an account to join the conversation.

More
24 Nov 2016 06:59 - 24 Nov 2016 07:00 #4590 by support
Dear Sir,

thank you for the detailed explanation. While there are multiple possible causes, the most likely one is a deadlock in some internal QuickOPC processing code.

There is a chance that we might be able to find the reason from investigating the call stacks, if you can provide some cooperation. What is needed is either to run the program under the debugger, or attach the debugger to the already running program later, when the problem occurs. Then, break into the program (pause it). And, give us call stacks of the running threads.

This is somewhat demanding because there will be many threads. In order to make it a bit easier. I think we can only do with threads that have at least SOME QuickOpc code in their call stack - the others can be ignored. In Visual Studio (Threads windows I believe), it is possible to switch to a thread, and then to view its call stack in the Call Stack window. You can select the whole call stack (Ctrl+A), Copy it, and then Paste it into a document. This has to be repeated for all threads of interest.

We do not "obfuscate" our assemblies, therefore all the method names are preserved, and it is pretty obvious which code belongs to "us".

It would be nice if you could do this.

Best regards
Zbynek Zahradnik
Last edit: 24 Nov 2016 07:00 by support.

Please Log in or Create an account to join the conversation.

More
23 Nov 2016 12:00 #4588 by RH
And again, I have to correct my assumption, that is has something to do with the Siemens OPC server. I could reproduce it on both sometimes, but the Siemens server session hangs more often. This is probably a coincidence.

Some short explanation on my program:

My program creates several instances of OPC clients, one for each server. I set the Isolated-Flag = true to get independent instances. My program creates one thread for each client, where the read/write operations are executed (ReadMultiple, WriteMultipleValues). Before each operation, I ping the server and only perform the following read/write operation on success. That already helped a lot. But I guess it's still a problem, when the connection is lost in the wrong moment. The good thing is, only one instance blocks and only one thread is locked when it happens. All the other ones are still running.

I tried the following workaround: After the detection of a thread hanging, I kill it (not recommended though, sometimes the application shuts down directly), free the OPC client and create a new client and thread. But even then, I can't reconnect. That might be like this due to the internal structure which I don't know. Does this information help you in any way to get the reason for my issues?

Can I collect some diagnostic information for you to get on track? I also tried on a different computer with different OS, but the problem occurred as well.

If there is no other possibility to get it fixed, I will probably have to go for one process for every OPC connection which I can probably kill and recreate with less damage. But that's not a nice structure in my opinion and more difficult to handle due to interprocess communication that would be needed.

Please Log in or Create an account to join the conversation.

More
22 Nov 2016 11:31 #4586 by RH
I also did some more testing, using another OPC server device of a different manufacturer. With that, I couldn't reproduce the error as well.

It seems, that it is an issue of the integrated OPC server in the Siemens Touchpanel. I posted another thread here, where I described problems with using the secured endpoints on the Siemens Touchpanel. You told me, that the server does not meet the OPC standards very good. Maybe this issue happens due to some differences from standard.

Would it be helpful for you, when we send you a device for temporary testing? Then you could make your library more tolerant against this connectivity issue. Siemens panels are widespread and maybe it might be relevant for other users, too.

Please Log in or Create an account to join the conversation.

More
19 Nov 2016 11:28 #4582 by support
Unfortunately this issue did not show up during the IOPWS testing.

Please Log in or Create an account to join the conversation.

More
09 Nov 2016 07:27 #4548 by support
Yes, the ping test could have reduced the actual instances when the read call is made, and therefore made the error disappear.

I was not able to reproduce the issue so far, but we will be testing our software extensively next week on the OPC Interoperability Workshop - including the "disconnection" test. Will keep a close eye on it to see whether we can "catch" it live. I will inform you afterwards.

Regards

Please Log in or Create an account to join the conversation.

More
08 Nov 2016 11:21 #4547 by RH
Maybe it made less problems because of less read calls from my application. I also tried to add some error handling in my code to prevent read calls if the server cannot be pinged. Although it is not recommended in your manual to add some own error handling. But maybe it helped in this case.

I use QuickOPC 5.40.315.1 (version of the dlls in the program folder).
The server uses opc.tcp: protocol.

Please Log in or Create an account to join the conversation.

More
08 Nov 2016 10:48 #4546 by support
If you have verified that nothing happens after 10 minutes, then your are right, it would make no sense to actually try to reduce that timeout. It looks like a problem inside QuickOPC.

I have to disappoint you with the ReadMultiple function: The methods that work with just one element are just very thin wrappers over the ReadMultiple function. Is therefore very unlikely that there would be a problem in them, but not in ReadMultiple. I suppose that (sadly) it was just a coincidence.

Can you please tell me

1) which QuickOPC version and build you are using?
2) Which protocol - opc.tcp:, or http: ?

Thank you

Please Log in or Create an account to join the conversation.

More
07 Nov 2016 11:39 - 07 Nov 2016 15:39 #4545 by RH
I just tested the readmultiple function, and there is no freeze problem there. If I disconnect my network cable and reconnect it again, it takes 5-10 seconds and the communication is running again. Maybe this helps you finding a possible bug with the single read function.

Edit: I also get it freezing with the multiple read function. It seems to be a timing issue, when exactly it is getting disconnected...
Last edit: 07 Nov 2016 15:39 by RH.

Please Log in or Create an account to join the conversation.

Moderators: support
Time to create page: 0.218 seconds

      

 Recommend this on Google